The Increasing Sophistication of Spam

by Krishna on September 25, 2011

Erik talks about a new type of spam comments:

a series of posts have started to come in that follow a distinct pattern. They’ll include some insipid compliment not referring to the content at all, and contain exactly one misspelled word. So, something like “Wow, great contnet!” The other defining characteristic is that they all register a URL to a popular site — google, facebook, yahoo, etc. In essence, harmless as far as SPAM goes — no link to something unsavory, no obvious attempt to game search engines, etc.

[…] I believe they’re sending out feelers to test whether or not a blog has some kind of automated SPAM blocking so that they can target blogs that won’t get their IP addresses black-listed on anti-SPAM rolls of sites like akismet. The misspelled words make it a lot easier for some automated web crawling utility to go find the Spammer’s handiwork, and the harmless links tell them whether or not the blog in question will even allow them to ply their trade. Basically, the Spammer or Spammers try to create something that is borderline enough to trigger SPAM protection if it’s there, but not so blatant as to be classified as a real threat.

I have been noticing these kind of comments too, though I am not entirely sure whether Erik’s analysis is right. At least some of these comments escape Akismet's net, though the more straightforward spam comments are caught by Akismet. So if someone is using this not to get black-listed by Akismet by sending a feeler first, they may still end up in the wrong place. However, this may not be the case for other kind of anti-spam filters and so there might be some benefit.

If the analysis is true, it shows how spammers are improving their techniques. One of the interesting uses of Project Gutenberg seems to have been spammers who “regularly harvest PG books to provide innocent-looking text to poison your spam filter”. It is interesting how spammers will parse the content of your web page, match it with text (such as from Shakespeare), add their link and post a comment. Also, how they use social techniques such as flattery to try to make you keep the comment in place.

Blogging software is a popular application that programmers like to write in their spare time. But with the increasing importance of blogs as the primary driver of news and entertainment on the Web, there is the need to make blogging software more like industrial-strength software. Take a look at the software that drives the Top 20 blogs on the Internet. WordPress drives 9 of them, followed by Movable Type, Drupal and Typepad.

I suppose a similar pattern would be found for the Top 100 or 1000 blogs. And there is a reason. A blog needs much more infrastructure than just writing and scheduling posts. In addition to protecting against the sophisticated spammers that we saw, it also needs the ability to easily change the web design, be SEO-friendly, allow multiple authors at various permission levels and so. Thus, blogging software that can support a wide variety of third party plugins and themes securely will be preferred by site owners.

For example, right now, I run this site on WordPress using the Thesis theme framework. The default layout of Thesis seems very pleasant and allows me to quickly have a professional layout with little work on my end. In addition, I also have plug-ins like Akismet, a Share and Follow widget for various social networks, Google XML sitemaps, 404 redirects, database backup and typography. Some of them would have taken forever to write if I was rolling out my own blog software.


Comments on this entry are closed.

{ 1 trackback }

Previous post:

Next post: