The Increasing Sophistication of Spam

by Krishna on September 25, 2011

Erik talks about a new type of spam comments:

a series of posts have started to come in that follow a distinct pattern. They’ll include some insipid compliment not referring to the content at all, and contain exactly one misspelled word. So, something like “Wow, great contnet!” The other defining characteristic is that they all register a URL to a popular site — google, facebook, yahoo, etc. In essence, harmless as far as SPAM goes — no link to something unsavory, no obvious attempt to game search engines, etc.

[…] I believe they’re sending out feelers to test whether or not a blog has some kind of automated SPAM blocking so that they can target blogs that won’t get their IP addresses black-listed on anti-SPAM rolls of sites like akismet. The misspelled words make it a lot easier for some automated web crawling utility to go find the Spammer’s handiwork, and the harmless links tell them whether or not the blog in question will even allow them to ply their trade. Basically, the Spammer or Spammers try to create something that is borderline enough to trigger SPAM protection if it’s there, but not so blatant as to be classified as a real threat.

I have been noticing these kind of comments too, though I am not entirely sure whether Erik’s analysis is right. At least some of these comments escape Akismet's net, though the more straightforward spam comments are caught by Akismet. So if someone is using this not to get black-listed by Akismet by sending a feeler first, they may still end up in the wrong place. However, this may not be the case for other kind of anti-spam filters and so there might be some benefit.

If the analysis is true, it shows how spammers are improving their techniques. One of the interesting uses of Project Gutenberg seems to have been spammers who “regularly harvest PG books to provide innocent-looking text to poison your spam filter”. It is interesting how spammers will parse the content of your web page, match it with text (such as from Shakespeare), add their link and post a comment. Also, how they use social techniques such as flattery to try to make you keep the comment in place.

Blogging software is a popular application that programmers like to write in their spare time. But with the increasing importance of blogs as the primary driver of news and entertainment on the Web, there is the need to make blogging software more like industrial-strength software. Take a look at the software that drives the Top 20 blogs on the Internet. WordPress drives 9 of them, followed by Movable Type, Drupal and Typepad.

I suppose a similar pattern would be found for the Top 100 or 1000 blogs. And there is a reason. A blog needs much more infrastructure than just writing and scheduling posts. In addition to protecting against the sophisticated spammers that we saw, it also needs the ability to easily change the web design, be SEO-friendly, allow multiple authors at various permission levels and so. Thus, blogging software that can support a wide variety of third party plugins and themes securely will be preferred by site owners.

For example, right now, I run this site on WordPress using the Thesis theme framework. The default layout of Thesis seems very pleasant and allows me to quickly have a professional layout with little work on my end. In addition, I also have plug-ins like Akismet, a Share and Follow widget for various social networks, Google XML sitemaps, 404 redirects, database backup and typography. Some of them would have taken forever to write if I was rolling out my own blog software.

{ 3 comments }

Erik September 26, 2011 at 2:07 pm

The increasing sophistication is an interesting angle. I find myself wondering what would happen if the Spammers became sophisticated enough to have their crawling mechanism parse blog posts and create actual, meaningful replies to the content. In other words, if Spammers were actually able to further your point or contribute to your discussion and, in exchange, they helped themselves to a free added link back. Assuming that the link wasn’t to something unsavory, that would start to blur the line between Spam and automated discussion.

Finding ways to augment mind share for products or ideas is an interesting problem. It’s too bad that it’s so frequently solved through schemes and trickery.

Krishna September 26, 2011 at 2:58 pm

Thanks for your comment, Eric. I have wondered why spammers do not even do this manually. For example, if someone is selling electronic widgets, they could participate on electronic blogs all the time and get links back by posting knowledgeable comments. That way, they are even contributing something.

But on the technology, I recently saw a NYT times article about automated sports commentary which looked very much like a human writing it. So we should have something like that soon. It would perhaps work on medium-sized sites where owners are too busy to police the comments effectively.

Amanda March 23, 2012 at 1:26 pm

Finally a possible explanation- I have been seeing this exact trend in our blog comments as well. I was wondering if someone was placing these links with the intentions of having them go to spammy locations later, but they’re all linking to random facebook profiles that do not exist. So strange, and getting extremely annoying as we have received at least 15 just in the past two days. Looks like we may have to add an ugly captcha box :(

Comments on this entry are closed.

{ 1 trackback }

Previous post:

Next post: