A Look Inside the Think Tank...

Blog-Spam On My blogccasion Software

Created on and categorized as Technical.
Written by Thomas Steiner.

Comment spam on my personal blog

Spam seems to be one of the most-spread evils of the internet age. Since december 26 it also reached my blog. I get a lot of poker spam from various online gambling sites. Besides the fact that I am a really poor poker player (I am good at making a poker face but bad at having the adequate cards...) I prefer losing offline instead of online.

Enough reason to fight spam. I did some research and found out that comment spamming seems to be a huge problem on WordPress, a popular semantic personal publishing platform. I am surprised at the fact that the spammers digged into my homebrew blogccasion software (my reasons for developping my own blogging software). Its structure is very open (and designed as an experiment to trust in the commenter's personal responsibility). By now I experienced good results and only once decided to change a comment into a text link to the origin site (last comment on the blog entry on the Paris riots).

The comment form passes its contents to WriteBlogComment.php where first all HTML tags get removed and second the database entry gets written. I do not force the commenter neither to provide a valid email address nor to provide his name. Looks like this open structure is too na?ve in the age of spam bots. Common ways to fight comment spam are so-called Captchas with the inconvenience of being a graphical solution. There has been a lot of research on the inaccesibility of captchas. I try to keep to the web accesibility rules as proposed by many organisations, among the W3C. So I developed a set of trivial arithmetic problems as proposed by the document mentioned above:

The goal [...] is to separate human from machine. One reasonable way to do this is to test for logic. Simple mathematical word puzzles, trivia, and the like may raise the bar for robots, at least to the point where using them is more attractive elsewhere.
Problems: Users with cognitive disabilities may still have trouble. Answers may need to be handled flexibly, if they require free-form text. A system would have to maintain a vast number of questions, or shift them around programmatically, in order to keep spiders from capturing them all.

My simple Turing Test can be used for free.

Image from wikipedia.org