web
analytics

Fighting spam with reCaptcha (Part 1)

Posted by on Dec 17, 2007 in Tech | No Comments

I have a couple sites which suffer from a deluge of contact form spam. This particular type of email spam attempts to use a site’s contact form to send out unsolicited email. My Cape Holiday site has been receiving a steady stream of these emails over the last few weeks.

Up to now I have relied on server settings and using good quality form-handling code to keep the amount of spam to a manageable amount. Matt Wright’s CGI script has been my standby for years and just recently I switched to the PHP form script from Tectite. These are both great and there are some fantastic settings in those scripts to keep spam down to a minimum. The time came this week for me to establish a new form of defence for my sites.

A challenge-response test seemed the logical route forward and so I sallied forth this week to find a captcha that would work for me (preferably free and not difficult to implement). Let me add at this point that my knowledge of programming languages is fairly non-existent, so my second criteria demanded a method that took this into account.

I toyed with the idea of using the free captcha offered on the Tectite website, but I wasn’t crazy about the images it generated. Other versions presented themselves until my attention was caught by a version that offered a unique angle.

reCaptcha Screenshot

ReCaptcha is a spam fighter with a heart, this little beauty fights spam while helping to digitize libraries of books.

reCaptcha OCR Example

The graphic above illustrates the problems with scanning documents, often the image is not of sufficient quality for the OCR to pick up the exact words. Any word that is thrown up by the OCR software as “hard to read” needs to be checked by a human eye. So reCaptcha uses their challenge-response to verify the scanned documents they hold in their databases. The challenge you receive from the server comes in two parts. The first word is known and forms the actual challenge, the second word is for the benefit of the research being carried out. By typing out the second challenge word you are helping the archivists to “decipher” their scanned texts. How cool is that?

OK, so after reading about this project I wanted to use this system on my own site. Have I mentioned my two criteria yet? Sure I have, but just in case here they are again:

My captcha needs to be 1. Free and 2. Easy to implement.

Well, truth be told, I struggled on this one. In Part Two of this post I’ll discuss my method of implementing this system on my site.