In this episode of Hacker Public Radio, I will describe the method I chose to combat spam bots filling out my company's contact form. About 99% of the submissions we receive are spam, which makes filtering for valid messages painful. After some research into different methods, I decided to go with the honey pot method.
I was already using Bootstrap CSS for our site, so I decided to use Bootstrap's "sr-only" class. This class is used for elements that you only want visible to screen readers. It takes the element and uses a combination of absolute positioning, setting the size and width to 1 pixel, setting a negative left margin, and hiding content overflow to prevent the honey pot showing up visually. I figured if the bot was scanning CSS for classes or properties, this wouldn't trigger any warnings. It does bring up the issue of how to prevent impacting the experience of people using screen readers. I applied the aria-hidden attribute with a value of true to the label element surrounding the honey pot input field. "[this] removes that element and all of its children from the accessibility tree." So we now have the field hidden both visually in the browser and from assistive technologies. Given the short end of the stick accessibility usually gets, I doubt there are any spam bots scanning for that ARIA attribute. For the minority of users who might be viewing with the classic lynx browser, I put 'For office use' as the label text before the honey pot, hoping this would get the message across without tipping off the bot to the intended purpose of the related input field.
The other main issue with this method is the value of the name attribute used for the input field. Some argue to use obfuscated values like "mmxxName" instead of "name", or "sxysPhone" for "phone". Apparently some bots will skip fields they don't recognize. By using more standard names for multiple honey pot fields, it easier to determine if it is a bot. The counter argument to this naming scheme is about the user experience, by obfuscating the name, then browsers won't auto-fill the valid fields of the form. This also brings up the matter of not auto-filling the spam fields by the browser of your users. This is done by setting any of your honey pot input elements' "autocomplete" attributes to "off".
So far this spam filtering method is working nicely. I currently send any messages flagged as spam to a different email address with the subject prepended with the words "[Spam review]". Once I am confident there are not that many false positives, I will just skip sending flagged messages. The one issue I have experienced with this method is when using the tab key to move through the form. Since the input field is only visually hidden, it still receives focus as you tab through. If you happen to hit another key while still in the hidden field, it will get captured by the honey pot and then the submission will be flagged as spam.
I have created a sample form on my personal site. Please visit URL: http://www.horning.us/hpr/SpamBotHoneyPot.php to try it out. It is a simple PHP page using the GET method when submitting the form. Once you press the submit button you will see the form fields and their values, along with the result messages. I chose to use "URL" as the name for my honey pot input field. I use it on my example form, and I use it for my work form. For my work form, a URL is not something we ask to be submitted, and being a common field in forms, makes it very tempting for bots. In my example code, the CSS for hiding the honey pot section is from the minicss.org websites. Their "visibility-hidden" class is very similar to Bootstrap's "sr-only" class. I would be interested to hear if others have implemented something similar. I would also love to hear from someone who uses a screen reader. Does it prevent the honey pot section from being read?
- Better Honeypot Implementation (Form Anti-Spam)
- Honeypot Technique: Fast, Easy Spam Prevention
- Using the aria-hidden attribute
- Spam Bot Honey Pot example