Blog
What's in your spamtail?
by Red Ant on Wednesday, July 29, 2009
We've recently moved this site from Wordpress to Radiant, and one of the fun tasks I got to do was move all of our comments over. The comments database table was quite large, so I naturally assumed that it must be brimming with all the witty and insightful comments that people had left on our blog, that I'd somehow missed. It turns out that of the ~4000 comments, 5 were from actual people, the rest were spam.
Out of procrastination idle curiosity, I wrote a query to break down the results a bit further, splitting on spaces and removing markup, then counting how often the same word appears. This then revealed a magnificent long tail (drag to zoom in to detail). Scroll down to see a more detailed version.
View online to see the interactive version of this diagram
The top 13 phrases
I was expecting porn and viagra to top the list of frequent phrases:
| car | 253 |
| site | 188 |
| Hi | 158 |
| htm | 131 |
| and | 129 |
| my | 119 |
| Good | 118 |
| visit | 106 |
| nice | 106 |
| Please | 97 |
| Hello | 95 |
| Great | 93 |
| Sex | 93 |
Here is a graph of those appearing more than 3 times:
View online to see the interactive version of this diagram
Most common names
The most popular name chosen by our spammers is Ipatiplakat, followed by:
- Britney;15
- Helga;13
- Hillary;12
- Replica watches;11
- Betty;9
- Mikle;9
- Joey;8
- Eddie;8
- tramadol;8
Most common URLs
Some variation of Google.com was popular. IPs were a little bit more straightforward- the clear winners was 81.95.146.227 with almost 30% of our spam, followed by several variations of 66.232.126.195