WordPress comments post by Googlebot / “slike za facebook”?
July 18th, 2012While you may have spotted Googlebot POST requests in your logs and dismissed them, you may also have spotted some odd requests and search phrases turning up in your traffic analysis. For me it was “slike za facebook”. Not only was it turning up in my search phrase stats, I could see my blog making a 200 response to the URLs in the Google search results. The content in the responses was from a directory in my document root called “/photographyyikb/”, which sure enough existed and had the following structure:
ls -a ../photographyyikb/ . .. .htaccess index.php .tmpbz
The .htaccess rewrites the incoming URL as a parameter to index.php, which contains some obfuscated code which (I’m guessing, didn’t look that hard) extracts ‘articles’ from what appears to be a pseudo filesystem built in the .tmpbz directory tree. I guess you’ll be able to get a good idea of what kind of content was in that filesystem by searching for the directory name on Google. I’m guessing it’s plain-old link farming … only using farmland that’s not your own, obviously.
If you want an archive of the files from the exploit, including the .htaccess, the index.php and the .tmpbz directory, there’s a 10MB tarball here. When I take unique-ish looking sequences of words from the sneaky content, I can see other sites appear to have been compromised in a similar way, although the URL varies slightly. The URL seems to be made up of the stem “photography”, “pics” or “photos” with a 4-character string appended (“yikb” in my case).
The exploit itself appears to be known. If I understand correctly it’s this one:
http://bot24.blogspot.co.uk/2012/07/xss-redirector-and-csrf-vulnerabilities.html
My blog did indeed have the Akismet setting “Auto-delete spam submitted on posts more than a month old” checked. If the POSTing Googlebot is bothering you very much, you could also try something from this page:
http://rankexploits.com/musings/2012/comment-control-for-worpdress-htacess-rules/
The POSTs to wp-comments-post seem to originate (for the ones that I’ve checked) Brazil, Turkey, Saudi Arabia and China. Some of the networks from which the traffic appears to originate have abuse/spam email addresses, some don’t. I sent out a few emails with highlights from my web log, but I suspect all I’ve accomplished is to get myself on some premium spam lists.
I’m not sure I haven’t put 2 and 2 together and come up with 7 here. I don’t have an explanation for the new content directory that links it firmly to the (not) Googlebot POSTs. They are certainly coincidental. I just put this online because there didn’t seem to be very many other people experiencing the same. I hope it helps.
Update: this seems to be a well-established exploit. This page (deliberately not linked, you’ll have to copy-paste) contains many similarly affected wordpress blogs:
http://www.tcvv.org/cgi-bin/autolink.cgi/www.billygamble.com/www.infobarrel.com/www.infobarrel.com/www.thehackingforum.com
Here’s a saved copy of the page, just in case: www.thehackingforum.com.html in zip (~1MB)
Update 2: Google have quarantined blog.lolyco.com and lolyco.com. Not sure what the detection method is – but it was preceded by a Webmastertools (I don’t often use it – I only looked because I saw the message in search results) message about a massive rise in 404s. The owners of /photographyyikb/ must be making links to lolyco.com available to Google’s crawler on other sites before they’re available on lolyco.com. Or perhaps they’re merely failing to synchronise their URLs on all their hijacked hosts. Not sure what to do with the list of 521 404 responses Google has given me – what use is that without the referring URLs?