Correct response for wrong Host: field in HTTP request header

September 25th, 2009 | by Sean |

Still frantically trying to get Spinneret into shape for a first open-source release. I’m also working on a new project that I’m quite excited about, which is providing plenty of examples of where Spinneret could be improved!

The new project is hosted on a VPS at VPSLink (recommended, so far!). Today’s curve ball was Google’s googlebot requesting pages that belonged to the previous owner of the IP address. I could see the previous domain name in the debug output in the ‘Host:’ request header field. Almost every request from the googlebot was causing my server to return a 404 – the URLs were all from the previous domain’s site.

In the interests of not cluttering up search indexes with my content duplicated for an expired domain, I decided that my new site should just plain reject requests for other hosts identified in the Host: header. I couldn’t decide which 4XX error it should be, but finally I decided to send the 403 (Forbidden) status code, with an explanation that the Host: header field was what caused the problem.

A quick search turned up nothing – I could annoy some admins by using cURL to send some wrong Host: fields to various sites, but I don’t really have the time. I’d be delighted if anyone could upgrade my ignorance!

Post a Comment