HTTP Header viewer updates: see redirects and test If-Modified-Since
February 28th, 2010 | by Sean |Just a quicky to bring your attention to a couple of updates to my Server Headers viewer page at spider.my, making it a bit more useful. Now you can check server responses to If-Modified-Since, requests that are redirected, and the effect that persistent ‘keep-alive’ connections have.
If-Modified-Since
I’ve added an If-Modified-Since field so that you can test your server’s ‘Last-Modified’ response. The best-known example will probably be browser caching. If your browser can do it (most can), it will cache content from the Internet so that the next time it needs to build a page that uses the same content (think images, style files – that kind of thing), it can retrieve it from memory or your hard disk instead of downloading it from the remote server again.
The way browser caching works is that when a repeat request for some content is about to be made, and the content is still in the browser cache, it send the date and time the content was previously downloaded to the server as an “If-Modified-Since” field. The server, if it is capable, checks the “If-Modified-Since” field against the time it knows the content was last modified. If the content hasn’t been modified since the “If-Modified-Since” time, the browser sends a “304 Not Modified” response. A 304 response has no content, so it’s a big saving on bandwidth. If the content has been modified, the server sends the new content along with a “Last-Modified” field with the new modification time.
For an example, check Google’s response to a HEAD request for its favicon:
You can see it was “Last-Modified” 4 months ago. Something to point out here is the Content-Length of ‘0’ (zero). I think that’s a wrong response by Google’s webserver. The W3’s RFC2616 says:
The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request.
If you send the same request to www.google.com as a GET request, the Content-Length is 1150 bytes (28th February 2010) – clearly not identical. Sending Content-Length: 0′ in response to a HEAD request is redundant – the body MUST NOT be returned, so obviously the Content-Length is zero for such messages. Specifying identical Content-Length in a HEAD request is a good idea, as it provides an facility to ‘check’ a resource before it is transferred – in the case of the Content-Length field, HEAD can be thought of as asking “what length is this resource?”.
Sticking with Google, we can fill in the If-Modified-Since field with the day after the Last-Modified date given for the favicon. This is the response when If-Modified-Since is set to ‘Tue, 27 Oct 2009 20:18:44 GMT’:
See the ‘(Response Status)’ line showing the “304 Not Modified response? Google’s webserver is clearly responding correctly to the If-Modified-Since header field.
Connection: close
Spider.my uses a Java HttpURLConnection, which defaults to a HTTP/1.1 persistent connection. That means that it won’t immediately close it’s connection to a server, in the expectation that it will be used again. When the expectation is valid, a lot of time can be saved in setting up new connections for each subsequent request, greatly improving latency. Try a few requests for big sites’ favicons with Connection: close and without it. If you get similar results to me, you should see the ‘Time to Headers’ figure is about double with ‘Connection: close’ what is is with a persistent connection. It looks to me as though an extra round-trip is needed for non-persistent connections.
Follow redirects
When your browser allows you to type in the name of an URL as just the domain, and then updates the URL bar to say ‘www.example.com/home.html’, chances are that the webserver at example.com has sent a redirect to your browser to say “don’t ask for that, ask for this instead”. Redirects are also very common on sites with forms to update server-side data. If you set “Don’t follow redirects”, you’ll be able to follow the conversation between your browser and a server in much greater detail. The default for the page is to follow redirects, so if you don’t check the box, you’ll only see the final page in the redirect chain.
I’ve just noticed that Java’s HttpUrlConnection doesn’t record the final URL in a redirect, so the page currently reports the requested URL rather than the URL that finally produced the response (in the case redirect-following is selected). That’s a job for the next update!

