Sean's blog

Pos Malaysia widget in Malay and Chinese (browser language detection)

March 2nd, 2010

Browser language 'en'. Netherlands quote requests 1-to-NL.js

Shipping widget appearance for browser language 'en'.

Adding the Pos Parcel rates (ripped yesterday) to the previously ripped Pos Laju rates made it glaringly obvious how difficult it is to write software for data that’s in as bad a shape as is Pos Malaysia’s. With the problems in country names I’d already mentioned (Luxembourg/Luxzemboug, Netherlands/Netherland, United Kingdom/Great Britain), the benefit of replying to a shipping quote query by weight / country is lost. If the country names don’t match for different datasets, there’s no easy way of grouping different quotes for the same destination together.

I mentioned ISO3166-1 – the ISO Country Code standard before. I wrote a quick tool to map Pos Malaysia country names (there’s some really funny stuff in there!) onto ISO3166-1 2-letter codes in my version of their data, and instantly broke all my code. I’d written the demo code to use Pos Malaysia’s names, so I had to convert everything to use the 2-letter ISO codes. That finally merged all the strangely-named countries. Having done that, the only possibility left (it’s a good one!) for the API was to use the 2-letter codes. The query for 0.5kg to Netherlands is now:

~~http://spider.my/pos-malaysia-shipping-quote-2/0.5-to-NL.xml~~
(updated 31st Dec 2010)
http://spider.my/api/shipping-quote/0.5-to-NL-ex-MY.xml

Using two-letter codes also meant I lost my source of country names for the pop-up selection menus on my pages. Fortunately Java comes equipped with a Locale class which can provide a nice country name for a 2-letter code. Here’s the good bit though: it can give the name in any language for which it has the necessary platform data. Coverage is pretty good, so widely-spoken languages in Malaysia like Malay, English and Chinese are available for free! Unfortunately for the large local Tamil-speaking population, Tamil doesn’t seem to be included in the Java platform at the moment.

Since I was using Locale to provide country-code-to-name mapping, I added a small amount of extra code to detect browser language preference. Now if you use the widget demo, you’ll get it ‘your way’ if your browser is set to ask for Malay (ms), English (en – the default) or Chinese (zh). Other browser settings will probably get you a best-effort at country names and everything else in English.

Browser language 'ms' (Malay). Belanda (Netherlands) uses same 1-to-NL.js

Browser language 'zh' (Chinese). 荷兰 (Netherlands) uses same 1-to-NL.js

That’s more or less as far as setting up an API for Pos shipping quotes can go. I’ve got one more demonstration of the API that I want to do, and then I’ll do a recap of the current state of the API. Not a hint of a reply to my emails to Pos yet, I’ll try phoning later in the week.

Posted in Fixed, software, Spider.my | No Comments »

NEW: Pos Malaysia Parcel rates

March 1st, 2010

All this work is now packaged up into the spider.my Shipping Quotations API. Check there for latest shipping API methods, shipping modules etc.

0.5kg to France, 4 quotes: Pos Laju Document and Parcel, Pos Parcel Surface and Air

This morning I modified the International Pos Laju rates ripping code to get a full set of Pos Malaysia’s International Parcel rates. If you use the widget on my blog, you’ll see that the countries to which Pos Malaysia ships by Pos Laju will also offer Pos Parcel ‘Surface’ and ‘Air’ prices where these are available. Try 0.5kg to France, for example. You should see something like the image here.

There are far more countries listed under the Parcel shipping method at Pos’ website, but I’m not listing those for now. I mentioned earlier that there’s an issue with free-format country names on Pos’ website. The Parcel method country names are just as bad as the Pos Laju ones, listing Yugoslavia – for example – which hasn’t existed as a country since 2003. Also, you won’t see parcel quotes for Germany because Pos’ website has it listed as ‘German’ (no ‘y’ – see also Luxembourg and Luxzembourg, Netherlands and Netherland). And the United Kingdom is listed as ‘Great Britain’. The ‘new ‘ countries (including misspelled ones) are all available from the AJAX shipping quotation demonstration page. See the updated XML for a 0.2kg shipment to Bahrain. Full rating tables are also available for download from the shipping rates download page. Here, for example, are all rates for Norway.

It’s an eye-bleedingly pains-taking job ripping this data. I mentioned previously the omission of per-country max weights on some Pos Laju quotes, resulting in a default (I guess) max weight of 999kg being offered. The max weights all seem to be present on the Parcel quotes, but the tables don’t all give prices up to the max weight, using ‘-‘ (I imagine) to signify that the upper weights are not available. This simply isn’t good enough for use in 3rd party computer systems. I incorporate some guesses in my ripping code in order for it to run to completion, despite the inconsistencies in the Pos data. This on its own would be a good reason why no other entity besides Pos Malaysia should provide shipping data.

I’ve sent emails so far to csc and corpcomm at Pos in an attempt to find someone I can talk to about this stuff. It would take only minutes to put the API online on Pos Malaysia’s domain, and possibly only a day to apply Pos branding to it. If (as I hope) Pos were to use spider.my’s Spinneret web server, they would instantly be able to serve millions of queries per hour from just about any old hardware they had lying about the office. In tests on my 4-year-old laptop, Spinneret can serve an average of 1,500 shipping quotations per second – before optimisations, and with the server test software running on the same machine! I should have made Spinneret open source before now, but would provide a full set of sources to Pos in any case.

If anybody has a suggestion of who I could contact to expedite this matter, please let me know? Thanks!

Posted in Fixed, software, Spider.my, Spinneret | 9 Comments »

HTTP Header viewer updates: see redirects and test If-Modified-Since

February 28th, 2010

Just a quicky to bring your attention to a couple of updates to my Server Headers viewer page at spider.my, making it a bit more useful. Now you can check server responses to If-Modified-Since, requests that are redirected, and the effect that persistent ‘keep-alive’ connections have.

If-Modified-Since

I’ve added an If-Modified-Since field so that you can test your server’s ‘Last-Modified’ response. The best-known example will probably be browser caching. If your browser can do it (most can), it will cache content from the Internet so that the next time it needs to build a page that uses the same content (think images, style files – that kind of thing), it can retrieve it from memory or your hard disk instead of downloading it from the remote server again.

The way browser caching works is that when a repeat request for some content is about to be made, and the content is still in the browser cache, it send the date and time the content was previously downloaded to the server as an “If-Modified-Since” field. The server, if it is capable, checks the “If-Modified-Since” field against the time it knows the content was last modified. If the content hasn’t been modified since the “If-Modified-Since” time, the browser sends a “304 Not Modified” response. A 304 response has no content, so it’s a big saving on bandwidth. If the content has been modified, the server sends the new content along with a “Last-Modified” field with the new modification time.

For an example, check Google’s response to a HEAD request for its favicon:

Google server headers for /favicon.ico

You can see it was “Last-Modified” 4 months ago. Something to point out here is the Content-Length of ‘0’ (zero). I think that’s a wrong response by Google’s webserver. The W3’s RFC2616 says:

The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request.

If you send the same request to www.google.com as a GET request, the Content-Length is 1150 bytes (28th February 2010) – clearly not identical. Sending Content-Length: 0′ in response to a HEAD request is redundant – the body MUST NOT be returned, so obviously the Content-Length is zero for such messages. Specifying identical Content-Length in a HEAD request is a good idea, as it provides an facility to ‘check’ a resource before it is transferred – in the case of the Content-Length field, HEAD can be thought of as asking “what length is this resource?”.

Sticking with Google, we can fill in the If-Modified-Since field with the day after the Last-Modified date given for the favicon. This is the response when If-Modified-Since is set to ‘Tue, 27 Oct 2009 20:18:44 GMT’:

304 Not Modified response from Google to If-Modified-Since request

See the ‘(Response Status)’ line showing the “304 Not Modified response? Google’s webserver is clearly responding correctly to the If-Modified-Since header field.

Connection: close

Spider.my uses a Java HttpURLConnection, which defaults to a HTTP/1.1 persistent connection. That means that it won’t immediately close it’s connection to a server, in the expectation that it will be used again. When the expectation is valid, a lot of time can be saved in setting up new connections for each subsequent request, greatly improving latency. Try a few requests for big sites’ favicons with Connection: close and without it. If you get similar results to me, you should see the ‘Time to Headers’ figure is about double with ‘Connection: close’ what is is with a persistent connection. It looks to me as though an extra round-trip is needed for non-persistent connections.

Follow redirects

When your browser allows you to type in the name of an URL as just the domain, and then updates the URL bar to say ‘www.example.com/home.html’, chances are that the webserver at example.com has sent a redirect to your browser to say “don’t ask for that, ask for this instead”. Redirects are also very common on sites with forms to update server-side data. If you set “Don’t follow redirects”, you’ll be able to follow the conversation between your browser and a server in much greater detail. The default for the page is to follow redirects, so if you don’t check the box, you’ll only see the final page in the redirect chain.

I’ve just noticed that Java’s HttpUrlConnection doesn’t record the final URL in a redirect, so the page currently reports the requested URL rather than the URL that finally produced the response (in the case redirect-following is selected). That’s a job for the next update!

Posted in Spider.my | No Comments »

Cross-domain javascript widget, no JSON, no AJAX

February 28th, 2010

I may be doing something terribly wrong. I hope the Internet will tell me. I wanted to make a widget so that I could showcase some AJAX-accessible shipping quotes I’d put on spider.my, but demonstrate accessibility from another site. I’d never written much javascript before this week, so I was very pleased that my first attempt worked so quickly – and then I discovered the Same Origin Policy (SOP). My first attempt was just some javascript that dynamically created the HTML (just thought – shouldn’t the widget markup match the document’s doctype?) elements of the widget, and then responded to user input by sending a request for a shipping quotation to the XML responder I wrote a few days ago.

That first attempt worked great – while it was included in a page from the same server that provided the shipping quote. The first time I tried to include it as a widget on this blog, it failed. According to the Inspect Element facility in Chromium, the problem is:

Uncaught Error: NETWORK_ERR: XMLHttpRequest Exception 101

when the send() function was invoked on the XMLHttpRequest object. At the same point Firefox says

Error: uncaught exception: [Exception… “Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIXMLHttpRequest.send]” nsresult: “0x80004005 (NS_ERROR_FAILURE)” location: “JS frame :: http://localhost:22791/static/js/posmalaysia5.js :: pos_spider_box_quote :: line 26” data: no]

And in IE8 the debugger tells me (on the open() method of XmlHttpRequest)

JScript debugger
Breaking on JScript runtime error – Access is denied

The request is received by my server from Firefox and Chromium, but not from IE8 (obviously, I suppose, if it’s failing on the open() method).

The Internet told me I was attempting Cross-domain scripting and it’s impossible unless you use version 4.8 of Firefox, pretend you’ve fixed it by using a proxy server, or defeat impossibility by using jQuery, flxHR or some other library whose developers presumably don’t understand the word ‘impossible’.

I decided that either this bloody thing is impossible or it isn’t, and that if it isn’t, why should I need a library to do what I wanted to do? For my specific application (or any I can currently imagine I might want to implement) I just want to exchange a very few items of data. If there is a way of doing it, how complex does it have to be?

Anyway, I didn’t have clue a couple of days ago, so I started with what seemed to me to be an unusually straightforward article on cross-domain scripting at IBM:

Cross-domain communications with JSONP, Part 1: Combine JSONP and jQuery to quickly build powerful mashups

A quick read of the document led me to think that JSON wasn’t all that bad. If it solved my problem, I could probably bring myself to use it. The article ‘goes off on one’ about jQuery in the later stages, but I decided I would cross that domain when I came to it. I was of the opinion that JSON alone would give me what I want.

At the crucial part where the article says you have to dynamically create a script element in the document’s HEAD element, I thought I’d try a little experiment and just send some bare javascript assignments instead. It just worked! Before I work through an example, here is a rough run-down of what I had at the time it first worked.

The code I invite people to use to try out the widget is a DIV element with a SCRIPT element inside it whose SRC attribute is the URL of a text/javascript resource at spider.my. Here’s the widget-including fragment one more time:

<div id="pos.spider.box"><script type="text/javascript" src="http://spider.my/static/js/posmalaysia4.js"></script></div>

The script creates the widget content. It also sets a var – which I’m guessing has some sort of global scope as it’s outside a function block – to the DIV which will hold the ‘result’ of a shipping quotation.

A text INPUT box and a SELECT element in the widget both have onchange() methods which call a function that’s part of the widget download. In the function called by onchange(), I assemble an URL from the values in the inputbox and the select element, to create a SCRIPT element in the document’s HEAD with the constructed URL as its SRC attribute.

The constructed URL is identical to the one I use to request an XML response for the one-shot shipping quotation AJAX script, but ends in “.js” instead of “.xml”. I modified my XML-producing server-side code to output javascript as a text/javascript response if the URL ends with “.js”, and it sends back a single javascript assignment (not a function) to set the contents of the ‘result area’ of the widget, using the reference prepared earlier by the widget code.

Here’s an example of the URL I use as the SRC attribute of the SCRIPT element the widget code creates in the document’s HEAD:

http://spider.my/pos-malaysia-shipping-quote-2/0.5-to-Australia.js

That’s for a 0.5kg shipment to Australia. Here’s the text/javascript response from spider.my:

pos_spider_box_quotes.innerHTML = ‘RM60.00 <i>Pos Laju Document</i><br>RM75.00 <i>Pos Laju Parcel</i>’;

See? Just a bare assignment to the var I prepared earlier. And lo and behold, the results are set in the widget! What is ‘impossible’ about that?

Now, this code started working bare hours ago, so I’m prepared to believe that there’s something dangerous, criminal or unholy about it. But presumably a bit of judicious editing will fix that, won’t it? The first thing that strikes me about this technique is that I’m dynamically adding things to the document and never removing them, so there’s a chance of morbid page obesity. Then again, it’s inline javascript that has no means of being executed more than once, so perhaps the clever javascript engine developers automatically dispose of it once it executes. Who knows? Not me, certainly! All I know is that my cross-domain javascript widget works. Despite it being impossible.

Update: after a night’s sleep, I’m wondering if this is a cross-domain example or not. It seems to me that cross-domain should only refer to a script attempting to access a domain other than the one it was loaded from. In that case the XmlHttpRequest method should work. If ‘cross-domain’ means ‘not the page the document was loaded from’, then I can see why XmlHttprequest shouldn’t work and why the dynamic inline javascript would work (both original and dynamically created being from the same domain). Just in case this is not the fantasy of a tired mind, I’m going to christen this Asynchronous Javascript InLining (AJIL – like ‘agile’). Two levels of asynchronism are an obvious problem, but maybe someone will find it useful.

Posted in Fixed, software | 1 Comment »

Pos Malaysia Shipping Widget for your website

February 26th, 2010

(Update 29th Dec 2010: The widget died while I was moving code around. It’s back again for a while until I put some pages together at spider.my explaining how the new API is going to solve all your shipping woes)

(Update 11th Jan 2011: Widget has been off and on for some time while I’ve been beating the API into shape. It should continue to work in its present form for the foreseeable future. Check out the spider.my API to see what the widget uses for shipping data.)

I’ve never written a widget before! This one hasn’t turned out bad – try changing the parcel weight and destination. See? It works! And you can have it*1 on your website too! All you need to do is to add this little bit of HTML:

…and next time you look at your webpage, it’ll have Pos Malaysia shipping rates on it, courtesy of spider.my and a very dodgy browser-security workaround to allow cross-domain … actually it isn’t really AJAX, it’s more like Asynchronous JavaScript And More JavaScript, but you’ll have to wait for the HowTo post to see how I do it (or just view my page source).

At the start of the week I said I’d demonstrate a how a Pos Malaysia API (Application Programmer’s Interface) should work. As a recap, it was to have two major parts: a simple, reliable, fast technique for obtaining a single quote in a format that can be readily used by any application, and a technique for providing a complete set of rating tables for applications that need higher performance or reliability, but also want easy access to updated rates.

It has actually been a very solid week’s work, but it’s now all done. On Monday, I demonstrated what the problem is. Pos Malaysia’s online shipping quotation page is OK*2 for humans to look at, but is too ‘fat’ and arbitrarily formatted for efficient and reliable use in e-commerce. I hinted at an AJAX replacement by caching quotes on my ‘old-technique’ page.

On Tuesday I started ‘ripping’ Pos Malaysia’s shipping rates. The whole problem is the difficulty of access to their rates, so having my own set would make the rest of the demo much easier. To avoid upsetting the Pos Malaysia sysadmin, I wrote a set of persistent classes for the rating data on spider.my, so that I would only have to make the hundreds of requests needed to rip the data just once.

On Wednesday I tidied up the server-side access to the ripped rates and provided a reliable AJAX interface which accepts a weight and a destination and returns quotes from all qualifying methods. I currently only have Pos Laju Document and Parcel rates, so if you ask for a weight over 1kg, you get only 1 quote, 2 quotes for anything lighter (now more quotes since Pos Parcel rates added). At this stage, part 1 of the job is finished: there’s now a simple technique for Pos Malaysia’s customers and partners to instantly acquire a single quote.

The method for complete rating data updates was quite straightforward to provide once the server-side persistence was finished. On Thursday I provided a rates download facility which can give full rating data for all countries and all methods, all methods for one country, or all countries for a single method. I have an e-commerce project ‘on a back burner’ at the moment which will use stored shipping rates to simplify and speed up its checkout. With Pos Malaysia’s current online facility that’s practically impossible. If Pos provided this kind of API it would be straightforward.

As a demonstration piece, I started writing a ‘widget’ that can be used on anybody’s blog or website on Thursday night. I’ve never written much in the way of javascript before this week, so I’ve been spending a lot of time at w3schools.com! I thought I’d finished the widget last night, but discovered the “Same Origin Policy” which prevents javascript on a page from one website loading data from another website. There seem to be a lot of very complicated solutions for this problem, but this afternoon I stumbled on a simple one.

*1 – Yes, you can put it on your blog or website. It’s not very pretty, but if you’d like to try it out I would be very pleased that you did. Let me know what you think of it functionally. One last thing – this is very experimental, so it might stop working from time to time (might, but might also keep going forever), and watch out for the rates: they don’t include Pos’ silly random surcharges.

*2 – OK… but it needs some basic maintenance! There are very many quotes which claim to allow shipments up to 999kg. It looks like Pos has never got round to setting the maximum values for all their countries. Also the country names – some of them are oddly styled like ‘Ukraine (Kiev)’ and ‘Serbia & Montenegro’. The latter hasn’t existed since 2006, according to Wikipedia! The technique Pos uses to request a quote from their server doesn’t actually work with names like these (try getting a quote for shipping to ‘Serbia & Montenegro’ from Pos’ website). The names form the URL for the quote, but contain invalid characters for URLs, so some countries return errors instead of quotes!

Posted in Fixed, software | 3 Comments »

Sean's blog

Pos Malaysia widget in Malay and Chinese (browser language detection)

NEW: Pos Malaysia Parcel rates

HTTP Header viewer updates: see redirects and test If-Modified-Since

If-Modified-Since

Connection: close

Follow redirects

Cross-domain javascript widget, no JSON, no AJAX

Pos Malaysia Shipping Widget for your website

Pages

Recent Posts

Adsense

Archives

Categories

More Adsense

Recent Comments

Blogroll

Meta

Sean's blog

Pos Malaysia widget in Malay and Chinese (browser language detection)

NEW: Pos Malaysia Parcel rates

HTTP Header viewer updates: see redirects and test If-Modified-Since

If-Modified-Since

Connection: close

Follow redirects

Cross-domain javascript widget, no JSON, no AJAX

Pos Malaysia Shipping Widget for your website

Pages

Recent Posts

Adsense

Archives

Categories

More Adsense

Tags

Recent Comments

Blogroll

Meta