Archive for the ‘Spider.my’ Category

LA Times reports Michael Jackson dead, HEAD missing

Friday, June 26th, 2009

[caption id="attachment_392" align="alignright" width="147" caption="Michael Jackson - from Wikipedia"][/caption] I was manually adding some reports of Michael Jackson's death to the crawl queue at spider.my this morning, when I noticed that one of the machines doing indexing had choked on a page. It wasn't long ago that I added some code ...

64 bit Java on slamd64 (64 bit slackware)

Sunday, April 5th, 2009

[caption id="attachment_281" align="alignright" width="280" caption="slamd64 - 64bit Slackware"][/caption] slamd64 is Fred Emmott's 64bit Slackware project. I installed it on a couple of servers that provide the bulk of the processing power behind spider.my recently. Just like installing Slackware, installing slamd64 on a server is a matter of downloading the first CD ...

Streamyx down in the Port Dickson area

Thursday, March 5th, 2009

Well, that didn't take long. From 1pm today (0400 GMT), our phones are dead. I walked around to the neighbours - their phones were dead. I popped into the local Internet cafes, their phones are dead too. At the 3rd Internet cafe, the staff looked surprised to find that their ...

What not to GET. Limiting what robots will request.

Monday, September 8th, 2008

I tested the spider at spider.my a few times recently. It was previously restricted to just a few sites that their respective admins had kindly volunteered. One of the immediate problems I noticed with releasing the spider in the wild was the number of pages I was mangling between storage ...

Damerau-Levenshtein algorithm: Levenshtein with transpositions

Wednesday, August 27th, 2008

I'm still working away slowly at Spider.my, and spotted a funny loop in the search suggestions: [caption id="attachment_117" align="alignnone" width="300" caption="Search for Teusday - how about Thursday?"][/caption]The helpful hint is "maybe 'thursday' would get more results?". I'm using a simple Levenshtein distance algorithm to provide hints when only a few results ...