{"id":261,"date":"2009-03-30T00:40:30","date_gmt":"2009-03-29T16:40:30","guid":{"rendered":"http:\/\/blog.lolyco.com\/sean\/?p=261"},"modified":"2009-07-28T19:42:16","modified_gmt":"2009-07-28T11:42:16","slug":"java-damerau-levenshtein-update","status":"publish","type":"post","link":"https:\/\/blog.lolyco.com\/sean\/2009\/03\/30\/java-damerau-levenshtein-update\/","title":{"rendered":"Java Damerau Levenshtein update"},"content":{"rendered":"<p>There&#8217;s currently a fairly serious issue with character encoding at <a title=\"My spider\" href=\"http:\/\/spider.my\">spider.my<\/a> &#8211; the knock-on effect of which is far too many entries in the keyword table. The <a title=\"MySQL Damerau Levenshtein\" href=\"http:\/\/blog.lolyco.com\/sean\/2008\/08\/27\/damerau-levenshtein-algorithm-levenshtein-with-transpositions\/\">MySQL Damerau Levenshtein UDF<\/a> I wrote is fast, but the disks in my database server are not. A spelling suggestion was taking several seconds to compute &#8211; far too slow for an interactive application.<\/p>\n<p>I wrote a quick Java RMI server to load all the keywords into memory, and used the java code I wrote (but never previously used) to scan through for the best match. It&#8217;s still far from ideal, but seems to execute in less than 10% of the time the database takes to do the same thing. This isn&#8217;t an example of Java&#8217;s performance supremacy over C++, but the advantage of writing a dedicated application to do a single task.<\/p>\n<p>The reason I wrote the Java code was to help understand the original C code the UDFs were based on. When I came to use them to get the spelling suggestions back online, I realised there was a fairly serious bug in them. Because of a transposition of string lengths, an example such as damlevlim(&#8220;speling&#8221;, &#8220;spelling&#8221;, 3) was returning 3 instead of 1. That bug is now squashed.<\/p>\n<p>If you want to test the code, it&#8217;s now the Java damlevlim(&#8230;) method that&#8217;s <a title=\"Search for 'spdier'\" href=\"http:\/\/spider.my\/search?q=spdier\">providing the spelling suggestions at spider.my<\/a><\/p>\n<p>Here is a class file with the static methods in, there&#8217;s no license expressed or implied on these, so take the method you want and embed it in your code, after testing it with the main() method provided.<\/p>\n<p><span style=\"text-decoration: line-through;\"><a href=\"http:\/\/blog.lolyco.com\/sean\/wp-content\/uploads\/2009\/03\/levenshtein.java\">Java Levenshtein \/ Damerau-Levenshtein methods<\/a><\/span><\/p>\n<p>2009 Apr 16 update &#8211; see <a title=\"Damerau Levenshtein page\" href=\"http:\/\/blog.lolyco.com\/sean\/damerau-levenshtein\/\">the Damerau Levenshtein page<\/a> for latest version<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There&#8217;s currently a fairly serious issue with character encoding at spider.my &#8211; the knock-on effect of which is far too many entries in the keyword table. The MySQL Damerau Levenshtein UDF I wrote is fast, but the disks in my database server are not. A spelling suggestion was taking several seconds to compute &#8211; far [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-261","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blog.lolyco.com\/sean\/wp-json\/wp\/v2\/posts\/261","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.lolyco.com\/sean\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.lolyco.com\/sean\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.lolyco.com\/sean\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.lolyco.com\/sean\/wp-json\/wp\/v2\/comments?post=261"}],"version-history":[{"count":6,"href":"https:\/\/blog.lolyco.com\/sean\/wp-json\/wp\/v2\/posts\/261\/revisions"}],"predecessor-version":[{"id":314,"href":"https:\/\/blog.lolyco.com\/sean\/wp-json\/wp\/v2\/posts\/261\/revisions\/314"}],"wp:attachment":[{"href":"https:\/\/blog.lolyco.com\/sean\/wp-json\/wp\/v2\/media?parent=261"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.lolyco.com\/sean\/wp-json\/wp\/v2\/categories?post=261"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.lolyco.com\/sean\/wp-json\/wp\/v2\/tags?post=261"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}