Damerau-Levenshtein in Java

October 3rd, 2008 | by Sean |

When I wrote about the Damerau Levenshtein MySQL UDF (User Defined Function) I explained that I prototyped the functions in Java first. I had a request for the Java code, but when I looked at it, realised that I only used it half way, and then made finishing touches to the C code directly.

Here is my Levenshtein.java, a class containing 3 static methods: lev(String, String) - the standard Levenshtein function, levlim(String, String, int) - an optimised Levenshtein that returns as soon as the distance between the two words grows beyond a limit, and damlev(String, String) - Damerau-Levenshtein distance between the two words.

The class has a main function, so you can run it directly. Besides the limit optimisation on levlim(), there’s no real optimisation done anywhere, so there should be plenty to go at. In particular, a new rectangular array is created on each invocation of any of the methods.

Possibly the most important thing to warn is this: I’ve only used this code for testing ideas, and it has never been used in any application!

So, fill your boots, here is Levenshtein.java.


Share and Enjoy:
  • Digg
  • del.icio.us
  • Fark
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • E-mail this story to a friend!
  • Facebook
  • Furl
  • Mixx
  • Print this article!
  • Sphinn
  1. 2 Responses to “Damerau-Levenshtein in Java”

  2. By Adam Koprowski on Oct 17, 2008 | Reply

    Your code seems buggy (or at least not coherent with standard specification) as:
    lev(”1″, “Blah blah blah blah blah blah”)) = 1

  3. By Sean on Oct 17, 2008 | Reply

    Well spotted! How embarrassing. I had never used the levenshtein code, only the damerau version, which doesn’t have this bug in it. I transcribed Josh Drew’s code by hand. The two string input parameters are s and t. I mistyped the code so that I took the length of s, and the length of s again. I’ve updated the Java code linked in the article, and this bug seems to be squashed.

    Thanks for reporting.

Post a Comment