Damerau-Levenshtein in Java

October 3rd, 2008 | by Sean |

When I wrote about the Damerau Levenshtein MySQL UDF (User Defined Function) I explained that I prototyped the functions in Java first. I had a request for the Java code, but when I looked at it, realised that I only used it half way, and then made finishing touches to the C code directly.

Here is my Levenshtein.java, a class containing 3 static methods: lev(String, String) – the standard Levenshtein function, levlim(String, String, int) – an optimised Levenshtein that returns as soon as the distance between the two words grows beyond a limit, and damlev(String, String) – Damerau-Levenshtein distance between the two words.

The class has a main function, so you can run it directly. Besides the limit optimisation on levlim(), there’s no real optimisation done anywhere, so there should be plenty to go at. In particular, a new rectangular array is created on each invocation of any of the methods.

Possibly the most important thing to warn is this: I’ve only used this code for testing ideas, and it has never been used in any application!

So, fill your boots, here is Levenshtein.java.

Updated with a bugfix March 2009

2009 Apr 16 update – see the Damerau Levenshtein page for latest version

  1. 4 Responses to “Damerau-Levenshtein in Java”

  2. By Adam Koprowski on Oct 17, 2008 | Reply

    Your code seems buggy (or at least not coherent with standard specification) as:
    lev(“1”, “Blah blah blah blah blah blah”)) = 1

  3. By Sean on Oct 17, 2008 | Reply

    Well spotted! How embarrassing. I had never used the levenshtein code, only the damerau version, which doesn’t have this bug in it. I transcribed Josh Drew’s code by hand. The two string input parameters are s and t. I mistyped the code so that I took the length of s, and the length of s again. I’ve updated the Java code linked in the article, and this bug seems to be squashed.

    Thanks for reporting.

  1. 2 Trackback(s)

  2. Apr 16, 2009: Sean’s blog » Blog Archive » Damerau-Levenshtein algorithm: Levenshtein with transpositions
  3. Jan 4, 2011: Search suggestions with MYSQL Levenshtein distance

Post a Comment