Damerau-Levenshtein in Java
October 3rd, 2008 | by Sean |When I wrote about the Damerau Levenshtein MySQL UDF (User Defined Function) I explained that I prototyped the functions in Java first. I had a request for the Java code, but when I looked at it, realised that I only used it half way, and then made finishing touches to the C code directly.
Here is my Levenshtein.java, a class containing 3 static methods: lev(String, String) – the standard Levenshtein function, levlim(String, String, int) – an optimised Levenshtein that returns as soon as the distance between the two words grows beyond a limit, and damlev(String, String) – Damerau-Levenshtein distance between the two words.
The class has a main function, so you can run it directly. Besides the limit optimisation on levlim(), there’s no real optimisation done anywhere, so there should be plenty to go at. In particular, a new rectangular array is created on each invocation of any of the methods.
Possibly the most important thing to warn is this: I’ve only used this code for testing ideas, and it has never been used in any application!
So, fill your boots, here is Levenshtein.java.
Updated with a bugfix March 2009
2009 Apr 16 update – see the Damerau Levenshtein page for latest version
4 Responses to “Damerau-Levenshtein in Java”
By Adam Koprowski on Oct 17, 2008 | Reply
Your code seems buggy (or at least not coherent with standard specification) as:
lev(“1”, “Blah blah blah blah blah blah”)) = 1
By Sean on Oct 17, 2008 | Reply
Well spotted! How embarrassing. I had never used the levenshtein code, only the damerau version, which doesn’t have this bug in it. I transcribed Josh Drew’s code by hand. The two string input parameters are s and t. I mistyped the code so that I took the length of s, and the length of s again. I’ve updated the Java code linked in the article, and this bug seems to be squashed.
Thanks for reporting.