Levenshtein Distance

This online calculator measures the Levenshtein distance between two strings.

Michele

Created: 2011-12-11 09:06:35, Last updated: 2021-09-28 08:25:58

Levenshtein distance (or edit distance) between two strings is the number of deletions, insertions, or substitutions required to transform source string into target string. For example, if the source string is "book" and the target string is "back," to transform "book" to "back," you will need to change first "o" to "a," second "o" to "c," without additional deletions and insertions. Thus, the Levenshtein distance between "book" and "back" will be 2.

Levenshtein Distance

Levenshtein Distance

Levenshtein Distance Algorithms and Applications

The Levenshtein distance between two strings a, b (of length |a| and |b| respectively) is given by lev(a,b) where
${\displaystyle \qquad \operatorname {lev} (a,b)={\begin{cases}|a|&{\text{ if }}|b|=0,\\|b|&{\text{ if }}|a|=0,\\\operatorname {lev} (\operatorname {tail} (a),\operatorname {tail} (b))&{\text{ if }}a[0]=b[0]\\1+\min {\begin{cases}\operatorname {lev} (\operatorname {tail} (a),b)\\\operatorname {lev} (a,\operatorname {tail} (b))\\\operatorname {lev} (\operatorname {tail} (a),\operatorname {tail} (b))\\\end{cases}}&{\text{ otherwise.}}\end{cases}}}$.

The Levenshtein distance is named after the Russian scientist Vladimir Levenshtein, who devised the metric in 1965. There are several algorithms to compute the Levenshtein distance:

• Recursive; the straightforward algorithm, which follows the definition
• Iterative with full matrix; the one used in the calculator above
• Iterative with two matrix rows

More details and pseudocode implementations for all algorithms can be found in Wikipedia article Levenshtein distance

It has been shown1 that the Levenshtein distance cannot be computed in strongly subquadratic time, which makes its usage for comparing long strings impractical, because computation cost will be proportional to the product of string lengths. However, the edit distance can be used to find matches of a short string, for example, taken from the dictionary, in a long string. This is useful for spell checkers, correction systems for optical character recognition, and similar products.

1. Backurs, Arturs; Indyk, Piotr (2015). Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false). Forty-Seventh Annual ACM on Symposium on Theory of Computing (STOC).

URL copied to clipboard
PLANETCALC, Levenshtein Distance