Largest common subsequence

This online calculator is designed to find the largest common subsequence of two sequences given as strings.

The calculator below solves the problem of finding the largest common subsequence. The problem is solved by dynamic programming, using an iterative algorithm, or bottom-up algorithm. The result is a matrix of solutions to the subproblem of finding the lengths of subsequences, which is then used to form the largest common subsequence. The calculator outputs the subsequence, its length and the solution matrix. You can read more about the problem and its solution algorithm under the calculator.

Largest common subsequence

Largest common subsequence

Length of the longest common subsequence

Solution matrix

Numerical solution matrix

Most common subsequence

A subsequence is a sequence that can be obtained from an original sequence after removing some set of elements (including an empty one) from it. We should distinguish a subsequence from a substring. For example, if our original sequence is ABCDEF, then ACE is a subsequence obtained by removing elements B, D, and F, and ABC is a substring (which is also a subsequence). In general, all substrings are subsequences, but not all subsequences are substrings.

The problem of finding the largest common subsequence is the problem of finding the largest subsequence contained in all sequences of a given set. Often the largest common subsequence is searched for in only two sequences, for example, the largest common subsequence between a short string and a long string. That is, in fact, they want to find out whether the letters of the short string appear in the same order, but possibly at different distances, in the long string. The problem is a classical computer science problem, its algorithms belong to string algorithms, and practically it is used for data comparison, for example, in bioinformatics and computational linguistics.

Finding the largest common sequence

Consider an iterative algorithm for finding the largest common sequence between two strings. The easiest way to obtain such a sequence is to form a matrix of solutions to the problem of finding the length of a common subsequence.

Algorithm for filling the solution matrix

Note: We will assume that characters in the string are numbered from index 0, as in most programming languages.

1. Form an empty matrix of size m+1 rows by n+1 columns, where m is the length of the first row, n is the length of the second row.
2. Let's traverse all the elements of this matrix, starting from the lower right corner, moving first from right to left (index j), then, after reaching index 0, from bottom to top (index i).
3. For each element with index i,j we calculate its value according to the following rule
• if the current index of row i is m or the current index of column j is n, assign the element the value 0
• if the symbol with index i of the first row is equal to the symbol with index j of the second row, assign the value 1 + the value of the element with index i+1, j+1 (the value of the element one position below and to the right of the current one) to the element.
• if the symbols are not equal, assign to the element the maximum value of its neighbors to the right and bottom.

Having passed all elements in this way, in the element with index 0,0 we will get the length of the maximum common subsequence. Using the obtained matrix, we can form the subsequence itself.