Skip to content Skip to sidebar Skip to footer

Finding The Longest Common Subsequence In A Variable Amount Of Sets With No Repeating Characters?

I'm trying to come up with an efficient algorithm that would work in JavaScript for the longest common subsequence problem. However, there are two main differences between my probl

Solution 1:

Not sure how efficient an adaption of the normal LCS algorithm would be, so this may be inefficient, but since there aren't many eligible letters, it's not too horrible.

Make a successor matrix. For each string s, for each pair of letters (s[i],s[j]) with i < j, increment matrix[s[i]][s[j]]. A pair of letters can only be part of a longest common subsequence if matrix[a][b] = n = number of strings. Build a directed graph from the letters participating in such a pair, with an edge from a to b iff matrix[a][b] = n. Find a longest path in that graph.

Solution 2:

What you are trying to do is called multiple sequence alignment. The algorithm 'could' be rewritten to be n-dimensional, but that would increase the space complexity to Length ^ Dimensions.

Since none of the letters repeat, is there a more efficient algorithm I should consider rather than the one described in the Wikipedia article?

Perhaps. First create a mapping (a,b) such that a < b. For example the mappings for the first three characters of each sets is as follows

--- Mapping for character[0]
Z - BA - C    A - N    Z - IA        N        IA
    N        I        C        N
    IB        O        C
    C        Z        T        O
    O        I        Z        T
    T        O        BB

--- Mapping for character[1]B - A    C - N    N - II - A
    N        I        C        N
    IB        O        C
    C        Z        T        O
    O        I        Z        T
    T        O        BB

--- Mapping for character[2]A - N    N - I    N - C    I - N
    IB        O        C
    C        Z        T        O
    O        I        Z        T
    T        O        BB

Then retain only mappings that are common on all sets. Here's an example of following this process, traversing Set[A] (ZBANICOT), which will give us a set of relations that are common to all sets.

So with the letter Z you will observe that in Set[C], the only character that follows Z is B, so you can eliminate just about every mapping from Z that is not to B. But to cut a long story short, every mapping for Z is removed. Note, only mappings from Z are removed, not mappings to Z.

Also note that since Set[D] ends with B, we can safely eliminate all mappings from B. By following the process one can observe that A maps to N, C, O and T. Next is N, which maps to T and O.

I maps to O (honestly, "I/O") while C maps to O and T (how cute). And funnily enough O maps to nothing!!! How befitting.

Finally there is one more character, and T maps to nothing too (since ZBANICOT ends with T).

In the end you are left with the following mappings:

A - C    C - O    I - O    N - O
    N        TT
    O
    T

Now all you have to do is start from each of the keys (A, C, I and N) and search for the longest sequence without any repeating nodes (characters in this case).

Here are all of the common subsequences (all using the simple algorithm):

ACO
ACT
ANO
ANT
AO
AT
CO
CT
IO
NT
NO

Solution 3:

What if I stored character mapping for each set of characters and then incremented this count each time...

For the 1st set (ZBANICOT), this would create the following character pairs (28 of them):

ZB, ZA, ZN, ZI, ZC, ZO, ZT
    BA, BN, BI, BC, BO, BT
        AN, AI, AC, AO, AT
            NI, NC, NO, NT
                IC, IO, IT
                    CO, CT
                        OT

For the 2nd set (ACNTBZIO), I would have 28 pairs again:

AC, AN, AT, AB, AZ, AI, AO
    CN, CT, CB, CZ, CI, CO
        NT, NB, NZ, NI, NO
            TB, TZ, TI, TO
                BZ, BI, BO
                    ZI, ZO
                        IO

For the last 2 sets, I would have:

AN, AI, AC, AO, AT, AZ, AB
    NI, NC, NO, NT, NZ, NB
        IC, IO, IT, IZ, IB
            CO, CT, CZ, CB
                OT, OZ, OB
                    TZ, TB
                        ZB

ZI, ZA, ZN, ZC, ZO, ZT, ZB
    IA, IN, IC, IO, IT, IB
        AN, AC, AO, AT, AB
            NC, NO, NT, NB
                CO, CT, CB
                    OT, OB
                        TB

(So to figure out the iteration count so far, let N equal the number of sets, and M be the number of characters in each set, we have N*(M)((M-1)/2) or 4*28=112 in this example)

Next, I would count the number of times each pair exists. For example,

PairCounts["ZB"] == 3
PairCounts["ZA"] == 2// ...
PairCounts["AO"] == 4
PairCounts["AT"] == 4// ...
PairCounts["AN"] == 4
PairCounts["AC"] == 4
PairCounts["CT"] == 4
PairCounts["NO"] == 4
PairCounts["NT"] == 4

Then, since we have 4 pairs, I would discard all pairs which don't have a count of 4. And then I would try concatting each pair in order to form the longest string possible.

Post a Comment for "Finding The Longest Common Subsequence In A Variable Amount Of Sets With No Repeating Characters?"