Martin Fürer, W. Miller
Mar 1, 1996
Citations
0
Citations
Journal
Int. J. Found. Comput. Sci.
Abstract
An alignment of k given sequences is a k-rowed matrix frequently used by molecular biologists to display correspondences between entries from each sequence. Under one approach, an alignment is represented by a matrix of ‘x’ and ’-’ characters, where each x in row r indicates the position of an entry of sequence r. It is sometimes efficient to store only the run-length encoding of each row of this bit-matrix. A natural class of commands for editing one such row into another consists of operations of the form: “Move the d dashes that begin at position i of row r to position j of that row,” for relevant values of r, d, i and j. We show that the problem of determining a shortest sequence of such operations that converts one given alignment to another is NP-hard and give a polynomial-time algorithm that always comes within a factor 5/4 of optimality. An application of these ideas to alignments of long DNA sequences is discussed.