Key Takeaway: This paper presents a polynomial-time algorithm for converting one alignment to another using "move gap" operations, showing that the problem is NP-hard and demonstrates its applicability to long DNA sequence alignments.

Abstract

An alignment of k given sequences is a k-rowed matrix frequently used by molecular biologists to display correspondences between entries from each sequence. Under one approach, an alignment is represented by a matrix of ‘x’ and ’-’ characters, where each x in row r indicates the position of an entry of sequence r. It is sometimes efficient to store only the run-length encoding of each row of this bit-matrix. A natural class of commands for editing one such row into another consists of operations of the form: “Move the d dashes that begin at position i of row r to position j of that row,” for relevant values of r, d, i and j. We show that the problem of determining a shortest sequence of such operations that converts one given alignment to another is NP-hard and give a polynomial-time algorithm that always comes within a factor 5/4 of optimality. An application of these ideas to alignments of long DNA sequences is discussed.