By changing the way we represent multiple sequence alignments...
We can change the way we construct multiple sequence alignments...
Yielding a faster and more accurate multiple sequence alignment algorithm...POA.
POA can accurately align 5,000 EST sequences in 4 hours on a Pentium II.
Discoveries Made Using POA
Modrek, B., Resch, A., Grasso, C., Lee, C. (2001) Genome-wide analysis of alternative splicing using human expressed sequence data. Nucleic Acids Research 29: 2850-2859.
General Idea:
Progressive multiple sequence alignment (MSA) methods depend on reducing an MSA to a linear profile for each alignment step. However, this leads to loss of information needed for accurate alignment, and gap scoring artifacts. We present a graph representation of an MSA, called a PO-MSA, that can itself be aligned directly by pairwise dynamic programming, eliminating the need to reduce the MSA to a profile. This enables our algorithm (POA: partial order alignment) to guarantee that the optimal alignment of each new sequence versus each sequence in the MSA will be considered. Moreover, this algorithm introduces a new edit operator, homologous recombination, important for multidomain sequences. The algorithm has improved speed (linear time complexity) over existing MSA algorithms, enabling construction of massive and complex alignments (e.g. an alignment of 5000 sequences in 4 hours on a Pentium II). The algorithm returns not only standard PIR and CLUSTAL format, but also PO format. PO-MSAs stored as PO files may be used in lieu of consensus sequences in genome and proteome databases, making it possible to store many different PO-MSAs relating the sequences from many different genomes and proteomes in a single database. Additionally, the speed with whcih PO-MSA data structures may be reconstructed from flat text files and written to flat text files facilitates the use of PO formatted files by many different programs accessing the database. PO-MSAs may also be used effectively in genome and proteome analysis programs and protocols. Their utility relies not only on the ease with which they may be visualized using POVIS, the new PO-MSA visualizer, but also on the ease with which features such as domains, snps, and alternative splicing may be extracted from them. This has been essential for our high throughput annotaiton of both snps and splice sites in human EST data, for which we developed two POA library funcitons, report_snp and report_splice. The partial order alignment program, POA, and the PO-MSA visualizer, POVIZ, are currently available. The report_snp and report_splice library functions will be made available at this website shortly.