Lecture notes: Structural biochemistry and bioinformatics 2001

Lecture 12 Nov 2001, Per Kraulis

How do we generate a multiple alignment? There is an obvious solution:
**Given a pairwise alignment, just add the third, then the
fourth, and so on**, until all have been aligned. But is this
as straightforward as it looks? Consider a case with three short fake
sequence that are pair-wise homologous in the following way:

dront ...AGAC... t-rex ...--AC... dront ...AGAC... unicorn ...AG--... t-rex ...AC... unicorn ...AG...

It is not self-evident how these sequences are to be aligned together. Here are some possibilities:

dront ...AGAC... t-rex ...--AC... unicorn ...AG--... dront ...AGAC... t-rex ...AC--... unicorn ...AG--... dront ...AGAC... t-rex ...--AC... unicorn ...--AG...

A multiple alignment depends in a non-trivial way on the various
parameters (insertion/deletion penalties, substitution
coefficients,...), and, importantly, also **on the
order** in which sequences are added to the growing multiple
alignment.

In pairwise alignments, one has a two-dimensional matrix with the sequences on each axis, and the elements in the matrix are initially the substitution coefficients, which are then operated on (as you have done previously) to locate the best "path" through the matrix. The number of operations required to do this is approximately proportional to the product of the lengths of the two sequences.

A possible general method would be to **extend the pairwise
alignment method into a simultaneous N-wise alignment**, using
a complete dynamical-programming algorithm in N
dimensions. Technically, it is not difficult to implement such an algorithm.

In the case of three sequences to be aligned, one can visualize this reasonable easily: One would set up a three-dimensional matrix (a cube) instead of the two-dimensional matrix for the pairwise comparison. Then one basically performs the same procedure as for the two-sequence case. This time, the result is a path that goes diagonally through the cube from one corner to the opposite.

The problem here is that the time to compute this N-wise alignment
becomes prohibitive as the number of sequences grows. The
**algorithmic complexity is something like
O(c ^{2n})**, where c is a constant, and n is the number
of sequences. (See the next section for an explanation of this
notation.) This is disastrous, as may be seen in a simple example: if
a pairwise alignment of two sequences takes 1 second, then four
sequences would take 10

Copyright © 2001 Per Kraulis $Date: 2001/11/09 14:35:31 $