10. Local Alignment Smith-Waterman

Local alignments like global alignments, but they generate "islands" of areas that have the greatest similarity. This is helpful when the query and sequence are dissimilar, but are suspected to contain domains or small regions of similarity. The BLAST algorithm uses local alignment.

Local alignments differ from global alignments in a few ways:

  • No penalty for starting at an internal position.
  • Does not necessarily extend to ends of sequences.
  • No negative values on the matrix are allowed - zeros are used instead.

The power of end gap penalties

The Smith-Waterman algorithm is very much like the Needleman-Wunsch algorithm used in global alignments - the hallmark difference is in the scoring methodology.

Unlike global alignment, local alignments have no end gap penalties, allowing small interior alignments to rank higher when scored.

Let's take a quick look at the effects of end gap penalties. The following sequence is aligned globally, with high end gap penalties.

M   -   N   A   L   S   D   R   T
M   G   S   D   R   T   T   E   T 
6 -12   1   0  -3   1   0  -1   3 = -5

Now in this next sequence, we have a local alignment. Notice how the small region in the middle aligns quite nicely.

      M  N  A  L  S  D  R  T  -  -  -
-  -  M  G  S  D  R  T  T  E  T
0  0 -1 -4  2  4  6  3  0  0  0 = 10

Without the end gap penalty, the Smith-Waterman alignment algorithm is able to find the best locally matching sequence.

Example

Let's compare two sequences - CGTTCTA and AACGTTGG.

1) Set up a 2d matrix

Set up a 2d matrix, as we did earlier in the Needleman-Wunsch example.

Smith Waterman 2d matrix
Setting up our matrix.

2) Decide on a scoring system

We need separate scores for matches, mismatches and gaps.

Smith Waterman Formula

Any cell that would have a negative value are given 0 instead.

3) Fill out primary values

We want to start with the first row and column and gives those a value of 0. Then we want to mark the cells that indicates a match.

Smith Waterman 2d matrix
Setting up our matrix.

4) Fill out rest of table

Now we fill the rest of our table out. Make sure to keep track of where each cell value came from, as we need this to trace back our optimal alignment.

Smith Waterman 2d matrix
Filling out rest of the values.

Note that a mismatch or a match can only come from the cell diagonally up to the left of the current cell. Additionally, gaps may only come from the top or left of the current cell.

5) Trace the optimal path

Now all we need to do is retrace our steps. First, find the cell with the highest score.

Smith Waterman 2d matrix
Tracing back the optimal alignment.

Now we trace back until we get to a cell with 0. Thus, our optimal local alignment becomes:

--CGTTCTA
AACGTTGG-

Conclusion: Global vs. Local alignments

Thus, we may say that for global alignments, where the sequences are connected along the entire length of their sequences, there is a higher % identity with many small interior gaps. For local alignments, which focus on the best matching regions, there is a lower % identity, but fewer interior gaps and longer end gaps.

Become a Bioinformatics Whiz!

Introduction to Bioinformatics Vol. 2

Become a Bioinformatics Whiz! Try Bioinformatics

This is Volume 2 of Bioinformatics Algorithms: An Active Learning Approach. This book presents students with a light-hearted and analogy-filled companion to the author's acclaimed course on Coursera. Each chapter begins with an interesting biological question that further evolves into more and more efficiently solutions of solving it.

$ Check price
49.9949.99Amazon 5 logo(5+ reviews)

More Bioinformatics resources

Take your Linux skills to the next level!

The Linux Command Line

Take your Linux skills to the next level! Try Linux & UNIX

The Linux Command Line takes you from your very first terminal keystrokes to writing full programs in Bash, the most popular Linux shell. Along the way you'll learn the timeless skills handed down by generations of gray-bearded, mouse-shunning gurus: file navigation, environment configuration, command chaining, pattern matching with regular expressions, and more.

$ Check price
39.9539.95Amazon 4.5 logo(274+ reviews)

More Linux & UNIX resources

Ad