09. Global Alignment Needleman-Wunsch

The purpose of global alignment (aka optimal matching algorithm) is to align two sequences from start to end, and make as many matches as possible. Algorithms do this by inserting gaps within the letters of each sequence to maximize the number of matches. This technique is often used when comparing sequences of similar length.


The algorithm for global alignment was first published by Saul Needleman and Christian Wunsch (1970), then modified over the next few years by other scientists. It aligns sequences from beginning to end and finds the best alignment that maximizes the overall score. The score is calculated based on matches, mismatches, and gaps.

Dynamic Programming

The algorithm is based on an algorithmic technique known as dynamic programming, where the optimal alignment is reduced into a series of smaller problem. The solution to these smaller problems are then used to calculate the solution to the larger (or overall) problem.


The best way to go through the algorithm is to show an example, so let's compare two sequences - IDENTITY and SIMILARITY.

1) Set up a 2d matrix

The first step is to set up a 2d matrix with the first sequence on the y-axis, and the second sequence on the x-axis.

Simple explanation of needleman-wunsch algorithm

2) Decide on a scoring system

Now we need to fill this grid with numerical values, which we will later use to score the matches. Let's come up with a formula to find the score per cell, which we can do with pre-determined values.

Simple explanation of needleman-wunsch algorithm
  1. In the first option, if a match is present (cell will be colored green), s(xi, yi) will be given a score of +1. If it is a mismatch, then the score will be -2.
  2. In the second option, a gap will be inserted in the first sequence. Thus, we take the cell left of the current cell and -2 from it.
  3. In the second option, a gap will be inserted in the second sequence. We take the cell on top of the current cell and -2 from it.

The maximum of these values will be taken and placed as the current cell's score.

3) Fill out the primary values

Start with filling a 0 at the top-left corner. Each cell along the row will be -2 since the only value it looks at is the cell to the left. Each cell along the first column will be -2 from its top, since that's the only value it has to look.

After coloring each matching letter green, you will have a table that looks like this:

Simple explanation of needleman-wunsch algorithm

4) Fill the rest of the table

We can see that for green cells (match), we take the maximum value of the left, top-left (diagonal), or top, and +1 to it. For any mismatches or gap insertions, we -2 from the largest value of the left, top-left (diagonal) and top cells.

Filling in the table for needleman-wunsch algorithm

A completed matrix gives the final score of the alignment on the bottom-right most cell. In this case, it's -7.

Simple explanation of needleman-wunsch algorithm

5) Trace the optimal path

Before we find trace the optimal path, let's go over some conventions.

Our purpose is to find the optimal path from the top left going to the bottom right. A diagonal line denotes that the letter is aligned (can be match or mismatch). A perfect match will have a diagonal going from top left to bottom right. Mismatches can still generate this diagonal, but their scores will be different.

Simple explanation of needleman-wunsch algorithm

Horizontal and vertical lines denote the presence of a gap in the first and second sequence, respectively. Gaps may be of any length, and may occur within (internal) or at the ends (terminal) of sequences.

Simple explanation of needleman-wunsch algorithm

For our example, we start from the bottom-right cell, then move our way up to the top-left.

Simple explanation of needleman-wunsch algorithm

With this, we get the alignment:

There are other alignments that give the same score. Can you trace them out?

6) Varying parameters

Depending on what kind of alignment you're looking for, you can vary the parameters. For example, some sequence alignment algorithms vary gap penalties in two subcategories. An opening gap would be given a more severe penalty than each continuing gap. In our example case, all gaps were treated equally with a penalty of -2.

Adding a harsher mismatch penalty can force more gaps to be created. However, this is non-ideal as it's good to have near-identical sequences with few deletions.

Additionally, global alignments usually use an end gap penalty. This is the distinguishing factor between global and local alignment, which we'll see next.

Take your Linux skills to the next level!

The Linux Command Line

Take your Linux skills to the next level! Try Linux & UNIX

The Linux Command Line takes you from your very first terminal keystrokes to writing full programs in Bash, the most popular Linux shell. Along the way you'll learn the timeless skills handed down by generations of gray-bearded, mouse-shunning gurus: file navigation, environment configuration, command chaining, pattern matching with regular expressions, and more.

$ Check price
39.9539.95Amazon 4.5 logo(274+ reviews)

More Linux & UNIX resources

Become a Bioinformatics Whiz!

Introduction to Bioinformatics Vol. 1

Become a Bioinformatics Whiz! Try Bioinformatics

If you're looking for a fun and easy entry point into bioinformatics algorithms, this book it just for you! Filled with graphics, and written in a light-hearted and humorous story-telling persona, Bioinformatics Algorithms guides you through the intricacies of the problems faced in biology, and the clever solutions used to solve them.

$ Check price
49.9949.99Amazon 4.5 logo(4+ reviews)

More Bioinformatics resources