01. Pairwise Alignment Introduction

What is Pairwise Alignment?

Pairwise alignment is the process of aligning two DNA, RNA or protein sequences such that the regions of similarity are maximized. This is often performed to find functional, structural or evolutionary commonalities.

In most cases, scientists use two protein sequences to quantitatively find relatedness (aka homology). With this, they are able to identify common domains and motifs, and sequence ancestry.

Domains and Motifs

Domains are parts of a DNA or amino acid strand that code for a physiochemically similar feature as found in other sequences and proteins. Domains refer to specific functionalities. For example, you could have a ATP-binding domain or polar domain.

Motifs are similar, but reference the structural characteristics rather than functional regions. Motifs are often found in domains, although that's not always the case.

Protein vs. DNA sequence alignment

Protein amino acid sequences are preferred over DNA sequences for a list of reasons.

  • Protein residues are more informative - a change in DNA (especially the 3rd position) does not necessarily change the AA.
  • The larger number of amino acids than nucleic acids makes it easier to find significance.
  • Some amino acids share related biochemical properties, which can be accounted for when scoring multiple pairwise alignments.
  • Protein sequence comparisons can link back to over a billion years ago, whereas DNA sequence comparisons can only go back up to 600 mya. Thus, protein sequences are far better for evolutionary studies.

However, there are some obvious instances when DNA alignments are needed.

  • When confirming the identity of cDNA (forensic sequencing).
  • When studying noncoding regions of DNA. These regions evolve at a faster rate than coding DNA, while mitochondrial noncoding DNA evolves even faster.
  • When studying DNA mutations.
  • When researching on very similar organisms such as Neanderthals and modern humans.

Biochemistry 101 Review

Before we move on, let's take a quick review on some elementary biochemistry and notations.

Nucleotide Codes

We're all familiar with the four nucleotide bases - however, there are other symbols used for more ambiguous nucleotides.

Symbol Meaning Explanation
A A Adenine
C C Cytosine
G G Guanine
T T Thymine
R A or G puRine
Y C or T pYrimidine
M A or C aMino
K G or T Keto
S C or G Strong interaction (3 bonds)
W A or T Weak interaction (2 bonds)
H A, C or T (not G) H is after G
B C, G, or T (not A) B is after A
V A, C or G (not T) V is after T and U
D A, G or T (not C) D is after C
N A, C, G or T aNything
CG DNA interaction vs. AT interaction
CG DNA interaction vs. AT interaction
A CG bond is stronger than an AT bond due to it having one more hydrogen bond. Source: Wikipedia

Amino Acid Residue Codes

Amino acids can be represented with one or three letters. Take some time to review these.

1-letter 3-letters Amino Acid
A Ala Alanine
C Cys Cysteine
D Asp Aspartic Acid
E Glu Glutamic Acid
F Phe Phenylaline
G Gly Glycine
H His Histidine
I Ile Isoleucine
K Lys Lysine
L Leu Leucine
M Met Methionine
N Asn Asparagine
O Pyl Pyrrolysine
P Pro Proline
Q Gln Glutamine
R Arg Arginine
S Ser Serine
T Thr Threonine
U Sec Selenocysteine
V Val Valine
W Trp Tryptophan
X Xaa Undetermined
Y Tyr Tyrosine
Z Gln Glutamic acid or glutamine

Amino Acids license plate game

A good tip to memorizing these is to play the amino acids license plate game! Keep a printout of the following table. When you and your cool friends are out for a drive, try to translate each license plate letter into amino acids. Sounds nerdy, but very effective in learning. Bonus points for knowing the properties and/or structures!

Amino acids grouping

There are several ways to group amino acids, depending on their functionalities and biochemical properties.

Amino Acids and their biochemical properties
Amino acids and their biochemical properties. From Wikipedia.

With nonpolar (hydrophobic) side chains: alanine, valine, leucine, isoleucine, proline, methionine, phenylaline, tryptophan

With uncharged polar side chains: tyrosine, asparagine, glutamine, glycine, serine, threnine, cystein

With positively charged side chains: histidine, lysine, arginine

With negatively charged side chains: aspartic acid, glutamic acid

Become a Bioinformatics Whiz!

Introduction to Bioinformatics Vol. 1

Become a Bioinformatics Whiz! Try Bioinformatics

If you're looking for a fun and easy entry point into bioinformatics algorithms, this book it just for you! Filled with graphics, and written in a light-hearted and humorous story-telling persona, Bioinformatics Algorithms guides you through the intricacies of the problems faced in biology, and the clever solutions used to solve them.

$ Check price
49.9949.99Amazon 4.5 logo(4+ reviews)

More Bioinformatics resources

Learn to be a Pythonista!

Python Playground

Learn to be a Pythonista! Try Python

Python Playground is a collection of fun programming projects that will inspire you to new heights. You'll manipulate images, build simulations, and interact with hardware using Arduino & Raspberry Pi. With each project, you'll get familiarized with leveraging external libraries for specialized tasks, breaking problems into smaller, solvable pieces, and translating algorithms into code.

$ Check price
29.9529.95Amazon 4 logo(14+ reviews)

More Python resources