01. Pairwise Alignment Introduction

What is Pairwise Alignment?

Pairwise alignment is the process of aligning two DNA, RNA or protein sequences such that the regions of similarity are maximized. This is often performed to find functional, structural or evolutionary commonalities.

In most cases, scientists use two protein sequences to quantitatively find relatedness (aka homology). With this, they are able to identify common domains and motifs, and sequence ancestry.

Domains and Motifs

Domains are parts of a DNA or amino acid strand that code for a physiochemically similar feature as found in other sequences and proteins. Domains refer to specific functionalities. For example, you could have a ATP-binding domain or polar domain.

Motifs are similar, but reference the structural characteristics rather than functional regions. Motifs are often found in domains, although that's not always the case.

Protein vs. DNA sequence alignment

Protein amino acid sequences are preferred over DNA sequences for a list of reasons.

  • Protein residues are more informative - a change in DNA (especially the 3rd position) does not necessarily change the AA.
  • The larger number of amino acids than nucleic acids makes it easier to find significance.
  • Some amino acids share related biochemical properties, which can be accounted for when scoring multiple pairwise alignments.
  • Protein sequence comparisons can link back to over a billion years ago, whereas DNA sequence comparisons can only go back up to 600 mya. Thus, protein sequences are far better for evolutionary studies.

However, there are some obvious instances when DNA alignments are needed.

  • When confirming the identity of cDNA (forensic sequencing).
  • When studying noncoding regions of DNA. These regions evolve at a faster rate than coding DNA, while mitochondrial noncoding DNA evolves even faster.
  • When studying DNA mutations.
  • When researching on very similar organisms such as Neanderthals and modern humans.

Biochemistry 101 Review

Before we move on, let's take a quick review on some elementary biochemistry and notations.

Nucleotide Codes

We're all familiar with the four nucleotide bases - however, there are other symbols used for more ambiguous nucleotides.

Symbol Meaning Explanation
A A Adenine
C C Cytosine
G G Guanine
T T Thymine
R A or G puRine
Y C or T pYrimidine
M A or C aMino
K G or T Keto
S C or G Strong interaction (3 bonds)
W A or T Weak interaction (2 bonds)
H A, C or T (not G) H is after G
B C, G, or T (not A) B is after A
V A, C or G (not T) V is after T and U
D A, G or T (not C) D is after C
N A, C, G or T aNything
CG DNA interaction vs. AT interaction
CG DNA interaction vs. AT interaction
A CG bond is stronger than an AT bond due to it having one more hydrogen bond. Source: Wikipedia

Amino Acid Residue Codes

Amino acids can be represented with one or three letters. Take some time to review these.

1-letter 3-letters Amino Acid
A Ala Alanine
C Cys Cysteine
D Asp Aspartic Acid
E Glu Glutamic Acid
F Phe Phenylaline
G Gly Glycine
H His Histidine
I Ile Isoleucine
K Lys Lysine
L Leu Leucine
M Met Methionine
N Asn Asparagine
O Pyl Pyrrolysine
P Pro Proline
Q Gln Glutamine
R Arg Arginine
S Ser Serine
T Thr Threonine
U Sec Selenocysteine
V Val Valine
W Trp Tryptophan
X Xaa Undetermined
Y Tyr Tyrosine
Z Gln Glutamic acid or glutamine

Amino Acids license plate game

A good tip to memorizing these is to play the amino acids license plate game! Keep a printout of the following table. When you and your cool friends are out for a drive, try to translate each license plate letter into amino acids. Sounds nerdy, but very effective in learning. Bonus points for knowing the properties and/or structures!

Amino acids grouping

There are several ways to group amino acids, depending on their functionalities and biochemical properties.

Amino Acids and their biochemical properties
Amino acids and their biochemical properties. From Wikipedia.

With nonpolar (hydrophobic) side chains: alanine, valine, leucine, isoleucine, proline, methionine, phenylaline, tryptophan

With uncharged polar side chains: tyrosine, asparagine, glutamine, glycine, serine, threnine, cystein

With positively charged side chains: histidine, lysine, arginine

With negatively charged side chains: aspartic acid, glutamic acid

Learn to be a Pythonista!

Programming for Beginners

Learn to be a Pythonista! Try Python

This book doesn't make any assumptions about your background or knowledge of Python or computer programming. You will be guided step by step using a logical and systematic approach. As new concepts, commands, or jargon are encountered they are explained in plain language, making it easy for anyone to understand.

$ Check price
24.9924.99Amazon 4.5 logo(124+ reviews)

More Python resources

Take your Linux skills to the next level!

The Linux Command Line

Take your Linux skills to the next level! Try Linux & UNIX

The Linux Command Line takes you from your very first terminal keystrokes to writing full programs in Bash, the most popular Linux shell. Along the way you'll learn the timeless skills handed down by generations of gray-bearded, mouse-shunning gurus: file navigation, environment configuration, command chaining, pattern matching with regular expressions, and more.

$ Check price
39.9539.95Amazon 4.5 logo(274+ reviews)

More Linux & UNIX resources