Pairwise alignment is the process of aligning two DNA, RNA or protein sequences such that the regions of similarity are maximized. This is often performed to find functional, structural or evolutionary commonalities.
In most cases, scientists use two protein sequences to quantitatively find relatedness (aka homology). With this, they are able to identify common domains and motifs, and sequence ancestry.
Domains are parts of a DNA or amino acid strand that code for a physiochemically similar feature as found in other sequences and proteins. Domains refer to specific functionalities. For example, you could have a ATP-binding domain or polar domain.
Motifs are similar, but reference the structural characteristics rather than functional regions. Motifs are often found in domains, although that's not always the case.
Protein amino acid sequences are preferred over DNA sequences for a list of reasons.
However, there are some obvious instances when DNA alignments are needed.
Before we move on, let's take a quick review on some elementary biochemistry and notations.
We're all familiar with the four nucleotide bases - however, there are other symbols used for more ambiguous nucleotides.
|R||A or G||puRine|
|Y||C or T||pYrimidine|
|M||A or C||aMino|
|K||G or T||Keto|
|S||C or G||Strong interaction (3 bonds)|
|W||A or T||Weak interaction (2 bonds)|
|H||A, C or T (not G)||H is after G|
|B||C, G, or T (not A)||B is after A|
|V||A, C or G (not T)||V is after T and U|
|D||A, G or T (not C)||D is after C|
|N||A, C, G or T||aNything|
Amino acids can be represented with one or three letters. Take some time to review these.
|Z||Gln||Glutamic acid or glutamine|
A good tip to memorizing these is to play the amino acids license plate game! Keep a printout of the following table. When you and your cool friends are out for a drive, try to translate each license plate letter into amino acids. Sounds nerdy, but very effective in learning. Bonus points for knowing the properties and/or structures!
There are several ways to group amino acids, depending on their functionalities and biochemical properties.
With nonpolar (hydrophobic) side chains: alanine, valine, leucine, isoleucine, proline, methionine, phenylaline, tryptophan
With uncharged polar side chains: tyrosine, asparagine, glutamine, glycine, serine, threnine, cystein
With positively charged side chains: histidine, lysine, arginine
With negatively charged side chains: aspartic acid, glutamic acid
Command Line Kung Fu is packed with dozens of tips and practical real-world examples. You won't find theoretical examples in this book. The examples demonstrate how to solve actual problems. The tactics are easy to find, too. Each chapter covers a specific topic and groups related tips and examples together.$ Check price
Learn the best practices used by academic and industry professionals. Bioinformatics Data Skills give a great overview to the Linux Command Line, Github, and other essential tools used in the trade. This book bridges the gap between knowing a few programming languages and being able to utilize the tools to analyze large amounts of biological data.$ Check price