FASTA (pronounced "fast-A") format is a simple type of format that bioinformaticians use to represent either nucleotide or protein sequences. It is written in text format, allowing for processing tools to easily parse the data. The general file extension is .fas.
The FASTA file format originated from a DNA and protein sequence alignment software package called FASTP created in the mid-1980's. The format allows you to precede each sequence with a comment.
There are two lines per sequence - 1) the identifier (comments, annotations) and 2) the sequence itself.
Before we dig into a FASTA sequence, let's see what one looks like. Here is an example of a standard FASTA format. Pretty simple, right?
>gi|13959657|sp|Q9PTU8|VSP3_BOTJA Venom serine proteinase A precursor MVLIRVIANLLILQLSNAQKSSELVIGGDECNITEHRFLVEIFNSSGLFCGGTLIDQEWVLSAAHCDMRN MRIYLGVHNEGVQHADQQRRFAREKFFCLSSRNYTKWDKDIMLIRLNRPVNNSEHIAPLSLPSNPPSVGS VCRIMGWGTITSPNATFPDVPHCANINLFNYTVCRGAHAGLPATSRTLCAGVLQGGIDTCGGDSGGPLIC NGTFQGIVSWGGHPCAQPGEPALYTKVFDYLPWIQSIIAGNTTATCPP
The top line holds information pertaining to the sequence below. It is preceded by with a ">". Without this informative first line, we just have a raw format.
When the FASTA sequence comes from a biological database, the identifier marks which database. Here is a list of major database sequence identifers:
The line immediately proceeding the identifier is the raw sequence. For both DNA and proteins, standard nucleic acid and amino acid IUB/IUPAC codes are used.
Additionally, there are a few more notes to consider:
Here is a list of the standard IUB/IUPAC nucleic acid codes.
Here's a list of the 24 amino acids and 3 special codons.
The generic form of FASTA file has the .fas extension. For more specific types, we can use the following:
If we just append multiple sequences in FASTA format, we get multi-FASTA format. This is a single file with several sequences, and is often used for multi-alignment programs like ClustalW or multialign.
To get FASTA-formatted sequence from GenBank NCBI database, simply click the display near the top of the record and click FASTA.
Keep in mind that there are prorams out there like READSEQ that allow you to convert formats to and from FASTA.
This is Volume 2 of Bioinformatics Algorithms: An Active Learning Approach. This book presents students with a light-hearted and analogy-filled companion to the author's acclaimed course on Coursera. Each chapter begins with an interesting biological question that further evolves into more and more efficiently solutions of solving it.$ Check price
This book is designed to be used as the primary textbook in a college-level first course in computing. It takes a fairly traditional approach, emphasizing problem solving, design, and programming as the core skills of computer science. However, these ideas are illustrated using a non-traditional language, namely Python.$ Check price