04. A quick look at BLAST

BLAST (Basic Local Alignment Search Tool) serves two purposes:

  1. Align two sequences and look for homology
  2. Search a sequence in a database to find similar and related sequences.

Without diving into too much details about BLAST (which we will cover in a later series), let's perform a simple query to get a feel for how to use it.

There are several types of BLAST that depend on what your query sequence is (DNA or protein) and what you want to match it to. For this run, let's stick with blastp, in which you enter a protein sequence and it matches to a similar protein sequence from a database.


The first thing to do is to go to the NCBI page for BLAST. From here, click protein blast (blastp), which is located under "basic BLAST."

You should get a window that looks like this:

BLAST website
BLAST website for protein BLAST (blastp).

1) Running a query against a database

We can search entire databases with a query. The query can be inputted with an accession number, gi (think of these as ID's for a specific protein sequence) or FASTA format.

What is FASTA?

FASTA format simply has the first line beginning with a > that describes the sequence. Any following lines are the protein sequence itself. For example:

Try inputting the above FASTA sequence.

  • Do not check the "Align two or more sequences" options.
  • Select the "non-redundant protein sequences (nr)" for the database.
  • For the organism name, use "human (taxid:9606)".

You'll notice that there are different types of BLAST you can perform - PSI-BLAST, PHI-BLAST and DELTA-BLAST. We'll cover these advanced BLAST variations in a later lesson.

There is also another window down at the bottom for Algorithm parameters, where you can fiddle with the scoring matrix, different gap penalties and more. But for now, click the big BLAST button to run your sequence!

A quick BLAST run
A quick BLAST run.

After waiting for your query to be processed...Great! You just ran a BLAST search! Looks like you just found yourself the human ortholog of a mouse protein.

Scroll down to the bottom to the Descriptions panel, and you can see all the matches that are similar to your query.

Results of the best scoring matches on top
Results of the best-scoring matches will be on top.

You can scroll further down to see the actual alignments with the Identities and Similarities (called Positives) scores next to them.

The top-scoring alignment with its identity and similarity scores.
Scroll down to see the top-scoring alignment with its identity and similarity (positives) scores.

2) Running a pairwise comparison

The other use of BLAST is for pairwise comparisons. This means you aren't querying a database, but just inputting two sequences and seeing how well they match up. To switch to pairwise comparison mode, click the "Align two or more sequences" option.

For the two sequences here, let's use gi|293651548 and gi|158256336.

A simple pairwise alignment with two proteins, given by their GI's.
A simple pairwise alignment with two proteins, given by their GI's.

Click the big BLAST button once again and wait for your query to be processed. Then scroll down and check your results.

In the Descriptions section there is just one alignment...but why are there multiple in the Alignments section? This is simply because there are several ways that BLAST can align your sequences. The top-scoring alignments are found on the top, while lower-scoring ones are at the bottom. For the most part, you'll want to look at the top-most result.

Results for a pairwise alignment run.
Results for a pairwise alignment run.

Wondering how the scoring system goes? We'll see that in the next few pages!

Become a Bioinformatics Whiz!

Introduction to Bioinformatics Vol. 2

Become a Bioinformatics Whiz! Try Bioinformatics

This is Volume 2 of Bioinformatics Algorithms: An Active Learning Approach. This book presents students with a light-hearted and analogy-filled companion to the author's acclaimed course on Coursera. Each chapter begins with an interesting biological question that further evolves into more and more efficiently solutions of solving it.

$ Check price
49.9949.99Amazon 5 logo(5+ reviews)

More Bioinformatics resources

Take your Linux skills to the next level!

The Linux Command Line

Take your Linux skills to the next level! Try Linux & UNIX

The Linux Command Line takes you from your very first terminal keystrokes to writing full programs in Bash, the most popular Linux shell. Along the way you'll learn the timeless skills handed down by generations of gray-bearded, mouse-shunning gurus: file navigation, environment configuration, command chaining, pattern matching with regular expressions, and more.

$ Check price
39.9539.95Amazon 4.5 logo(274+ reviews)

More Linux & UNIX resources