04. A quick look at BLAST

BLAST (Basic Local Alignment Search Tool) serves two purposes:

  1. Align two sequences and look for homology
  2. Search a sequence in a database to find similar and related sequences.

Without diving into too much details about BLAST (which we will cover in a later series), let's perform a simple query to get a feel for how to use it.

There are several types of BLAST that depend on what your query sequence is (DNA or protein) and what you want to match it to. For this run, let's stick with blastp, in which you enter a protein sequence and it matches to a similar protein sequence from a database.

Using BLAST

The first thing to do is to go to the NCBI page for BLAST. From here, click protein blast (blastp), which is located under "basic BLAST."

You should get a window that looks like this:

BLAST website
BLAST website for protein BLAST (blastp).

1) Running a query against a database

We can search entire databases with a query. The query can be inputted with an accession number, gi (think of these as ID's for a specific protein sequence) or FASTA format.

What is FASTA?

FASTA format simply has the first line beginning with a > that describes the sequence. Any following lines are the protein sequence itself. For example:

Try inputting the above FASTA sequence.

  • Do not check the "Align two or more sequences" options.
  • Select the "non-redundant protein sequences (nr)" for the database.
  • For the organism name, use "human (taxid:9606)".

You'll notice that there are different types of BLAST you can perform - PSI-BLAST, PHI-BLAST and DELTA-BLAST. We'll cover these advanced BLAST variations in a later lesson.

There is also another window down at the bottom for Algorithm parameters, where you can fiddle with the scoring matrix, different gap penalties and more. But for now, click the big BLAST button to run your sequence!

A quick BLAST run
A quick BLAST run.

After waiting for your query to be processed...Great! You just ran a BLAST search! Looks like you just found yourself the human ortholog of a mouse protein.

Scroll down to the bottom to the Descriptions panel, and you can see all the matches that are similar to your query.

Results of the best scoring matches on top
Results of the best-scoring matches will be on top.

You can scroll further down to see the actual alignments with the Identities and Similarities (called Positives) scores next to them.

The top-scoring alignment with its identity and similarity scores.
Scroll down to see the top-scoring alignment with its identity and similarity (positives) scores.

2) Running a pairwise comparison

The other use of BLAST is for pairwise comparisons. This means you aren't querying a database, but just inputting two sequences and seeing how well they match up. To switch to pairwise comparison mode, click the "Align two or more sequences" option.

For the two sequences here, let's use gi|293651548 and gi|158256336.

A simple pairwise alignment with two proteins, given by their GI's.
A simple pairwise alignment with two proteins, given by their GI's.

Click the big BLAST button once again and wait for your query to be processed. Then scroll down and check your results.

In the Descriptions section there is just one alignment...but why are there multiple in the Alignments section? This is simply because there are several ways that BLAST can align your sequences. The top-scoring alignments are found on the top, while lower-scoring ones are at the bottom. For the most part, you'll want to look at the top-most result.

Results for a pairwise alignment run.
Results for a pairwise alignment run.

Wondering how the scoring system goes? We'll see that in the next few pages!

Take your Linux skills to the next level!

Linux for Beginners

Take your Linux skills to the next level! Try Linux & UNIX

Linux for Beginners doesn't make any assumptions about your background or knowledge of Linux. You need no prior knowledge to benefit from this book. You will be guided step by step using a logical and systematic approach. As new concepts, commands, or jargon are encountered they are explained in plain language, making it easy for anyone to understand.

$ Check price
24.9924.99Amazon 4.5 logo(101+ reviews)

More Linux & UNIX resources

Become a Bioinformatics Whiz!

Bioinformatics Data Skills

Become a Bioinformatics Whiz! Try Bioinformatics

Learn the best practices used by academic and industry professionals. Bioinformatics Data Skills give a great overview to the Linux Command Line, Github, and other essential tools used in the trade. This book bridges the gap between knowing a few programming languages and being able to utilize the tools to analyze large amounts of biological data.

$ Check price
49.9949.99Amazon 4.5 logo(7+ reviews)

More Bioinformatics resources

Ad