01. Introduction to Next-Generation Sequencing (NGS) Technology

In the previous lesson on basic DNA sequencing techniques, we covered a variety of sequencing methods that were used from the mid 80's to the early 2000's.

Although these techniques allowed us to sequence the first human genome, they were too costly and time-intensive. Because of this, there were a limited number of sample human genomes to base genetic studies on, making it difficult to come up with robust phenotype-genotype correlations.

It wasn't until pyrosequencing and other NGS techniques that allowed for a price drop to $1000. By 2008, consumer genomics began to take hold, with hard data showing genetic mutations correlating with specific disease.

Genome sequencers from the early 2000's.
Genome sequencers from the early 2000's.

Resequencing vs. de novo sequencing

Before we talk about how much cheaper DNA sequencing has gotten, let's look at these two terms used to describe sequencing methodologies: resequencing and de novo sequencing.

Resequencing is the term for sequencing an organism that has already been sequenced. We only need to align our reads to a reference genome. Thus, our reads need only be a few hundred base pairs long. Many Next-Generation Sequencing platforms provide short reads that can be aligned to a reference genome.

De novo sequencing, on the other hand, is the term used to sequence a genome from scratch. It is much more costly, time-intensive, and limited to select techniques. The length of a read must be at least 1,000 bps long. The first human genome sequenced relied on these methods, which is one of the reasons it was so costly and time-intensive.

Forces behind NGS

Throughout the 2000's, scientists have come up with a class of novel techniques to lower the cost of DNA sequencing. These methods were successful not only because of the new chemistries available, but also due to cheaper and more powerful computing power.

Some may argue that computers allowed for the emergence of NGS technology, as faster processing powers allowed computers to assemble genomes at a rate much higher than before. Additionally, affordable data storage allowed for genomes to be stored and accessible through public databases, and novel algorithms provided immediate analysis and results.

The problem we face in bioinformatics is now not the lack of information, but the wealth of it! Scientists simply have too much data and not enough time to curate through them. This is why many biological databases are separated into primary (unfiltered) and secondary (curated) databases. In order to make sense of all this data, we are in need of well-trained and knowledgeable bioinformaticians.

What exactly is NGS?

The standard definition

The textbook definition of Next-Generation Sequencing is a high-throughput DNA sequencing methodology that makes use of parallelization to process up to half a million sequences concurrently. The process of running thousands of analytes at a time is known as a multiplexing.

A new era

NGS can also be used to describe a new era. In this new time, we can see sequencing one's genome becoming commonplace. Imagine going to the doctor's office with some illness or concerns, and simply ordering a genetic test. The process will be affordable and easy - just like taking an MRI or performing a blood test. An era where this is commonplace is what some say NGS refers to.

Time cover in 2012 - The genetic revolution
Time magazine's cover of 2012 - The genetic revolution.

Distinguishing factors of NGS

A simpler library preparation

A commonality of Next-Generation Sequencing methods is the simplified workflow used to prepare genes for sequencing. With the advent of PCR and its variations, there is no more use of transforming DNA fragments into bacterial cells to replicate DNA. Library preparation includes the following:

  1. Fragmenting the DNA (through sonification, enzymatic cleavage, or any other method).
  2. Ligation of an adapter sequence, barcode and primer.
  3. Size selection of the fragments.

Short reads limit de novo sequencing

Previous methods relied on capillary electrophoresis, which could only read up to 96 wells at a time. NGS's massively parallel technique allowed for millions of reads to run simultaneously; however, most reads come out as short, unless additional techniques such as mate-pair sequencing are used.

Two types of PCR

Instead of conventional PCR or amplification through bacterial species, NGS techniques use two different flavors of PCR to set the stage for sequencing.

There are two ways we are able to prepare the library: through emulsion PCR (ePCR) and bridge PCR.

With ePCR we have technologies such as Ion Torrent Semiconductor sequencing, 454 Roche Pyrosequencing, and sequencing by ligation.

With Bridge PCR, we have technologies such as Illumina's Sequencing by Synthesis and SOLiD sequencing by ligation.

We'll first cover ePCR and the technologies that use them, then move onto Bridge PCR.

Time cover in 2012 - The genetic revolution
Emulsion PCR is done within a water-in-oil emersion, while bridge PCR is conducted on a flow cell.

Sequencing by...

There are alternative methods used to sequence the actual DNA. We have seen sequencing by synthesis already, where the base calls are read at the addition of each nucleotide. There is another type technique called sequencing by ligation, which we'll see soon.

NGS terms

Here are some important NGS terms you should familiarize yourselves with.

Read
A raw sequence that comes from a sequencing machine. Usually 300-800 bp long.
Tag
Several reads coming from the same sequences can be merged to one tag.
Sequencing Depth
Total number of sequences, reads, or base pairs generated represented in a single sequencing experiment.
Coverage
Total number of bases generated / size of genome sequenced.

This is just the beginning...

The term Next-Generation Sequencing is somewhat of a misnomer since it implies some technology of the future. However, as you're going through this lesson, note the limitations of NGS, as they exist. There is a Third-Generation Sequencing, which is supposed to be the next-Next-Generation of sequencing platforms, and improve upon these limitations. We will cover this in the future.

References

  1. Emulsion PCR figure adapted and used with permission from Andy Vierstraete.
  2. Time Magazine - The DNA Dilemma: A Test That Could Change Your Life.
  3. Wikipedia - DNA sequencer

Learn to be a Pythonista!

Programming Python

Learn to be a Pythonista! Try Python

Programming Python shows in-depth tutorials on the language's number of application domains including: system administration, GUIs, the Web, networking, front-end scripting layers, and more. This book focuses on commonly used tools and libraries to give you a comprehensive understanding of Python’s many roles in practical, real-world programming.

$ Check price
64.9964.99Amazon 4 logo(56+ reviews)

More Python resources

Become a Bioinformatics Whiz!

Bioinformatics Data Skills

Become a Bioinformatics Whiz! Try Bioinformatics

Learn the best practices used by academic and industry professionals. Bioinformatics Data Skills give a great overview to the Linux Command Line, Github, and other essential tools used in the trade. This book bridges the gap between knowing a few programming languages and being able to utilize the tools to analyze large amounts of biological data.

$ Check price
49.9949.99Amazon 4.5 logo(7+ reviews)

More Bioinformatics resources

Ad