In the previous lesson on basic DNA sequencing techniques, we covered a variety of sequencing methods that were used from the mid 80's to the early 2000's.
Although these techniques allowed us to sequence the first human genome, they were too costly and time-intensive. Because of this, there were a limited number of sample human genomes to base genetic studies on, making it difficult to come up with robust phenotype-genotype correlations.
It wasn't until pyrosequencing and other NGS techniques that allowed for a price drop to $1000. By 2008, consumer genomics began to take hold, with hard data showing genetic mutations correlating with specific disease.
Before we talk about how much cheaper DNA sequencing has gotten, let's look at these two terms used to describe sequencing methodologies: resequencing and de novo sequencing.
Resequencing is the term for sequencing an organism that has already been sequenced. We only need to align our reads to a reference genome. Thus, our reads need only be a few hundred base pairs long. Many Next-Generation Sequencing platforms provide short reads that can be aligned to a reference genome.
De novo sequencing, on the other hand, is the term used to sequence a genome from scratch. It is much more costly, time-intensive, and limited to select techniques. The length of a read must be at least 1,000 bps long. The first human genome sequenced relied on these methods, which is one of the reasons it was so costly and time-intensive.
Throughout the 2000's, scientists have come up with a class of novel techniques to lower the cost of DNA sequencing. These methods were successful not only because of the new chemistries available, but also due to cheaper and more powerful computing power.
Some may argue that computers allowed for the emergence of NGS technology, as faster processing powers allowed computers to assemble genomes at a rate much higher than before. Additionally, affordable data storage allowed for genomes to be stored and accessible through public databases, and novel algorithms provided immediate analysis and results.
The problem we face in bioinformatics is now not the lack of information, but the wealth of it! Scientists simply have too much data and not enough time to curate through them. This is why many biological databases are separated into primary (unfiltered) and secondary (curated) databases. In order to make sense of all this data, we are in need of well-trained and knowledgeable bioinformaticians.
The textbook definition of Next-Generation Sequencing is a high-throughput DNA sequencing methodology that makes use of parallelization to process up to half a million sequences concurrently. The process of running thousands of analytes at a time is known as a multiplexing.
NGS can also be used to describe a new era. In this new time, we can see sequencing one's genome becoming commonplace. Imagine going to the doctor's office with some illness or concerns, and simply ordering a genetic test. The process will be affordable and easy - just like taking an MRI or performing a blood test. An era where this is commonplace is what some say NGS refers to.
A commonality of Next-Generation Sequencing methods is the simplified workflow used to prepare genes for sequencing. With the advent of PCR and its variations, there is no more use of transforming DNA fragments into bacterial cells to replicate DNA. Library preparation includes the following:
Previous methods relied on capillary electrophoresis, which could only read up to 96 wells at a time. NGS's massively parallel technique allowed for millions of reads to run simultaneously; however, most reads come out as short, unless additional techniques such as mate-pair sequencing are used.
Instead of conventional PCR or amplification through bacterial species, NGS techniques use two different flavors of PCR to set the stage for sequencing.
There are two ways we are able to prepare the library: through emulsion PCR (ePCR) and bridge PCR.
With ePCR we have technologies such as Ion Torrent Semiconductor sequencing, 454 Roche Pyrosequencing, and sequencing by ligation.
With Bridge PCR, we have technologies such as Illumina's Sequencing by Synthesis and SOLiD sequencing by ligation.
We'll first cover ePCR and the technologies that use them, then move onto Bridge PCR.
There are alternative methods used to sequence the actual DNA. We have seen sequencing by synthesis already, where the base calls are read at the addition of each nucleotide. There is another type technique called sequencing by ligation, which we'll see soon.
Here are some important NGS terms you should familiarize yourselves with.
The term Next-Generation Sequencing is somewhat of a misnomer since it implies some technology of the future. However, as you're going through this lesson, note the limitations of NGS, as they exist. There is a Third-Generation Sequencing, which is supposed to be the next-Next-Generation of sequencing platforms, and improve upon these limitations. We will cover this in the future.
Command Line Kung Fu is packed with dozens of tips and practical real-world examples. You won't find theoretical examples in this book. The examples demonstrate how to solve actual problems. The tactics are easy to find, too. Each chapter covers a specific topic and groups related tips and examples together.$ Check price
This is Volume 2 of Bioinformatics Algorithms: An Active Learning Approach. This book presents students with a light-hearted and analogy-filled companion to the author's acclaimed course on Coursera. Each chapter begins with an interesting biological question that further evolves into more and more efficiently solutions of solving it.$ Check price