Shotgun sequencing is a type of de novo sequencing, meaning it can assemble an entire genome that has not yet been sequenced before.
Shotgun sequence is used to analyze DNA sequences longer than 1000 base pairs, up to entire chromosomes. The basic methodology is to break up multiple sequences of the same genome in various places, and reassemble them based on overlapping regions.
Genomic DNA is fragmented by sonification or hydrodynamic shearing.
All sticky-end fragments are blunt ended with T4 DNA polymerase and exonuclease activity.
T4 polynucleotide kinase is added so that 5' ends are phosphorylated.
Fragments seaprated into either small (~1kb), medium (~8kb) and large (~40kb) fragments.
A library is created per each size in plasmids and transformed into E. coli cells.
Vector DNA is purified from each library and amplified.
Each DNA strand is sequenced (can attach a primer upstream of our vector, then use any sequencing by synthesis method).
Computer program called a base caller filters out poor calls.
The assembler finds overlapping segments and generates long successive continguous stretches of nucleotides, called contigs.
Statistically speaking, there are chances of false contigs coming up. This occurs when the assembler finds overlapping segments that occurred by chance. This may be corrected by paired-ends or mate-pairs sequencing.
Additionally, transfecting bacteria cells can take a long time.
Command Line Kung Fu is packed with dozens of tips and practical real-world examples. You won't find theoretical examples in this book. The examples demonstrate how to solve actual problems. The tactics are easy to find, too. Each chapter covers a specific topic and groups related tips and examples together.
Learn the best practices used by academic and industry professionals. Bioinformatics Data Skills give a great overview to the Linux Command Line, Github, and other essential tools used in the trade. This book bridges the gap between knowing a few programming languages and being able to utilize the tools to analyze large amounts of biological data.