04. SAM, BAM and CRAM

Before we talk about SAM, BAM and CRAM, we must discuss the software, SAMtools, from which these formats originate.

What is SAMtools?

SAMtools is a suite of utilities that allow for efficient post-processing of short DNA sequence read alignments. The program includes several command line programs such as view, sort, and index that allow for next-generation sequence data processing.

The SAM, BAM and CRAM file formats come from the use of SAMtools.

What is the SAM format?

The name SAM comes from Sequence Alignment/MAP. In addition to regular sequence reads, SAM includes alignment data that link short reads to a reference sequence. This makes SAM files the choice of format when visualizing short read sequences in genome browsers such as IGV (Integrated Genome Viewer).

IGV viewer with the SAM file format
IGV (Integrated Genome Viewer) uses SAM files to view short read alignments to a reference sequence. Image from Illumina's BaseSpace blog.

What is BAM and CRAM?

The SAM format is simple to parse, generate and check for errors. However, its large file size (~10 gb on average) gets in the way of efficiency. Thus, researchers found a way to compress it into a binary format without losing the ability to manipulate it. BAM contains indexable representation of nucleotide sequence alignments, allowing for intensive data processing in production pipelines.

CRAM is a restructured version of its binary version, with column-orientation.

References

For more reading on SAM and BAM, head over to the Center for Statistical Genetics

Become a Bioinformatics Whiz!

Introduction to Bioinformatics Vol. 1

Become a Bioinformatics Whiz! Try Bioinformatics

If you're looking for a fun and easy entry point into bioinformatics algorithms, this book it just for you! Filled with graphics, and written in a light-hearted and humorous story-telling persona, Bioinformatics Algorithms guides you through the intricacies of the problems faced in biology, and the clever solutions used to solve them.

$ Check price
49.9949.99Amazon 4.5 logo(4+ reviews)

More Bioinformatics resources

Learn to be a Pythonista!

Programming Python

Learn to be a Pythonista! Try Python

Programming Python shows in-depth tutorials on the language's number of application domains including: system administration, GUIs, the Web, networking, front-end scripting layers, and more. This book focuses on commonly used tools and libraries to give you a comprehensive understanding of Python’s many roles in practical, real-world programming.

$ Check price
64.9964.99Amazon 4 logo(56+ reviews)

More Python resources

Ad