Before we talk about SAM, BAM and CRAM, we must discuss the software, SAMtools, from which these formats originate.
SAMtools is a suite of utilities that allow for efficient post-processing of short DNA sequence read alignments. The program includes several command line programs such as
index that allow for next-generation sequence data processing.
The SAM, BAM and CRAM file formats come from the use of SAMtools.
The name SAM comes from Sequence Alignment/MAP. In addition to regular sequence reads, SAM includes alignment data that link short reads to a reference sequence. This makes SAM files the choice of format when visualizing short read sequences in genome browsers such as IGV (Integrated Genome Viewer).
The SAM format is simple to parse, generate and check for errors. However, its large file size (~10 gb on average) gets in the way of efficiency. Thus, researchers found a way to compress it into a binary format without losing the ability to manipulate it. BAM contains indexable representation of nucleotide sequence alignments, allowing for intensive data processing in production pipelines.
CRAM is a restructured version of its binary version, with column-orientation.
For more reading on SAM and BAM, head over to the Center for Statistical Genetics
In this completely revised second edition of the perennial best seller How Linux Works, author Brian Ward makes the concepts behind Linux internals accessible to anyone curious about the inner workings of the operating system. Inside, you'll find the kind of knowledge that normally comes from years of experience doing things the hard way.$ Check price
Python Playground is a collection of fun programming projects that will inspire you to new heights. You'll manipulate images, build simulations, and interact with hardware using Arduino & Raspberry Pi. With each project, you'll get familiarized with leveraging external libraries for specialized tasks, breaking problems into smaller, solvable pieces, and translating algorithms into code.$ Check price