06. Wig and BigWig

The Wiggle format (.wig) is an efficient way to store dense, continuous blocks of data. It is primarily used to store values such as GC percentage, probability scores and transcriptome data. Instead of specifying a value for each nucleotide position, wig allows you to bind values to entire regions that follow a certain pattern.

BigWig

Like SAM and BAM, wig has an indexed binary equivalent called bigWig. This allows for efficient data handling, as only parts of the file are extracted and processed when viewing particular regions on a genome browsers. For a conversion, use the WigToBigWig program.

Characteristics

The .wig filetype contains one or more blocks. On the top of each block is the track declaration line, which defines the data elements with a number of options.

Track definition line

There are several options we can place on the first line which characterizes that particular block of information. Each variable should be formatted as a key=value pair.

name
Name of block.
description
Describes the region in detail.
priority
Integer describing the order to display tracks.
color
Color per track in RGB or hexadecimal.
graphType
Bar or point graph.

The two main formatting option per block are variableStep and fixedStep.

variableStep

The variableStep option is the more common option. It includes the chromosome position in one column, and data values in another.

variableStep chrom=chr4
400001 13
400002 13
400003 13
400004 13
400005 13

We may have the chromosome number and an optional parameter known as span, which tells us the number of bases each value should cover.

The use of the "span" parameter can help us save space. The following is identical to the data block above, but saves much more space.

variableStep chrom=chr4 span=5
400001 13

fixedStep

In case you have data blocks with regular intervals between each position, you can use the fixedStep option. This allows you to place the positions on the track definition line, along with the interval length. Thus, only one column is necessary for the data parameters.

fixedStep chrom=chr4 start=400001 step=100
13
14
15

The above block would feature chromosome 4, position 400001 as having a value of 13, position 400101 having the value 14, and position 400201 having value 15.

You may also specify a span, indicating the length of each sequence.

fixedStep chrom=chr4 start=400001 step=100 span=5
13
14
15

This is similar, but the values range for five nucleotides instead of just one. Thus we have 13 for 400101-400105, 14 for 400201-400205, and 15 for 400301-400305.

References

Ensembl WIG File Format - Definition and support options

Become a Bioinformatics Whiz!

Bioinformatics Data Skills

Become a Bioinformatics Whiz! Try Bioinformatics

Learn the best practices used by academic and industry professionals. Bioinformatics Data Skills give a great overview to the Linux Command Line, Github, and other essential tools used in the trade. This book bridges the gap between knowing a few programming languages and being able to utilize the tools to analyze large amounts of biological data.

$ Check price
49.9949.99Amazon 4.5 logo(7+ reviews)

More Bioinformatics resources

Learn to be a Pythonista!

Python Playground

Learn to be a Pythonista! Try Python

Python Playground is a collection of fun programming projects that will inspire you to new heights. You'll manipulate images, build simulations, and interact with hardware using Arduino & Raspberry Pi. With each project, you'll get familiarized with leveraging external libraries for specialized tasks, breaking problems into smaller, solvable pieces, and translating algorithms into code.

$ Check price
29.9529.95Amazon 4 logo(14+ reviews)

More Python resources

Ad