06. Wig and BigWig

The Wiggle format (.wig) is an efficient way to store dense, continuous blocks of data. It is primarily used to store values such as GC percentage, probability scores and transcriptome data. Instead of specifying a value for each nucleotide position, wig allows you to bind values to entire regions that follow a certain pattern.

BigWig

Like SAM and BAM, wig has an indexed binary equivalent called bigWig. This allows for efficient data handling, as only parts of the file are extracted and processed when viewing particular regions on a genome browsers. For a conversion, use the WigToBigWig program.

Characteristics

The .wig filetype contains one or more blocks. On the top of each block is the track declaration line, which defines the data elements with a number of options.

Track definition line

There are several options we can place on the first line which characterizes that particular block of information. Each variable should be formatted as a key=value pair.

name
Name of block.
description
Describes the region in detail.
priority
Integer describing the order to display tracks.
color
Color per track in RGB or hexadecimal.
graphType
Bar or point graph.

The two main formatting option per block are variableStep and fixedStep.

variableStep

The variableStep option is the more common option. It includes the chromosome position in one column, and data values in another.

variableStep chrom=chr4
400001 13
400002 13
400003 13
400004 13
400005 13

We may have the chromosome number and an optional parameter known as span, which tells us the number of bases each value should cover.

The use of the "span" parameter can help us save space. The following is identical to the data block above, but saves much more space.

variableStep chrom=chr4 span=5
400001 13

fixedStep

In case you have data blocks with regular intervals between each position, you can use the fixedStep option. This allows you to place the positions on the track definition line, along with the interval length. Thus, only one column is necessary for the data parameters.

fixedStep chrom=chr4 start=400001 step=100
13
14
15

The above block would feature chromosome 4, position 400001 as having a value of 13, position 400101 having the value 14, and position 400201 having value 15.

You may also specify a span, indicating the length of each sequence.

fixedStep chrom=chr4 start=400001 step=100 span=5
13
14
15

This is similar, but the values range for five nucleotides instead of just one. Thus we have 13 for 400101-400105, 14 for 400201-400205, and 15 for 400301-400305.

References

Ensembl WIG File Format - Definition and support options

Become a Bioinformatics Whiz!

Bioinformatics Data Skills

Become a Bioinformatics Whiz! Try Bioinformatics

Learn the best practices used by academic and industry professionals. Bioinformatics Data Skills give a great overview to the Linux Command Line, Github, and other essential tools used in the trade. This book bridges the gap between knowing a few programming languages and being able to utilize the tools to analyze large amounts of biological data.

$ Check price
49.9949.99Amazon 4.5 logo(7+ reviews)

More Bioinformatics resources

Learn to be a Pythonista!

Learning Python

Learn to be a Pythonista! Try Python

Get a comprehensive, in-depth introduction to the core Python language with this hands-on book. Based on author Mark Lutz's popular training course, this updated fifth edition will help you quickly write efficient, high-quality code with Python. It's an ideal way to begin, whether you're new to programming or a professional developer versed in other languages.

$ Check price
64.9964.99Amazon 4 logo(279+ reviews)

More Python resources

Ad