06. Wig and BigWig

The Wiggle format (.wig) is an efficient way to store dense, continuous blocks of data. It is primarily used to store values such as GC percentage, probability scores and transcriptome data. Instead of specifying a value for each nucleotide position, wig allows you to bind values to entire regions that follow a certain pattern.

BigWig

Like SAM and BAM, wig has an indexed binary equivalent called bigWig. This allows for efficient data handling, as only parts of the file are extracted and processed when viewing particular regions on a genome browsers. For a conversion, use the WigToBigWig program.

Characteristics

The .wig filetype contains one or more blocks. On the top of each block is the track declaration line, which defines the data elements with a number of options.

Track definition line

There are several options we can place on the first line which characterizes that particular block of information. Each variable should be formatted as a key=value pair.

name
Name of block.
description
Describes the region in detail.
priority
Integer describing the order to display tracks.
color
Color per track in RGB or hexadecimal.
graphType
Bar or point graph.

The two main formatting option per block are variableStep and fixedStep.

variableStep

The variableStep option is the more common option. It includes the chromosome position in one column, and data values in another.

variableStep chrom=chr4
400001 13
400002 13
400003 13
400004 13
400005 13

We may have the chromosome number and an optional parameter known as span, which tells us the number of bases each value should cover.

The use of the "span" parameter can help us save space. The following is identical to the data block above, but saves much more space.

variableStep chrom=chr4 span=5
400001 13

fixedStep

In case you have data blocks with regular intervals between each position, you can use the fixedStep option. This allows you to place the positions on the track definition line, along with the interval length. Thus, only one column is necessary for the data parameters.

fixedStep chrom=chr4 start=400001 step=100
13
14
15

The above block would feature chromosome 4, position 400001 as having a value of 13, position 400101 having the value 14, and position 400201 having value 15.

You may also specify a span, indicating the length of each sequence.

fixedStep chrom=chr4 start=400001 step=100 span=5
13
14
15

This is similar, but the values range for five nucleotides instead of just one. Thus we have 13 for 400101-400105, 14 for 400201-400205, and 15 for 400301-400305.

References

Ensembl WIG File Format - Definition and support options

Learn to be a Pythonista!

Programming Python

Learn to be a Pythonista! Try Python

Programming Python shows in-depth tutorials on the language's number of application domains including: system administration, GUIs, the Web, networking, front-end scripting layers, and more. This book focuses on commonly used tools and libraries to give you a comprehensive understanding of Python’s many roles in practical, real-world programming.

$ Check price
64.9964.99Amazon 4 logo(56+ reviews)

More Python resources

Take your Linux skills to the next level!

Command Line Kung Fu

Take your Linux skills to the next level! Try Linux & UNIX

Command Line Kung Fu is packed with dozens of tips and practical real-world examples. You won't find theoretical examples in this book. The examples demonstrate how to solve actual problems. The tactics are easy to find, too. Each chapter covers a specific topic and groups related tips and examples together.

$ Check price
14.9914.99Amazon 4.5 logo(27+ reviews)

More Linux & UNIX resources

Ad