The Wiggle format (.wig) is an efficient way to store dense, continuous blocks of data. It is primarily used to store values such as GC percentage, probability scores and transcriptome data. Instead of specifying a value for each nucleotide position, wig allows you to bind values to entire regions that follow a certain pattern.
Like SAM and BAM, wig has an indexed binary equivalent called bigWig. This allows for efficient data handling, as only parts of the file are extracted and processed when viewing particular regions on a genome browsers. For a conversion, use the WigToBigWig program.
The .wig filetype contains one or more blocks. On the top of each block is the track declaration line, which defines the data elements with a number of options.
There are several options we can place on the first line which characterizes that particular block of information. Each variable should be formatted as a key=value pair.
The two main formatting option per block are variableStep and fixedStep.
The variableStep option is the more common option. It includes the chromosome position in one column, and data values in another.
variableStep chrom=chr4 400001 13 400002 13 400003 13 400004 13 400005 13
We may have the chromosome number and an optional parameter known as span, which tells us the number of bases each value should cover.
The use of the "span" parameter can help us save space. The following is identical to the data block above, but saves much more space.
variableStep chrom=chr4 span=5 400001 13
In case you have data blocks with regular intervals between each position, you can use the fixedStep option. This allows you to place the positions on the track definition line, along with the interval length. Thus, only one column is necessary for the data parameters.
fixedStep chrom=chr4 start=400001 step=100 13 14 15
The above block would feature chromosome 4, position 400001 as having a value of 13, position 400101 having the value 14, and position 400201 having value 15.
You may also specify a span, indicating the length of each sequence.
fixedStep chrom=chr4 start=400001 step=100 span=5 13 14 15
This is similar, but the values range for five nucleotides instead of just one. Thus we have 13 for 400101-400105, 14 for 400201-400205, and 15 for 400301-400305.
Learn the best practices used by academic and industry professionals. Bioinformatics Data Skills give a great overview to the Linux Command Line, Github, and other essential tools used in the trade. This book bridges the gap between knowing a few programming languages and being able to utilize the tools to analyze large amounts of biological data.$ Check price
Get a comprehensive, in-depth introduction to the core Python language with this hands-on book. Based on author Mark Lutz's popular training course, this updated fifth edition will help you quickly write efficient, high-quality code with Python. It's an ideal way to begin, whether you're new to programming or a professional developer versed in other languages.$ Check price