07. GFF and GTF formats

GFF, or the General Feature Format is used to describe genes and other features of DNA, RNA and protein sequences. It comes with the .gff extension.

What exactly is GFF?

GFF is an extension of a basic file with the name, start and end parameters (NSE). For example, an NSE (Chromosome2,2000,4000) specifies two kilobases found on chromosome 2. GFF allows the annotation of these segments.

Name, start and end parameters (NSE).

GFF allows for users to perform common operations such as intersection, exclusion, union, filtration, sorting, transformation and dereferencing.

What types of software use GFF?

Several types of bioinformatics software use GFF. This includes genome views such as GBrowse, Jalview and IGB.

Different versions

There are several versions of GFF. The ones used today are GFF2, GTF and GFF3.

GFF2 (General Feature Format version 2) was limited in that it could only handle three-level feature hierachies instead of three-level such as gene -> transcript -> exon. Thus the Sequence Ontology and GMOD projects expanded on this with features.

GTF (General Transfer Format) has also been known as GFF Version 2.5 since it improves on verison 2, but not as much as version 3.


GFF consists of one line per feature, each containing 9 columns of data. Each column is separated by a tab, making it a tabs-delimited file.

Optional track lines

Within the file, we can also include optional track definition lines. These go at the beginning of the list of features they are to affect.


refseq name
Name of chromosome or scaffold. Chromosomes can be given without the 'chr' prefix.
Must be one used within Ensembl.
Source of annotation, name of program that generated this feature.
Feature type name.
Gene, variation, similarity
Start position, starting at 1.
End position, starting at 1.
Floating point value.
For scores such as similarity, identity, etc.
'+' for forward and '-' for reverse.
Either 0, 1 or 2.
0 indicates first base of the feature is first base of codon, 1 indicates second base of feature is the first base of a codon, etc.
Semicolon-separated list of tag-value pairs.
Provides additional information about each feature.


Validators allow us to ensure that a file is formatted properly. To validate a GFF3 file, go to the GFF3 validator.



Wellcome trust sanger institute. GFF: an exchange format for feature description

