Tuesday, December 30, 2014

Technically Speaking: Converting Glimmer predict and gff3 Gene Annotations

Example screenshot of open reading frame
annotation within the Geneious program.
Predicting open reading frames within genomic sequences is probably one of the most basic yet important hallmarks of bioinformatics and sequencing analysis. This is the process by which, given an organism's genomic sequence or a section of that genomic sequence, we predict what sections of that genome are potential genes. At its most basic level, this can be done by looking for sequence regions between start and stop codons (sequence signals for the beginning and end of a gene). While there are many programs for predicting open reading frames, I often use the common Glimmer3 toolkit. This program works great overall, but one drawback is that it can sometimes be hard to visualize your open reading frames on your genome or genomic region (using Geneious or the Integrated Genomics Viewer) because it does not give you a '.gff3' formatted file, which is commonly used by these programs. In this technical post, I am going to focus on the file types you get from Glimmer3, I will explain the .gff3 file type, and I will leave you with a perl script to convert between the two.