GAZE

GAZE is a tool for the integration of gene prediction signal and content sensor information into complete gene structures.

It is completely configurable in the way that both the signal and content data themselves and the the model of gene structure against which assemblies are validated and scored, are external to the system and and supplied by the user.

[Genome Research Limited]

Introduction

There are two primary inputs to the system. The first is a list of Features (corresponding to the results of signal sensors) and Segments (the results of content sensors), supplied in one or more files in the General Feature Format (GFF).

The second is an XML configuration file that controls how candidate gene structure assemblies are validated and scored. An important element of the configuration file is the gene structure model, which describes how the gene features relate to one another and fit together into complete gene structures. The GAZE configuration file is described in more detail on the "Configuration" tab.

GAZE was written by Kevin Howe, a member of the research group of Richard Durbin

Configuration

About the GAZE configuration file

The configuration file controls how GAZE works. It has two main roles. It firstly defines the space of features and segments that GAZE is going to work with, and how instances of such are to be made from the given GFF and DNA files. It secondly contains the gene structure model, which effectively dictates how candidate gene structure assemblies are to be validated and scored.

Top level anatomy

The file contains five sections.

declarations
a list containing a declaration for each of the features, segments and length penalty functions referred to later in the model. This provides GAZE with a namespace for the configuration, which helps in detecting errors in the model.
gff2gaze
a set of directives for converting entries from a list of GFF files into instances these features and segments.
dna2gaze
a set of directives for extracting further features and segments from the raw DNA sequence, if supplied by the user.
lengthfunctions
each length penalty referred to in the model is defined as a list of (distance->penalty) pairs, with linear interpolation being used to derive penalties for distances not defined.
model
The model of gene structure is defined by listing for each target feature which source features can lie immediately upstream.

An example of a GAZE configuration file containing a simple model of gene structure can be found on the "Software" tab.

Reference

If you use GAZE in your publication, please cite:

  • GAZE: a generic framework for the integration of gene-prediction data by dynamic programming.

    Howe KL, Chothia T and Durbin R

    Genome research 2002;12;9;1418-27

* quick link - http://q.sanger.ac.uk/23e4kfzj