Introduction
The agp (A Golden Path) file format maps a path through a tiling set of genome sequences based on their accession and sequence version in the public nucleotide databases (EMBL and GenBank).| Chromosome | Chromosome coordinates | Ordinal | Type | EMBL data | EMBL accession coordinates | Strand | ||
|---|---|---|---|---|---|---|---|---|
| Start | Stop | Start | Stop | |||||
| X | 1 | 2649 | 1 | F | AL031272.2 | 1 | 2649 | + |
| X | 2650 | 63549 | 2 | N |   |   |   |   |
| X | 63550 | 93313 | 3 | F | Z83097.1 | 1 | 29764 | + |
The chromosome coordinates are non-overlapping absolute coordinates for each DNA sequence in the tiling path.
The ordinal is a number relating to the order of the DNA sequences, starts at 1 and increments to the total number of sequences in the tiling path.
The type denotes whether the sequence is finished 'F' or is a padding gap '-'.
The EMBL data is the accession number and sequence version in the EMBL nucleotide database.
The EMBL accession coordinate are relative coordinates within the named EMBL entry.
The strand is the strand alignment for this tile in the path (usually '+' for forward).
CHROMOSOME_X Genomic_canonical Sequence 1 2649 . + . Sequence "CTEL7X" acc=AL031272 ver=2 CHROMOSOME_X Genomic_canonical Sequence 63550 93313 . + . Sequence "AC8" acc=Z83097 ver=1make_agp_files.pl reads from this file and constructs the non-overlapping 'golden' tile path running from base 1 of the first clone 'tile' to the last non-redundant base before the overlap with the second clone. This process continues until the last clone in the tiling path which is included in it's entirity. Note: this means that the relative coordinates for sequence extracted from each clone will always begin at position 1 and continue until the last unique base before the overlap with the clone to the right - compare the GFF file above with the agp file in the file format section.
Figure 1 - How to construct the golden path from the tiling path of genome sequences. Each 'tile' is included in the consensus from base 1 to the last base not overlapping the clone to the right.
Text in red indicates either the name of a script/program to run, or commands to be typed within interactive tace sessions. Where script names also exist as hyperlinks, they can be clicked to access the POD documentation for that script.
Text in blue represents comments/warnings that should be checked. Some of these may only be temporary comments and should possibly be removed if no longer valid/relevant.
Text in green refers to file or path names.
