The SGRP Genome Browser

The SGRP Genome Browser allows you to browse alignments, SNPs and indels from shotgun sequencing of multiple strains of S cerevisiae and S paradoxus, together with annotations from the Saccharomyces Genome Database.

Note: This browser is no longer actively maintained and so may at times be temporarily unavailable.

The SGRP browser is based on GBrowse2, the Generic Genome Browser. For help on GBrowse itself, click the "Help" button in the browser. Here, we describe SGRP-specific aspects. You will need to understand what follows to interpret what the browser shows you.

1. Types of display

You can view displays organized around regions of the reference genome, around SNPs, or around particular read pairs.

1.1 Reference genome display

The chromosomes are named chr01 ... chr16, with (in S cerevisiae only) the mitochondrion being chrm and the two-micron circle scplasm1. To see a region of one of these, such as positions 50000 to 70000 of chr01, enter something like chr01:50000..70000 into the "Landmark or Region" box and click "Search".

There are several kinds of track which can be switched on and off in the "Tracks" section. Tracks are organized by strain, with additional sets of "General" tracks which apply to all strains for the current species, and "Across strains" tracks which compare S cerevisiae and S paradoxus. The per-strain tracks are:

Alignment tracks
These show how reads from the strain in question are aligned to the region you are looking at. The colour scheme is described below.
Level tracks
At each point in the genome, this shows the coverage level per strain, i.e. the number of different reads from the strain aligning to that point.
Indel tracks
A deletion in a strain with respect to the reference sequence is indicated by a black rectangle extending over the length of the deleted region, and an insertion by a short black rectangle with an annotation over it such as chr04_lar106, where lar stands for "local alignment region". Clicking on an insertion rectangle will take you to a display of the region in question. Local alignment regions are typically a few kilobases to a few tens of kilobases in length, and have coordinates organized to include inserted sequences from all strains, so that strains (including the reference sequence) missing a particular insertion are shown as having a deletion over that region.

The "General" tracks are:

  • Tracks for protein-coding genes and transposable elements defined in the SGD (Saccharomyces Genome Database) annotation;
  • A reference DNA sequence track (showing GC content for large regions);
  • A 6-frame translation track for the DNA sequence;
  • SNP Density track. This shows the approximate density of SNPs through each chromosome. The density is defined as the proportion of polymorphic positions, using a Gaussian window with standard deviation 100. Only polymorphisms which were confirmed by imputation are counted. Multiple-nucleotide polymorphisms are only counted once;
  • A SNPs track, showing all SNPs (single nucleotide polymorphism) and short indels (we use the term "SNPs" to refer to both in this section) that occur in any strain. Most SNPs are shared across multiple strains, so having only one symbol per SNP greatly reduces the volume of information to be displayed.

    Each SNP is displayed with a lightning bolt symbol; you can click on it for full details (described in 1.2 below). Larger symbols are for SNPs with a global effect on protein sequence; smaller ones are for those that only have a local or no effect. Specifically:

    Large, red outline:
    SNP creates a stop codon, truncating the protein.
    Large, green outline:
    SNP destroys a stop codon, extending the protein.
    Large, orange outline:
    "SNP" (in this case an indel) inserts or deletes nucleotides in coding sequence, changing the reading frame.
    Small, blue outline:
    Non-synonymous SNP, changing one amino acid.
    Small, grey outline:
    Synonymous SNP: nucleotide change but no resulting amino acid change.
    Small, brown outline:
    SNP in non-coding sequence.

    Independently of the size and outline colour, the interior of the symbol can have two colours:

    Grey:
    an apparent polymorphism was sequenced, but imputation across all the available strains using Margarita and the Felsenstein algorithm (Carter, Minichiello and Durbin, Genome Informatics, 2006) concluded this was a sequencing error, not a true polymorphism.
    Black:
    imputation confirmed the apparent polymorphism as genuine

    In the strain-specific tracks, the black/grey distinction means the sequenced polymorphism was or was not confirmed for that strain; in the general SNP track, it means the polymorphism was or was not confirmed for some strain.

The "across strains" tracks show alignments (determined by ssahaSNP) between different regions of the same species and between the two species. The percentage identity of each match is shown in its annotation, and the bar representing the match has a colour ranging from yellow (low-quality matches) to red (near-identity). It is possible to click through to matches within the same species, but not yet between species.

1.2 SNP tables

Clicking on any of the lightning-bolt symbols in the SNP tracks will take you to a table giving some details of the SNP in question. Each table covers all strains for which the SNP was sequenced and/or imputed, whether you click to it from the all-strain SNP track or from a strain-specific one.

The first few rows of the table are:

Name:
The SNP name is the chrosomome or contig name (deleting any leading "chr" or "c", and shorting "scplasm1" to "pl1") followed by a hyphen and the offset (from 1) of the position of the SNP in the chromosome or contig reference sequence.
Class, Type, Source, Position, Length, Score:
not of interest.
Effect:
corresponds to the SNP outline colour in the graphical display (see above).
MAJ:
major allele nucleotide value at this position
Quality:
"Confirmed" means a polymorphism was sequenced for at least one strain and confirmed by imputation; "Discounted" means it was sequenced but imputation removed it.

Clicking on "Context" will take you to a row of a table giving per-strain details of the SNP in question, with neighbouring rows showing nearby SNPs, and each column representing a strain. The strain names are displayed vertically at the top of the file and at various positions below it. Within a row, the cells are:

  • Offset (in reference sequence) of nucleotide. Positions within an insertion are shown with a decimal point, e.g. 1234.567 is position 567 in an insertion after reference position 1234.
  • Consensus value: generalized nucleotide value, with lower case indicating that some strains do not have this position.
  • One cell per strain. Each cell contains four characters in a two-by-two arrangement. The left column is for sequenced values, and the right is for imputed (confirmed or corrected from other strains). The top row shows a nucleotide value, gap ("-" or "="), or, for sequenced values, a dot indicating nothing was sequenced there for the strain in question. The bottom row shows the tens digit (sequenced on the left, imputed on the right) of the PHRED quality score for the value above it. A PHRED score of N implies an error probability of 0.1 to the power N/10, so N=35 means an error probability of about 1 in 3,000, and would be represented by the digit 3 in the display; thus a bottom row reading "23" means the sequenced value has a quality between 20 and 29 inclusive, while the imputed value has a quality between 30 and 39 inclusive. The colour of the cell indicates the effect of any change from the reference. White is for no change from the reference, while the effect of variations is shown with the following colours:
     
    Intergenic SNP
     
    synonmyous SNP
     
    nonsynonmyous SNP
     
    indel
     
    change to/from stop codon

2 The alignment colour scheme

The colour scheme for alignments is as follows.

2.1 SGD annotations

 
Protein-coding genes
 
Transposable elements

Both these elements are taken directly from the SGD annotation.

2.2 Alignments to displayed position

 
Consistent read pair
 
Inconsistent (often chimeric) insert
 
Unpaired read

These types of alignment are shown as thick bars in the reference display. They differ according to the alignment of the read mate. Dark brown means that this read and its mate are aligned consistently, i.e. they are a few kilobases apart and are oriented correctly. Blue means that they are aligned inconsistently, often to different chromosomes altogether. These usually indicates a chimeric insert, but may also be evidence for a rearrangement between the reference sequence and the current strain. Light brown means that the mate for this read either was not sequenced at all, or did not align anywhere (the most common reason for which is poor sequence quality).

When both members of a consistent (dark brown) pair of alignments are visible, they are shown joined by a dotted line.

AUTHOR

David Carter <dmc@sanger.ac.uk>; last updated September 13th, 2008

* quick link - http://q.sanger.ac.uk/gyefl7x4