Contact WTSI Webmaster Printer friendly format Login to WTSI resources WTSI RSS feed
All Sequencing
  • Human (HGP)
  • Pathogens
  • Blast
  • T. brucei
  • T. brucei Home
  • GeneDB
  • omniBLAST
    Server
  • Blast Server
  • FTP site
  • T. b. gambiense project
  • T. congolense project
  • T. vivax project
  • Trypanosoma Genome Network
  • Website Search
  • People Search
  • Library Services
  • Site Map
  • Feedback / Help
Retrieve BLAST result
T. brucei GSS Sequence Data

In addition to sequencing the megabase chromosomes of the T. brucei genome, the Wellcome Trust Sanger Institut as well as TIGR have carried out extensive genome survey sequencing.

TIGR has provided 47,000 single-pass reads of randomly selected clones: these derived both from ends of P1 and BAC genomic clones as well as from genomic DNA clones, selected from a T. brucei TREU927 GUTat 10.1 whole genome TIGR manufactured sheared DNA library (av. insert size 2-3 kb). These have proved immensely useful resources to the research community for gene discovery. The end-sequences of the P1 and BAC clones have also been used in physical mapping.

The Sanger Institute has in turn submitted > 43,000 GSS sequences from the 2-kb sheared genomic DNA clones constructed by TIGR. These end sequences have since been clustered with ESTs available through public databases and some preliminary automated analysis has been carried out. The sequences can be obtained from ftp.sanger.ac.uk/pub/databases/T.brucei_sequences/GSS/.

As an aid to the community, all GSS sequences were subjected to a BLASTX analysis of Swissprot/TrEMBL databases in February 2002. The summary data are shown below:

Applying a probability cut-off of 1e-10 to the BLAST output:

  • 8196 had a hit (~21 percent)

    of which, according to their description lines:

    The following now have html-linked sequences

  • 1095 were probably INGI-related (ORF 1, 2)
  • 441 were adenylate cyclases
  • 77 were described as ESAG
  • 632 were VSGs
  • 112 were ribosomal proteins
  • 66 were helicases
  • 1454 showed similarity to hypothetical proteins
  • 4170 did not fall into the above "classes"
  • 2025 had no hits at all.
  • species-by-species tally of top BLASTX hits
    (Note: T. brucei brucei and T. brucei are treated as separate items)

Each of these datasets are available, either by clicking on the above links, or from the GSS ftp site. The entire set of Sanger GSS are also available as a fasta database.

GSS and EST clustering

All T. brucei genome survey sequences plus approximately 5,500 EST/mRNA sequences were clustered, using the sequence assembly programme phrap. The ESTs were retrieved from EMBL in February 2001, using Trypanosoma brucei listed as an organism as a search term. This will therefore include EST data generated from different Trypanosoma brucei subspecies and strains. The dataset totalled 96,474 sequences (~45.87Mb). ]12,251 contigs were generated, while 8,242 sequences could not be placed in a contig (singletons). The GSS/EST clusters have an estimated coverage of >95% of the T. brucei genome. They are accessible for similarity searching and a summary of top BLAST hits can be viewed here.

Information Projects Other Services
Sanger Home
Sitemap
Site Search
Information
Careers
Press
News
Seminars
Workshops
Publications
Staff Theses
Travel Directions
Research Teams
Research Faculty
Personnel Search
Human Genetics
Model Organism Genetics
Pathogen Genetics
Bioinformatics
Sequencing
Library
Helpdesk
Webmail
VPN Access
Sign In
SSO Pass. Reset

webmaster@sanger.ac.uk

Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK  Tel:+44 (0)1223 834244

Last Modified Wed Aug 10 13:37:25 2005

Genome Research Limited is a charity registered in England with number 1021457

Data Sharing Policy | Conditions of Use | Copyright