The Wellcome Trust Sanger Institute Pathogen Genomics group is sequencing the genome of the fungal pathogen Candida dubliniensis in collaboration with Dr. Derek Sullivan, Dr. Gary Moran and Prof. David Coleman of the School of Dental Science, Trinity College, Dublin and Prof. Neil Gow of the Department of Molecular and Cell Biology, University of Aberdeen.
The Disease
The genus Candida contains a range of clinically important yeast-like fungi including the important human pathogen Candida albicans.
Diseases caused by Candida species include superficial infections of the oral cavity and vagina and deep-seated systemic infections, which have a mortality rate of 35-50%. In the majority of cases, these infections occur in immunocompromised individuals. In addition to C. albicans, a number of other species belonging to the genus Candida have increasingly been identified as important human pathogens. One of the most interesting of these species is C. dubliniensis.
The Organism
C. dubliniensis was first identified and described in 1995. It is particularly associated with oral candidosis in HIV-infected individuals. Prior to this date, C. dubliniensis isolates were misidentified as C. albicans due to their ability to produce germ tubes and chlamydospores, traits previously used for the definitive identification of C. albicans.
The C. dubliniensis genome is approximately 16 Mb in size.
The Project
Phylogenetic distance between the pathogenic Candida species and the hierarchy of pathogenicity in Candida species do not correlate. For example, by far the closest relative to C. albicans - Candida dubliniensis, is ranked only fifth or sixth in terms of serious invasive disease. In addition, comparison of C. albicans and C. dubliniensis genomes by cross-hybridization on microarrays suggests that a majority of shared genes exist but there is a small, but substantial, set of highly diverged genes. These diverged genes seem to be enriched in genes related to yeast-hypha morphogenesis, a putative virulence factor of the group. Given these observations, it seems sensible to select C. dubliniensis as the instrument for a comparative genome sequencing project.
The C. albicans genome sequence has been completed (at Stanford Genome Technology Center) and is being annotated by an international consortium, involving the Wellcome Trust Sanger Institute in an advisory capacity. Virulence is a polygenic trait in Candida species, therefore comparative genomics is likely to play a major role in understanding virulence in this group as a whole. The genome of Candida glabrata is currently being sequenced by the Institut Pasteur (Genopole). However, C. albicans is as closely related to C. glabrata as it is to Saccharomyces cerevisiae, therefore sequencing of the C. dubliniensis genome will enable an informative evaluation of the genomes of C. albicans through C. glabrata to S. cerevisiae to be made. Comparison of these four genomes will provide invaluable information concerning the evolution of pathogenesis in yeasts as a whole.
More specifically, comparison of C. albicans and C. dubliniensis will however enable direct questions to be addressed concerning the following:
- The virulence of C. albicans and C. dubliniensis.
- Drug resistance mechanisms in both species.
- The evolution of Candida species.
- Mechanisms of adhesion to human cell surfaces.
- Chromosomal stability in Candida species.
The strain being sequenced is the C. dubliniensis type strain CD36, which is the most intensively studied strain of the species and therefore the most appropriate for a sequencing project.
The shotgun was taken to 8x coverage, meaning that around 240,000 attempted sequence readswere required. The sequence was taken to finished quality and was fully annotated.
Progress
The C. dubliniensis project is finished and published in Jackson et al, (2009) Comparative genomics of the fungal pathogens Candida dubliniensis and Candida albicans., Genome Res., epublication ahead of print.
There are currently 8 chromosomal contigs larger than 100kb containing 262288 reads with a total length of 14.6 Mb. These contigs represent the majority of a haploid assembly. Work is continuing to finalise the assembly and resolve the ribosomal gene repeats.
The sequence data are available via GenBank/EMBL/DDBJ with accession numbers FM992688 to FM992695 as well as via our ftp site. They are also available for searching via our blast server.



