Contact WTSI Webmaster Printer friendly format Login to WTSI resources WTSI RSS feed
  • C. elegans
  • Overview
  • Sequence data
  • BLAST search
  • Wormpep
  • FTP site
  • C. briggsae
  • C. briggsae project
  • BLAST Search
  • WormBase
  • Release info
  • Current gene names
  • Submit data
  • GFF files
  • Documentation
  • Annotation
  • Website

  • Ensembl
  • C. elegans project

Zinc fingers in Caenorhabditis elegans: Finding Families and Probing Pathways

Neil D. Clarke and Jeremy M. Berg

This site is an electronic appendix to a paper that appeared in Science on December 11, 1998 as part of the special issue on the genome sequence of C. elegans.

The sequence data and genome annotation used in this analysis were current as of June 25, 1998. Links are provided here to the data sets actually used in case reference to the original data is useful.

Searches of the DNA sequence for matches to known binding sites were performed using the Hidden Markov Model (HMM) program HMMER v. 2.0. The HMMs used in these searches were constructed as described in the paper and links to these HMMs are provided below.

Searches of the inferred protein sequences were performed using an older and now unsupported version of HMMER (v. 1.8.4) which uses a different format for the HMM's. Most of the protein sequence HMM's were from the Pfam database; you are probably better off going there to get newer versions of these HMMs that work with HMMER v2.0. If you need the protein sequence HMMs that we constructed (DM and-C3H motifs), you can get the old-format HMMs here.

All of the links below are to an anoymous ftp site, rather than to HMTL documents. If you prefer to go the ftp site directly, the site is at ftp.sanger.ac.uk . Everything is under /pub/C.elegans_sequences/SCIENCE98/clarke_and_berg. The directory structure and contents are as indicated below.

If you have any questions, contact Neil Clarke

  •  data
  • directory with all the raw data used; all data was obtained from the Washington University Genome Sequencing Center.
    • Ce_dna.tar.Z 
     compressed set of DNA sequence files
    • Ce_feature.tar.Z
     compressed set of feature files (putative exons, introns,etc)
    • allproteins.pep
     16, 626 ORFs; apparently a pre-release of wormpep14?
    • header_lines.wp14
     annotated names for ORFs in allproteins.pep
  •  protHMM
  •  directory with ascii versions of the old-format (v. 1.8.4) HMMs for the DM and Nup-C3H motifs; see Pfam for other protein HMMs
  • dnaHMM
  • TRA-1
  • MAB-3
  • GATA
  •  subdirectories of 'dnaHMM' contain pairs of HMMs for the DNA binding sites, one for the site as conventionally defined and one for the site as it appears on the opposite strand
  •  ORFhits
  • Zinc finger
  • Hormone receptor
  • GATA
  • LIM
  • DM
  • Zinc cluster
  • C3H
  • RING finger
  • Nucleocapsid
  •  subdirectories are named according to the names used in the paper for the different zinc binding motifs. Each subdirectory contains two files: (i) <motif>.hmm.results, a BLAST-style output of the HMMER search of allproteins.pep showing all hits above 0 bits,and (ii) <motif>.summary, a list of all genes that have at least one domain scoring higher than 10 bits, followed by the number of hits, followed by a list of the spacings between each hit. Please note that in a few cases an ORF has two partial matches to two different parts of the HMM, implying two domains where there is really only one.
  •  bindsite
  • TRA-1
  • MAB-3
  • GATA
  •  results of DNA binding site searches; see the README file in the 'bindsite' directory to understand the contents of the files that are in the 'bindsite' subdirectories.

    webmaster@sanger.ac.uk

    Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK  Tel:+44 (0)1223 834244

    Last Modified Thu Nov 1 16:20:07 2001

    Genome Research Limited is a charity registered in England with number 1021457

    Data Sharing | Copyright