Zinc fingers in Caenorhabditis elegans: Finding Families and Probing Pathways
Neil D. Clarke and Jeremy M. Berg
This site is an electronic appendix to a paper that appeared in Science on December 11, 1998 as part of the special issue on the genome sequence of C. elegans.
The sequence data and genome annotation used in this analysis were current as of June 25, 1998. Links are provided here to the data sets actually used in case reference to the original data is useful.
Searches of the DNA sequence for matches to known binding sites were performed using the Hidden Markov Model (HMM) program HMMER v. 2.0. The HMMs used in these searches were constructed as described in the paper and links to these HMMs are provided below.
Searches of the inferred protein sequences were performed using an older and now unsupported version of HMMER (v. 1.8.4) which uses a different format for the HMM's. Most of the protein sequence HMM's were from the Pfam database; you are probably better off going there to get newer versions of these HMMs that work with HMMER v2.0. If you need the protein sequence HMMs that we constructed (DM and-C3H motifs), you can get the old-format HMMs here.
All of the links below are to an anoymous ftp site, rather than to HMTL documents. If you prefer to go the ftp site directly, the site is at ftp.sanger.ac.uk . Everything is under /pub/C.elegans_sequences/SCIENCE98/clarke_and_berg. The directory structure and contents are as indicated below.
If you have any questions, contact Neil Clarke| directory with all the raw data used; all data was obtained from the Washington University Genome Sequencing Center. | |
| compressed set of DNA sequence files | |
| compressed set of feature files (putative exons, introns,etc) | |
| 16, 626 ORFs; apparently a pre-release of wormpep14? | |
| annotated names for ORFs in allproteins.pep | |
| directory with ascii versions of the old-format (v. 1.8.4) HMMs for the DM and Nup-C3H motifs; see Pfam for other protein HMMs | |
| subdirectories of 'dnaHMM' contain pairs of HMMs for the DNA binding sites, one for the site as conventionally defined and one for the site as it appears on the opposite strand | |
| subdirectories are named according to the names used in the paper for the different zinc binding motifs. Each subdirectory contains two files: (i) <motif>.hmm.results, a BLAST-style output of the HMMER search of allproteins.pep showing all hits above 0 bits,and (ii) <motif>.summary, a list of all genes that have at least one domain scoring higher than 10 bits, followed by the number of hits, followed by a list of the spacings between each hit. Please note that in a few cases an ORF has two partial matches to two different parts of the HMM, implying two domains where there is really only one. | |
| results of DNA binding site searches; see the README file in the 'bindsite' directory to understand the contents of the files that are in the 'bindsite' subdirectories. |



