Sanger Institute - Publications 1994

Number of papers published in 1994: 2

  • A workbench for large-scale sequence homology analysis.

    Sonnhammer EL and Durbin R

    Sanger Centre, Hinxton Hall, Cambridge, UK.

    When routinely analysing very long stretches of DNA sequences produced by genome sequencing projects, detailed analysis of database search results becomes exceedingly time consuming. To reduce the tedious browsing of large quantities of protein similarities, two programs, MSPcrunch and Blixem, were developed, which assist in processing the results from the database search programs in the BLAST suite. MSPcrunch removes biased composition and redundant matches while keeping weak matches that are consistent with a larger gapped alignment. This makes BLAST searching in practice more sensitive and reduces the risk of overlooking distant similarities. Blixem is a multiple sequence alignment viewer for X-windows which makes it significantly easier to scan and evaluate the matches ratified by MSPcrunch. In Blixem, matches to the translated DNA query sequence are simultaneously aligned in three frames. Also, the distribution of matches over the whole DNA query is displayed. Examples of usage are drawn from 36 C. elegans cosmid clones totalling 1.2 megabases, to which these tools were applied.

    Funded by: Wellcome Trust

    Computer applications in the biosciences : CABIOS 1994;10;3;301-7

  • 2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans.

    Wilson R, Ainscough R, Anderson K, Baynes C, Berks M, Bonfield J, Burton J, Connell M, Copsey T, Cooper J et al.

    Department of Genetics, Washington University School of Medicine, St Louis, Missouri 63110.

    As part of our effort to sequence the 100-megabase (Mb) genome of the nematode Caenorhabditis elegans, we have completed the nucleotide sequence of a contiguous 2,181,032 base pairs in the central gene cluster of chromosome III. Analysis of the finished sequence has indicated an average density of about one gene per five kilobases; comparison with the public sequence databases reveals similarities to previously known genes for about one gene in three. In addition, the genomic sequence contains several intriguing features, including putative gene duplications and a variety of other repeats with potential evolutionary implications.

    Nature 1994;368;6466;32-8