NestedMICA

A new motif-finder by Thomas Down and Tim Hubbard

NestedMICA is a method for discovering over-represented short motifs in large sets of strings. Typical applications include finding candidate transcription factor binding sites in DNA sequences.

[Genome Research Limited]

Information

NestedMICA works by optimizing a probabilistic model which treats the input data as a mixture of interesting motifs and background sequence. NestedMICA uses a new and robust inference technique called nested sampling, and a novel mosaic background model to acheive extremely high sensitivity. More information about the algorithm, and some quantitative performance comparisons, are given in the papers:

  • NestedMICA as an ab initio protein motif discovery tool.

    Doğruel M, Down TA and Hubbard TJ

    BMC bioinformatics 2008;9;19

  • NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence.

    Down TA and Hubbard TJ

    Nucleic acids research 2005;33;5;1445-53

As well as sensitivity improvements, NestedMICA has a number of features which make it particularly suited to discoving multiple motifs ('regulatory vocabularies') in large datasets - up to and including whole-genome promoter/enhancer sets.

  • The ability to find many motifs in a single run (most motif-finders discover one motif at a time, then mask out all its occurances and restart the search to find the next most significant motif)
  • CPU time scales linearly with the amount of input data.
  • Good support for parallel processing in both SMP and clustered environments.

Recent Applications of NestedMICA

  • Large-scale discovery of promoter motifs in Drosophila melanogaster.

    Down TA, Bergman CM, Su J and Hubbard TJ

    PLoS computational biology 2007;3;1;e7

  • The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells.

    Loh YH, Wu Q, Chew JL, Vega VB, Zhang W, Chen X, Bourque G, George J, Leong B, Liu J, Wong KY, Sung KW, Lee CW, Zhao XD, Chiu KP, Lipovich L, Kuznetsov VA, Robson P, Stanton LW, Wei CL, Ruan Y, Lim B and Ng HH

    Nature genetics 2006;38;4;431-40

Downloads

Getting NestedMICA

NestedMICA was developed on Linux (ia32) and Mac OS X (powerpc) systems. The main program is written in Java, with a small amount of C code, It should run on any Unix-like platform with a good Java implemention. As of version 0.7.0, NestedMICA requires a Java 5 platform (Sun JDK 1.5.0 or later).

FTP Download

If you use NestedMICA, we suggest you join the nmica-users mailing list to receive information about new releases.

Test datasets

Related programs

We now recommend that you use MotifExplorer to examine motif-sets learned using NestedMICA.

Change log

NestedMICA 0.8.0

  • Major changes to the build system. On supported platforms, it should now be possible to compile NestedMICA in a single step using the ANT tool.
  • NestedMICA now requires ANT 1.7.0 or later (http://ant.apache.org)
  • Many programs in the NestedMICA suite have been renamed to be more distinctive (there were too many motifscanners in this world!). All new names start with the prefix "nm". Existing tools have been renamed as follows:
    • motiffinder -> nminfer
    • makemosaicbg -> nmmakebg
    • motifscanner -> nmscan
    • dlepnode -> nmworker

    Also, a few old tools have been removed. The only one that most users might notice is motifviewer, which has been replaced by the MotifExplorer tool.

  • Several command line switches have been renamed. If in doubt, check the manual or run the command with the -help switch.
  • When running on Java versions prior to 6.0, NestedMICA now requires two new libraries: stax-api.jar and wstx.jar. Suitable versions are included with the download.
  • The background model format has changed to a new XML-based format. Old background models can be updated using the nmconvertbg tool.
  • The concept of "order" when applied to background models has changed: mononucleotide models are now considered to be order 0 (in common with most literature in the field) rather than order 1.
  • If you run nminfer without the -backgroundModel option, it will automatically build a background model using the supplied sequences.
  • nminfer can now automatically infer motif lengths. To take advantage of this feature, you need to specify a length range using the -minLength and -maxLength options.
  • When running in distributed mode, it is now possible to do some processing on the master node as well by specifying "-threads N" where N is greater than zero. Please consult the manual for more details.
  • Improved support for "counted" motif models, including performance improvements and one serious bug-fix. Note that the default (uncounted) model is still recommended for most purposes.
  • Many smaller performance improvements.
  • KNOWN ISSUE: nminfer startup time can be very slow if it is run with many short sequences and a large value of the -numMotifs parameter. This should be fixed in the next release. For now, we would suggest removing all sequences less than about 50 bases when running nminfer with a large -numMotifs value.

NestedMICA 0.7.3

  • Support for Intel macs
  • Preliminary support for finding motifs in protein sequences (thanks to Mutlu Dogruel)
  • Much faster motifscanner
  • Better default options when searching for large numbers of motifs
  • The -workerThreads option has been renamed -threads

NestedMICA 0.7.2

  • Thread-safety bugfixes
  • Added links to MotifExplorer

NestedMICA 0.7.1

  • Many small bug-fixes, including checkpoint-restarting and command-line option processing
  • Added a Solaris port (thanks to Mikhail Velikanov).

NestedMICA 0.7.0

  • Now requires a Java 5 platform
  • New required library: bjv2-core. A suitable snapshot is included in the lib/ directory
  • Improved command-line handling, and cleaned up the command lines for main tools. All tooks now print a list of options in response to the '-help' switch.
  • Significant (typically a factor of 2) performance improvements when finding multiple motifs.
  • Better criteria for automatic termination. This is still intentionally somewhat conservative, but is much more usable than previous versions.
  • Uses more efficient synchronization code when running on multiple processors.

NestedMICA 0.06

  • Now searches JAVA_HOME for a Java runtime
  • Added the printmosaicbg program to inspect background model files

NestedMICA 0.05

  • Wrapper scripts and build process have been updated.
  • Required libraries (biojava, bytecode, changeless) are now included in the default distribution
  • Memory usage for large datasets has been substantially reduced
  • Further improvements to the -distributed mode. This now includes a simple load-balancing system which allows operation on hetrogenous clusters.

NestedMICA 0.04

  • Default motif output format is now XML-based (.xms)
  • Minor performance improvements
  • Distributed mode is now more robust, and uses various strategies to reduce network traffic and avoid packet-loss problems.

NestedMICA 0.03

  • More bug-fixes
  • Better defaults
  • Added some code for testing different background models

NestedMICA 0.02

  • Fixed a bug restarting checkpoints
  • Minor performance improvements
  • Added wrapper scripts for launching the main programs
  • Added termination conditions
  • More command line cleanups

NestedMICA 0.01

  • First public release

MotifExplorer

MotifExplorer is a graphical Java program for viewing and manipulating collections of short sequence motifs -- typically transcription factor binding sites. It is designed to work well with NestedMICA and uses XMS (NestedMICA's output format) as its native file format

Users of Windows and Linux systems with a reasonably up-to-date Java version installed (1.5 or later) should be able to start MotifExplorer via Java Webstart by clicking on the link below.

Java Web Start

Notes

Note that you'll see a warning about the certificate that was used to digitally sign this MotifExplorer distribution. Hopefully future releases will be signed using a trusted certificate.

The webstart package currently doesn't work well on Mac OS X. Please use this downloadable Mac-specific version instead. (requires Mac OS 10.4.0 or later and Java 1.5 or later).

Contacts

Questions and comments about NestedMICA should be sent to Thomas Down at thomas.down@gurdon.cam.ac.uk.

* quick link - http://q.sanger.ac.uk/i0plc5qe