Cookie Monster

Cookie Monster

Cookie Monster

Overview

Cookie Monster is a tool for triaging the huge amounts of sequencing (and related) data by its metadata, from various sources, for opportunistic/proactive downstream processing by HGI.

Sequencing pipelines generate a lot of data, which is only going to increase as time goes on. These data require further processing before being delivered to analysts. However, at the expanding rate, there is too much for one person (or even a team of people) to deal with efficiently. Given the time data can take to process, an untenable backlog builds, exacerbating the problem further.

Cookie Monster is an automated system that constantly monitors data that is pushed into iRODS (in our implementation), via its metadata and how it changes over time. A sequence of customisable rules are applied to each potential piece of data to either further enrich its metadata from different sources (e.g., Sequencescape or, in the case of BAM/CRAM files, the file headers, etc.) or, ultimately, decide what to do with them. That could mean disregarding the file altogether (which will apply to the majority of data), pushing it back upstream for reprocessing or correction, or pushing it downstream into our own processing pipelines.

Cookie Monster is written in Python 3.5 and is designed to be used as a general purpose module, using CouchDB as a backend persistence layer and InfluxDB for performance metric logging. The HGI implementation (also Python 3.5) then takes that module and plumbs it into various services appropriate to our needs -- not least, our downstream pipelines -- and provides a set of rules which are applied against metadata collections that match the needs of the Human Genetics Programme and its research interests.

Download and Installation

Cookie Monster is implemented as a Python 3.5 (or higher) module. Full source, documentation and installation instructions for which can be found on the project's GitHub repository. The HGI implementation of Cookie Monster -- with a production ruleset, etc. -- can be found in its own GitHub repository.

Cookie Monster on GitHub The HGI Monster on GitHub

License and Citation

Copyright © 2015, 2016 Genome Research Ltd.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Contact

Issues and bug reports can be filed via the project's GitHub repository:

GitHub Issues

Authors

Sanger Contributors