MascotPercolator

MascotPercolator is a software package that interfaces the proteomics spectral identification algorithm Mascot (Matrix Science) with Percolator, a well performing machine learning algorithm for rescoring search results. We have demonstrate it to be amenable for both low and high accuracy mass spectrometry data, outperforming all available Mascot scoring schemes as well as providing reliable significance measures.

Sound scoring methods for idetifying mass spectra using sequence database search algorithms such as Mascot and Sequest are essential for sensitive and accurate peptide and protein identifications from proteomic tandem mass spectrometry data. Here we present a software package that interfaces Mascot with Percolator, a well performing machine learning method for rescoring database search results, and demonstrate it to be amenable for both low and high accuracy mass spectrometry data, outperforming all available Mascot scoring schemes as well as providing reliable significance measures. MascotPercolator can be readily used as a stand alone tool or integrated into existing data analysis pipelines.

Further information

To run MascotPercolator enter the following in a command prompt or unix shell:

java -cp MascotPercolator.jar cli.MascotPercolator [options…]

Parameters (replacing the “[options …]” expression):

  • target VAL (required) – Log ID [1] or path/file name of the Mascot target results .dat file
  • decoy VAL (required) – Log ID [1] or path/file name of the Mascot decoy results dat file. Note: if Mascot’s ‘auto-decoy’ mode was used, use same logID/file as for the target parameter.
  • out VAL (required) – Results path and file name prefix (without extension). Will be used as prefix for output files.
  • overwrite (optional) – Given result files already exist, this option forces overwrite
  • validate FILE (optional) – File with a list of correct peptides/proteins (sequences simply concatenated or alternatively one sequence per line without identifiers)
  • rankdelta N (optional) – Maximum allowed Mascot score difference of peptide hit at hand as compared to top hit match. (Default = -1: If set to 1 all peptide hit ranks that have a delta score of < 1 to the top hit match are processed. A setting of -1 strictly reports only the top hit match of a spectrum. )
  • newDat (optional flag) – Write a new Mascot dat file that replaces the Mascot scores with Percolator’s posterior error probabilities. newMascotScore = -10log10(PosteriorErrorProbability). The Mascot Identity Threshold is then set to 13 (score equivalent to posterior error probabilities <= 0.05). This option does not replace the existing dat files.
  • rt (optional/flag) – Enables retention time; will only be switched on when available from input data; default off; largely untested.
  • xml (optional/flag) – Write supplemental XML output as defined here: http://noble.gs.washington.edu/proj/percolator/model/percolator_out.xsd
  • features (optional/flag) – Write out feature file with results
  • chargefeature (optional/flag) – Switch to using a single value feature to represent precursor charge state rather than the standard 4 feature format
  • highcharge (optional/flag) – calculates series specific features for higher (up to 5+) fragment charge states
  • nofilter (optional/flag) – switches off filter which ignores spectra with less than 15 fragment peaks
  • u (optional/flag) – This flag switches Percolator between PSM mode and unique peptide mode. Using this option with the latest versions of Percolator and hence MascotPercolator report all PSMs rather than peptides. If using earlier versions of Percolator (pre v2.0) this will do the opposite and force Percolator and MascotPercolator to report only unique peptides. (only available in Mascot Percolator v2.02 onwards)

Example:

java -cp MascotPercolator.jar cli.MascotPercolator -rankdelta 1 -newDat -u -target 11083 -decoy 11084 -out 11083-11084

MascotPercolator extracts all necessary data from the Mascot dat file(s), trains Percolator and writes the results to the specified summary file. MascotPercolator requires a separate target and decoy search, which can be achieved in two ways:

  1. A Mascot search is performed with the Mascot auto-decoy option enabled. In this case, the “-target” and “-decoy” parameter refer to the same logID or results file.
  2. Two independent searches against a target and decoy database are performed, using identical search parameter settings. The “-target” and “-decoy” parameters are set accordingly.

Notes

[1] Note: Given the Mascot results are in the default results folder as specified in the config file, then the ‘log ID’ is the integer part of the Mascot result file of interest. Example: given /mascot/results/ is the root folder of the Mascot results and /mascot/results/20090330/F001234.dat is the results file of interest, then the ‘log ID’ would be 1234.

Contact

If you need help or have any queries, please contact us using the details below.

Contact James Wright (james.wright@sanger.ac.uk)


Sanger Institute Contributors

Previous contributors

Photo of Dr James Christopher Wright

Dr James Christopher Wright

Principal Bioinformatician

Photo of Jyoti Choudhary

Jyoti Choudhary

Former Head of Mass Spectrometry