Computational Biology Lab

Download SHRiMP:


Mailing List Sign-up:

The mailing list is closed.


About:


SHRiMP is a software package for aligning genomic reads against a target genome. It was primarily developed with the multitudinous short reads of next generation sequencing machines in mind, as well as Applied Biosystem's colourspace genomic representation.

SHRiMP was originally designed and written by Michael Brudno and Stephen M. Rumble, with considerable input and testing by the SidowLab. Since then, Adrian Dalca, Marc Fiume and Vladimir Yanovsky have made considerable contributions to probability calculations and 2-pass SMS mapping algorithms. The original SHRiMP publication can be found here.

Matei David has joined the project with the 1.3.0 release. The latest version, 2.0, included significant contributions by Daniel Lister and Michael Dzamba. The SHRiMP2 publication can be found here.

The authors may be contacted via e-mail at: shrimp at cs.toronto.edu. However, see the note below.

Additional information is available in the README file.


News:


2014: END OF SUPPORT

With all of the people involved having moved on to other projects, we are no longer able to develop or support SHRiMP. The code remains available on this page, but messages to the support list above will only be answered very sporadically.

July, 2012

Version 2.2.3 is now available.

In this version, we fixed a colour-space bug in which a FW/BW base call that is different from the preceding SW base call is not propagated to the output. To clarify, suppose SW calls a base T at some location. Then, FW/BW computes the probability that base is every one of A/C/G/T, and it decides the probability pr_A (that it's an A) is greater than the probability pr_T (that it's a T). Prior to this fix, the output would be T, with base call qv derived from pr_T. With this fix, the output is an A, with base call qv derived from pr_A. The score and qv of the mapping are unaffected by this change, so they were ok before.

We also stabilized the output, removing a part of the code which performed sorting on pointers.

We will not be posting 32-bit (i686) compiled binaries any more. You can still try compiling directly from the source.

December 12, 2011

A new release is available, version 2.2.2.
A bug in mergesam has been corrected, and a default parameter for SHRiMP has been changed.
See HISTORY for changes and README for details.

October 31, 2011

A new release is available, version 2.2.1.
With this version of SHRiMP there is a dramatic speed increase in mapping reads to a large number of relatively small contigs.
See HISTORY for changes and README for details.

August 17, 2011

A new release is available, version 2.2.0. The major new functionality is the calculation of both mapping qualities and base call qualities, in both letter and colour space. We have also significantly improved mergesam, the program used to merge mappings of (potentially) several read chunks to (potentially) several reference chunks. We have also tweaked some of the default settings. As always, see HISTORY for changes and README for details.

NOTE: As of v2.2.0, SHRiMP by default drops reads with average base/colour quality value (qv) less than 10 when given fastq input. This threshold can be changed, and setting it to -1 disables it altogether. One complication is that various platforms encode qvs differently: we've seen Solid data using PHRED+33, and different sets of Illumina data using PHRED+33 and PHRED+64. The default SHRiMP behaviour is to use PHRED+33 for colour space data, and PHRED+64 for letter space data. However, this might be incorrect, e.g. for Illumina data using PHRED+33. Incorrect settings can result in many reads being dropped (reported at the end of the run), or in undesirable mappings due to the use of wrong probabilities in analysis (harder to notice on its own). To help diagnose such issues early, SHRiMP now crashes with an informative error message if it ever sees a base/colour qv less than -10 or greater than 50. See README for options affecting this behaviour.

February 8, 2011 -

A new release is available, version 2.1.1. See HISTORY for changes and README for details.

December 28, 2010 -

A new release is available, version 2.1.0. See HISTORY for changes and README for details.

November 19, 2010 -

A new release is available, version 2.0.4. See HISTORY for changes and README for details.

October 28, 2010 -

A new release is available, version 2.0.3. See HISTORY for changes and README for details.

September 20, 2010 -

A new release is available, version 2.0.2. See HISTORY for changes and README for details. Most importantly, we changed the default score thresholds to increase the (default) sensitivity on reads of lengths smaller than 70bp.

May 18, 2010 -

SHRiMP2 is now available. This release includes a major design change since 1.3.2 in the fact that we now index the genome instead of the reads. As a result, in order to run SHRiMP2 on machines with limited memory, you might have to split the genome into smaller chunks. As benefits of this design change, SHRiMP2 now supports multi-threaded operation (via OpenMP), as well as mate-pair/pair-end mapping. Another important feature we added is native support for SAM output format. For further details, see the README file.

January 28, 2010 -

SHRiMP 1.3.2 is now available. This version fixes a couple of bugs, as explained in HISTORY. We also introduced the '-ungapped' option which tells rmapper to perform ungapped alignment between the queries and the database, as well as a '-M mirna' mode which loads some settings that were tested in the context of micro RNA analysis. For more details, see the README file. (Thank you Alessandro Guffanti.)

November 24, 2009 -

SHRiMP 1.3.1 is available. This version fixes two bugs. One of them, related to memory management, caused rmapper to allocate more memory than needed, increasing unnecessarily its virtual memory footprint. The other bug caused rmapper to miss some kmers at the beginning of each database sequence (contig). The fix should positively affect the sensitivity of rmapper on databases of small sequences (e.g., miRNA). We'd like to thank Alessandro Guffanti and Hossein Farahani for the detailed feedback that lead to the identification of these bugs.

October 6, 2009 -

SHRiMP 1.3.0 is now available. As a major upgrade, 'rmapper' now includes proper support for multiple spaced seeds. Using the new default 4 spaced seeds of weight 12 as opposed to the old default of 1 spaced seed of weight 8 decreases the running time by a factor of 5 or even more in certain settings, all without affecting sensitivity. 'rmapper' now comes equipped with several pre-defined sets of parameters, designed for different (approximate) read lengths (see the description of the '-M' option in the README file).

June 30, 2009 -

SHRiMP 1.2.1 is out. This update primarily includes support for multiple spaced seeds and 'shrimp_var', a new utility for detailing the variations detected for specific hits. Many minor bug fixes and features were added, including the -T and -U flags for rmapper (see the README). Please note that all default Smith-Waterman parameters were divided by 10 in this release in order to accomodate longer reads in our vectorized filter. The algorithms are unchanged, but the alignment scores will by off by an order compared to previous versions. Check out the HISTORY file for more information on changes in this release.

For those interested in SHRiMP's inner workings, we have a recent publication in PLoS Computational Biology.

February 27, 2009 -

SHRiMP 1.2.0 has been released. This version incorporates a mate-pair version of probcalc, substantial bug fixes and enhancements to probcalc, Solaris support, significantly reduced memory usage in rmapper, and support for long spaced seeds up to 128 positions in length. As usual, check the HISTORY file for more information. New features are documented in the README.

July 19, 2008 -

Concurrent with the ISMB Short-SIG, we're pleased to announce the release of SHRiMP 1.1.0. This release features a number of important changes, namely the addition of an experimental aligner for Helicos 2-pass reads, significant improvements and bugfixes in probcalc, a new, more concise output format with succinct edit string alignment representations, speed enhancements and a number of bugfixes. As always, check out the HISTORY file for more information. Source and ICC builds for Linux (x86 and x86_64) are available for download. OS X builds should be appearing shortly.

March 25, 2008 -

SHRiMP 1.0.5 is out. Bug fixes included better handling of input files with strange characters (e.g. MSDOS newlines), proper wobble code complementation, and some GCC 4 compiler warning fixes. Default parameters have changed slightly: letter space seeds were increased to better deal with longer reads of 454 and Illumina/Solexa machines, kmer pruning is disabled by default again, and gap open and crossover penalties were increased. A new feature for better handling non-uniform read lengths has been added as well: the -h, -v and -w flags can now take relative arguments. For example, '-w 120%' specifies a window size to be 120% of each read's length. Please read the HISTORY file for more information.

January 24, 2008 -

SHRiMP 1.0.4 is now available (1.0.3 was an internal development version). Some important bug fixes were made regarding colourspace alignment and various useful usability features were added. Please read the HISTORY file for more information.

November 6, 2007 -

SHRiMP 1.0.2 is now available. An incorrect assertion bug has been fixed and various documentation updates were made.

November 2, 2007 -

SHRiMP 1.0.1, the first public release, is now available. Use the links on the left to download in source form, or pre-compiled static binaries for i686 and x86_64 platforms (Linux 2.6).