Phusion2 is a pipeline for de novo genome assembly using NGS data. It is based upon a strategy called read clustering. Starting with kmer frequency analysis, this allows for a reasonable selection of the kmer sizes. K-tuples from raw reads are merged and sorted into a table so that multiple occurring kmer words shared by different reads can be linked. A relation matrix is used to record the shared kmer words among all the reads. Setting a minimum threshold of shared k-tuples, the whole set of reads can then be clustered into groups using kmer sharing information in the relational matrix. After obtaining small read clusters with a controllable size, a local assembler can be used to produce contigs.
The Phusion2 pipeline was used in assembling a number of genomes, such as Gorilla, Zebrafish, Tasmanian Devil, Bamboo and Miscanthus, etc.