LSCluster, a large-scale sequence clustering and aligning software for use in partial identity mapping and splice-variant analysis

H. Husi, R.J. Skipworth, K.C.H. Fearon, J.A. Ross

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Many sequence analyses and multiple sequence alignment tools are widely used in biological research and are well described. However, large-scale proteome-wide analysis to identify potential splice-variants, describe the sequence differences compared to a progenitor sequence and cluster those sequences into individual groups for further analysis is a difficult task with the tools available, and a desktop-based, stand-alone search engine with the capabilities to align and cluster thousands of sequences and present the output in a deprecated format has been lacking. We have developed a novel software named LSCluster (Large-Scale CLUSTERing) which allows users to group tens of thousands of sequences based on sequence alignments or partial identity mapping, and can be used specifically for the detection of splicing variants and other pairs of sequences sharing identical fragments. One of the unique features of LSCluster is its ability to display the alignment output as a deprecated string thereby listing only differences in aligned sequences. The software (current version 2.0) is freely available through the PADB (Proteomic Analysis DataBase) initiative at www.PADB.org.

Biological significance: Large-scale proteome-wide analysis to identify potential splice-variants, describe the sequence differences compared to a progenitor sequence and cluster those sequences into individual groups for further analysis is a difficult task with the tools presently available. This work introduces a desktop-based, stand-alone search engine with the capabilities to align and cluster thousands of sequences and present the output in a deprecated format. We have developed a novel software named LSCluster (Large-Scale CLUSTERing) which allows users to group tens of thousands of sequences based on sequence alignments or partial identity mapping which can be used specifically for the detection of splicing variants and other pairs of sequences sharing identical fragments. One of the unique features of LSCluster is the ability to display the alignment output as a deprecated string listing only differences in aligned sequences. The software (current version 2.0) is freely available through the PADB (Proteomic Analysis DataBase) initiative at www.PADB.org.

Original languageEnglish
Pages (from-to)185-189
Number of pages5
JournalJournal of Proteomics
Volume84
DOIs
Publication statusPublished - 1 Jun 2013

Fingerprint Dive into the research topics of 'LSCluster, a large-scale sequence clustering and aligning software for use in partial identity mapping and splice-variant analysis'. Together they form a unique fingerprint.

  • Cite this