Show simple item record

dc.contributor.authorZhong, Cuncong
dc.contributor.authorYang, Youngik
dc.contributor.authorYooseph, Shibu
dc.date.accessioned2017-09-22T17:00:23Z
dc.date.available2017-09-22T17:00:23Z
dc.date.issued2016-08-31
dc.identifier.citationZhong, C., Yang, Y., & Yooseph, S. (2016). GRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assembly. BMC Bioinformatics, 17(Suppl 8), 283. http://doi.org/10.1186/s12859-016-1119-1en_US
dc.identifier.urihttp://hdl.handle.net/1808/24999
dc.description.abstractBackground Metagenomics is a cultivation-independent approach that enables the study of the genomic composition of microbes present in an environment. Metagenomic samples are routinely sequenced using next-generation sequencing technologies that generate short nucleotide reads. Proteins identified from these reads are mostly of partial length. On the other hand, de novo assembly of a large metagenomic dataset is computationally demanding and the assembled contigs are often fragmented, resulting in the identification of protein sequences that are also of partial length and incomplete. Annotation of an incomplete protein sequence often proceeds by identifying its homologs in a database of reference sequences. Identifying the homologs of incomplete sequences is a challenge and can result in substandard annotation of proteins from metagenomic datasets. To address this problem, we recently developed a homology detection algorithm named GRASP (Guided Reference-based Assembly of Short Peptides) that identifies the homologs of a given reference protein sequence in a database of short peptide metagenomic sequences. GRASP was developed to implement a simultaneous alignment and assembly algorithm for annotation of short peptides identified on metagenomic reads. The program achieves significantly improved recall rate at the cost of computational efficiency. In this article, we adopted three techniques to speed up the original version of GRASP, including the pre-construction of extension links, local assembly of individual seeds, and the implementation of query-level parallelism.

Results The resulting new program, GRASPx, achieves >30X speedup compared to its predecessor GRASP. At the same time, we show that the performance of GRASPx is consistent with that of GRASP, and that both of them significantly outperform other popular homology-search tools including the BLAST and FASTA suites. GRASPx was also applied to a human saliva metagenome dataset and shows superior performance for both recall and precision rates.

Conclusions In this article we present GRASPx, a fast and accurate homology-search program implementing a simultaneous alignment and assembly framework. GRASPx can be used for more comprehensive and accurate annotation of short peptides. GRASPx is freely available at http://graspx.sourceforge.net/.

Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1119-1) contains supplementary material, which is available to authorized users.
en_US
dc.publisherBioMed Centralen_US
dc.rights© 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.en_US
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/publicdomain/zero/1.0/en_US
dc.titleGRASPx: efficient homolog-search of short peptide metagenome database through simultaneous alignment and assemblyen_US
dc.typeArticleen_US
kusw.kuauthorZhong, Cuncong
kusw.kudepartmentElectrical Engineering and Computer Scienceen_US
dc.identifier.doi10.1186/s12859-016-1119-1en_US
kusw.oaversionScholarly/refereed, publisher versionen_US
kusw.oapolicyThis item meets KU Open Access policy criteria.en_US
dc.identifier.pmidPMC5009819en_US
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

© 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Except where otherwise noted, this item's license is described as: © 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.