Show simple item record

dc.contributor.authorThippabhotla, Sirisha
dc.contributor.authorLiu, Ben
dc.contributor.authorPodgorny, Adam
dc.contributor.authorYooseph, Shibu
dc.contributor.authorYang, Youngik
dc.contributor.authorZhang, Jun
dc.contributor.authorZhong, Cuncong
dc.date.accessioned2023-04-10T18:11:15Z
dc.date.available2023-04-10T18:11:15Z
dc.date.issued2023-03-11
dc.identifier.citationSirisha Thippabhotla, Ben Liu, Adam Podgorny, Shibu Yooseph, Youngik Yang, Jun Zhang, Cuncong Zhong, Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data, NAR Genomics and Bioinformatics, Volume 5, Issue 1, March 2023, lqad023, https://doi.org/10.1093/nargab/lqad023en_US
dc.identifier.urihttps://hdl.handle.net/1808/34083
dc.description.abstractMetagenomics is the study of all genomic content contained in given microbial communities. Metagenomic functional analysis aims to quantify protein families and reconstruct metabolic pathways from the metagenome. It plays a central role in understanding the interaction between the microbial community and its host or environment. De novo functional analysis, which allows the discovery of novel protein families, remains challenging for high-complexity communities. There are currently three main approaches for recovering novel genes or proteins: de novo nucleotide assembly, gene calling and peptide assembly. Unfortunately, their information dependency has been overlooked, and each has been formulated as an independent problem. In this work, we develop a sophisticated workflow called integrated Metagenomic Protein Predictor (iMPP), which leverages the information dependencies for better de novo functional analysis. iMPP contains three novel modules: a hybrid assembly graph generation module, a graph-based gene calling module, and a peptide assembly-based refinement module. iMPP significantly improved the existing gene calling sensitivity on unassembled metagenomic reads, achieving a 92–97% recall rate at a high precision level (>85%). iMPP further allowed for more sensitive and accurate peptide assembly, recovering more reference proteins and delivering more hypothetical protein sequences. The high performance of iMPP can provide a more comprehensive and unbiased view of the microbial communities under investigation. iMPP is freely available from https://github.com/Sirisha-t/iMPP.en_US
dc.publisherOxford University Pressen_US
dc.rightsCopyright The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License.en_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/en_US
dc.titleIntegrated de novo gene prediction and peptide assembly of metagenomic sequencing dataen_US
dc.typeArticleen_US
kusw.kuauthorThippabhotla, Sirisha
kusw.kuauthorLiu, Ben
kusw.kuauthorPodgorny, Adam
kusw.kuauthorZhong, Cuncong
kusw.kudepartmentElectrical Engineering and Computer Scienceen_US
kusw.kudepartmentCenter for Computational Biologyen_US
dc.identifier.doi10.1093/nargab/lqad023en_US
dc.identifier.orcidhttps://orcid.org/0000-0002-8777-082Xen_US
kusw.oaversionScholarly/refereed, publisher versionen_US
kusw.oapolicyThis item meets KU Open Access policy criteria.en_US
dc.rights.accessrightsopenAccessen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Copyright The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License.
Except where otherwise noted, this item's license is described as: Copyright The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License.