Show simple item record

dc.contributor.authorYu, Qiang
dc.contributor.authorHuo, Hongwei
dc.contributor.authorZhao, Ruixing
dc.contributor.authorFeng, Dazheng
dc.contributor.authorVitter, Jeffrey Scott
dc.contributor.authorHuan, Jun
dc.date.accessioned2017-12-06T21:20:15Z
dc.date.available2017-12-06T21:20:15Z
dc.date.issued2015-11-12
dc.identifier.citationYu, Q., Huo, H., Zhao, R., Feng, D., Vitter, J. S., & Huan, J. (2016). RefSelect: a reference sequence selection algorithm for planted (l, d) motif search. BMC bioinformatics, 17(9), 266.en_US
dc.identifier.urihttp://hdl.handle.net/1808/25596
dc.description.abstractBackground The planted (l, d) motif search (PMS) is an important yet challenging problem in computational biology. Pattern-driven PMS algorithms usually use k out of t input sequences as reference sequences to generate candidate motifs, and they can find all the (l, d) motifs in the input sequences. However, most of them simply take the first k sequences in the input as reference sequences without elaborate selection processes, and thus they may exhibit sharp fluctuations in running time, especially for large alphabets.

Results In this paper, we build the reference sequence selection problem and propose a method named RefSelect to quickly solve it by evaluating the number of candidate motifs for the reference sequences. RefSelect can bring a practical time improvement of the state-of-the-art pattern-driven PMS algorithms. Experimental results show that RefSelect (1) makes the tested algorithms solve the PMS problem steadily in an efficient way, (2) particularly, makes them achieve a speedup of up to about 100× on the protein data, and (3) is also suitable for large data sets which contain hundreds or more sequences.

Conclusions The proposed algorithm RefSelect can be used to solve the problem that many pattern-driven PMS algorithms present execution time instability. RefSelect requires a small amount of storage space and is capable of selecting reference sequences efficiently and effectively. Also, the parallel version of RefSelect is provided for handling large data sets.
en_US
dc.publisherBioMed Centralen_US
dc.rights© 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.en_US
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en_US
dc.subjectPlanted (l, d) motif searchen_US
dc.subjectPattern-drivenen_US
dc.subjectReference sequencesen_US
dc.titleRefSelect: a reference sequence selection algorithm for planted (l, d) motif searchen_US
dc.typeArticleen_US
kusw.kuauthorHuan, Jun
kusw.kudepartmentElectrical Engineering and Computer Scienceen_US
dc.identifier.doi10.1186/s12859-016-1130-6en_US
kusw.oaversionScholarly/refereed, publisher versionen_US
kusw.oapolicyThis item meets KU Open Access policy criteria.en_US
dc.rights.accessrightsopenAccessen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

© 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Except where otherwise noted, this item's license is described as: © 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.