RefSelect: a reference sequence selection algorithm for planted (l, d) motif search
dc.contributor.author | Yu, Qiang | |
dc.contributor.author | Huo, Hongwei | |
dc.contributor.author | Zhao, Ruixing | |
dc.contributor.author | Feng, Dazheng | |
dc.contributor.author | Vitter, Jeffrey Scott | |
dc.contributor.author | Huan, Jun | |
dc.date.accessioned | 2017-12-06T21:20:15Z | |
dc.date.available | 2017-12-06T21:20:15Z | |
dc.date.issued | 2015-11-12 | |
dc.identifier.citation | Yu, Q., Huo, H., Zhao, R., Feng, D., Vitter, J. S., & Huan, J. (2016). RefSelect: a reference sequence selection algorithm for planted (l, d) motif search. BMC bioinformatics, 17(9), 266. | en_US |
dc.identifier.uri | http://hdl.handle.net/1808/25596 | |
dc.description.abstract | Background The planted (l, d) motif search (PMS) is an important yet challenging problem in computational biology. Pattern-driven PMS algorithms usually use k out of t input sequences as reference sequences to generate candidate motifs, and they can find all the (l, d) motifs in the input sequences. However, most of them simply take the first k sequences in the input as reference sequences without elaborate selection processes, and thus they may exhibit sharp fluctuations in running time, especially for large alphabets.Results In this paper, we build the reference sequence selection problem and propose a method named RefSelect to quickly solve it by evaluating the number of candidate motifs for the reference sequences. RefSelect can bring a practical time improvement of the state-of-the-art pattern-driven PMS algorithms. Experimental results show that RefSelect (1) makes the tested algorithms solve the PMS problem steadily in an efficient way, (2) particularly, makes them achieve a speedup of up to about 100× on the protein data, and (3) is also suitable for large data sets which contain hundreds or more sequences.Conclusions The proposed algorithm RefSelect can be used to solve the problem that many pattern-driven PMS algorithms present execution time instability. RefSelect requires a small amount of storage space and is capable of selecting reference sequences efficiently and effectively. Also, the parallel version of RefSelect is provided for handling large data sets. | en_US |
dc.publisher | BioMed Central | en_US |
dc.rights | © 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | en_US |
dc.subject | Planted (l, d) motif search | en_US |
dc.subject | Pattern-driven | en_US |
dc.subject | Reference sequences | en_US |
dc.title | RefSelect: a reference sequence selection algorithm for planted (l, d) motif search | en_US |
dc.type | Article | en_US |
kusw.kuauthor | Huan, Jun | |
kusw.kudepartment | Electrical Engineering and Computer Science | en_US |
dc.identifier.doi | 10.1186/s12859-016-1130-6 | en_US |
kusw.oaversion | Scholarly/refereed, publisher version | en_US |
kusw.oapolicy | This item meets KU Open Access policy criteria. | en_US |
dc.rights.accessrights | openAccess | en_US |
Files in this item
This item appears in the following Collection(s)
Except where otherwise noted, this item's license is described as: © 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.