Compressed Index for Dictionary Matching with One Error
View/ Open
Issue Date
2011-03-29Author
Hon, Wing-Kai
Lam, Tak-Wah
Shah, Rahul
Siu-Lung, Tam
Vitter, Jeffrey Scott
Publisher
Institute of Electrical and Electronics Engineers
Type
Article
Article Version
Scholarly/refereed, author accepted manuscript
Metadata
Show full item recordAbstract
Given a set D of d patterns of total length n, the dictionary matching problem is to index D such that for any query text T, we can locate the occurrences of any pattern within T efficiently. This problem can be solved in optimal O(|T|+occ) time by the classical AC automaton (Aho and Corasick, 1975) where occ denotes the number of occurrences. The space requirement is O(n) words. In the {approximate} dictionary matching problem with one error, we consider a substring of T[i..j] an occurrence of P whenever the edit distance between T[i..j] and P is at most one. For this problem, the best known indexes are by Cole et al. (2004), which requires O(n+ dlog{d}) words of space and reports all occurrences in O(|T|log{d}log{log{d}}+occ) time, and by Ferragina et al. (1999), which requires O(n^{1+epsilon}) words of space and reports all occurrences in O(|T|loglog n + occ) time. Recently, there have been successes in compressing the dictionary matching index while keeping the query time optimal (Belazzougui, 2010, Hon et al., 2010). However, a compressed index for approximate dictionary matching problem is still open. In this paper, we propose the first such index which requires an optimal nH_k+O(n)+o(nlogsigma)-bit index space, where H_k denotes the kth-order empirical entropy of D, and sigma is the size of alphabet set from which all the characters in D and T are drawn. The query time of our index is O(σ|T|log3 n log log n + occ).
Description
DOI: 10.1109/DCC.2011.18
Collections
Citation
W.-K. Hon, Tsung-Han Ku, R. Shah, S. V. Thankachan, and J. S. Vitter. “Compressed Dictionary Matching With One Error,” in preparation. An extended abstract appears in Proceedings of the 2011 IEEE Data Compression Conference (DCC ’11), Snowbird, UT, March 2011. http://dx.doi.org/10.1109/DCC.2011.18
Items in KU ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
We want to hear from you! Please share your stories about how Open Access to this item benefits YOU.
Related items
Showing items related by title, author, creator and subject.
-
The Efficacy of Propensity Score Matching in Bias Reduction with Limited Sample Sizes
Howarter, Stephani (University of Kansas, 2015-12-31)The current literature on propensity score matching is missing imperative information for educational researchers regarding the practical implications of utilizing this method with limited sample sizes. The purpose of this ... -
On the local interaction of money and credit
Jin, Yi; Temzelides, Ted (ACADEMIC PRESS INC ELSEVIER SCIENCE, 2004-01)We study the coexistence of monetary and credit transactions in a model where exchange is decentralized. Agents belong to different locations which are informationally separated. The equilibrium mix of monetary and credit ... -
Examining sensory ability, feature matching, and assessment-based adaptation for a brain-computer interface using the steady-state visually evoked potential
Brumberg, Jonathan S.; Nguyen, Anh; Pitt, Kevin M.; Lorenz, Sean D. (Taylor & Francis, 2018-01-31)PURPOSE:We investigated how overt visual attention and oculomotor control influence successful use of a visual feedback brain-computer interface (BCI) for accessing augmentative and alternative communication (AAC) devices ...