Show simple item record

dc.contributor.authorHon, Wing-Kai
dc.contributor.authorLam, Tak-Wah
dc.contributor.authorShah, Rahul
dc.contributor.authorSiu-Lung, Tam
dc.contributor.authorVitter, Jeffrey Scott
dc.date.accessioned2011-07-25T21:23:21Z
dc.date.available2011-07-25T21:23:21Z
dc.date.issued2011-03-29
dc.identifier.citationW.-K. Hon, Tsung-Han Ku, R. Shah, S. V. Thankachan, and J. S. Vitter. “Compressed Dictionary Matching With One Error,” in preparation. An extended abstract appears in Proceedings of the 2011 IEEE Data Compression Conference (DCC ’11), Snowbird, UT, March 2011. http://dx.doi.org/10.1109/DCC.2011.18
dc.identifier.urihttp://hdl.handle.net/1808/7806
dc.descriptionDOI: 10.1109/DCC.2011.18
dc.description.abstractGiven a set D of d patterns of total length n, the dictionary matching problem is to index D such that for any query text T, we can locate the occurrences of any pattern within T efficiently. This problem can be solved in optimal O(|T|+occ) time by the classical AC automaton (Aho and Corasick, 1975) where occ denotes the number of occurrences. The space requirement is O(n) words. In the {approximate} dictionary matching problem with one error, we consider a substring of T[i..j] an occurrence of P whenever the edit distance between T[i..j] and P is at most one. For this problem, the best known indexes are by Cole et al. (2004), which requires O(n+ dlog{d}) words of space and reports all occurrences in O(|T|log{d}log{log{d}}+occ) time, and by Ferragina et al. (1999), which requires O(n^{1+epsilon}) words of space and reports all occurrences in O(|T|loglog n + occ) time. Recently, there have been successes in compressing the dictionary matching index while keeping the query time optimal (Belazzougui, 2010, Hon et al., 2010). However, a compressed index for approximate dictionary matching problem is still open. In this paper, we propose the first such index which requires an optimal nH_k+O(n)+o(nlogsigma)-bit index space, where H_k denotes the kth-order empirical entropy of D, and sigma is the size of alphabet set from which all the characters in D and T are drawn. The query time of our index is O(σ|T|log3 n log log n + occ).
dc.language.isoen_US
dc.publisherInstitute of Electrical and Electronics Engineers
dc.subjectApproximation algorithms
dc.subjectAutomata
dc.subjectComputers
dc.subjectDictionaries
dc.subjectEntropy
dc.subjectPattern matching
dc.subjectComputational complexity
dc.subjectData compression
dc.subjectIndex
dc.subjectQuery processing
dc.subjectString matching
dc.subjectAC automaton
dc.subjectO(n) word
dc.subjectCompressed dictionary matching
dc.subjectKth order empirical entropy
dc.subjectQuery text
dc.subjectApproximate matching
dc.subjectCompressed indexes
dc.subjectDictionary matching
dc.subjectSuffix tree
dc.titleCompressed Index for Dictionary Matching with One Error
dc.typeArticle
kusw.kuauthorVitter, Jeffrey Scott
kusw.kudepartmentOffice of the Provost
kusw.oastatusfullparticipation
dc.identifier.doi10.1109/DCC.2011.18
kusw.oaversionScholarly/refereed, author accepted manuscript
kusw.oapolicyThis item meets KU Open Access policy criteria.
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record