Loading...
Compressed Index for Dictionary Matching (extended abstract)
Hon, Wing-Kai ; Lam, Tak-Wah ; Shah, Rahul ; Siu-Lung, Tam ; Vitter, Jeffrey Scott
Hon, Wing-Kai
Lam, Tak-Wah
Shah, Rahul
Siu-Lung, Tam
Vitter, Jeffrey Scott
Citations
Altmetric:
Abstract
The past few years have witnessed several exciting results on compressed represen-
tation of a string T that supports e±cient pattern matching, and the space complexity
has been reduced to jTjHk(T)+o(jTj log ¾) bits [8, 10], where Hk(T) denotes the kth-
order empirical entropy of T, and ¾ is the size of the alphabet. In this paper we study
compressed representation for another classical problem of string indexing, which is
called dictionary matching in the literature. Precisely, a collection D of strings (called
patterns) of total length n is to be indexed so that given a text T, the occurrences of the
patterns in T can be found e±ciently. In this paper we show how to exploit a sampling
technique to compress the existing O(n)-word index to an (nHk(D) + o(n log ¾))-bit
index with only a small sacri¯ce in search time.
Description
(c) 2008 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
Date
2008
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Research Projects
Organizational Units
Journal Issue
Keywords
Citation
W.-K. Hon, T.-W. Lam, R. Shah, S.-L. Tam, and J. S. Vitter. “Compressed Index for Dictionary Matching,” in preparation. An extended abstract appears in Proceedings of the 2008 IEEE Data Compression Conference (DCC ’08), Snowbird, UT, March 2008, 23-32. http://dx.doi.org/10.1109/DCC.2008.62