Loading...
Thumbnail Image
Publication

Compressed Index for Dictionary Matching (extended abstract)

Hon, Wing-Kai
Lam, Tak-Wah
Shah, Rahul
Siu-Lung, Tam
Vitter, Jeffrey Scott
Citations
Altmetric:
Abstract
The past few years have witnessed several exciting results on compressed represen- tation of a string T that supports e±cient pattern matching, and the space complexity has been reduced to jTjHk(T)+o(jTj log ¾) bits [8, 10], where Hk(T) denotes the kth- order empirical entropy of T, and ¾ is the size of the alphabet. In this paper we study compressed representation for another classical problem of string indexing, which is called dictionary matching in the literature. Precisely, a collection D of strings (called patterns) of total length n is to be indexed so that given a text T, the occurrences of the patterns in T can be found e±ciently. In this paper we show how to exploit a sampling technique to compress the existing O(n)-word index to an (nHk(D) + o(n log ¾))-bit index with only a small sacri¯ce in search time.
Description
(c) 2008 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
Date
2008
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Research Projects
Organizational Units
Journal Issue
Keywords
Citation
W.-K. Hon, T.-W. Lam, R. Shah, S.-L. Tam, and J. S. Vitter. “Compressed Index for Dictionary Matching,” in preparation. An extended abstract appears in Proceedings of the 2008 IEEE Data Compression Conference (DCC ’08), Snowbird, UT, March 2008, 23-32. http://dx.doi.org/10.1109/DCC.2008.62
Embedded videos