We present a novel implementation of compressed su~x arrays exhibiting new tradeoffs between search time and space occupancy for a given text (or sequence) of n symbols over an alphabet E, where each symbol is encoded by lg ]E I bits. We show that compressed su1~x arrays use just
nHh + O(n lglg n~ lgl~ I n) bits, while retaining full text indexing functionalities, such as searching any pattern sequence of length m in O(mlg [E[ + polylog(n)) time. The term Hh < lg IEI denotes the hth-order empirical entropy of the text, which means that our index is nearly optimal in space apart from lower-order terms, achieving asymptotically the empirical entropy of the text (with a multiplicative constant 1). If the text is highly compressible so that H~ = o(1) and the alphabet size is small, we obtain a text index with
o(m) search time that requires only o(n) bits. Further results and tradeoffs are reported in the paper.
R. Grossi, A. Gupta, and J. S. Vitter. “High-Order Entropy-Compressed Text Indexes,” being partitioned with more recent work for submission to journals. An extended abstract appears in Proceedings of the 14th Annual SIAM/ACM Symposium on Discrete Algorithms (SODA ’03), Baltimore, MD, January 2003, 841–850.
Items in KU ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
We want to hear from you! Please
share your stories
about how Open Access to this item benefits YOU.
The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.