dc.contributor.author | Grossi, Roberto | |
dc.contributor.author | Gupta, Ankur | |
dc.contributor.author | Vitter, Jeffrey Scott | |
dc.date.accessioned | 2011-03-18T19:01:12Z | |
dc.date.available | 2011-03-18T19:01:12Z | |
dc.date.issued | 2003 | |
dc.identifier.citation | R. Grossi, A. Gupta, and J. S. Vitter. “High-Order Entropy-Compressed Text Indexes,” being partitioned with more recent work for submission to journals. An extended abstract appears in Proceedings of the 14th Annual SIAM/ACM Symposium on Discrete Algorithms (SODA ’03), Baltimore, MD, January 2003, 841–850. | |
dc.identifier.uri | http://hdl.handle.net/1808/7192 | |
dc.description.abstract | We present a novel implementation of compressed su~x arrays exhibiting new tradeoffs between search time and space occupancy for a given text (or sequence) of n symbols over an alphabet E, where each symbol is encoded by lg ]E I bits. We show that compressed su1~x arrays use just
nHh + O(n lglg n~ lgl~ I n) bits, while retaining full text indexing functionalities, such as searching any pattern sequence of length m in O(mlg [E[ + polylog(n)) time. The term Hh < lg IEI denotes the hth-order empirical entropy of the text, which means that our index is nearly optimal in space apart from lower-order terms, achieving asymptotically the empirical entropy of the text (with a multiplicative constant 1). If the text is highly compressible so that H~ = o(1) and the alphabet size is small, we obtain a text index with
o(m) search time that requires only o(n) bits. Further results and tradeoffs are reported in the paper. | |
dc.language.iso | en_US | |
dc.publisher | Society for Industrial and Applied Mathematics Philadelphia | |
dc.relation.isversionof | http://portal.acm.org/citation.cfm?id=644250 | |
dc.title | High-Order Entropy-Compressed Text Indexes | |
dc.type | Article | |
kusw.kuauthor | Vitter, Jeffrey Scott | |
kusw.oastatus | fullparticipation | |
kusw.oaversion | Scholarly/refereed, publisher version | |
kusw.oapolicy | This item meets KU Open Access policy criteria. | |
dc.rights.accessrights | openAccess | |