ATTENTION: The software behind KU ScholarWorks is being upgraded to a new version. Starting July 15th, users will not be able to log in to the system, add items, nor make any changes until the new version is in place at the end of July. Searching for articles and opening files will continue to work while the system is being updated.
If you have any questions, please contact Marianne Reed at mreed@ku.edu .
High-Order Entropy-Compressed Text Indexes
dc.contributor.author | Grossi, Roberto | |
dc.contributor.author | Gupta, Ankur | |
dc.contributor.author | Vitter, Jeffrey Scott | |
dc.date.accessioned | 2011-03-18T19:01:12Z | |
dc.date.available | 2011-03-18T19:01:12Z | |
dc.date.issued | 2003 | |
dc.identifier.citation | R. Grossi, A. Gupta, and J. S. Vitter. “High-Order Entropy-Compressed Text Indexes,” being partitioned with more recent work for submission to journals. An extended abstract appears in Proceedings of the 14th Annual SIAM/ACM Symposium on Discrete Algorithms (SODA ’03), Baltimore, MD, January 2003, 841–850. | |
dc.identifier.uri | http://hdl.handle.net/1808/7192 | |
dc.description.abstract | We present a novel implementation of compressed su~x arrays exhibiting new tradeoffs between search time and space occupancy for a given text (or sequence) of n symbols over an alphabet E, where each symbol is encoded by lg ]E I bits. We show that compressed su1~x arrays use just nHh + O(n lglg n~ lgl~ I n) bits, while retaining full text indexing functionalities, such as searching any pattern sequence of length m in O(mlg [E[ + polylog(n)) time. The term Hh < lg IEI denotes the hth-order empirical entropy of the text, which means that our index is nearly optimal in space apart from lower-order terms, achieving asymptotically the empirical entropy of the text (with a multiplicative constant 1). If the text is highly compressible so that H~ = o(1) and the alphabet size is small, we obtain a text index with o(m) search time that requires only o(n) bits. Further results and tradeoffs are reported in the paper. | |
dc.language.iso | en_US | |
dc.publisher | Society for Industrial and Applied Mathematics Philadelphia | |
dc.relation.isversionof | http://portal.acm.org/citation.cfm?id=644250 | |
dc.title | High-Order Entropy-Compressed Text Indexes | |
dc.type | Article | |
kusw.kuauthor | Vitter, Jeffrey Scott | |
kusw.oastatus | fullparticipation | |
kusw.oaversion | Scholarly/refereed, publisher version | |
kusw.oapolicy | This item meets KU Open Access policy criteria. | |
dc.rights.accessrights | openAccess |