ATTENTION: The software behind KU ScholarWorks is being upgraded to a new version. Starting July 15th, users will not be able to log in to the system, add items, nor make any changes until the new version is in place at the end of July. Searching for articles and opening files will continue to work while the system is being updated. If you have any questions, please contact Marianne Reed at mreed@ku.edu .

Show simple item record

dc.contributor.authorGrossi, Roberto
dc.contributor.authorGupta, Ankur
dc.contributor.authorVitter, Jeffrey Scott
dc.date.accessioned2011-03-18T19:01:12Z
dc.date.available2011-03-18T19:01:12Z
dc.date.issued2003
dc.identifier.citationR. Grossi, A. Gupta, and J. S. Vitter. “High-Order Entropy-Compressed Text Indexes,” being partitioned with more recent work for submission to journals. An extended abstract appears in Proceedings of the 14th Annual SIAM/ACM Symposium on Discrete Algorithms (SODA ’03), Baltimore, MD, January 2003, 841–850.
dc.identifier.urihttp://hdl.handle.net/1808/7192
dc.description.abstractWe present a novel implementation of compressed su~x arrays exhibiting new tradeoffs between search time and space occupancy for a given text (or sequence) of n symbols over an alphabet E, where each symbol is encoded by lg ]E I bits. We show that compressed su1~x arrays use just nHh + O(n lglg n~ lgl~ I n) bits, while retaining full text indexing functionalities, such as searching any pattern sequence of length m in O(mlg [E[ + polylog(n)) time. The term Hh < lg IEI denotes the hth-order empirical entropy of the text, which means that our index is nearly optimal in space apart from lower-order terms, achieving asymptotically the empirical entropy of the text (with a multiplicative constant 1). If the text is highly compressible so that H~ = o(1) and the alphabet size is small, we obtain a text index with o(m) search time that requires only o(n) bits. Further results and tradeoffs are reported in the paper.
dc.language.isoen_US
dc.publisherSociety for Industrial and Applied Mathematics Philadelphia
dc.relation.isversionofhttp://portal.acm.org/citation.cfm?id=644250
dc.titleHigh-Order Entropy-Compressed Text Indexes
dc.typeArticle
kusw.kuauthorVitter, Jeffrey Scott
kusw.oastatusfullparticipation
kusw.oaversionScholarly/refereed, publisher version
kusw.oapolicyThis item meets KU Open Access policy criteria.
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record