KUKU

KU ScholarWorks

  • myKU
  • Email
  • Enroll & Pay
  • KU Directory
    • Login
    View Item 
    •   KU ScholarWorks
    • Office of the Provost
    • Provost Office Published Articles
    • View Item
    •   KU ScholarWorks
    • Office of the Provost
    • Provost Office Published Articles
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    High-Order Entropy-Compressed Text Indexes

    Thumbnail
    View/Open
    vitter_2003.pdf (1.082Mb)
    Issue Date
    2003
    Author
    Grossi, Roberto
    Gupta, Ankur
    Vitter, Jeffrey Scott
    Publisher
    Society for Industrial and Applied Mathematics Philadelphia
    Type
    Article
    Article Version
    Scholarly/refereed, publisher version
    Published Version
    http://portal.acm.org/citation.cfm?id=644250
    Metadata
    Show full item record
    Abstract
    We present a novel implementation of compressed su~x arrays exhibiting new tradeoffs between search time and space occupancy for a given text (or sequence) of n symbols over an alphabet E, where each symbol is encoded by lg ]E I bits. We show that compressed su1~x arrays use just nHh + O(n lglg n~ lgl~ I n) bits, while retaining full text indexing functionalities, such as searching any pattern sequence of length m in O(mlg [E[ + polylog(n)) time. The term Hh < lg IEI denotes the hth-order empirical entropy of the text, which means that our index is nearly optimal in space apart from lower-order terms, achieving asymptotically the empirical entropy of the text (with a multiplicative constant 1). If the text is highly compressible so that H~ = o(1) and the alphabet size is small, we obtain a text index with o(m) search time that requires only o(n) bits. Further results and tradeoffs are reported in the paper.
    URI
    http://hdl.handle.net/1808/7192
    Collections
    • Provost Office Published Articles [95]
    • Electrical Engineering and Computer Science Scholarly Works [288]
    • Distinguished Professors Scholarly Works [918]
    Citation
    R. Grossi, A. Gupta, and J. S. Vitter. “High-Order Entropy-Compressed Text Indexes,” being partitioned with more recent work for submission to journals. An extended abstract appears in Proceedings of the 14th Annual SIAM/ACM Symposium on Discrete Algorithms (SODA ’03), Baltimore, MD, January 2003, 841–850.

    Items in KU ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.


    We want to hear from you! Please share your stories about how Open Access to this item benefits YOU.


    Contact KU ScholarWorks
    785-864-8983
    KU Libraries
    1425 Jayhawk Blvd
    Lawrence, KS 66045
    785-864-8983

    KU Libraries
    1425 Jayhawk Blvd
    Lawrence, KS 66045
    Image Credits
     

     

    Browse

    All of KU ScholarWorksCommunities & CollectionsThis Collection

    My Account

    LoginRegister

    Statistics

    View Usage Statistics

    Contact KU ScholarWorks
    785-864-8983
    KU Libraries
    1425 Jayhawk Blvd
    Lawrence, KS 66045
    785-864-8983

    KU Libraries
    1425 Jayhawk Blvd
    Lawrence, KS 66045
    Image Credits
     

     

    The University of Kansas
      Contact KU ScholarWorks
    Lawrence, KS | Maps
     
    • Academics
    • Admission
    • Alumni
    • Athletics
    • Campuses
    • Giving
    • Jobs

    The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.

     Contact KU
    Lawrence, KS | Maps