In this paper, we present a nearly tight analysis of the encoding length of the Burrows-Wheeler Transform (bwt) that is motivated by the text indexing setting.For a text T of n symbols drawn from an alphabet , our encoding scheme achieves bounds in terms of the hth-order empirical entropy Hh of the text, and takes linear time for encoding and decoding. We also describe a lower bound on the encoding length of the bwt that constructs an in nite (non-trivial) class of texts that are among the hardest to compress using the bwt. We then show that our upper bound encoding length is nearly tight with this lower bound
for the class of texts we described.
In designing our bwt encoding and its lower
bound, we also address the t-subset problem; here,
the goal is to store a subset of t items drawn from a
R. Grossi, A. Gupta, and J. S. Vitter. “Nearly Tight Bounds on the Encoding Length of the Burrows-Wheeler Transform,” in preparation. An extended abstract appears in Proceedings of the 5th Workshop on Analytical Algorithmics and Combinatorics (ANALCO ’08), San Francisco, CA, January 2008, 191–202
Items in KU ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
We want to hear from you! Please
share your stories
about how Open Access to this item benefits YOU.
The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.