Show simple item record

dc.contributor.authorChiu, Sheng-Yuan
dc.contributor.authorHon, Wing-Kai
dc.contributor.authorShah, Rahul
dc.contributor.authorVitter, Jeffrey Scott
dc.date.accessioned2011-03-21T21:37:51Z
dc.date.available2011-03-21T21:37:51Z
dc.date.issued2010
dc.identifier.citationS.-Y. Chiu, W.-K. Hon, R. Shah, and J. S. Vitter. “I/O-efficient Compressed Text Indexes: From Theory to Practice,” in preparation. An extended abstract appears in Proceedings of the 2010 IEEE Data Compression Conference (DCC ’10), Snowbird, UT, March 2010. http://dx.doi.org/10.1109/DCC.2010.45
dc.identifier.urihttp://hdl.handle.net/1808/7234
dc.description(c) 2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
dc.description.abstractPattern matching on text data has been a fundamental field of Computer Science for nearly 40 years. Databases supporting full-text indexing functionality on text data are now widely used by biologists. In the theoretical literature, the most popular internal-memory index structures are the suffix trees and the suffix arrays, and the most popular external-memory index structure is the string B-tree. However, the practical applicability of these indexes has been limited mainly because of their space consumption and I/O issues. These structures use a lot more space (almost 20 to 50 times more) than the original text data and are often disk-resident. Ferragina and Manzini (2005) and Grossi and Vitter (2005) gave the first compressed text indexes with efficient query times in the internal-memory model. Recently, Chien et al (2008) presented a compact text index in the external memory based on the concept of Geometric Burrows-Wheeler Transform. They also presented lower bounds which suggested that it may be hard to obtain a good index structure in the external memory. In this paper, we investigate this issue from a practical point of view. On the positive side we show an external-memory text indexing structure (based on R-trees and KD-trees) that saves space by about an order of magnitude as compared to the standard String B-tree. While saving space, these structures also maintain a comparable I/O efficiency to that of String B-tree. We also show various space vs I/O efficiency trade-offs for our structures.
dc.language.isoen_US
dc.publisherIEEE
dc.titleI/O-efficient Compressed Text Indexes: From Theory to Practice (extended abstract)
dc.typeArticle
kusw.kuauthorVitter, Jeffrey Scott
kusw.oastatusfullparticipation
dc.identifier.doi10.1109/DCC.2010.45
kusw.oaversionScholarly/refereed, author accepted manuscript
kusw.oapolicyThis item meets KU Open Access policy criteria.
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record