Efficient Bundle Sorting

View/ Open
Issue Date
2006Author
Matias, Yossi
Segal, Eran
Vitter, Jeffrey Scott
Publisher
Society for Industrial and Applied Mathematics
Type
Article
Article Version
Scholarly/refereed, publisher version
Metadata
Show full item recordAbstract
Many data sets to be sorted consist of a limited number of distinct keys. Sorting such data sets can be thought of as bundling together identical keys and having the bundles placed in order; we therefore denote this as bundle sorting. We describe an efficient algorithm for bundle sorting
in external memory, which requires at most c(N/B) logM/B k disk accesses, where N is the number
of keys, M is the size of internal memory, k is the number of distinct keys, B is the transfer block
size, and 2 < c < 4. For moderately sized k, this bound circumvents the Θ((N/B) logM/B(N/B))
I/O lower bound known for general sorting. We show that our algorithm is optimal by proving a
matching lower bound for bundle sorting. The improved running time of bundle sorting over general
sorting can be significant in practice, as demonstrated by experimentation. An important feature of the new algorithm is that it is executed “in-place,” requiring no additional disk space.
Description
AMS subject classification. 68W01
DOI. 10.1137/S0097539704446554
Collections
Citation
Y. Matias, E. Segal, and J. S. Vitter. “Efficient Bundle Sorting,” SIAM Journal on Computing, 36(2), 2006, 394–410. An extended abstract appears in Proceedings of the 11th Annual SIAM/ACM Symposium on Discrete Algorithms (SODA ’00), San Francisco, CA, January 2000, 839–848. http://dx.doi.org/10.1137/S0097539704446554
Items in KU ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
We want to hear from you! Please share your stories about how Open Access to this item benefits YOU.