Loading...
Thumbnail Image
Publication

Efficient Bundle Sorting

Matias, Yossi
Segal, Eran
Vitter, Jeffrey Scott
Citations
Altmetric:
Abstract
Many data sets to be sorted consist of a limited number of distinct keys. Sorting such data sets can be thought of as bundling together identical keys and having the bundles placed in order; we therefore denote this as bundle sorting. We describe an efficient algorithm for bundle sorting in external memory, which requires at most c(N/B) logM/Bk disk accesses, where N is the number of keys, M is the size of internal memory, k is the number of distinct keys, B is the transfer block size, and 2 < c < 4. For moderately sized k, this bound circumvents the Theta((N/B) logM/B (N/B)) I/O lower bound known for general sorting. We show that our algorithm is optimal by proving a matching lower bound for bundle sorting. The improved running time of bundle sorting over general sorting can be significant in practice, as demonstrated by experimentation. An important feature of the new algorithm is that it is executed "in-place," requiring no additional disk space.
Description
This is the published version. Copyright © 2006 Society for Industrial and Applied Mathematics
Date
2006
Journal Title
Journal ISSN
Volume Title
Publisher
Society for Industrial and Applied Mathematics
Research Projects
Organizational Units
Journal Issue
Keywords
Citation
Matias, Yossi, Eran Segal, and Jeffrey Scott Vitter. "Efficient Bundle Sorting." SIAM J. Comput. SIAM Journal on Computing 36.2 (2006): 394-410. http://dx.doi.org/10.1137/S0097539704446554
Embedded videos