ATTENTION: The software behind KU ScholarWorks is being upgraded to a new version. Starting July 15th, users will not be able to log in to the system, add items, nor make any changes until the new version is in place at the end of July. Searching for articles and opening files will continue to work while the system is being updated.
If you have any questions, please contact Marianne Reed at mreed@ku.edu .
A Simple and Efficient Parallel Disk Mergesort
dc.contributor.author | Barve, Rakesh | |
dc.contributor.author | Vitter, Jeffrey Scott | |
dc.date.accessioned | 2011-03-21T20:00:35Z | |
dc.date.available | 2011-03-21T20:00:35Z | |
dc.date.issued | 2002 | |
dc.identifier.citation | R. D. Barve and J. S. Vitter. “A Simple and Efficient Parallel Disk Mergesort,” invited paper in special issue on parallel algorithms and architectures in Theory of Computing Systems, 35(2), March/April 2002, 189–215. An extended abstract appears in Proceedings of the 11th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA ’99), St. Malo, France, June 1999, 232–241. http://dx.doi.org/10.1007/s00224-002-1031-0 | |
dc.identifier.uri | http://hdl.handle.net/1808/7223 | |
dc.description | The original publication is available at www.springerlink.com | |
dc.description.abstract | External sorting|the process of sorting a le that is too large to t into the computer's internal memory and must be stored externally on disks|is a fundamental subroutine in database sys- tems [Gra93, IBM90]. Of prime importance are techniques that use multiple disks in parallel in order to speed up the performance of external sorting. The simple randomized merging (SRM) mergesort algorithm proposed by Barve et al. [BGV97] is the rst parallel disk sorting algorithm that requires a provably optimal number of passes and that is fast in practice. Knuth [Knu98, Section 5.4.9] recently identi ed SRM (which he calls \randomized striping") as the method of choice for sorting with parallel disks. In this paper we present an eÆcient implementation of SRM, based upon novel and elegant data structures. We give a new implementation for SRM's lookahead forecasting technique for parallel prefetching and its forecast and ush technique for bu er management. Our techniques amount to a signi cant improvement in the way SRM carries out the parallel, independent disk accesses necessary to read blocks of input runs eÆciently during external merging. Our implementation is based on synchronous parallel I/O primitives provided by the TPIE programming environment [TPI99]; whenever our program issues an I/O read (write) operation, one block of data is synchronously read from (written to) each disk in parallel. We compare the performance of SRM over a wide range of input sizes with that of disk-striped mergesort (DSM), which is widely used in practice. DSM consists of a standard mergesort in conjunction with striped I/O for parallel disk access. SRM merges together signi cantly more runs at a time compared with DSM, and thus it requires fewer merge passes. We demonstrate in practical scenarios that even though the streaming speeds for merging with DSM are a little higher than those for SRM (since DSM merges fewer runs at a time), sorting using SRM is often signi cantly faster than with DSM (since SRM requires fewer passes). The techniques in this paper can be generalized to meet the load-balancing requirements of other applications using parallel disks, including distribution sort and multiway partitioning of a le into several other les. Since both parallel disk merging and multimedia processing deal with streams that get \consumed" at nonuniform and partially predictable rates, our techniques for lookahead based upon forecasting data may have relevance in video server applications. | |
dc.language.iso | en_US | |
dc.publisher | Springer Verlag | |
dc.title | A Simple and Efficient Parallel Disk Mergesort | |
dc.type | Article | |
kusw.kuauthor | Vitter, Jeffrey Scott | |
kusw.oastatus | fullparticipation | |
dc.identifier.doi | 10.1007/s00224-002-1031-0 | |
kusw.oaversion | Scholarly/refereed, author accepted manuscript | |
kusw.oapolicy | This item meets KU Open Access policy criteria. | |
dc.rights.accessrights | openAccess |