A Simple and Efficient Parallel Disk Mergesort

Barve, Rakesh; Vitter, Jeffrey Scott

dc.contributor.author	Barve, Rakesh
dc.contributor.author	Vitter, Jeffrey Scott
dc.date.accessioned	2011-03-21T20:00:35Z
dc.date.available	2011-03-21T20:00:35Z
dc.date.issued	2002
dc.identifier.citation	R. D. Barve and J. S. Vitter. “A Simple and Efficient Parallel Disk Mergesort,” invited paper in special issue on parallel algorithms and architectures in Theory of Computing Systems, 35(2), March/April 2002, 189–215. An extended abstract appears in Proceedings of the 11th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA ’99), St. Malo, France, June 1999, 232–241. http://dx.doi.org/10.1007/s00224-002-1031-0
dc.identifier.uri	http://hdl.handle.net/1808/7223
dc.description	The original publication is available at www.springerlink.com
dc.description.abstract	External sorting\|the process of sorting a le that is too large to t into the computer's internal memory and must be stored externally on disks\|is a fundamental subroutine in database sys- tems [Gra93, IBM90]. Of prime importance are techniques that use multiple disks in parallel in order to speed up the performance of external sorting. The simple randomized merging (SRM) mergesort algorithm proposed by Barve et al. [BGV97] is the rst parallel disk sorting algorithm that requires a provably optimal number of passes and that is fast in practice. Knuth [Knu98, Section 5.4.9] recently identi ed SRM (which he calls \randomized striping") as the method of choice for sorting with parallel disks. In this paper we present an eÆcient implementation of SRM, based upon novel and elegant data structures. We give a new implementation for SRM's lookahead forecasting technique for parallel prefetching and its forecast and ush technique for bu er management. Our techniques amount to a signi cant improvement in the way SRM carries out the parallel, independent disk accesses necessary to read blocks of input runs eÆciently during external merging. Our implementation is based on synchronous parallel I/O primitives provided by the TPIE programming environment [TPI99]; whenever our program issues an I/O read (write) operation, one block of data is synchronously read from (written to) each disk in parallel. We compare the performance of SRM over a wide range of input sizes with that of disk-striped mergesort (DSM), which is widely used in practice. DSM consists of a standard mergesort in conjunction with striped I/O for parallel disk access. SRM merges together signi cantly more runs at a time compared with DSM, and thus it requires fewer merge passes. We demonstrate in practical scenarios that even though the streaming speeds for merging with DSM are a little higher than those for SRM (since DSM merges fewer runs at a time), sorting using SRM is often signi cantly faster than with DSM (since SRM requires fewer passes). The techniques in this paper can be generalized to meet the load-balancing requirements of other applications using parallel disks, including distribution sort and multiway partitioning of a le into several other les. Since both parallel disk merging and multimedia processing deal with streams that get \consumed" at nonuniform and partially predictable rates, our techniques for lookahead based upon forecasting data may have relevance in video server applications.
dc.language.iso	en_US
dc.publisher	Springer Verlag
dc.title	A Simple and Efficient Parallel Disk Mergesort
dc.type	Article
kusw.kuauthor	Vitter, Jeffrey Scott
kusw.oastatus	fullparticipation
dc.identifier.doi	10.1007/s00224-002-1031-0
kusw.oaversion	Scholarly/refereed, author accepted manuscript
kusw.oapolicy	This item meets KU Open Access policy criteria.
dc.rights.accessrights	openAccess

Files in this item

Name:: BaV00.SRM.pdf
Size:: 328.8Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.