Bulk Operations for Space-Partitioning Trees

View/ Open
Issue Date
2004Author
Ghanem, Thanaa M.
Shah, Rahul
Mokbel, Mohamed F.
Aref, Walid
Vitter, Jeffrey Scott
Publisher
IEEE
Type
Article
Article Version
Scholarly/refereed, author accepted manuscript
Metadata
Show full item recordAbstract
The emergence of extensible index structures, e.g.,
GiST (Generalized Search Tree) [25] and SP-GiST
(Space-Partitioning Generalized Search Tree) [3], calls
for a set of extensible algorithms to support different
operations (e.g., insertion, deletion, and search). Extensible
bulk operations (e.g., bulk loading and bulk insertion)
are of the same importance and need to be supported
in these index engines. In this paper, we propose
two extensible buffer-based algorithms for bulk operations
in the class of space-partitioning trees; a class of
hierarchical data structures that recursively decompose
the space into disjoint partitions. The main idea of
these algorithms is to build an in-memory tree of the
target space-partitioning index. Then, data items are
recursively partitioned into disk-based buffers using the
in-memory tree. Although the second algorithm is designed
for bulk insertion, it can be used in bulk loading
as well. The proposed extensible algorithms are implemented
inside SP-GiST; a framework for supporting
the class of space-partitioning trees. Both algorithms
have I/O bound 0(NH/B), where N is the number of
data items to be bulk loaded/inserted, B is the number
of tree nodes that can fit in one disk page, H is the tree
height in terms of pages after applying a clustering algorithm.
Experimental results are provided to show the
scalability and applicability of the proposed algorithms
for the class of space-partitioning trees. A comparison
of the two proposed algorithms shows that the first algorithm
performs better in case of bulk loading. However
the second algorithm is more general and can be used
for efficient bulk insertion.
Description
(c) 2004 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
Collections
Citation
T. M. Ghanem, R. Shah, M. F. Mokbel, W. G. Aref, and J. S. Vitter. “Bulk Operations for Space-Partitioning Trees,” Proceedings of the 20th Annual IEEE International Conference on Data Engineering (ICDE ’04), Boston, March–April 2004, 29–41.
Items in KU ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
We want to hear from you! Please share your stories about how Open Access to this item benefits YOU.