How Bandwidth Selection Algorithms Impact Exploratory Data Analysis Using Kernel Density Estimation
Issue Date
2013-05-31Author
Harpole, Jared Kenneth
Publisher
University of Kansas
Format
48 pages
Type
Thesis
Degree Level
M.A.
Discipline
Psychology
Rights
This item is protected by copyright and unless otherwise specified the copyright of this thesis/dissertation is held by the author.
Metadata
Show full item recordAbstract
Exploratory data analysis (EDA) is important, yet often overlooked in the social and behavioral sciences. Graphical analysis of one's data is central to EDA. A viable method of estimating and graphing the underlying density in EDA is kernel density estimation (KDE). A problem with using KDE involves correctly specifying the bandwidth to portray an accurate representation of the density. The purpose of the present study is to empirically evaluate how the choice of bandwidth in KDE influences recovery of the true density. Simulations were carried out that compared five bandwidth selection methods [Sheather-Jones plug-in (SJDP), Normal rule of thumb (NROT), Silverman's rule of thumb (SROT), Least squares cross-validation (LSCV), and Biased cross-validation (BCV)], using four true density shapes (Standard Normal, Positively Skewed, Bimodal, and Skewed Bimodal), and eight sample sizes (25, 50, 75, 100, 250, 500, 1000, 2000). Results indicated that overall SJDP performed best. However, this was specifically true for samples between 250 and 2,000. For smaller samples (N = 25 to 100), SROT performed best. Thus, either the SJDP or SROT is recommended depending on the sample size.
Collections
- Psychology Dissertations and Theses [459]
- Theses [3828]
Items in KU ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
We want to hear from you! Please share your stories about how Open Access to this item benefits YOU.