Loading...
Thumbnail Image
Publication

How Duplicates Affect the Error Rate of Datasets During Validation

Gargeshwari Mahadevaswamy, NAGABHUSHANA
Citations
Altmetric:
Abstract
In data mining, duplicate data plays a huge role in deciding the set of rules. In this project, an analysis has been made on finding the impact of duplicates in the input data set on the rule set. The effect of duplicates is being analyzed using the error rate factor. Error rate is calculated by comparing the obtained rule set against the testing part of input data. The results of experiments have shown decrement of error rate with the increase of percentage of duplicates in the input data set, which demonstrates that the duplicate data plays a crucial role in validation process of machine learning. LEM2 algorithm and rule checker application have been implemented as a part of project. LEM2 algorithm is used to induce the rule set for the given input data set and rule checker application is used to calculate the error rate.
Description
This work was submitted to the graduate degree program in Electrical Engineering and Computer Science and the Graduate Faculty of the University of Kansas in partial fulfillment of the requirements for the degree of Master of Science.
Date
2016-05-31
Journal Title
Journal ISSN
Volume Title
Publisher
University of Kansas
Research Projects
Organizational Units
Journal Issue
Keywords
Duplicates, Data mining, Data
Citation
DOI
Embedded videos