Loading...
Thumbnail Image
Publication

A Comparison of Four Approaches to Discretization Based on Entropy †

Grzymala-Busse, Jerzy W.
Citations
Altmetric:
Abstract
We compare four discretization methods, all based on entropy: the original C4.5 approach to discretization, two globalized methods, known as equal interval width and equal frequency per interval, and a relatively new method for discretization called multiple scanning using the C4.5 decision tree generation system. The main objective of our research is to compare the quality of these four methods using two criteria: an error rate evaluated by ten-fold cross-validation and the size of the decision tree generated by C4.5. Our results show that multiple scanning is the best discretization method in terms of the error rate and that decision trees generated from datasets discretized by multiple scanning are simpler than decision trees generated directly by C4.5 or generated from datasets discretized by both globalized discretization methods.
Description
Date
2016-02-25
Journal Title
Journal ISSN
Volume Title
Publisher
MDPI
Research Projects
Organizational Units
Journal Issue
Keywords
Data mining, Discretization, Numerical attributes, Entropy
Citation
Grzymala-Busse, J. W., & Mroczek, T. (2016). A comparison of four approaches to discretization based on entropy. Entropy, 18(3), 69.
Embedded videos