Show simple item record

dc.date.accessioned2019-11-15T15:14:34Z
dc.date.available2019-11-15T15:14:34Z
dc.date.issued2018-11-16
dc.identifier.citationGrzymala-Busse, J.W.; Mroczek, T. Merging of Numerical Intervals in Entropy-Based Discretization. Entropy 2018, 20, 880.en_US
dc.identifier.urihttp://hdl.handle.net/1808/29764
dc.description.abstractAs previous research indicates, a multiple-scanning methodology for discretization of numerical datasets, based on entropy, is very competitive. Discretization is a process of converting numerical values of the data records into discrete values associated with numerical intervals defined over the domains of the data records. In multiple-scanning discretization, the last step is the merging of neighboring intervals in discretized datasets as a kind of postprocessing. Our objective is to check how the error rate, measured by tenfold cross validation within the C4.5 system, is affected by such merging. We conducted experiments on 17 numerical datasets, using the same setup of multiple scanning, with three different options for merging: no merging at all, merging based on the smallest entropy, and merging based on the biggest entropy. As a result of the Friedman rank sum test (5% significance level) we concluded that the differences between all three approaches are statistically insignificant. There is no universally best approach. Then, we repeated all experiments 30 times, recording averages and standard deviations. The test of the difference between averages shows that, for a comparison of no merging with merging based on the smallest entropy, there are statistically highly significant differences (with a 1% significance level). In some cases, the smaller error rate is associated with no merging, in some cases the smaller error rate is associated with merging based on the smallest entropy. A comparison of no merging with merging based on the biggest entropy showed similar results. So, our final conclusion was that there are highly significant differences between no merging and merging, depending on the dataset. The best approach should be chosen by trying all three approaches.en_US
dc.publisherMDPIen_US
dc.rightsc 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).en_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.subjectdata miningen_US
dc.subjectdiscretizationen_US
dc.subjectnumerical attributesen_US
dc.subjectentropyen_US
dc.titleMerging of Numerical Intervals in Entropy-Based Discretizationen_US
dc.typeArticleen_US
kusw.kuauthorGrzymala-Busse, Jerzy W.
kusw.kudepartmentElectrical Engineering and Computer Scienceen_US
dc.identifier.doi10.3390/e20110880en_US
kusw.oaversionScholarly/refereed, author accepted manuscripten_US
kusw.oapolicyThis item meets KU Open Access policy criteria.en_US
dc.rights.accessrightsopenAccessen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Except where otherwise noted, this item's license is described as: c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).