ATTENTION: The software behind KU ScholarWorks is being upgraded to a new version. Starting July 15th, users will not be able to log in to the system, add items, nor make any changes until the new version is in place at the end of July. Searching for articles and opening files will continue to work while the system is being updated.
If you have any questions, please contact Marianne Reed at mreed@ku.edu .
Merging of Numerical Intervals in Entropy-Based Discretization
dc.date.accessioned | 2019-11-15T15:14:34Z | |
dc.date.available | 2019-11-15T15:14:34Z | |
dc.date.issued | 2018-11-16 | |
dc.identifier.citation | Grzymala-Busse, J.W.; Mroczek, T. Merging of Numerical Intervals in Entropy-Based Discretization. Entropy 2018, 20, 880. | en_US |
dc.identifier.uri | http://hdl.handle.net/1808/29764 | |
dc.description.abstract | As previous research indicates, a multiple-scanning methodology for discretization of numerical datasets, based on entropy, is very competitive. Discretization is a process of converting numerical values of the data records into discrete values associated with numerical intervals defined over the domains of the data records. In multiple-scanning discretization, the last step is the merging of neighboring intervals in discretized datasets as a kind of postprocessing. Our objective is to check how the error rate, measured by tenfold cross validation within the C4.5 system, is affected by such merging. We conducted experiments on 17 numerical datasets, using the same setup of multiple scanning, with three different options for merging: no merging at all, merging based on the smallest entropy, and merging based on the biggest entropy. As a result of the Friedman rank sum test (5% significance level) we concluded that the differences between all three approaches are statistically insignificant. There is no universally best approach. Then, we repeated all experiments 30 times, recording averages and standard deviations. The test of the difference between averages shows that, for a comparison of no merging with merging based on the smallest entropy, there are statistically highly significant differences (with a 1% significance level). In some cases, the smaller error rate is associated with no merging, in some cases the smaller error rate is associated with merging based on the smallest entropy. A comparison of no merging with merging based on the biggest entropy showed similar results. So, our final conclusion was that there are highly significant differences between no merging and merging, depending on the dataset. The best approach should be chosen by trying all three approaches. | en_US |
dc.publisher | MDPI | en_US |
dc.rights | c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). | en_US |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | en_US |
dc.subject | data mining | en_US |
dc.subject | discretization | en_US |
dc.subject | numerical attributes | en_US |
dc.subject | entropy | en_US |
dc.title | Merging of Numerical Intervals in Entropy-Based Discretization | en_US |
dc.type | Article | en_US |
kusw.kuauthor | Grzymala-Busse, Jerzy W. | |
kusw.kudepartment | Electrical Engineering and Computer Science | en_US |
dc.identifier.doi | 10.3390/e20110880 | en_US |
kusw.oaversion | Scholarly/refereed, author accepted manuscript | en_US |
kusw.oapolicy | This item meets KU Open Access policy criteria. | en_US |
dc.rights.accessrights | openAccess | en_US |
Files in this item
This item appears in the following Collection(s)
Except where otherwise noted, this item's license is described as: c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).