Loading...
Thumbnail Image
Publication

A comparison of sixteen classification strategies of rule induction from incomplete data using the MLEM2 algorithm

Nelakurthi, Venkata Siva Pavan Kumar Kumar
Citations
Altmetric:
Abstract
In data mining, rule induction is a process of extracting formal rules from decision tables, where the later are the tabulated observations, which typically consist of few attributes, i.e., independent variables and a decision, i.e., a dependent variable. Each tuple in the table is considered as a case, and there could be n number of cases for a table specifying each observation. The efficiency of the rule induction depends on how many cases are successfully characterized by the generated set of rules, i.e., ruleset. There are different rule induction algorithms, such as LEM1, LEM2, MLEM2. In the real world, datasets will be imperfect, inconsistent, and incomplete. MLEM2 is an efficient algorithm to deal with such sorts of data, but the quality of rule induction largely depends on the chosen classification strategy. We tried to compare the 16 classification strategies of rule induction using MLEM2 on incomplete data. For this, we implemented MLEM2 for inducing rulesets based on the selection of the type of approximation, i.e., singleton, subset or concept, and the value of alpha for calculating probabilistic approximations. A program called rule checker is used to calculate the error rate based on the classification strategy specified. To reduce the anomalies, we used ten-fold cross-validation to measure the error rate for each classification. Error rates for the above strategies are being calculated for different datasets, compared, and presented.
Description
Date
2020-05-31
Journal Title
Journal ISSN
Volume Title
Publisher
University of Kansas
Research Projects
Organizational Units
Journal Issue
Keywords
Computer science
Citation
DOI
Embedded videos