A comparison of sixteen classification strategies of rule induction from incomplete data using the MLEM2 algorithm

Nelakurthi, Venkata Siva Pavan Kumar Kumar

dc.contributor.advisor	Busse, Jerzy Grzymala
dc.contributor.author	Nelakurthi, Venkata Siva Pavan Kumar Kumar
dc.date.accessioned	2020-08-14T20:24:38Z
dc.date.available	2020-08-14T20:24:38Z
dc.date.issued	2020-05-31
dc.date.submitted	2020
dc.identifier.other	http://dissertations.umi.com/ku:17172
dc.identifier.uri	http://hdl.handle.net/1808/30607
dc.description.abstract	In data mining, rule induction is a process of extracting formal rules from decision tables, where the later are the tabulated observations, which typically consist of few attributes, i.e., independent variables and a decision, i.e., a dependent variable. Each tuple in the table is considered as a case, and there could be n number of cases for a table specifying each observation. The efficiency of the rule induction depends on how many cases are successfully characterized by the generated set of rules, i.e., ruleset. There are different rule induction algorithms, such as LEM1, LEM2, MLEM2. In the real world, datasets will be imperfect, inconsistent, and incomplete. MLEM2 is an efficient algorithm to deal with such sorts of data, but the quality of rule induction largely depends on the chosen classification strategy. We tried to compare the 16 classification strategies of rule induction using MLEM2 on incomplete data. For this, we implemented MLEM2 for inducing rulesets based on the selection of the type of approximation, i.e., singleton, subset or concept, and the value of alpha for calculating probabilistic approximations. A program called rule checker is used to calculate the error rate based on the classification strategy specified. To reduce the anomalies, we used ten-fold cross-validation to measure the error rate for each classification. Error rates for the above strategies are being calculated for different datasets, compared, and presented.
dc.format.extent	45 pages
dc.language.iso	en
dc.publisher	University of Kansas
dc.rights	Copyright held by the author.
dc.subject	Computer science
dc.title	A comparison of sixteen classification strategies of rule induction from incomplete data using the MLEM2 algorithm
dc.type	Thesis
dc.contributor.cmtemember	Wang, Guanghui
dc.contributor.cmtemember	Kulkarni, Prasad
dc.thesis.degreeDiscipline	Electrical Engineering & Computer Science
dc.thesis.degreeLevel	M.S.
dc.identifier.orcid	https://orcid.org/0000-0002-5829-6946
dc.rights.accessrights	openAccess

Files in this item

Name:: Nelakurthi_ku_0099M_17172_DATA ...
Size:: 320.0Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.