Show simple item record

dc.contributor.advisorGrzymala-Busse, Jerzy W.
dc.contributor.authorRao, H. Shanker
dc.date.accessioned2015-09-07T20:58:34Z
dc.date.available2015-09-07T20:58:34Z
dc.date.issued2014-12-31
dc.date.submitted2014
dc.identifier.otherhttp://dissertations.umi.com/ku:13689
dc.identifier.urihttp://hdl.handle.net/1808/18382
dc.description.abstractRapid development of high throughput technologies and database management systems has made it possible to produce and store large amount of data. However, making sense of big data and discovering knowledge from it is a compounding challenge. Generally, data mining techniques search for information in datasets and express gained knowledge in the form of trends, regularities, patterns or rules. Rules are frequently identified automatically by a technique called rule induction, which is the most important technique in data mining and machine learning and it was developed primarily to handle symbolic data. However, real life data often contain numerical attributes and therefore, in order to fully utilize the power of rule induction techniques, an essential preprocessing step of converting numeric data into symbolic data called discretization is employed in data mining. Here we present two entropy based discretization techniques known as dominant attribute approach and multiple scanning approach, respectively. These approaches were implemented as two explicit algorithms in a JAVA programming language and experiments were conducted by applying each algorithm separately on seventeen well known numerical data sets. The resulting discretized data sets were used for rule induction by LEM2 or Learning from Examples Module 2 algorithm. For each dataset in multiple scanning approach, experiments were repeated with incremental scans until interval counts were stabilized. Preliminary results from this study indicated that multiple scanning approach performed better than dominant attribute approach in terms of producing comparatively smaller and simpler rule sets.
dc.format.extent81 pages
dc.language.isoen
dc.publisherUniversity of Kansas
dc.rightsCopyright held by the author.
dc.subjectComputer science
dc.titleDOMINANT ATTRIBUTE AND MULTIPLE SCANNING APPROACHES FOR DISCRETIZATION OF NUMERICAL ATTRIBUTES
dc.typeThesis
dc.contributor.cmtememberCaragea, Doina
dc.contributor.cmtememberAlexander, Perry
dc.thesis.degreeDisciplineElectrical Engineering & Computer Science
dc.thesis.degreeLevelM.S.
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record