Learning with Low-Quality Data: Multi-View Semi-Supervised Learning with Missing Views

Quanz, Brian

dc.contributor.advisor	Huan, Jun
dc.contributor.author	Quanz, Brian
dc.date.accessioned	2012-10-27T10:23:13Z
dc.date.available	2012-10-27T10:23:13Z
dc.date.issued	2012-08-31
dc.date.submitted	2012
dc.identifier.other	http://dissertations.umi.com/ku:12347
dc.identifier.uri	http://hdl.handle.net/1808/10207
dc.description.abstract	The focus of this thesis is on learning approaches for what we call ``low-quality data'' and in particular data in which only small amounts of labeled target data is available. The first part provides background discussion on low-quality data issues, followed by preliminary study in this area. The remainder of the thesis focuses on a particular scenario: multi-view semi-supervised learning. Multi-view learning generally refers to the case of learning with data that has multiple natural views, or sets of features, associated with it. Multi-view semi-supervised learning methods try to exploit the combination of multiple views along with large amounts of unlabeled data in order to learn better predictive functions when limited labeled data is available. However, lack of complete view data limits the applicability of multi-view semi-supervised learning to real world data. Commonly, one data view is readily and cheaply available, but additionally views may be costly or only available in some cases. This thesis work aims to make multi-view semi-supervised learning approaches more applicable to real world data specifically by addressing the issue of missing views through both feature generation and active learning, and addressing the issue of model selection for semi-supervised learning with limited labeled data. This thesis introduces a unified approach for handling missing view data in multi-view semi-supervised learning tasks, which applies to both data with completely missing additional views and data only missing views in some instances. The idea is to learn a feature generation function mapping one view to another with the mapping biased to encourage the features generated to be useful for multi-view semi-supervised learning algorithms. The mapping is then used to fill in views as pre-processing. Unlike previously proposed single-view multi-view learning approaches, the proposed approach is able to take advantage of additional view data when available, and for the case of partial view presence is the first feature-generation approach specifically designed to take into account the multi-view semi-supervised learning aspect. The next component of this thesis is the analysis of an active view completion scenario. In some tasks, it is possible to obtain missing view data for a particular instance, but with some associated cost. Recent work has shown an active selection strategy can be more effective than a random one. In this thesis, a better understanding of active approaches is sought, and it is demonstrated that the effectiveness of an active selection strategy over a random one can depend on the relationship between the views. Finally, an important component of making multi-view semi-supervised learning applicable to real world data is the task of model selection, an open problem which is often avoided entirely in previous work. For cases of very limited labeled training data the commonly used cross-validation approach can become ineffective. This thesis introduces a re-training alternative to the method-dependent approaches similar in motivation to cross-validation, that involves generating new training and test data by sampling from the large amount of unlabeled data and estimated conditional probabilities for the labels. The proposed approaches are evaluated on a variety of multi-view semi-supervised learning data sets, and the experimental results demonstrate their efficacy.
dc.format.extent	220 pages
dc.language.iso	en
dc.publisher	University of Kansas
dc.rights	This item is protected by copyright and unless otherwise specified the copyright of this thesis/dissertation is held by the author.
dc.subject	Computer science
dc.subject	Data mining
dc.subject	Low-quality data
dc.subject	Machine learning
dc.subject	Missing data
dc.subject	Multi-view learning
dc.subject	Semi-supervised learning
dc.title	Learning with Low-Quality Data: Multi-View Semi-Supervised Learning with Missing Views
dc.type	Dissertation
dc.contributor.cmtemember	Chen, Xue-wen
dc.contributor.cmtemember	Frost, Victor
dc.contributor.cmtemember	Luo, Bo
dc.contributor.cmtemember	Potetz, Brian
dc.contributor.cmtemember	Talata, Zsolt
dc.thesis.degreeDiscipline	Electrical Engineering & Computer Science
dc.thesis.degreeLevel	Ph.D.
kusw.oastatus	na
kusw.oapolicy	This item does not meet KU Open Access policy criteria.
kusw.bibid	8085773
dc.rights.accessrights	openAccess

Files in this item

Name:: Quanz_ku_0099D_12347_DATA_1.pdf
Size:: 1.864Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.