ATTENTION: The software behind KU ScholarWorks is being upgraded to a new version. Starting July 15th, users will not be able to log in to the system, add items, nor make any changes until the new version is in place at the end of July. Searching for articles and opening files will continue to work while the system is being updated. If you have any questions, please contact Marianne Reed at mreed@ku.edu .

Show simple item record

dc.contributor.advisorHuan, Jun
dc.contributor.authorFei, Hongliang
dc.date.accessioned2012-11-26T22:35:58Z
dc.date.available2012-11-26T22:35:58Z
dc.date.issued2012-08-31
dc.date.submitted2012
dc.identifier.otherhttp://dissertations.umi.com/ku:12438
dc.identifier.urihttp://hdl.handle.net/1808/10466
dc.description.abstractStructured data is accumulated rapidly in many applications, e.g. Bioinformatics, Cheminformatics, social network analysis, natural language processing and text mining. Designing and analyzing algorithms for handling these large collections of structured data has received significant interests in data mining and machine learning communities, both in the input and output domain. However, it is nontrivial to adopt traditional machine learning algorithms, e.g. SVM, linear regression to structured data. For one thing, the structural information in the input domain and output domain is ignored if applying the normal algorithms to structured data. For another, the major challenge in learning from many high-dimensional structured data is that input/output domain can contain tens of thousands even larger number of features and labels. With the high dimensional structured input space and/or structured output space, learning a low dimensional and consistent structured predictive function is important for both robustness and interpretability of the model. In this dissertation, we will present a few machine learning models that learn from the data with structured input features and structured output tasks. For learning from the data with structured input features, I have developed structured sparse boosting for graph classification, structured joint sparse PCA for anomaly detection and localization. Besides learning from structured input, I also investigated the interplay between structured input and output under the context of multi-task learning. In particular, I designed a multi-task learning algorithms that performs structured feature selection & task relationship Inference. We will demonstrate the applications of these structured models on subgraph based graph classification, networked data stream anomaly detection/localization, multiple cancer type prediction, neuron activity prediction and social behavior prediction. Finally, through my intern work at IBM T.J. Watson Research, I will demonstrate how to leverage structural information from mobile data (e.g. call detail record and GPS data) to derive important places from people's daily life for transit optimization and urban planning.
dc.format.extent185 pages
dc.language.isoen
dc.publisherUniversity of Kansas
dc.rightsThis item is protected by copyright and unless otherwise specified the copyright of this thesis/dissertation is held by the author.
dc.subjectComputer science
dc.subjectInformation science
dc.subjectAnomaly detection
dc.subjectClassification
dc.subjectData mining
dc.subjectMachine learning
dc.subjectStructrual sparsity
dc.subjectStructured data
dc.titleLearning from Structured Data with High Dimensional Structured Input and Output Domain
dc.typeDissertation
dc.contributor.cmtememberLuo, Bo
dc.contributor.cmtememberPotetz, Brian
dc.contributor.cmtememberAgah, Arvin
dc.contributor.cmtememberXu, Hongguo
dc.thesis.degreeDisciplineElectrical Engineering & Computer Science
dc.thesis.degreeLevelPh.D.
kusw.oastatusna
kusw.oapolicyThis item does not meet KU Open Access policy criteria.
kusw.bibid8085805
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record