Show simple item record

dc.contributor.advisorChen, Xue-wen
dc.contributor.advisorLuo, Bo
dc.contributor.authorJeong, Jong Cheol
dc.date.accessioned2013-07-14T15:47:07Z
dc.date.available2013-07-14T15:47:07Z
dc.date.issued2013-05-31
dc.date.submitted2013
dc.identifier.otherhttp://dissertations.umi.com/ku:12808
dc.identifier.urihttp://hdl.handle.net/1808/11467
dc.description.abstract1.2 billion users in facebook, 17 million articles in Wikipedia, and 190 million tweets per day have demanded significant increase of information processing through Internet in recent years. Similarly life sciences and bioinformatics also have faced issues of processing Big data due to the explosion of publicly available genomic information resulted from the Human Genome Project (HGP) and the increasing usage of high throughput technology. HGP was completed in 2003 and resulted in identifying 20,000-25,000 genes in human DNA and determining the sequences of three billion human base pairs. The information requires huge amount of data storage and becomes difficult to process using on-hand database management tools or traditional data processing applications. This thesis introduces new method, Biological and Statistical Mean (BSM) score to calculate functional similarity between gene products (GPs) that can help to extract biologically relevant and statistically robust information from large-scale biomedical, genomic and proteomic data sources. BSM score is defined by 16 different scoring matrices derived from principles of multi-view learning in machine learning algorithm and five different databases including Gene Ontology, UniProt, SCOP, CATH, and KUPS. The proposed method also shows how diverse databases and principles in machine learning theory can be integrated into a simple scoring function, and how the simple concept can give significant impact on the studies in biomedical and human life sciences. The comprehensive evaluations and performance comparisons with other conventional methods show that BSM score clearly outperforms other methods in terms of sensitivity of clustering similarity functional groups and coverage of identifying related genes. As a part of potential applications handling large amount of diverse data sources in medical domain, this thesis introduces similarity-based drug target identification and disease networks using BSM scores. Application of BSM score is freely available through http://www.ittc.ku.edu/chenlab/goal/
dc.format.extent273 pages
dc.language.isoen
dc.publisherUniversity of Kansas
dc.rightsThis item is protected by copyright and unless otherwise specified the copyright of this thesis/dissertation is held by the author.
dc.subjectBioinformatics
dc.subjectComputer science
dc.subjectInformation science
dc.subjectDisease relation
dc.subjectDrug target identification
dc.subjectFunctional similarity
dc.subjectGene ontology
dc.subjectMachine learning
dc.subjectSemantic distance
dc.titleNew Methodology for Measuring Semantic Functional Similarity Based on Bidirectional Integration
dc.typeDissertation
dc.contributor.cmtememberLuo, Bo
dc.contributor.cmtememberChen, Xue-wen
dc.contributor.cmtememberAgah, Arvin
dc.contributor.cmtememberGrzymala-Busse, Jerzy
dc.contributor.cmtememberHuan, Jun
dc.contributor.cmtememberIm, Wonpil
dc.thesis.degreeDisciplineBioengineering
dc.thesis.degreeLevelPh.D.
kusw.oastatusna
dc.identifier.orcidhttps://orcid.org/0000-0002-5024-2927
kusw.oapolicyThis item does not meet KU Open Access policy criteria.
kusw.bibid8086022
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record