Show simple item record

dc.contributor.advisorChen, Xue-wen
dc.contributor.authorJeong, Jong Cheol
dc.date.accessioned2012-03-01T20:16:35Z
dc.date.available2012-03-01T20:16:35Z
dc.date.issued2011-12-31
dc.date.submitted2011
dc.identifier.otherhttp://dissertations.umi.com/ku:11862
dc.identifier.urihttp://hdl.handle.net/1808/8783
dc.description.abstractGenome-sequencing projects with advanced technologies have rapidly increased the amount of protein sequences, and demands for identifying protein interaction sites are significantly increased due to its impact on understanding cellular process, biochemical events and drug design studies. However, the capacity of current wet laboratory techniques is not enough to handle the exponentially growing protein sequence data; therefore, sequence based predictive methods identifying protein interaction sites have drawn increasing interest. In this article, a new predictive model which can be valuable as a first approach for guiding experimental methods investigating protein-protein interactions and localizing the specific interface residues is proposed. The proposed method extracts a wide range of features from protein sequences. Random forests framework is newly redesigned to effectively utilize these features and the problems of imbalanced data classification commonly encountered in binding site predictions. The method is evaluated with 2,829 interface residues and 24,616 non-interface residues extracted from 99 polypeptide chains in the Protein Data Bank. The experimental results show that the proposed method performs significantly better than two other conventional predictive methods and can reliably predict residues involved in protein interaction sites. As blind tests, the proposed method predicts interaction sites and constructs three protein complexes: the DnaK molecular chaperone system, 1YUW and 1DKG, which provide new insight into the sequence-function relationship. Finally, the robustness of the proposed method is assessed by evaluating the performances obtained from four different ensemble methods.
dc.format.extent114 pages
dc.language.isoen
dc.publisherUniversity of Kansas
dc.rightsThis item is protected by copyright and unless otherwise specified the copyright of this thesis/dissertation is held by the author.
dc.subjectBioinformatics
dc.subjectComputer science
dc.subjectBiomedical engineering
dc.subjectInterface residues
dc.subjectMachine learning
dc.subjectProperties of amino acids
dc.subjectProtein binding
dc.subjectProtein-protein interactions
dc.subjectProtein sequence analysis
dc.titleA NEW METHODOLOGY FOR IDENTIFYING INTERFACE RESIDUES INVOLVED IN BINDING PROTEIN COMPLEXES
dc.typeThesis
dc.contributor.cmtememberHuan, Luke
dc.contributor.cmtememberLuo, Bo
dc.thesis.degreeDisciplineElectrical Engineering & Computer Science
dc.thesis.degreeLevelM.S.
kusw.oastatusna
dc.identifier.orcidhttps://orcid.org/0000-0002-5024-2927
kusw.oapolicyThis item does not meet KU Open Access policy criteria.
kusw.bibid7643363
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record