A NEW METHODOLOGY FOR IDENTIFYING INTERFACE RESIDUES INVOLVED IN BINDING PROTEIN COMPLEXES

Jeong, Jong Cheol

dc.contributor.advisor	Chen, Xue-wen
dc.contributor.author	Jeong, Jong Cheol
dc.date.accessioned	2012-03-01T20:16:35Z
dc.date.available	2012-03-01T20:16:35Z
dc.date.issued	2011-12-31
dc.date.submitted	2011
dc.identifier.other	http://dissertations.umi.com/ku:11862
dc.identifier.uri	http://hdl.handle.net/1808/8783
dc.description.abstract	Genome-sequencing projects with advanced technologies have rapidly increased the amount of protein sequences, and demands for identifying protein interaction sites are significantly increased due to its impact on understanding cellular process, biochemical events and drug design studies. However, the capacity of current wet laboratory techniques is not enough to handle the exponentially growing protein sequence data; therefore, sequence based predictive methods identifying protein interaction sites have drawn increasing interest. In this article, a new predictive model which can be valuable as a first approach for guiding experimental methods investigating protein-protein interactions and localizing the specific interface residues is proposed. The proposed method extracts a wide range of features from protein sequences. Random forests framework is newly redesigned to effectively utilize these features and the problems of imbalanced data classification commonly encountered in binding site predictions. The method is evaluated with 2,829 interface residues and 24,616 non-interface residues extracted from 99 polypeptide chains in the Protein Data Bank. The experimental results show that the proposed method performs significantly better than two other conventional predictive methods and can reliably predict residues involved in protein interaction sites. As blind tests, the proposed method predicts interaction sites and constructs three protein complexes: the DnaK molecular chaperone system, 1YUW and 1DKG, which provide new insight into the sequence-function relationship. Finally, the robustness of the proposed method is assessed by evaluating the performances obtained from four different ensemble methods.
dc.format.extent	114 pages
dc.language.iso	en
dc.publisher	University of Kansas
dc.rights	This item is protected by copyright and unless otherwise specified the copyright of this thesis/dissertation is held by the author.
dc.subject	Bioinformatics
dc.subject	Computer science
dc.subject	Biomedical engineering
dc.subject	Interface residues
dc.subject	Machine learning
dc.subject	Properties of amino acids
dc.subject	Protein binding
dc.subject	Protein-protein interactions
dc.subject	Protein sequence analysis
dc.title	A NEW METHODOLOGY FOR IDENTIFYING INTERFACE RESIDUES INVOLVED IN BINDING PROTEIN COMPLEXES
dc.type	Thesis
dc.contributor.cmtemember	Huan, Luke
dc.contributor.cmtemember	Luo, Bo
dc.thesis.degreeDiscipline	Electrical Engineering & Computer Science
dc.thesis.degreeLevel	M.S.
kusw.oastatus	na
dc.identifier.orcid	https://orcid.org/0000-0002-5024-2927
kusw.oapolicy	This item does not meet KU Open Access policy criteria.
kusw.bibid	7643363
dc.rights.accessrights	openAccess

Files in this item

Name:: JEONG_ku_0099M_11862_DATA_1.pdf
Size:: 13.59Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.