Text Mining for Protein-Protein Docking

Badal, Varsha Dave

dc.contributor.advisor	Vakser, Ilya A.
dc.contributor.advisor	Kundrotas, Petras J.
dc.contributor.author	Badal, Varsha Dave
dc.date.accessioned	2019-05-18T19:05:48Z
dc.date.available	2019-05-18T19:05:48Z
dc.date.issued	2018-08-31
dc.date.submitted	2018
dc.identifier.other	http://dissertations.umi.com/ku:16105
dc.identifier.uri	http://hdl.handle.net/1808/27985
dc.description.abstract	Scientific publications are a rich but underutilized source of structural and functional information on proteins and protein interactions. Although scientific literature is intended for human audience, text mining makes it amenable to algorithmic processing. It can focus on extracting information relevant to protein binding modes, providing specific residues that are likely be at the binding site for a given pair of proteins. The knowledge of such residues is a powerful guide for the structural modeling of protein-protein complexes. This work combines and extends two well-established areas of research: the non-structural identification of protein-protein interactors, and structure-based detection of functional (small-ligand) sites on proteins. Text-mining based constraints for protein-protein docking is a unique research direction, which has not been explored prior to this study. Although text mining by itself is unlikely to produce docked models, it is useful in scoring of the docking predictions. Our results show that despite presence of false positives, text mining significantly improves the docking quality. To purge false positives in the mined residues, along with the basic text-mining, this work explores enhanced text mining techniques, using various language processing tools, from simple dictionaries, to WordNet (a generic word ontology), parse trees, word vectors and deep recursive neural networks. The results significantly increase confidence in the generated docking constraints and provide guidelines for the future development of this modeling approach. With the rapid growth of the body of publicly available biomedical literature, and new evolving text-mining methodologies, the approach will become more powerful and adequate to the needs of biomedical community.
dc.format.extent	151 pages
dc.language.iso	en
dc.publisher	University of Kansas
dc.rights	Copyright held by the author.
dc.subject	Bioinformatics
dc.subject	Deep learning
dc.subject	Natural language processing
dc.subject	Protein docking
dc.subject	Protein-protein interaction
dc.subject	Structural bioinformatics
dc.subject	Text mining
dc.title	Text Mining for Protein-Protein Docking
dc.type	Dissertation
dc.contributor.cmtemember	Deeds, Eric J.
dc.contributor.cmtemember	Ray, Christian J.
dc.contributor.cmtemember	Slusky, Joanna S.G.
dc.contributor.cmtemember	Miao, Yinglong
dc.contributor.cmtemember	Kuczera, Krzysztof
dc.thesis.degreeDiscipline	Molecular Biosciences
dc.thesis.degreeLevel	Ph.D.
dc.identifier.orcid
dc.rights.accessrights	openAccess

Files in this item

Name:: Badal_ku_0099D_16105_DATA_1.pdf
Size:: 4.113Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.