dc.contributor.advisor | Vakser, Ilya A. | |
dc.contributor.advisor | Kundrotas, Petras J. | |
dc.contributor.author | Badal, Varsha Dave | |
dc.date.accessioned | 2019-05-18T19:05:48Z | |
dc.date.available | 2019-05-18T19:05:48Z | |
dc.date.issued | 2018-08-31 | |
dc.date.submitted | 2018 | |
dc.identifier.other | http://dissertations.umi.com/ku:16105 | |
dc.identifier.uri | http://hdl.handle.net/1808/27985 | |
dc.description.abstract | Scientific publications are a rich but underutilized source of structural and functional information on proteins and protein interactions. Although scientific literature is intended for human audience, text mining makes it amenable to algorithmic processing. It can focus on extracting information relevant to protein binding modes, providing specific residues that are likely be at the binding site for a given pair of proteins. The knowledge of such residues is a powerful guide for the structural modeling of protein-protein complexes. This work combines and extends two well-established areas of research: the non-structural identification of protein-protein interactors, and structure-based detection of functional (small-ligand) sites on proteins. Text-mining based constraints for protein-protein docking is a unique research direction, which has not been explored prior to this study. Although text mining by itself is unlikely to produce docked models, it is useful in scoring of the docking predictions. Our results show that despite presence of false positives, text mining significantly improves the docking quality. To purge false positives in the mined residues, along with the basic text-mining, this work explores enhanced text mining techniques, using various language processing tools, from simple dictionaries, to WordNet (a generic word ontology), parse trees, word vectors and deep recursive neural networks. The results significantly increase confidence in the generated docking constraints and provide guidelines for the future development of this modeling approach. With the rapid growth of the body of publicly available biomedical literature, and new evolving text-mining methodologies, the approach will become more powerful and adequate to the needs of biomedical community. | |
dc.format.extent | 151 pages | |
dc.language.iso | en | |
dc.publisher | University of Kansas | |
dc.rights | Copyright held by the author. | |
dc.subject | Bioinformatics | |
dc.subject | Deep learning | |
dc.subject | Natural language processing | |
dc.subject | Protein docking | |
dc.subject | Protein-protein interaction | |
dc.subject | Structural bioinformatics | |
dc.subject | Text mining | |
dc.title | Text Mining for Protein-Protein Docking | |
dc.type | Dissertation | |
dc.contributor.cmtemember | Deeds, Eric J. | |
dc.contributor.cmtemember | Ray, Christian J. | |
dc.contributor.cmtemember | Slusky, Joanna S.G. | |
dc.contributor.cmtemember | Miao, Yinglong | |
dc.contributor.cmtemember | Kuczera, Krzysztof | |
dc.thesis.degreeDiscipline | Molecular Biosciences | |
dc.thesis.degreeLevel | Ph.D. | |
dc.identifier.orcid | | |
dc.rights.accessrights | openAccess | |