Show simple item record

dc.contributor.advisorVakser, Ilya A.
dc.contributor.advisorKundrotas, Petras J.
dc.contributor.authorBadal, Varsha Dave
dc.date.accessioned2019-05-18T19:05:48Z
dc.date.available2019-05-18T19:05:48Z
dc.date.issued2018-08-31
dc.date.submitted2018
dc.identifier.otherhttp://dissertations.umi.com/ku:16105
dc.identifier.urihttp://hdl.handle.net/1808/27985
dc.description.abstractScientific publications are a rich but underutilized source of structural and functional information on proteins and protein interactions. Although scientific literature is intended for human audience, text mining makes it amenable to algorithmic processing. It can focus on extracting information relevant to protein binding modes, providing specific residues that are likely be at the binding site for a given pair of proteins. The knowledge of such residues is a powerful guide for the structural modeling of protein-protein complexes. This work combines and extends two well-established areas of research: the non-structural identification of protein-protein interactors, and structure-based detection of functional (small-ligand) sites on proteins. Text-mining based constraints for protein-protein docking is a unique research direction, which has not been explored prior to this study. Although text mining by itself is unlikely to produce docked models, it is useful in scoring of the docking predictions. Our results show that despite presence of false positives, text mining significantly improves the docking quality. To purge false positives in the mined residues, along with the basic text-mining, this work explores enhanced text mining techniques, using various language processing tools, from simple dictionaries, to WordNet (a generic word ontology), parse trees, word vectors and deep recursive neural networks. The results significantly increase confidence in the generated docking constraints and provide guidelines for the future development of this modeling approach. With the rapid growth of the body of publicly available biomedical literature, and new evolving text-mining methodologies, the approach will become more powerful and adequate to the needs of biomedical community.
dc.format.extent151 pages
dc.language.isoen
dc.publisherUniversity of Kansas
dc.rightsCopyright held by the author.
dc.subjectBioinformatics
dc.subjectDeep learning
dc.subjectNatural language processing
dc.subjectProtein docking
dc.subjectProtein-protein interaction
dc.subjectStructural bioinformatics
dc.subjectText mining
dc.titleText Mining for Protein-Protein Docking
dc.typeDissertation
dc.contributor.cmtememberDeeds, Eric J.
dc.contributor.cmtememberRay, Christian J.
dc.contributor.cmtememberSlusky, Joanna S.G.
dc.contributor.cmtememberMiao, Yinglong
dc.contributor.cmtememberKuczera, Krzysztof
dc.thesis.degreeDisciplineMolecular Biosciences
dc.thesis.degreeLevelPh.D.
dc.identifier.orcid
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record