Show simple item record

dc.contributor.advisorKaranicolas, John
dc.contributor.advisorVakser, Ilya
dc.contributor.authorMalhotra, Shipra
dc.date.accessioned2020-01-17T22:43:20Z
dc.date.available2020-01-17T22:43:20Z
dc.date.issued2019-05-31
dc.date.submitted2019
dc.identifier.otherhttp://dissertations.umi.com/ku:16583
dc.identifier.urihttp://hdl.handle.net/1808/29888
dc.description.abstractA critical step in the target identification phase of drug discovery is evaluating druggability, i.e., whether a protein can be targeted with high affinity using drug-like ligands. The overarching goal of my PhD thesis is to build a machine learning model that predicts the binding affinity that can be attained when addressing a given protein surface. I begin by examining the lead optimization phase of drug development, where I find that in a test set of 297 examples, 41 of these (14%) change binding mode when a ligand is elaborated. My analysis shows that while certain ligand physiochemical properties predispose changes in binding mode, particularly those properties that define fragments, simple structure-based modeling proves far more effective for identifying substitutions that alter the binding mode. My proposed measure of RMAC (rmsd after minimization of the aligned complex) can help determine whether a given ligand can be reliably elaborated without changing binding mode, thus enabling straightforward interpretation of the resulting structure-activity relationships. Moving forward, I next noted that a very popular machine learning algorithm for regression tasks, random forest, has a systematic bias in the predictions it generates; this bias is present in both real-world datasets and synthetic datasets. To address this, I define a numerical transformation that can be applied to the output of random forest models. This transformation fully removes the bias in the resulting predictions, and yields improved predictions across all datasets. Finally, taking advantage of this improved machine learning approach, I describe a model that predicts the “attainable binding affinity” for a given binding pocket on a protein surface. This model uses 13 physiochemical and structural features calculated from the protein structure, without any information about the ligand. While details of the ligand must (of course) contribute somewhat to the binding affinity, I find that this model still recapitulates the binding affinity for 848 different protein-ligand complexes (across 230 different proteins) with correlation coefficient 0.57. I further find that this model is not limited to “traditional” drug targets, but rather that it works just as well for emerging “non-traditional” drug targets such as inhibitors of protein-protein interactions. Collectively, I anticipate that the tools and insights generated in the course of my PhD research will play an important role in facilitating the key target selection phase of drug discovery projects.
dc.format.extent143 pages
dc.language.isoen
dc.publisherUniversity of Kansas
dc.rightsCopyright held by the author.
dc.subjectMolecular biology
dc.subjectComputational Biology
dc.subjectMolecule Design
dc.titlePredicting the Most Tractable Protein Surfaces in the Human Proteome for Developing New Therapeutics
dc.typeDissertation
dc.contributor.cmtememberRay, Christian
dc.contributor.cmtememberSlusky, Joanna
dc.contributor.cmtememberMiao, Yinglong
dc.contributor.cmtememberDe Guzman, Roberto
dc.contributor.cmtememberRafferty, Michael
dc.thesis.degreeDisciplineMolecular Biosciences
dc.thesis.degreeLevelPh.D.
dc.identifier.orcidhttps://orcid.org/0000-0002-2234-191X
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record