ATTENTION: The software behind KU ScholarWorks is being upgraded to a new version. Starting July 15th, users will not be able to log in to the system, add items, nor make any changes until the new version is in place at the end of July. Searching for articles and opening files will continue to work while the system is being updated. If you have any questions, please contact Marianne Reed at mreed@ku.edu .

Show simple item record

dc.contributor.advisorSwint-Kruse, Liskin
dc.contributor.authorParente, Daniel Joseph
dc.date.accessioned2017-05-07T20:31:12Z
dc.date.available2017-05-07T20:31:12Z
dc.date.issued2014-08-31
dc.date.submitted2014
dc.identifier.otherhttp://dissertations.umi.com/ku:13473
dc.identifier.urihttp://hdl.handle.net/1808/23958
dc.description.abstractRevolutionary advances in sequencing technology have dramatically expanded the set of known, naturally-occurring protein sequences. Protein sequences arise from an evolutionary process and during evolution proteins experience pressure to maintain and diversity their functions via mutation. Some mutations arise merely from neutral drift, but other changes enable organisms to adapt to their unique niche. Positions that are important for structure or function are expected to be mutationally constrained during evolution. To that end, many algorithms have been devised to identify mutational constraints in the evolutionary record in order to predict the location of functionally important sites. Accurate prediction of functionally important positions would have important practical implications. For example, individual humans each carry about 10,000 exomic sequence polymorphisms. Which of these are functionally and/or clinically significant? Similarly, protein engineers may target such sites for mutagenesis to derive variant functions. To detect these constraints, homologous proteins must first be sorted into protein families, based on sequence similarity, which typically indicates structural and functional similarity. Protein family multiple sequence alignments (MSAs) can then be computationally analyzed to understand the family in light of its evolutionary history. MSA analyses have detect various evolutionary patterns that are thought to confer functional significance. For example, positions that are absolutely conserved across a family are commonly inferred to play important structural or functional roles and, consequently, be intolerant to mutation. Other analyses attempt to identify important non-conserved positions, some of which must be functionally significant for the family to evolve functional variations. One important example is “co-evolutionary” analyses, which seek pairs of positions that vary in a coordinated manner across evolution. MSA analyses make a number of simplifying assumptions to abstract away the full complexity of real proteins. Here, we have (1) assessed the validity of some of these assumptions, and (2) investigated strategies to maximize the usefulness of existing tools in identifying functionally important positions, in light of their limitations, and (3) evaluated the ability of existing tools to identify known-significant positions. To that end, we have applied MSA analyses to the LacI/GalR bacterial transcription regulator family as our primary model system. Our studies have proceeded in three phases. First, preceding work indicated that published predictions based on a small LacI/GalR MSA fail to identify several functionally-significant positions in the 18-amino acid linker of LacI/GalR proteins. We have investigated whether making better use of these tools — by expanding the set of sequence in the LacI/GalR MSA and sorting the family based on external experimental knowledge — can improve predictive accuracy. Interestingly, comparison of existing predictions to all available experimental data also suggests that — contrary to a common assumption — functionally neutral positions may be much more rare than previously thought. Second, LacI/GalR proteins exhibit substantial functional diversity, even though their structures are extremely similar. One question is: how can a common structure support high levels of functional diversity? We have used conservation and co-evolutionary analyses to determine whether (a) functionally significant positions are dictated by the tertiary structure — an assumption of most MSA analyses — or (b) whether the structure serves as an accommodating scaffold, by permitting multiple subfamily-specific networks of functionally significant positions. Finally, alternative co-evolutionary algorithms disagree about which pairs of positions are evolutionarily-linked. However, we have analyzed alternative co-evolution networks using graph theory and have observed that the eigenvector network centrality (a) improves agreement between diverse analyses, and (b) can identify functionally significant positions in protein families. Thus, eigenvector centrality may be a useful framework for interpreting co-evolution analyses. Taken together, our studies provide tools to make best use of existing MSA analyses and indicate that future tools should avoid making several common assumptions.
dc.format.extent306 pages
dc.language.isoen
dc.publisherUniversity of Kansas
dc.rightsCopyright held by the author.
dc.subjectBiochemistry
dc.subjectBioinformatics
dc.subjectCo-evolution
dc.subjectLacI/GalR
dc.subjectProtein evolution
dc.subjectProtein sequence analysis
dc.titleMining Evolutionary Data to Reveal the Layered Architecture of Protein Function
dc.typeDissertation
dc.contributor.cmtememberFenton, Aron
dc.contributor.cmtememberKaranicolas, John
dc.contributor.cmtememberZhu, Hao
dc.contributor.cmtememberFontes, Joseph
dc.thesis.degreeDisciplineBiochemistry & Molecular Biology
dc.thesis.degreeLevelPh.D.
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record