Mining Evolutionary Data to Reveal the Layered Architecture of Protein Function

Parente, Daniel Joseph

dc.contributor.advisor	Swint-Kruse, Liskin
dc.contributor.author	Parente, Daniel Joseph
dc.date.accessioned	2017-05-07T20:31:12Z
dc.date.available	2017-05-07T20:31:12Z
dc.date.issued	2014-08-31
dc.date.submitted	2014
dc.identifier.other	http://dissertations.umi.com/ku:13473
dc.identifier.uri	http://hdl.handle.net/1808/23958
dc.description.abstract	Revolutionary advances in sequencing technology have dramatically expanded the set of known, naturally-occurring protein sequences. Protein sequences arise from an evolutionary process and during evolution proteins experience pressure to maintain and diversity their functions via mutation. Some mutations arise merely from neutral drift, but other changes enable organisms to adapt to their unique niche. Positions that are important for structure or function are expected to be mutationally constrained during evolution. To that end, many algorithms have been devised to identify mutational constraints in the evolutionary record in order to predict the location of functionally important sites. Accurate prediction of functionally important positions would have important practical implications. For example, individual humans each carry about 10,000 exomic sequence polymorphisms. Which of these are functionally and/or clinically significant? Similarly, protein engineers may target such sites for mutagenesis to derive variant functions. To detect these constraints, homologous proteins must first be sorted into protein families, based on sequence similarity, which typically indicates structural and functional similarity. Protein family multiple sequence alignments (MSAs) can then be computationally analyzed to understand the family in light of its evolutionary history. MSA analyses have detect various evolutionary patterns that are thought to confer functional significance. For example, positions that are absolutely conserved across a family are commonly inferred to play important structural or functional roles and, consequently, be intolerant to mutation. Other analyses attempt to identify important non-conserved positions, some of which must be functionally significant for the family to evolve functional variations. One important example is “co-evolutionary” analyses, which seek pairs of positions that vary in a coordinated manner across evolution. MSA analyses make a number of simplifying assumptions to abstract away the full complexity of real proteins. Here, we have (1) assessed the validity of some of these assumptions, and (2) investigated strategies to maximize the usefulness of existing tools in identifying functionally important positions, in light of their limitations, and (3) evaluated the ability of existing tools to identify known-significant positions. To that end, we have applied MSA analyses to the LacI/GalR bacterial transcription regulator family as our primary model system. Our studies have proceeded in three phases. First, preceding work indicated that published predictions based on a small LacI/GalR MSA fail to identify several functionally-significant positions in the 18-amino acid linker of LacI/GalR proteins. We have investigated whether making better use of these tools — by expanding the set of sequence in the LacI/GalR MSA and sorting the family based on external experimental knowledge — can improve predictive accuracy. Interestingly, comparison of existing predictions to all available experimental data also suggests that — contrary to a common assumption — functionally neutral positions may be much more rare than previously thought. Second, LacI/GalR proteins exhibit substantial functional diversity, even though their structures are extremely similar. One question is: how can a common structure support high levels of functional diversity? We have used conservation and co-evolutionary analyses to determine whether (a) functionally significant positions are dictated by the tertiary structure — an assumption of most MSA analyses — or (b) whether the structure serves as an accommodating scaffold, by permitting multiple subfamily-specific networks of functionally significant positions. Finally, alternative co-evolutionary algorithms disagree about which pairs of positions are evolutionarily-linked. However, we have analyzed alternative co-evolution networks using graph theory and have observed that the eigenvector network centrality (a) improves agreement between diverse analyses, and (b) can identify functionally significant positions in protein families. Thus, eigenvector centrality may be a useful framework for interpreting co-evolution analyses. Taken together, our studies provide tools to make best use of existing MSA analyses and indicate that future tools should avoid making several common assumptions.
dc.format.extent	306 pages
dc.language.iso	en
dc.publisher	University of Kansas
dc.rights	Copyright held by the author.
dc.subject	Biochemistry
dc.subject	Bioinformatics
dc.subject	Co-evolution
dc.subject	LacI/GalR
dc.subject	Protein evolution
dc.subject	Protein sequence analysis
dc.title	Mining Evolutionary Data to Reveal the Layered Architecture of Protein Function
dc.type	Dissertation
dc.contributor.cmtemember	Fenton, Aron
dc.contributor.cmtemember	Karanicolas, John
dc.contributor.cmtemember	Zhu, Hao
dc.contributor.cmtemember	Fontes, Joseph
dc.thesis.degreeDiscipline	Biochemistry & Molecular Biology
dc.thesis.degreeLevel	Ph.D.
dc.rights.accessrights	openAccess

Files in this item

Name:: Parente_ku_0099D_13473_DATA_1.pdf
Size:: 116.7Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.