A topological algorithm for identification of structural domains of proteins

Frank Emmert-Streib; Arcady Mushegian

dc.contributor.author	Frank Emmert-Streib	en_US
dc.contributor.author	Arcady Mushegian	en_US
dc.date.accessioned	2009-05-05T16:13:10Z
dc.date.available	2009-05-05T16:13:10Z
dc.date.issued	2006-02-17	en_US
dc.identifier.citation	Frank Emmert-Streib;Arcady Mushegian: A topological algorithm for identification of structural domains of proteins. BMC Bioinformatics 2007, 8(1):237.	en_US
dc.identifier.uri	http://hdl.handle.net/2271/586	en_US
dc.description.abstract	BACKGROUND:Identification of the structural domains of proteins is important for our understanding of the organizational principles and mechanisms of protein folding, and for insights into protein function and evolution. Algorithmic methods of dissecting protein of known structure into domains developed so far are based on an examination of multiple geometrical, physical and topological features. Successful as many of these approaches are, they employ a lot of heuristics, and it is not clear whether they illuminate any deep underlying principles of protein domain organization. Other well-performing domain dissection methods rely on comparative sequence analysis. These methods are applicable to sequences with known and unknown structure alike, and their success highlights a fundamental principle of protein modularity, but this does not directly improve our understanding of protein spatial structure.RESULTS:We present a novel graph-theoretical algorithm for the identification of domains in proteins with known three-dimensional structure. We represent the protein structure as an undirected, unweighted and unlabeled graph whose nodes correspond to the secondary structure elements and edges represent physical proximity of at least one pair of alpha carbon atoms from two elements. Domains are identified as constrained partitions of the graph, corresponding to sets of vertices obtained by the maximization of the cycle distributions found in the graph. When a partition is found, the algorithm is iteratively applied to each of the resulting subgraphs. The decision to accept or reject a tentative cut position is based on a specific classifier. The algorithm is applied iteratively to each of the resulting subgraphs and terminates automatically if partitions are no longer accepted. The distribution of cycles is the only type of information on which the decision about protein dissection is based. Despite the barebone simplicity of the approach, our algorithm approaches the best heuristic algorithms in accuracy.CONCLUSION:Our graph-theoretical algorithm uses only topological information present in the protein structure itself to find the domains and does not rely on any geometrical or physical information about protein molecule. Perhaps unexpectedly, these drastic constraints on resources, which result in a seemingly approximate description of protein structures and leave only a handful of parameters available for analysis, do not lead to any significant deterioration of algorithm accuracy. It appears that protein structures can be rigorously treated as topological rather than geometrical objects and that the majority of information about protein domains can be inferred from the coarse-grained measure of pairwise proximity between elements of secondary structure elements.	en_US
dc.language	en	en_US
dc.language.iso	en_US	en_US
dc.publisher	BioMedCentral	en_US
dc.relation.isversionof	http://www.biomedcentral.com/1471-2105/8/237	en_US
dc.relation.hasversion	http://www.biomedcentral.com/content/pdf/1471-2105-8-237.pdf	en_US
dc.rights	This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.	en_US
dc.rights.uri	http://creativecommons.org/licenses/by/2.0	en_US
dc.subject.mesh	Algorithms	en_US
dc.subject.mesh	Base Sequence	en_US
dc.subject.mesh	Binding Sites	en_US
dc.subject.mesh	Chromosome Mapping/ methods	en_US
dc.subject.mesh	Computer Simulation	en_US
dc.subject.mesh	DNA-Binding Proteins/ genetics	en_US
dc.subject.mesh	Internet	en_US
dc.subject.mesh	Models, Genetic	en_US
dc.subject.mesh	Molecular Sequence Data	en_US
dc.subject.mesh	Online Systems	en_US
dc.subject.mesh	Protein Binding	en_US
dc.subject.mesh	Sequence Alignment/ methods	en_US
dc.subject.mesh	Sequence Analysis, DNA/ methods	en_US
dc.subject.mesh	Sequence Homology, Nucleic Acid	en_US
dc.subject.mesh	Software	en_US
dc.subject.mesh	Transcription Factors/ genetics	en_US
dc.title	A topological algorithm for identification of structural domains of proteins	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1186/1471-2105-8-237	en_US
dc.identifier.pmid	PMC16503993	en_US
dc.rights.accessrights	openAccess	en_US
dc.date.captured	2009-04-27	en_US

Files in this item

Name:: 1471-2105-8-237.pdf
Size:: 904.8Kb
Format:: PDF

View/Open

Name:: license_rdf
Size:: 11.77Kb
Format:: Unknown

View/Open

Name:: license_text
Size:: 0bytes
Format:: Unknown

View/Open

Name:: license_url
Size:: 43bytes
Format:: Unknown

View/Open

This item appears in the following Collection(s)

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.