Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity

Jia, Yi; Huan, Luke; Buhr, Vincent; Zhang, Jintao; Carayannopoulos, Leonidas N.

dc.contributor.author	Jia, Yi
dc.contributor.author	Huan, Luke
dc.contributor.author	Buhr, Vincent
dc.contributor.author	Zhang, Jintao
dc.contributor.author	Carayannopoulos, Leonidas N.
dc.date.accessioned	2015-11-13T20:20:39Z
dc.date.available	2015-11-13T20:20:39Z
dc.date.issued	2009-01-30
dc.identifier.citation	Jia, Yi, Jun Huan, Vincent Buhr, Jintao Zhang, and Leonidas N. Carayannopoulos. "Towards Comprehensive Structural Motif Mining for Better Fold Annotation in the "twilight Zone" of Sequence Dissimilarity." BMC Bioinformatics 10.Suppl 1 (2009): n. pag. http://dx.doi.org/10.1186/1471-2105-10-S1-S46	en_US
dc.identifier.uri	http://hdl.handle.net/1808/18905
dc.description.abstract	Background Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the "twilight-" or "midnight-" zones where pair-wise sequence identities to known sequences fall below 25% and sequence-based functional annotations often fail. Results Here we report a novel graph database mining method and demonstrate its application to protein structure pattern identification and structure classification. The biologic motivation of our study is to recognize common structure patterns in "immunoevasins", proteins mediating virus evasion of host immune defense. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. Conclusion We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrices as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. And without loss of generality, choice of appropriate compatibility matrices allows our method to be easily employed in domains where subgraph labels have some uncertainty.	en_US
dc.publisher	BioMed Central	en_US
dc.rights	Copyright © 2009 Jia et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
dc.rights.uri	http://creativecommons.org/licenses/by/2.0/
dc.title	Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity	en_US
dc.type	Article
kusw.kuauthor	Huan, Luke
kusw.kudepartment	Electrical Engr & Comp Science	en_US
dc.identifier.doi	10.1186/1471-2105-10-S1-S46
kusw.oaversion	Scholarly/refereed, publisher version
kusw.oapolicy	This item meets KU Open Access policy criteria.
dc.rights.accessrights	openAccess

Files in this item

Name:: Jia_structural_motif2009.pdf
Size:: 1.216Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.