ATTENTION: The software behind KU ScholarWorks is being upgraded to a new version. Starting July 15th, users will not be able to log in to the system, add items, nor make any changes until the new version is in place at the end of July. Searching for articles and opening files will continue to work while the system is being updated. If you have any questions, please contact Marianne Reed at mreed@ku.edu .

Show simple item record

dc.contributor.authorJia, Yi
dc.contributor.authorHuan, Luke
dc.contributor.authorBuhr, Vincent
dc.contributor.authorZhang, Jintao
dc.contributor.authorCarayannopoulos, Leonidas N.
dc.date.accessioned2015-11-13T20:20:39Z
dc.date.available2015-11-13T20:20:39Z
dc.date.issued2009-01-30
dc.identifier.citationJia, Yi, Jun Huan, Vincent Buhr, Jintao Zhang, and Leonidas N. Carayannopoulos. "Towards Comprehensive Structural Motif Mining for Better Fold Annotation in the "twilight Zone" of Sequence Dissimilarity." BMC Bioinformatics 10.Suppl 1 (2009): n. pag. http://dx.doi.org/10.1186/1471-2105-10-S1-S46en_US
dc.identifier.urihttp://hdl.handle.net/1808/18905
dc.description.abstractBackground

Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the "twilight-" or "midnight-" zones where pair-wise sequence identities to known sequences fall below 25% and sequence-based functional annotations often fail. Results

Here we report a novel graph database mining method and demonstrate its application to protein structure pattern identification and structure classification. The biologic motivation of our study is to recognize common structure patterns in "immunoevasins", proteins mediating virus evasion of host immune defense. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. Conclusion

We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrices as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. And without loss of generality, choice of appropriate compatibility matrices allows our method to be easily employed in domains where subgraph labels have some uncertainty.
en_US
dc.publisherBioMed Centralen_US
dc.rightsCopyright © 2009 Jia et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
dc.rights.urihttp://creativecommons.org/licenses/by/2.0/
dc.titleTowards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarityen_US
dc.typeArticle
kusw.kuauthorHuan, Luke
kusw.kudepartmentElectrical Engr & Comp Scienceen_US
dc.identifier.doi10.1186/1471-2105-10-S1-S46
kusw.oaversionScholarly/refereed, publisher version
kusw.oapolicyThis item meets KU Open Access policy criteria.
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Copyright © 2009 Jia et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Except where otherwise noted, this item's license is described as: Copyright © 2009 Jia et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.