Show simple item record

dc.contributor.authorWang, Xiaohong
dc.contributor.authorHuan, Luke
dc.contributor.authorSmalter, Aaron Matthew
dc.contributor.authorLushington, Gerald H.
dc.date.accessioned2015-11-13T20:09:31Z
dc.date.available2015-11-13T20:09:31Z
dc.date.issued2010
dc.identifier.citationWang, Xiaohong, Jun Huan, Aaron Smalter, and Gerald H. Lushington. "Application of Kernel Functions for Accurate Similarity Search in Large Chemical Databases." 2009 IEEE International Conference on Bioinformatics and Biomedicine (2009). http://dx.doi.org/10.1186/1471-2105-11-S3-S8en_US
dc.identifier.urihttp://hdl.handle.net/1808/18903
dc.description.abstractBackground

Similaritysearch in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions can not be applied to large chemical compound database due to the high computational complexity and the difficulties in indexing similarity search for large databases. Results

To bridge graph kernel function and similarity search in chemical databases, we applied a novel kernel-based similarity measurement, developed in our team, to measure similarity of graph represented chemicals. In our method, we utilize a hash table to support new graph kernel function definition, efficient storage and fast search. We have applied our method, named G-hash, to large chemical databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Moreover, the similarity measurement and the index structure is scalable to large chemical databases with smaller indexing size, and faster query processing time as compared to state-of-the-art indexing methods such as Daylight fingerprints, C-tree and GraphGrep. Conclusions

Efficient similarity query processing method for large chemical databases is challenging since we need to balance running time efficiency and similarity search accuracy. Our previous similarity search method, G-hash, provides a new way to perform similarity search in chemical databases. Experimental study validates the utility of G-hash in chemical databases.
en_US
dc.publisherBioMed Centralen_US
dc.rightsCopyright © 2010 Huan et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
dc.rights.urihttp://creativecommons.org/licenses/by/2.0/
dc.titleApplication of kernel functions for accurate similarity search in large chemical databasesen_US
dc.typeArticle
kusw.kuauthorHuan, Luke
kusw.kudepartmentElectrical Engr & Comp Scienceen_US
dc.identifier.doi10.1186/1471-2105-11-S3-S8
kusw.oaversionScholarly/refereed, publisher version
kusw.oapolicyThis item meets KU Open Access policy criteria.
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Copyright © 2010 Huan et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Except where otherwise noted, this item's license is described as: Copyright © 2010 Huan et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.