Application of kernel functions for accurate similarity search in large chemical databases

Wang, Xiaohong; Huan, Jun; Smalter, Aaron Matthew; Lushington, Gerald H.

dc.contributor.author	Wang, Xiaohong
dc.contributor.author	Huan, Jun
dc.contributor.author	Smalter, Aaron Matthew
dc.contributor.author	Lushington, Gerald H.
dc.date.accessioned	2014-01-31T23:13:47Z
dc.date.available	2014-01-31T23:13:47Z
dc.date.issued	2010-04-29
dc.identifier.citation	Wang, Xiaohong, Jun Huan, Aaron Smalter, and Gerald H Lushington. 2010. “Application of Kernel Functions for Accurate Similarity Search in Large Chemical Databases.” BMC Bioinformatics 11 Suppl 3 (Suppl 3): S8. http://dx.doi.org/10.1186/1471-2105-11-S3-S8.
dc.identifier.uri	http://hdl.handle.net/1808/12913
dc.description.abstract	Background: Similaritysearch in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions can not be applied to large chemical compound database due to the high computational complexity and the difficulties in indexing similarity search for large databases. Results: To bridge graph kernel function and similarity search in chemical databases, we applied a novel kernel-based similarity measurement, developed in our team, to measure similarity of graph represented chemicals. In our method, we utilize a hash table to support new graph kernel function definition, efficient storage and fast search. We have applied our method, named G-hash, to large chemical databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Moreover, the similarity measurement and the index structure is scalable to large chemical databases with smaller indexing size, and faster query processing time as compared to state-of-the-art indexing methods such as Daylight fingerprints, C-tree and GraphGrep. Conclusions: Efficient similarity query processing method for large chemical databases is challenging since we need to balance running time efficiency and similarity search accuracy. Our previous similarity search method, G-hash, provides a new way to perform similarity search in chemical databases. Experimental study validates the utility of G-hash in chemical databases.
dc.publisher	BioMed Central
dc.rights	This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
dc.rights.uri	http://creativecommons.org/licenses/by/2.0
dc.title	Application of kernel functions for accurate similarity search in large chemical databases
dc.type	Article
kusw.kuauthor	Wang, Xiaohong
kusw.kuauthor	Huan, Jun
kusw.kuauthor	Smalter, Aaron
kusw.kuauthor	Lushington, Gerald H.
kusw.kudepartment	Electrical Engineering and Computer Science
kusw.oastatus	fullparticipation
dc.identifier.doi	10.1186/1471-2105-11-S3-S8
kusw.oaversion	Scholarly/refereed, publisher version
kusw.oapolicy	This item meets KU Open Access policy criteria.
dc.rights.accessrights	openAccess

Files in this item

Name:: Wang_2010.pdf
Size:: 661.6Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.