Loading...
Thumbnail Image
Publication

Document similarity based on concept tree distance

Lakkaraju, Praveen
Citations
Altmetric:
Abstract
The Web is fast moving from an era of search engines to an era of discovery engines. Discovery engines help you find things that you never knew existed or did not know how to ask for. One of the ways this can be done is by automatically computing and displaying objects that are similar to the object in which the user is currently expressing interest. In this paper, we present a new approach to compute interdocument similarity that is based on a tree-matching algorithm. We represent each document as a concept tree using the concept associations obtained from a classifier. We make use of a tree-matching algorithm called the tree edit distance to compute similarities between these concept trees. Experiments on a subset of documents from the CiteSeer collection showed that our algorithm performed better than the document similarity based on the traditional vector space model.
Description
Thesis (M.S.)--University of Kansas, Electrical Engineering and Computer Science, 2007.
Date
2007-05-31
Journal Title
Journal ISSN
Volume Title
Publisher
University of Kansas
Collections
Research Projects
Organizational Units
Journal Issue
Keywords
Applied sciences
Citation
DOI
Embedded videos