Document similarity based on concept tree distance
Issue Date
2007-05-31Author
Lakkaraju, Praveen
Publisher
University of Kansas
Type
Thesis
Degree Level
M.S.
Discipline
Electrical Engineering and Computer Science
Rights
This item is protected by copyright and unless otherwise specified the copyright of this thesis/dissertation is held by the author.
Metadata
Show full item recordAbstract
The Web is fast moving from an era of search engines to an era of discovery engines. Discovery engines help you find things that you never knew existed or did not know how to ask for. One of the ways this can be done is by automatically computing and displaying objects that are similar to the object in which the user is currently expressing interest. In this paper, we present a new approach to compute interdocument similarity that is based on a tree-matching algorithm. We represent each document as a concept tree using the concept associations obtained from a classifier. We make use of a tree-matching algorithm called the tree edit distance to compute similarities between these concept trees. Experiments on a subset of documents from the CiteSeer collection showed that our algorithm performed better than the document similarity based on the traditional vector space model.
Description
Thesis (M.S.)--University of Kansas, Electrical Engineering and Computer Science, 2007.
Collections
- Theses [3901]
Items in KU ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
We want to hear from you! Please share your stories about how Open Access to this item benefits YOU.