Document similarity based on concept tree distance

Lakkaraju, Praveen

ATTENTION: The software behind KU ScholarWorks is being upgraded to a new version. Starting July 15th, users will not be able to log in to the system, add items, nor make any changes until the new version is in place at the end of July. Searching for articles and opening files will continue to work while the system is being updated. If you have any questions, please contact Marianne Reed at mreed@ku.edu .

View/Open

Lakkaraju_Praveen_2007_5349267.pdf (363.3Kb)

Issue Date

2007-05-31

Author

Lakkaraju, Praveen

Publisher

University of Kansas

Type

Thesis

Degree Level

M.S.

Discipline

Electrical Engineering and Computer Science

Rights

This item is protected by copyright and unless otherwise specified the copyright of this thesis/dissertation is held by the author.

Metadata

Show full item record

Abstract

The Web is fast moving from an era of search engines to an era of discovery engines. Discovery engines help you find things that you never knew existed or did not know how to ask for. One of the ways this can be done is by automatically computing and displaying objects that are similar to the object in which the user is currently expressing interest. In this paper, we present a new approach to compute interdocument similarity that is based on a tree-matching algorithm. We represent each document as a concept tree using the concept associations obtained from a classifier. We make use of a tree-matching algorithm called the tree edit distance to compute similarities between these concept trees. Experiments on a subset of documents from the CiteSeer collection showed that our algorithm performed better than the document similarity based on the traditional vector space model.

Description

Thesis (M.S.)--University of Kansas, Electrical Engineering and Computer Science, 2007.

URI

http://hdl.handle.net/1808/32041

Collections

Theses [4088]

We want to hear from you! Please share your stories about how Open Access to this item benefits YOU.

Contact KU ScholarWorks
Lawrence, KS | Maps

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.

Contact KU
Lawrence, KS | Maps