Show simple item record

dc.contributor.advisorLuo, Bo
dc.contributor.advisorChen, Xue-wen
dc.contributor.authorPark, Meeyoung
dc.date.accessioned2014-06-18T03:40:21Z
dc.date.available2014-06-18T03:40:21Z
dc.date.issued2013-12-31
dc.date.submitted2013
dc.identifier.otherhttp://dissertations.umi.com/ku:13091
dc.identifier.urihttp://hdl.handle.net/1808/14202
dc.description.abstractAs well recognized, healthcare information is growing exponentially and is made more available to public. Frequent users such as medical professionals and patients are highly dependent on the web sources to get the appropriate information promptly. However, the trustworthiness of the information on the web is always questionable due to the fast and augmentative properties of the Internet. Most search engines provide relevant pages to given keywords, but the results might contain some unreliable or biased information. Consequently, a significant challenge associated with the information explosion is to ensure effective use of information. One way to improve the search results is by accurately identifying more trustworthy data. Surprisingly, although trustworthiness of sources is essential for a great number of daily users, not much work has been done for healthcare information sources by far. In this dissertation, I am proposing a new system named HealthTrust, which automatically assesses the trustworthiness of healthcare information over the Internet. In the first phase, an unsupervised clustering using graph topology, on our collection of data is employed. The goal is to identify a relatively larger and reliable set of trusted websites as a seed set without much human efforts. After that, a new ranking algorithm for structure-based assessment is adopted. The basic hypothesis is that trustworthy pages are more likely to link to trustworthy pages. In this way, the original set of positive and negative seeds will propagate over the Web graph. With the credibility-based discriminators, the global scoring is biased towards trusted websites and away from untrusted websites. Next, in the second phase, the content consistency between general healthcare-related webpages and trusted sites is evaluated using information retrieval techniques to evaluate the content-semantics of the webpage with respect to the medical topics. In addition, graph modeling is employed to generate contents-based ranking for each page based on the sentences in the seed pages. Finally, in order to integrate the two components, an iterative approach that integrates the credibility assessments from structure-based and content-based methods to give a final verdict - a HealthTrust score for each webpage is exploited. I demonstrated the first attempt to integrate structure-based and content-based approaches to automatically evaluate the credibility of online healthcare information through HealthTrust and make fundamental contributions to both information retrieval and healthcare informatics communities.
dc.format.extent120 pages
dc.language.isoen
dc.publisherUniversity of Kansas
dc.rightsThis item is protected by copyright and unless otherwise specified the copyright of this thesis/dissertation is held by the author.
dc.subjectComputer science
dc.subjectHealthcare informatics
dc.subjectHidden Markov model
dc.subjectMachine learning
dc.subjectNatural language processing
dc.subjectTopic modeling
dc.subjectTrustworthiness
dc.titleHealthTrust: Assessing the Trustworthiness of Healthcare Information on the Internet
dc.typeDissertation
dc.contributor.cmtememberAgah, Arvin
dc.contributor.cmtememberHuan, Luke
dc.contributor.cmtememberKulkarni, Prasad
dc.contributor.cmtememberWang, Michael
dc.thesis.degreeDisciplineElectrical Engineering & Computer Science
dc.thesis.degreeLevelPh.D.
kusw.bibid8086457
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record