Show simple item record

dc.contributor.advisorAgah, Arvin
dc.contributor.authorGibbons, John William
dc.date.accessioned2014-07-05T19:04:47Z
dc.date.available2014-07-05T19:04:47Z
dc.date.issued2014-05-31
dc.date.submitted2014
dc.identifier.otherhttp://dissertations.umi.com/ku:13237
dc.identifier.urihttp://hdl.handle.net/1808/14603
dc.description.abstractOnline Social Networks (OSNs) are integrated into business, entertainment, politics, and education; they are integrated into nearly every facet of our everyday lives. They have played essential roles in milestones for humanity, such as the social revolutions in certain countries, to more day-to-day activities, such as streaming entertaining or educational materials. Not surprisingly, social networks are the subject of study, not only for computer scientists, but also for economists, sociologists, political scientists, and psychologists, among others. In this dissertation, we build a model that is used to classify content on the OSNs of Reddit, 4chan, Flickr, and YouTube according the types of lifespan their content have and the popularity tiers that the content reaches. The proposed model is evaluated using 10-fold cross-validation, using data mining techniques of Sequential Minimal Optimization (SMO), which is a support vector machine algorithm, Decision Table, Naïve Bayes, and Random Forest. The run times and accuracies are compared across OSNs, models, and data mining algorithms. The peak/death category of Reddit content can be classified with 64% accuracy. The peak/death category of 4Chan content can be classified with 76% accuracy. The peak/death category of Flickr content can classified with 65% accuracy. We also used 10-fold cross-validation to measure the accuracy in which the popularity tier of content can be classified. The popularity tier of content on Reddit can be classified with 84% accuracy. The popularity tier of content on 4chan can be classified with 70% accuracy. The popularity tier of content on Flickr can be classified with 66% accuracy. The popularity tier of content on YouTube can be classified with only 48% accuracy. Our experiments compared the runtimes and accuracy of SMO, Naïve Bayes, Decision Table, and Random Forest to classify the lifespan of content on Reddit, 4chan, and Flickr as well as classify the popularity tier of content on Reddit, 4chan, Flickr, and YouTube. The experimental results indicate that SMO is capable of outperforming the other algorithms in runtime across all OSNs. Decision Table has the longest observed runtimes, failing to complete analysis before system crashes in some cases. The statistical analysis indicates, with 95% confidence, there is no statistically significant difference in accuracy between the algorithms across all OSNs. Reddit content was shown, with 95% confidence, to be the OSN least likely to be misclassified. All other OSNs, were shown to have no statistically significant difference in terms of their content being more or less likely to be misclassified when compared pairwise with each other.
dc.format.extent111 pages
dc.language.isoen
dc.publisherUniversity of Kansas
dc.rightsThis item is protected by copyright and unless otherwise specified the copyright of this thesis/dissertation is held by the author.
dc.subjectInformation science
dc.subjectContent
dc.subjectData mining
dc.subjectLifespan
dc.subjectOnline social networks
dc.titleModeling Content Lifespan in Online Social Networks Using Data Mining
dc.typeDissertation
dc.contributor.cmtememberMiller, James
dc.contributor.cmtememberPerry, Alexander
dc.contributor.cmtememberGrzymala-Busse, Jerzey
dc.contributor.cmtememberDhar, Prajna
dc.thesis.degreeDisciplineElectrical Engineering & Computer Science
dc.thesis.degreeLevelPh.D.
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record