ATTENTION: The software behind KU ScholarWorks is being upgraded to a new version. Starting July 15th, users will not be able to log in to the system, add items, nor make any changes until the new version is in place at the end of July. Searching for articles and opening files will continue to work while the system is being updated. If you have any questions, please contact Marianne Reed at mreed@ku.edu .

Show simple item record

dc.contributor.authorFang, Yaping
dc.contributor.authorFang, Jianwen
dc.date.accessioned2017-06-22T21:22:19Z
dc.date.available2017-06-22T21:22:19Z
dc.date.issued2013-04-05
dc.identifier.citationFang, Y., & Fang, J. (2013). Discrimination of soluble and aggregation-prone proteins based on sequence information. Molecular bioSystems, 9(4), 806–811. http://doi.org/10.1039/c3mb70033jen_US
dc.identifier.urihttp://hdl.handle.net/1808/24586
dc.description.abstractUnderstanding the factors governing protein solubility is a key to grasp the mechanisms of protein solubility and may provide insight into protein aggregation and misfolding related diseases such as Alzheimer’s disease. In this work, we attempt to identify factors important to protein solubility using feature selection. Firstly, we calculate 1438 features including physicochemical properties and statistics for each protein. Random Forest algorithm is used to select the most informative and the minimal subset of features based on their predictive performance. A predictive model is built based on 17 selected features. Compared with previous models, our model achieves better performance with a sensitivity of 0.82, specificity 0.85, ACC 0.84, AUC 0.91 and MCC 0.67. Furthermore, a model using redundancy-reduced dataset (sequence identity <= 30%) achieves the same performance as the model without redundancy reduction. Our results provide not only a reliable model for predicting protein solubility but also a list of features important to protein solubility. The predictive model is implemented as a freely available web application at http://shark.abl.ku.edu/ProS/.en_US
dc.publisherRoyal Society of Chemistryen_US
dc.rights© Royal Society of Chemistryen_US
dc.subjectProtein solubilityen_US
dc.subjectAggregationen_US
dc.subjectRandom Foresten_US
dc.subjectClassificationen_US
dc.subjectFeature selectionen_US
dc.titleDiscrimination of soluble and aggregation-prone proteins based on sequence informationen_US
dc.typeArticleen_US
kusw.kuauthorFang, Yaping
kusw.kuauthorFang, Jianwen
kusw.kudepartmentMolecular Structures Groupen_US
dc.identifier.doi10.1039/c3mb70033jen_US
kusw.oaversionScholarly/refereed, author accepted manuscripten_US
kusw.oapolicyThis item meets KU Open Access policy criteria.en_US
dc.identifier.pmidPMC3627541en_US
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record