Discrimination of soluble and aggregation-prone proteins based on sequence information
View/ Open
Issue Date
2013-04-05Author
Fang, Yaping
Fang, Jianwen
Publisher
Royal Society of Chemistry
Type
Article
Article Version
Scholarly/refereed, author accepted manuscript
Rights
© Royal Society of Chemistry
Metadata
Show full item recordAbstract
Understanding the factors governing protein solubility is a key to grasp the mechanisms of protein solubility and may provide insight into protein aggregation and misfolding related diseases such as Alzheimer’s disease. In this work, we attempt to identify factors important to protein solubility using feature selection. Firstly, we calculate 1438 features including physicochemical properties and statistics for each protein. Random Forest algorithm is used to select the most informative and the minimal subset of features based on their predictive performance. A predictive model is built based on 17 selected features. Compared with previous models, our model achieves better performance with a sensitivity of 0.82, specificity 0.85, ACC 0.84, AUC 0.91 and MCC 0.67. Furthermore, a model using redundancy-reduced dataset (sequence identity <= 30%) achieves the same performance as the model without redundancy reduction. Our results provide not only a reliable model for predicting protein solubility but also a list of features important to protein solubility. The predictive model is implemented as a freely available web application at http://shark.abl.ku.edu/ProS/.
Collections
Citation
Fang, Y., & Fang, J. (2013). Discrimination of soluble and aggregation-prone proteins based on sequence information. Molecular bioSystems, 9(4), 806–811. http://doi.org/10.1039/c3mb70033j
Items in KU ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
We want to hear from you! Please share your stories about how Open Access to this item benefits YOU.