Discrimination of soluble and aggregation-prone proteins based on sequence information

Fang, Yaping; Fang, Jianwen

dc.contributor.author	Fang, Yaping
dc.contributor.author	Fang, Jianwen
dc.date.accessioned	2017-06-22T21:22:19Z
dc.date.available	2017-06-22T21:22:19Z
dc.date.issued	2013-04-05
dc.identifier.citation	Fang, Y., & Fang, J. (2013). Discrimination of soluble and aggregation-prone proteins based on sequence information. Molecular bioSystems, 9(4), 806–811. http://doi.org/10.1039/c3mb70033j	en_US
dc.identifier.uri	http://hdl.handle.net/1808/24586
dc.description.abstract	Understanding the factors governing protein solubility is a key to grasp the mechanisms of protein solubility and may provide insight into protein aggregation and misfolding related diseases such as Alzheimer’s disease. In this work, we attempt to identify factors important to protein solubility using feature selection. Firstly, we calculate 1438 features including physicochemical properties and statistics for each protein. Random Forest algorithm is used to select the most informative and the minimal subset of features based on their predictive performance. A predictive model is built based on 17 selected features. Compared with previous models, our model achieves better performance with a sensitivity of 0.82, specificity 0.85, ACC 0.84, AUC 0.91 and MCC 0.67. Furthermore, a model using redundancy-reduced dataset (sequence identity <= 30%) achieves the same performance as the model without redundancy reduction. Our results provide not only a reliable model for predicting protein solubility but also a list of features important to protein solubility. The predictive model is implemented as a freely available web application at http://shark.abl.ku.edu/ProS/.	en_US
dc.publisher	Royal Society of Chemistry	en_US
dc.rights	© Royal Society of Chemistry	en_US
dc.subject	Protein solubility	en_US
dc.subject	Aggregation	en_US
dc.subject	Random Forest	en_US
dc.subject	Classification	en_US
dc.subject	Feature selection	en_US
dc.title	Discrimination of soluble and aggregation-prone proteins based on sequence information	en_US
dc.type	Article	en_US
kusw.kuauthor	Fang, Yaping
kusw.kuauthor	Fang, Jianwen
kusw.kudepartment	Molecular Structures Group	en_US
dc.identifier.doi	10.1039/c3mb70033j	en_US
kusw.oaversion	Scholarly/refereed, author accepted manuscript	en_US
kusw.oapolicy	This item meets KU Open Access policy criteria.	en_US
dc.identifier.pmid	PMC3627541	en_US
dc.rights.accessrights	openAccess

Files in this item

Name:: Fang_2013.pdf
Size:: 570.4Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.