The accurate prediction of protein stability upon sequence mutation is an important but unsolved challenge in protein engineering. Large mutational datasets are required to train computational predictors, but traditional methods for collecting stability data are either low-throughput or measure protein stability indirectly. Here, we develop an automated method to generate thermodynamic stability data for nearly every single mutant in a small 56-residue protein. Analysis reveals that most single mutants have a neutral effect on stability, mutational sensitivity is largely governed by residue burial, and unexpectedly, hydrophobics are the best tolerated amino acid type. Correlating the output of various stability-prediction algorithms against our data shows that nearly all perform better on boundary and surface positions than for those in the core and are better at predicting large-to-small mutations than small-to-large ones. We show that the most stable variants in the single-mutant landscape are better identified using combinations of 2 prediction algorithms and including more algorithms can provide diminishing returns. In most cases, poor in silico predictions were tied to compositional differences between the data being analyzed and the datasets used to train the algorithm. Finally, we find that strategies to extract stabilities from high-throughput fitness data such as deep mutational scanning are promising and that data produced by these methods may be applicable toward training future stability-prediction tools.
Submitter: Connie Wang
Submission Date: Sept. 10, 2019, 9:59 p.m.
This is an updated version of study gwoS2haU3.
|Number of data points||18861|
|Proteins||Immunoglobulin G-binding protein G|
|Assays/Quantities/Protocols||Experimental Assay: Cm ; Experimental Assay: m-value ; Experimental Assay: dG(H2O)_mean ; Derived Quantity: SD of ddG(deepseq)_Wu_median ; Derived Quantity: ddG(deepseq)_Wu_median ; Derived Quantity: ddG(deepseq)_Wu_mean ; Derived Quantity: ddG(deepseq)_Olson ; Derived Quantity: ddG_lit_fromOlson ; Derived Quantity: SD of ddG(mAvg)_mean ; Derived Quantity: ddG(mAvg)_mean ; Derived Quantity: SD of dG(mAvg)_mean ; Derived Quantity: dG(mAvg)_mean ; Derived Quantity: SD of dG(H2O)_mean ; Derived Quantity: SD of m-value ; Derived Quantity: SD of Cm ; Computational Protocol: Rosetta NoMin ; Computational Protocol: Rosetta SomeMin ; Computational Protocol: Rosetta SomeMin_ddG ; Computational Protocol: FullMin|
|Libraries||∆∆G Predictors ; Deep mutational scanning ∆∆G comparisons ; ∆∆G domain-wide mutagenesis|
|Structure ID||Release Date||Resolution||Structure Title|
|6L9D||2019-11-08T00:00:00+0000||1.73||X-ray structure of synthetic GB1 domain with mutations K10(DVA), T11S|
|6L9B||2019-11-08T00:00:00+0000||1.95||X-ray structure of synthetic GB1 domain with mutations K10(DVA), T11A|
|1MPE||2002-09-12T00:00:00+0000||0||Ensemble of 20 structures of the tetrameric mutant of the B1 domain of streptococcal protein G|
|1PN5||2003-06-12T00:00:00+0000||0||NMR structure of the NALP1 Pyrin domain (PYD)|
|3MP9||2010-04-26T00:00:00+0000||1.2||Structure of Streptococcal protein G B1 domain at pH 3.0|
|2IGH||1992-08-26T00:00:00+0000||0||DETERMINATION OF THE SOLUTION STRUCTURES OF DOMAINS II AND III OF PROTEIN G FROM STREPTOCOCCUS BY 1H NMR|
|2N7J||2015-09-12T00:00:00+0000||0||Sidechain chi1 distribution in B3 domain of protein G from extensive sets of residual dipolar couplings|
|6CNE||2018-03-08T00:00:00+0000||1.2||Selenomethionine variant (V29SeM) of protein GB1|
|2GB1||1991-05-15T00:00:00+0000||0||A NOVEL, HIGHLY STABLE FOLD OF THE IMMUNOGLOBULIN BINDING DOMAIN OF STREPTOCOCCAL PROTEIN G|
|2K0P||2008-02-11T00:00:00+0000||0||Determination of a Protein Structure in the Solid State from NMR Chemical Shifts|