Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis.


Abstract

The accurate prediction of protein stability upon sequence mutation is an important but unsolved challenge in protein engineering. Large mutational datasets are required to train computational predictors, but traditional methods for collecting stability data are either low-throughput or measure protein stability indirectly. Here, we develop an automated method to generate thermodynamic stability data for nearly every single mutant in a small 56-residue protein. Analysis reveals that most single mutants have a neutral effect on stability, mutational sensitivity is largely governed by residue burial, and unexpectedly, hydrophobics are the best tolerated amino acid type. Correlating the output of various stability-prediction algorithms against our data shows that nearly all perform better on boundary and surface positions than for those in the core and are better at predicting large-to-small mutations than small-to-large ones. We show that the most stable variants in the single-mutant landscape are better identified using combinations of 2 prediction algorithms and including more algorithms can provide diminishing returns. In most cases, poor in silico predictions were tied to compositional differences between the data being analyzed and the datasets used to train the algorithm. Finally, we find that strategies to extract stabilities from high-throughput fitness data such as deep mutational scanning are promising and that data produced by these methods may be applicable toward training future stability-prediction tools.

Submission Details

ID: 3xESLyS9

Submitter: Connie Wang

Submission Date: Sept. 10, 2019, 9:59 p.m.

Version: 2

Publication Details
Nisthal A;Wang CY;Ary ML;Mayo SL,Proc Natl Acad Sci U S A (2019) Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis. PMID:31371509
Additional Information

This is an updated version of study gwoS2haU3.

Structure view and single mutant data analysis

Study data

No weblogo for data of varying length.
Colors: D E R H K S T N Q A V I L M F Y W C G P
 

Data Distribution

Studies with similar sequences (approximate matches)

Correlation with other assays (exact sequence matches)


Relevant PDB Entries

Structure ID Release Date Resolution Structure Title
1EM7 2000-03-16T00:00:00+0000 2.0 HELIX VARIANT OF THE B1 DOMAIN FROM STREPTOCOCCAL PROTEIN G
1GB1 1991-05-15T00:00:00+0000 0 A NOVEL, HIGHLY STABLE FOLD OF THE IMMUNOGLOBULIN BINDING DOMAIN OF STREPTOCOCCAL PROTEIN G
1IGC 1994-08-05T00:00:00+0000 2.6 IGG1 FAB FRAGMENT (MOPC21) COMPLEX WITH DOMAIN III OF PROTEIN G FROM STREPTOCOCCUS
1IGD 1994-08-05T00:00:00+0000 1.1 THE THIRD IGG-BINDING DOMAIN FROM STREPTOCOCCAL PROTEIN G: AN ANALYSIS BY X-RAY CRYSTALLOGRAPHY OF THE STRUCTURE ALONE AND IN A COMPLEX WITH FAB
1LE3 2002-04-09T00:00:00+0000 0 NMR Structure of Tryptophan Zipper 4: A Stable Beta-Hairpin Peptide Based on the C-terminal Hairpin of the B1 Domain of Protein G
1MPE 2002-09-12T00:00:00+0000 0 Ensemble of 20 structures of the tetrameric mutant of the B1 domain of streptococcal protein G
1MVK 2002-09-25T00:00:00+0000 2.5 X-ray structure of the tetrameric mutant of the B1 domain of streptococcal protein G
1PGA 1993-11-23T00:00:00+0000 2.07 TWO CRYSTAL STRUCTURES OF THE B1 IMMUNOGLOBULIN-BINDING DOMAIN OF STREPTOCOCCAL PROTEIN G AND COMPARISON WITH NMR
1PGB 1993-11-23T00:00:00+0000 1.92 TWO CRYSTAL STRUCTURES OF THE B1 IMMUNOGLOBULIN-BINDING DOMAIN OF STREPTOCCOCAL PROTEIN G AND COMPARISON WITH NMR
1PGX 1992-04-03T00:00:00+0000 1.66 THE 1.66 ANGSTROMS X-RAY STRUCTURE OF THE B2 IMMUNOGLOBULIN-BINDING DOMAIN OF STREPTOCOCCAL PROTEIN G AND COMPARISON TO THE NMR STRUCTURE OF THE B1 DOMAIN

Relevant UniProtKB Entries

Percent Identity Matching Chains Protein Accession Entry Name
100.0 Immunoglobulin G-binding protein G P06654 SPG1_STRSG
100.0 Immunoglobulin G-binding protein G P19909 SPG2_STRSG