Protein model discrimination using mutational sensitivity derived from deep sequencing.


A major bottleneck in protein structure prediction is the selection of correct models from a pool of decoys. Relative activities of ∼1,200 individual single-site mutants in a saturation library of the bacterial toxin CcdB were estimated by determining their relative populations using deep sequencing. This phenotypic information was used to define an empirical score for each residue (RankScore), which correlated with the residue depth, and identify active-site residues. Using these correlations, ∼98% of correct models of CcdB (RMSD ≤ 4Å) were identified from a large set of decoys. The model-discrimination methodology was further validated on eleven different monomeric proteins using simulated RankScore values. The methodology is also a rapid, accurate way to obtain relative activities of each mutant in a large pool and derive sequence-structure-function relationships without protein isolation or characterization. It can be applied to any system in which mutational effects can be monitored by a phenotypic readout.

Submission Details

ID: oRfaxKZo

Submitter: Shu-Ching Ou

Submission Date: March 1, 2019, 4:11 p.m.

Version: 1

Publication Details
Adkar BV;Tripathi A;Sahoo A;Bajaj K;Goswami D;Chakrabarti P;Swarnkar MK;Gokhale RS;Varadarajan R,Structure (2012) Protein model discrimination using mutational sensitivity derived from deep sequencing. PMID:22325784
Additional Information

The ccdb gene from each pooled set of plasmids and from the master pool was amplified using a set of forward and reverse primers, which contained multiplex identifier (MID) sequences (10 bases long) unique for each condition. ∼122 ng for MID 1 and ∼61 ng each for MID 2–8 of purified PCR products.

Structure view and single mutant data analysis

Study data

No weblogo for data of varying length.
Colors: D E R H K S T N Q A V I L M F Y W C G P

Data Distribution

Studies with similar sequences (approximate matches)

Correlation with other assays (exact sequence matches)

Relevant UniProtKB Entries

Percent Identity Matching Chains Protein Accession Entry Name
100.0 Toxin CcdB Q52042 CCDB3_ECOLX
100.0 Toxin CcdB Q52043 CCDB4_ECOLX
100.0 Toxin CcdB P62555 CCDB_ECO57
100.0 Toxin CcdB P62554 CCDB_ECOLI
98.0 Toxin CcdB Q46996 CCDB2_ECOLX