Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization.


Abstract

There is growing interest in studying and engineering integral membrane proteins (MPs) that play key roles in sensing and regulating cellular response to diverse external signals. A MP must be expressed, correctly inserted and folded in a lipid bilayer, and trafficked to the proper cellular location in order to function. The sequence and structural determinants of these processes are complex and highly constrained. Here we describe a predictive, machine-learning approach that captures this complexity to facilitate successful MP engineering and design. Machine learning on carefully-chosen training sequences made by structure-guided SCHEMA recombination has enabled us to accurately predict the rare sequences in a diverse library of channelrhodopsins (ChRs) that express and localize to the plasma membrane of mammalian cells. These light-gated channel proteins of microbial origin are of interest for neuroscience applications, where expression and localization to the plasma membrane is a prerequisite for function. We trained Gaussian process (GP) classification and regression models with expression and localization data from 218 ChR chimeras chosen from a 118,098-variant library designed by SCHEMA recombination of three parent ChRs. We use these GP models to identify ChRs that express and localize well and show that our models can elucidate sequence and structure elements important for these processes. We also used the predictive models to convert a naturally occurring ChR incapable of mammalian localization into one that localizes well.

Submission Details

ID: uzNZojBD3

Submitter: Kevin Yang

Submission Date: Sept. 6, 2018, 10:45 a.m.

Version: 1

Publication Details
Bedbrook CN;Rice AJ;Yang KK;Ding X;Chen S;LeProust EM;Gradinaru V;Arnold FH,Proc Natl Acad Sci U S A (2017) Structure-guided SCHEMA recombination generates diverse chimeric channelrhodopsins. PMID:28283661
Bedbrook CN;Yang KK;Rice AJ;Gradinaru V;Arnold FH,PLoS Comput Biol (2017) Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization. PMID:29059183
Additional Information

Structure view and single mutant data analysis

Study data

No weblogo for data of varying length.
Colors: D E R H K S T N Q A V I L M F Y W C G P
 

Data Distribution

Studies with similar sequences (approximate matches)

Correlation with other assays (exact sequence matches)