Start a new submission by selecting Submission Tools then clicking on the Create a new submission icon on your Home page. The user first selects whether the data to be submitted is from published or unpublished work, then enters basic information on the study (title, authors, year, abstract, etc.). The Summary/Links page is then displayed, which summarizes the information entered so far and provides links to additional entry forms. These forms are used to enter details on the protein that was mutated, the type(s) of assay data and quantities reported, and the mutant sequences and their data values.
Data entry forms must be filled out that specify:A SAVE button is provided at the bottom of each data entry page, which saves the entries for that page and returns you to the Summary/Links page. If errors are detected or a required field is not filled out, the user is warned and/or directed to the missing item that must be completed before the page can be saved.
Details on how to create a CSV file, fill out each of the data entry forms, determine whether a reported quantity is experimental, derived, or computational etc. are described in Sections B and C.
Note: Only entry fields marked "required" must be filled in. However, we strongly recommend completing the other fields as well.
User interfaceSummaries of the information entered so far are displayed on the Summary/Links page so that you can check that your entries are correct. As described above, the Edit and Delete icons at the right allow you to make any needed changes or delete the associated entries entirely, and the View icon allows you check mutant library data. The total # of Data Points for each data set/library is also given so that you can confirm that all the data was included.
Editing in progress submissionsYou can leave ProtaBank at any time during the submission process and return later to finish the entries or to make changes (all the saved entries are retained). Just click the See my in progress submissions icon under Submission Tools on your Home page, then select the desired Study ID from those listed. You can also delete an in progress submission by clicking the Delete icon in the In Progress Submissions list.
Submitting the study, specifying when it will be available to the publicAfter completing and reviewing the entries for all the data sets, you can submit your study by clicking the Submit Study to Database button at the bottom of the Summary/Links page. You will be asked to specify the date the study will be publicly available. This date can be a maximum of six months from the current date. If you want the embargo to extend beyond this period, contact ProtaBank support.
Validating the dataAutomated tests are then performed to ensure data integrity, and the submitter is immediately warned if any errors are detected. ProtaBank developers also check studies manually and send potential errors back to the submitter for review. If needed, you can provide the developers with feedback or additional information. Once studies pass these validation and curation steps, they are included in the ProtaBank database and made available to other users for viewing, searching, etc.
Note: Once submitted, you can't make revisions to a study.
Before filling out the entry forms, it is useful to organize your data (i.e., decide how many data sets will be entered and what information will be included in each), and get it into the proper format (i.e., prepare a CSV file for each data set).
Identifying the data setsMany studies describe their results in a set of tables, with each table reporting results from a different set of assays or from experiments or analyses designed to answer different questions. Frequently, data is obtained for a large set of variants, and then additional or more extensive experiments are carried out on a subset of these (e.g., the hits). Thus, it often makes sense to provide the data in each table in a separate data set or library. You should decide on the data sets you will be submitting because a separate entry form must be filled out for each.
The reported quantitiesThe reported quantities are protein properties that were: (1) obtained experimentally (raw data or parameters fit to raw data), (2) obtained from computational modeling/simulations, or (3) derived from other reported quantities (e.g., via subtraction or division). Typically, each quantity corresponds to one of the columns in a results table or spreadsheet and must be matched with the data in your CSV file. An entry form must be filled out for each quantity to specify the property and to provide details on the experimental techniques and conditions used, to describe the computational protocols employed, or to indicate how the quantity was derived. This information will be useful in comparing and analyzing ProtaBank data.
Preparing the CSV file(s)Tabular data stored in a spreadsheet can be saved in CSV (comma separated values) file format for upload to ProtaBank. Unless you only have a handful of variants, a CSV file is typically preferred over entering the data manually. A separate CSV file should therefore be created for each data set.
File format: A new line is used for each mutant sequence, with each of the data values separated by a comma. Typically the mutant is specified first (in column 1), followed by its associated experimental assay/computational protocol/derived data (in columns 2, 3, 4, etc.). You may keep comments, labels, headers etc. in your CSV file in additional columns.
Data can be numerical, given as a range or limit (e.g., 20–30, >99), or qualitative (e.g., text such as "unfolded" can be used to indicate that the protein was unfolded, "ND" or "NA" can be entered to indicate that a value was not determined or does not apply, etc.). This type of "negative data" provides information and is encouraged (as opposed to leaving the data field blank). If your data includes standard errors or standard deviations, these should be entered in a separate data column in the CSV file. See FAQs for details and CSV file examples.
Information on the study is requested when creating a new submission. You must select one of two buttons to specify whether the data being submitted is from a published study or from unpublished work, then fill in the details on the data entry form that comes up.
For published studies, you can enter the PubMed ID then click the Fetch Publication Details by ID button to have all the details entered automatically. Or you can fill in each of the fields by hand (Title, Authors, Journal, Year, Abstract, etc.). The Abstract should describe the major goals and results of the study.
For unpublished work, a title is required, along with the investigators who worked on the study, the laboratory or organization where the work was done, and the date when the data was collected. An abstract is recommended.
A Study ID is automatically assigned, and the Submission Date, Submitter (login name), and Version are recorded. When the study is submitted, the Version will become 1; prior to that, it reads "not submitted." Submission of the study allows it to be searchable in the database.
Protein DetailsUse the Add Protein link to bring up the entry form describing the protein that was mutated. If you enter the UniProt ID or the PDB ID in the field at the top, you can then click the appropriate Fetch by … button to have the rest of the details entered automatically. Or you can fill in the fields by hand. The common name of the protein and/or the domain or fragment that was engineered should be entered under Protein Name. The Organism refers to the species the protein is found in (not the host it may have been expressed in). If one or more protein structures were used in the design, enter the PDB ID and optional Chain identifier for each, separated by commas (no spaces). The UniProt accession number and protein Sequence can be included if desired, but are not needed.
Note: If you use the UniProt ID to fetch the data, all the PDB IDs associated with the protein will be listed; we just want those used in your study, so the rest should be deleted. Also, the fetched sequence will be for the whole protein. If your mutations were done on a particular chain, domain, or fragment, then just the sequence for that portion should be listed.
Sequence Display. If a sequence was entered in the Sequence field, it will also be displayed below in a form that makes it easier to check for mistakes (with numbers indicating residue positions and residues color-coded by amino acid type); a grey slider bar is provided that can be moved to view different segments along the sequence.
Multiple proteins: If you are submitting mutation data for more than one protein in this study, you should fill out a separate entry form for each. The mutational data for each protein must also be specified in a separate data set.
Expression DetailsClicking the icon under Expression Details brings up a form for entering this type of information. Details on what's needed in each field are provided in the hover text. The only required field is DNA Sequence, which lists the DNA sequence of the expressed protein sequence including tags. The DNA sequence can be entered automatically by entering the GenBank accession number in the field at the top and clicking Fetch by GenBank Acc. No. The DNA sequence will also be displayed at the bottom in a numbered, color-coded format that makes it easier to check for mistakes, similar to that described above for protein sequences.
Although useful, this information is currently not required by ProtaBank search and analysis tools, which are based solely on the protein sequence (without any expression or other tags attached).
Assays (Experimental, Computational, or Derived)Use the Add Assay link to bring up the entry form describing the quantity being submitted, which includes the property reported, as well as the units, source, techniques, and conditions used to obtain the quantity. A separate form must be filled out for each quantity submitted. The forms differ somewhat depending on the source (experimental measurement, computational/simulated result, or derived from a previously reported assay). Fill out assay forms for experimental and computational/simulated quantities first, as this information will be required to specify any derived quantities.
For all entry forms, a Fill in Details from Previously Entered Assay field appears at the top that lists all the assays already entered for the study (if any). This can be used in conjunction with the Fetch Assay Details button to autofill the fields with information from a previously entered assay, which can be useful if the information overlaps significantly.
A separate entry form must be filled out for each data set. Links to two different entry forms are provided. The Add/Edit/Remove Individual Sequence Data link is intended primarily for de novo designed sequences; the entire sequence is listed for each individual sequence (referred to as individual sequence data). The Add Mutational Data link is for mutants obtained by mutating a given protein (referred to as mutational data). In this case, you can just list the amino acids for the position(s) that were mutated; the entire sequence for each of the mutants can be entered instead if desired.
Generating a CSV file template. ProtaBank can automatically generate a comma separated values (CSV) file template from the assay information that you have entered (the Name of each assay heads a column). Click the here link in the Mutational Data section on the Summary/Links page to download the template.
Add Mutational Data link. Entries are required for all fields on this entry form; details on what's needed in each field are provided in the hover text.
Fields include: what's in the data set (Description); the name of the Protein mutated; the sequence that was mutated (Starting Sequence, not always WT); the Syntax used to specify the mutant sequences; if the Range or List syntax is used, the positions mutated (Mutated Residues); and the Index of the Initial Residue.
Additional fields are provided that allow you to specify the actual mutant sequences and their associated data values. These fields vary depending on whether you are uploading your data from a CSV file (recommended) or entering it manually. Click the Upload a CSV file button or the Manually input mutant data button to display these fields.
The CSV file is uploaded using the Choose File button. The fields at the bottom (Column and Assay/Derived Quantity/Protocol) are then used to match the columns in the CSV file to the appropriate quantity (Name of the experimental assay, derived quantity, or computational protocol) entered previously. Click on add another as needed to specify the data in each column. If a CSV template file was used without modification (assay names are in same columns), you can use the Fetch Template Description button to populate these fields automatically.
Checking for errors. Clicking SAVE at the bottom of the input form brings up a Library Details window that summarizes the data entered so you can check for errors. This includes a library data table for the first 100 variants, which lists the mutations (Mutant Description), data value (Data), Units, name of the assay (Assay), and Full Sequence for each variant. If errors are detected, you can return to the input form by clicking on the continue editing link at the top of the window. If no errors are noted, click the Finished, Return to Study Page link in the top right corner.