Help

General Advice

PDBMD2CD is the updated version of our web server for the calculation of Circular Dichroism spectra of proteins, given their atomic coordinates. It uses a combination of a least squares approach and linear fitting of reference spectra of known secondary structure. The webserver accepts pdb codes, pdb files or compressed archives of pdb files as input. Multiple structures or codes can be submitted at once. NMR structures can also be analysed on a model-by-model basis.

Example run

The button below will run a prediction for 383 structures that include a lysozyme molecule - this total includes NMR models which will have their models assessed individually. The input can be found here.

When you get to the results page, the Compare to Experiment tool has an example spectrum file available - this spectrum is of hen egg lysozyme in water. Try it with the above example data.

Server Input

- Filename: If you have your own PDB file/s or archive (.tar.gz/.zip/.bz2) of PDB files, click on browse and upload your file to the server.

- Give a PDB Code/s: Provide the 4 digit pdb code/s of the protein/s you want to predict. To submit mutiple codes, enter each pdb code seperated by a comma like so: "XXXX, YYYY, ZZZZ".

- Split NMR models?: Tick this box to analyse each model of NMR structures (or any PDB with multiple MODEL entries - e.g. a PDB MD trajectory created using GROMACS) individually. If left unticked, the first model in the file will be used in the analysis. The individual models split out of the NMR structure will be analysed as PDB format files with their model number (starting from 0) appended to the file name i.e. XXXX.pdb with 10 models will be referred to as XXXX_0.pdb, XXXX_1.pdb... etc.

If you have many structures to process, for best performance upload an archive of files rather than a list of codes. The biggest bottleneck in speed for PDBMD2CD is fetching files from the RCSB.

Results Page

The results page has three tabs - "Results", "Clustering" and "Compare to Experiment".

Results Tab

This is the main results page for your job. It is split into four panes:

- Top Left: A plot of the predicted spectra. If more than one structure is submitted, the individual spectra are shown in blue and the mean spectra in red. Mousing over any of the spectra will show you the name of structure it refers to and the wavelength and CD signal at that point. Tools can be found on the right of the plot to save the figure as a PNG, zoom, pan and reset the plot view. Below the plot is a link to download a ZIP file containing the spectra for all the files uploaded as well as the mean spectra of the ensemble.

- Top Right: An interactive 3D representation of the structure with the predicted spectra closest to the mean spectra is shown.

- Bottom Left: A summary of the run is shown e.g. number of files, date, average RMSD. The average secondary structure makeup of the uploaded structures is also tabulated. A link to the representative structure (i.e. the structure that has a spectra closest to the mean of the ensemble) is also given. If only one structure is submitted, the representative structure will have the same spectra as the mean (unsuprisingly).

- Bottom Right: The secondary structure map of the reference data set is shown as blue circles on the scatter plot and the input proteins as orange circles. Mousing over any of the data points will show you the name of the structure and its alpha/beta content as percentages. The plot can be saved as a PNG from the tool menu to the right of the plot.

Clustering tab

If >50 structures are uploaded, we provide a tool for clustering the predictions using the k-means algorithm. This will partition the data into k clusters by minimising the distortion , which refers (in our chosen implementation) to the average Euclidean distance between the observations and their corresponding centroids. This can allow for the identification of different groups in your data which correspond to structural differences in the data e.g. ligand-bound or apo protein, or folded vs unfolded populations.

To assist the user in picking a value of k, we provide an elbow plot which plots the distortion for values of k 1 through 6. Observation of a clear "elbow" in the shape of the plot suggests that the value of k where this occurs might be a good number of clusters to choose for the analysis. Conversely, an elbow plot with not clear elbow, or that is linear throughout, suggest the data is not suitable for clustering by k in the range 1 to 6. We would like to stress that clustering can be a complicated analysis which is very sensitive to your specific dataset, and it is easy to ascribe meaning to meaningless groupings - so examine the clusters this tool produces and ensure that the information you are getting out makes sense in light of your specific data and predictions.

At the very top of this tab, a drop down menu is available which allows you to select the value of k you are interested in. Changing this value will automatically change the information displayed below.

Below the dropdown, the CD spectra of the predictions can be seen, coloured according to the cluster they belong to. Mousing over the traces will provide information on the data shown. To the right of this plot the elbow plot described above is shown. The current k selection is indicated by a red circle on this plot.

Below these plots, information about individual clusters will be shown. Each cluster info panel will have a title indicating which cluster it refers to. On the left hand side a plot of the predicted spectra belonging to the cluster is shown, coloured according to the color scheme used in the CD spectra plot above. The mean prediction will also be shown on this plot. As with the main results page, statistics about the cluster members including average secondary structure content is shown. To the right an interactive 3D visualisation of the structure with prediction closest to the mean prediction is shown. For k>1, subsequent cluster information panels will be shown in list form, one after another down the page. The clusters will always be listed in size order i.e. the cluster with the largest number of members will be shown first. For each cluster, the predictions and cluster members can be downloaded as a .csv file.

Compare to Experiment

This page allows comparisons to be made between the predicted spectra and an uploaded spectrum that a user may provide. The user file can be uploaded in a two-column format (Wavelength and CD data), but many of the standard text-based output formats from the main CD manufacturers and SRCD beamlines can be interpreted. Uploaded spectra can be in either Delta Epsilon units (as default) or as Mean Residue Ellipticity, (chosen by selecting a radio button) where in this case the package converts this into Delta Epsilon units, as the predictions are generated in these units.

The RMSD between the experimental and each of the predicted CD spectra is used as the measure for comparison. The user may choose between an RMSD threshold or a number of predicted spectra with which to generate a representative set closest to the experimental spectrum. By default, this RMSD value (which may be modified by the user), is initially set at 0.5, or to half the total range if the maximum RMSD value is smaller than this value. Whichever comparison approach has been chosen, the other associated value will be updated as individual comparisons are made. Each comparison made will generate a subset of the predicted spectra as a result.

To the left below this section is the plot displaying the currently chosen subset of spectra together with the experimental spectrum in red, the mean prediction of the subset in blue and the closest predicted spectrum to the experimental spectrum in green. To the right of this is a pane showing a Histogram plot of the distribution of predicted spectra RMSD values from the experimental. This can be displayed in either of two forms; a histogram representing the counts of spectra as binned blocks of the RMSD of the predicted spectra from the experimental spectrum, or as the cumulative sum of these counts. A solid red line indicates the position of the maximum RMSD value in the currently selected subset. Moving the cursor over the plot generates a dashed red line indicating with the values of RMSD, count and cumulative count being displayed as the cursor moves, allowing a user to select a new threshold RMSD value for subset selection. Clicking the left mouse button then generates the new comparison position and the solid red line moves to that new position.

Below this pane are detailed the experimental file name; the number of members within the current subset of spectra being displayed; the matching maximum RMSD threshold for that subset; the mean RMSD of the subset; the name of the structure file within the subset from which the closest predicted spectrum to the experimental is derived; the closest RMSD of this spectrum; and the furthest structure away in this subset generating a predicted spectrum; and its related RMSD value. To the left of this data, and below the plot of this subset, is the mean 7-state secondary structure information of the current subset as a percentage, with its associated standard deviation. The names and predicted spectra of the structures in the subset can be downloaded as a .csv file.

Cite

If you use PDBMD2CD, please cite: TBD

In using PDB2CD please cite: Mavridis, L,. and Janes, R.W.. (2017) PDB2CD : A web-based application for the generation of circular dichroism spectra from protein atomic coordinates. Bioinformatics 33: 56-63.

Contact

Dr. Robert W. Janes - r.w.janes@qmul.ac.uk

Dr. Elliot D. Drew - e.drew@qmul.ac.uk

Back to start