PyHDX web application#
This section will describe a typical workflow of using the main web interface application. Detailed information on each parameter can be found in the web application reference docs Web Application Reference. The web application consists of a sidebar with controls and input, divided into sections, and a main view area with graphs and visualization. We will go through the functionality of the web interface per section.
Peptide d-uptake data can be added into the web application either in ‘Batch’ or ‘Manual’ mode. Use Input Mode to switch between input modes.
When using ‘Batch’, multiple measurements can be added quickly through an yaml file specification. An example of an yaml file can be found in ‘tests/test-data’ on GitHub.
When using ‘Single’, each measurement and experimental metadata must be input manually. Use the Browse button to select peptide data files to upload. These should be ‘peptide master tables’ which is long format data where each entry should at least have the entries of:
start (inclusive residue number at which the detected peptide starts, first residue = 1)
stop (inclusive residue number at which the detected peptide stops)
sequence (sequence of the peptide in one letter amino acid codes)
exposure (time of exposure to deuterated solution)
uptake (amount/mass (g/mol) of deuterium taken up by the peptide)
state (identifier to which ‘state’ the peptide/protein is in (ie ligands, experimental conditions)
Currently the only data format accepted is exported ‘state data’ from Waters DynamX, which is .csv format. Exposure time units is assumed to be minutes. Other data format support can be added on request (eg HDExaminer).
Multiple files can be selected after which these files will be combined. Make sure there are no overlaps/clashes between ‘state’ entries when combining multiple files.
Choose which method of back-exchange correction to use. Options are either to use using a fully deuterated sample or to set a fixed back-exchange percentage for all peptides. The latter method should only be used if no FD sample is available. A percentage to set here can be obtained by running a back-exchange control once on your setup.
When selecting FD Sample, use the fields FD State and FD Exposure to choose which peptides from the input should be used as FD control. Note that these peptides will be matched to the ones in the experiment and peptides without control will not be included.
Use the fields Experiment State to choose the ‘state’ of your experiment. In ‘Experiment Exposures’ you can select which exposure times to add include the dataset.
In the Drop first’` entry the number of N-terminal residues for each peptides can be chosen which should be ignored when calculating the maximum uptake for each peptide as they are considered to fully exchange back. Prolines are ignored by default as they do not have exchangeable amide hydrogens.
Next, specify the percentage of deuterium in the labelling solution in the field Deuterium percentage. This percentage should be as high as possible, typically >90%.
Use the fields Temperature (K) and pH read to specify the temperature and pH at which the D-labelling was done. The pH is the value as read from the pH meter without any correction.
The next fields N term and C term specify the residue number indices of the N-terminal and C-terminal residues, respectively. For the N-terminal this value is typically equal to 1, but if N-terminal affinity tags are used for purification this might be a negative number. The value specified should match with the residue indices used in in the input .csv file. The C-term value tells the software at which index the C-terminal of the protein is, as it is possible that the protein extends beyond the last residue included in any peptide and as the C-term exhibits different intrinsic rates of exchanges this needs to be taken into account. A sequence for the full protein (in the N-term to C-term range as specified) can be added to provide additional sequence information, but this is optional.
Finally, specify a name of the dataset, by default equal to the ‘state’ value and press ‘Add dataset’ to add the dataset. Datasets currently cannot be removed, if you want to remove datasets, press the browser ‘refresh’ button to start over.
The ‘Coverage’ figure in the main application area rectangles show corresponding to the peptides of a single timepoint. Peptides are only included if they are in both all the timepoints as well as in the fully deuterated control sample.
By hovering the mouse over the peptides in the graph, more information is shown about each peptide:
peptide_id: Index of the peptide per timepoint starting at the first peptide at 0
start, end: Inclusive, exclusive interval of residue numbers in this peptide (Taking N-terminal resiudues into account)
RFU: Relative fraction uptake of the peptide
D(corrected): Absolute D-uptake, corrected by FD control
sequence: FASTA sequence of the peptide. Non-exchanging N-terminal reisues marked as ‘x’ and prolines in lower case.
As a first step in the fitting procedure, initial guesses for the exchange kinetics need to be derived. This can be done through two options (Fitting model): ‘Half-life’ (fast but less accurate), or ‘Association’ (slower but more accurate).
Using the ‘Association’ procedure is recommended. This model fits two time constants the the weighted-averaged uptake kinetics of each residue. At Lower bound and Upper bound the bounds of these rate constants can be specified but in most cases the autosuggested bounds are sufficient. The bounds can be changed per dataset by using the Dataset field or for all datasets at the same time by ticking the Global bounds checkbox. Rarely issues might arise when the initial guess rates are close to the specified bounds at which point the bounds should be moved to contain a larger interval. This can be checked by comparing the fitted rates k1 and k2 ( ) Both rates and associated amplitudes are converted to a single rate value used for initial guesses. To calcualte guesses, select the model in the drop-down menu, assign a name to these initial guesses and the press ‘Calculate Guesses’. The fitting is done in the background. When the fitting is done, the obtained rate is shown in the main area in the tab ‘Rates’. Note that these rates are merely an guesstimate of HDX rates and these rates should not be used for any interpretation whatsoever but should only function to provide the global fit with initial guesses.
After the initial guesses are calculated we can move on the the global fit of the data. Details of the fitting equation can be found the PyHDX publication (currently `_ACS`_).
At Initial guess, select which dataset to use for initial guesses (typically ‘Guess_1’). Both previous fits (ΔG values) or estimated HX rates can be used as initial guesses. The initial guesses can be applied as ‘One-to-one’, where each protein state gets initial guesses derived from that state, or ‘One-to-many’, where one protein state is use as initial guesses for all states. Users can switch between both modes using Guess mode.
At Fit mode, users can choose either ‘Batch’ or ‘Single’ fitting. If only one datasets is loaded, only ‘Single’ is
available. If ‘Single’ is selected, PyHDX will fit ΔG values for each datasets individually using the specified settings.
In ‘Batch’ mode all data enters the fitting process at the same time. This allows for the use of a second regularizer
between datasets. Note that when using ‘Batch’ mode, the relative magnitudes of the Mean Squared error losses and
regularizer might be different, such that ‘Batch’ fitting with
r2 at zero is not identical to ‘Single’ fits.
The fields Stop loss and Stop patience control the fitting termination. If the loss improvement is less than Stop loss for Stop patience epochs (fit iterations), the fitting will terminate. Learning rate controls the step size per epoch. For typical a dataset with 62 peptides over 6 timepoints, the learning rate should be 50-100. Smaller datasets require larger learning rates and vice versa.
Momentum and Nesterov are advanced settings for the Pytorch
The maximum number of epochs or fit iterations is set in the field Epochs.
Finally, the fields Regualizer 1 and Regualizer 2 control the magnitude of the regualizers. Please refer
to our `_ACS`_ publication for more details. In short,
r1 acts along consecutive residues and affects as a ‘smoothing’
along the primary structure. Higher values give a more smoothed result. This prevents overfitting or helps avoid problems
in the ‘non-identifiability’ issue where in unresolved (no residue-level overlap) regions the correct kinetic components
can be found (ΔGs of residues given correct choice of timepoints) but it cannot confidently be assigned to residues as
resolution is lacking. The regualizer r1 biases the fit result towards the residue assignment choice with the lowest
variation along the primary structure. Typical values range from 0.01 to 0.5, depending on size of the input data.
r2 acts between samples, minimizing variability between them. This is used in differential HDX where users are interested in ΔG differences (ΔΔG). When measuring HD exchange with differing experimental conditions, such as differences in peptides detected, timepoints used or D-labelling temperature and pH, the datasets obtained will have different resolution, both ‘spatially’ (degree of resolved residues) and ‘temporally’ (range/accuracy of ΔGs). This can lead to artefactual differences in the final ΔΔG result, as features might be resolved in out dataset and not in the other, which will show up as ΔΔG. The penalty from r2 can be calculated either with respect to a selected reference state (
Specify a unique name at Fit name and press Do Fitting do start the fit. The Info log in the bottom right corner displays information on when the fit started and finished. The fitting runs in the background and multiple jobs can be executed at the same time when processing multiple protein states with Fit mode set to ‘Single’. However, please take into account that these fits are computationally intensive and currently if multiple users submit too many jobs it might overwhelm our/your server.
The output ΔG values are shown in the ‘ΔG’ graph.
See also the Fitting example section for more details on fitting and the effect of regualizers.
This control panel can be used to generate differential HDX datasets. Select the fit to use with Fit_ID, then choose which state should be the reference state with Reference state. Assign a name to the new comparison and then click Add comparison to calculate ΔΔG values. The values are calculated by taking each state and subtracting the reference from them (Test - Reference). Therefore if the test if more flexible (lower ΔG) compared to the test, ΔΔG value are negative and appear on the top of the ΔΔG figure, by default colored green. Rigids parts are colored purple and are on the bottom of the graph. (note that the y axis is inverted as for the ΔG figure) When adding a comparison, ΔRFU values are automatically calculated, independent of the selected Fit_ID
The color transform panel can be used to update color transforms for each data quantity (rfu, drfu, dG, ddG). Select which quantity to update with Target Quantity. When selecting data quantities, the name of the current color map is shown below the selector.
Mode can be used to select between the available color modes; Colormap, Continuous and Discrete. Discrete
splits the ΔG values in n categories, which are all assigned the same color. When using Continuous, n color ‘nodes’ can be
defined, where color values are interpolated between these nodes. Color map allows users to choose a colormap from either the
PyHDX defaults, user defined color maps, or from
The number of categories can be set with Number of colours. When using Discrete coloring, the thresholds of the categories can be automatically determined by pressing the Otsu button (using Otsu’s method). Use the button Linear to distribute threshold values automatically with equal distances between them, and the extrema at the largest/smallest data values. A color for residues which are covered by peptides can be chosen at No coverage.
Assign an unique name using Color transform name and press Update color transform to create or update the color transform.
The colors for the color groups or nodes can be chosen at the bottom of the controllers, as well as the exact position of the thresholds. These values must be input such that they are always in decreasing order.
Selected datasets can be directly visualized on a protein structure using the built in PDBeMolStar protein viewer. Use the selector Input mode to either directly download a PDB file from the RCSB PDB (specify Pdb id) or to upload a local .pdb file from your computer.
The Table selector can be used to choose which of the data tables to use to assign colors to the 3D structure (RFU, ΔRFU, ΔG or ΔG values). Visual Style and Lighting can be used to tweak the appearance.
Use the buttons and menu on the protein viewer itself to export the current image to .png format.
This section is used to control which dataset is currently show in the graphs. Use the selector Fit id to switch between fit results. The selector State name is used to switch between experimental states and exposure to switch between exposure times. The selector peptide_id is used to choose which peptide uptake curve and corresponding fit to show in the Peptide graph. All corresponding graphs and selector options will update when changing these settings, including the protein view.
We can use these control to inspect the quality of the fit obtained. First, at Losses (bottom right) the progress of the fit can be inspected. This should show a rapid decrease of the ‘mse’ loss, followed by a mostly flat plateau. If this is not the case, extend the number of epochs (epochs or stop_loss and Stop patience) or increase Learning rate.
The graph ‘Peptide MSE’ shows the total mean squared error of all timepoints per peptide. The color scale adjust automatically so yellow colors do not necessarily reflect a poor fit, but highlight the worst fitted peptides in your dataset. Hover over the peptide with the mouse to find the index of the peptide and select the peptide with Peptide index.
All tables which underlie the graphs in the PyHDX web application can be downloaded directly. Choose the the desired dataset with Target dataset. The data can be exported in machine-readable .csv files or human-readable .txt (pprint) file by setting Export format. Make sure to download at least the .csv file for further.
When selecting a dataset with an assigned color transform, the data can not only be download as a .csv file but also as (a zip file of) .pml files which contain pymol scripts to directly apply the color map to a structure in pymol, or as .csv/.txt files with hexadecimal color codes.
This panel can be used to export publication quality figures of ΔG or ΔΔG values. Figure options are scatterplot, linear bars or rainbowclouds and export filetypes can be .png, .pdf, .svg or .eps.
Use the selector Reference to set a reference state. This will then export the figures with ΔΔG values. If set to None, figures are exported with ΔG values.
Some parameters of the output figure format (number of columns, aspect ratio, figure width) can be tuned before generating the figure.
From here .zip files can be downloaded which contain all underlying data tables used in the current view. Click Export session to generate the .zip file. This file can then later be uploaded to recover the current session. At the moment, this only reproduces the data in the figures. It is not possible to calculate additional ΔG fits after reloading a session. However, exporting figures is possible. Use Browse, select your PyHDX session .zip file and click Load session to reload your session.
The button Reset session can be used to clear all data. But its probably better to just use the refresh button in the browser (F5).