PyHDX basics#
[1]:
from pyhdx import read_dynamx, HDXMeasurement
from pyhdx.process import filter_peptides, apply_control, correct_d_uptake
from pyhdx.plot import peptide_coverage
import proplot as pplt
from pathlib import Path
C:\Users\jhsmi\Miniconda3\envs\py38_pyhdx_01\lib\site-packages\tqdm\auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
We can use the read_dynamx
function to read the input DynamX state file. In these fildes, exposure times in the .csv file are in units of minutes, which are converted to seconds using the time_conversion
argument.
Any space in column names are replaced with underscores and an additional column ‘stop’ is added, which is equal to 'end' + 1
such that python-standard inclusive: exclusive
interval indexing can be used.
This function returns a pandas
DataFrame where each entry corresponds to one peptide, in this example 567 peptides.
[2]:
fpath = Path() / '..' / '..' / 'tests' / 'test_data' / 'input' / 'ecSecB_apo.csv'
data = read_dynamx(fpath, time_conversion = ("min", "s"))
data.head()
[2]:
protein | start | end | stop | sequence | modification | fragment | maxuptake | mhp | state | exposure | center | center_sd | uptake | uptake_sd | rt | rt_sd | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Accession | 9 | 17 | 18 | MTFQIQRIY | NaN | NaN | 8 | 1199.6241 | Full deuteration control | 0.00 | 1200.412304 | 0.001798 | 0.000000 | 0.000000 | 5.510866 | 0.003846 |
1 | Accession | 9 | 17 | 18 | MTFQIQRIY | NaN | NaN | 8 | 1199.6241 | Full deuteration control | 10.02 | 1205.485704 | 0.019962 | 5.073400 | 0.020042 | 5.519758 | 0.002944 |
2 | Accession | 9 | 17 | 18 | MTFQIQRIY | NaN | NaN | 8 | 1199.6241 | SecB WT apo | 0.00 | 1200.411174 | 0.023301 | 0.000000 | 0.000000 | 5.513296 | 0.010889 |
3 | Accession | 9 | 17 | 18 | MTFQIQRIY | NaN | NaN | 8 | 1199.6241 | SecB WT apo | 10.02 | 1202.897618 | 0.016323 | 2.486444 | 0.028450 | 5.513900 | 0.010976 |
4 | Accession | 9 | 17 | 18 | MTFQIQRIY | NaN | NaN | 8 | 1199.6241 | SecB WT apo | 30.00 | 1203.268315 | 0.029992 | 2.857141 | 0.037979 | 5.510626 | 0.008419 |
This dataframe contains the peptides for both the fully deuterated control example as well as the experimental peptides measured over multiple deuterium exposure times. The filter_peptides
function can be used for to separate out peptides by their ‘state’ and ‘exposure’ fields.
[3]:
# Filter out peptides for the full deuteration control
fd_df = filter_peptides(data, state="Full deuteration control", exposure=60*0.167)
fd_df['state'].unique(), fd_df['exposure'].unique()
[3]:
(array(['Full deuteration control'], dtype=object), array([10.02]))
Additionally, the query
keyword argument can be used to pass a list of additional queries to be applied to the DataFrame. The query strings are passed to pandas.DataFrame.query and the example below removes all 0 exposure times peptides.
[4]:
peptides = filter_peptides(data, state="SecB WT apo", query=['exposure > 0.'])
peptides['state'].unique(), peptides['exposure'].unique()
[4]:
(array(['SecB WT apo'], dtype=object),
array([ 10.02 , 30. , 60. , 300. , 600. ,
6000.00048]))
Next, the FD control is applied with the apply_control
function. This takes the intersection of peptides between the experimental and control peptides and subsequently adds the columns: ‘fd_uptake’, ‘fd_uptake_sd’, ‘nd_uptake’ , ‘nd_uptake_sd’, ‘rfu’ and ‘rfu_sd’. The ‘rfu’ fields are Relative Fractional Uptake and are caluculated as:
Where \(U(t)\) are deuterium uptake values per peptide as a function of their D-exposure time. Supplying a Non-deuteration control is optional, and is otherwise assumed to be zero.
[5]:
peptides_control = apply_control(peptides, fd_df, nd_control=None)
peptides_control.head()
[5]:
start | end | stop | sequence | state | exposure | uptake | maxuptake | fd_uptake | fd_uptake_sd | ... | protein | modification | fragment | mhp | center | center_sd | uptake_sd | rt | rt_sd | rfu sd | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 9 | 17 | 18 | MTFQIQRIY | SecB WT apo | 10.02 | 2.486444 | 8 | 5.0734 | 0.020042 | ... | Accession | NaN | NaN | 1199.6241 | 1202.897618 | 0.016323 | 0.028450 | 5.513900 | 0.010976 | 0.005932 |
1 | 9 | 17 | 18 | MTFQIQRIY | SecB WT apo | 30.00 | 2.857141 | 8 | 5.0734 | 0.020042 | ... | Accession | NaN | NaN | 1199.6241 | 1203.268315 | 0.029992 | 0.037979 | 5.510626 | 0.008419 | 0.007809 |
2 | 9 | 17 | 18 | MTFQIQRIY | SecB WT apo | 60.00 | 3.145738 | 8 | 5.0734 | 0.020042 | ... | Accession | NaN | NaN | 1199.6241 | 1203.556913 | 0.040432 | 0.046666 | 5.516562 | 0.012371 | 0.009519 |
3 | 9 | 17 | 18 | MTFQIQRIY | SecB WT apo | 300.00 | 3.785886 | 8 | 5.0734 | 0.020042 | ... | Accession | NaN | NaN | 1199.6241 | 1204.197061 | 0.008898 | 0.024942 | 5.505509 | 0.004363 | 0.005732 |
4 | 9 | 17 | 18 | MTFQIQRIY | SecB WT apo | 600.00 | 4.082950 | 8 | 5.0734 | 0.020042 | ... | Accession | NaN | NaN | 1199.6241 | 1204.494124 | 0.027490 | 0.036037 | 5.522181 | 0.011556 | 0.007782 |
5 rows × 23 columns
Next, the measured d-uptake is corrected for backexchange and the percentage of deuterium in solution. For this, the amount of N-terminal residues which are assumed to fully back exchange must be specified (typically best set to 2).
This step modified the ‘sequence’ field by marking back-exchanging amino acids with ‘x’, and adds the columns ‘ex_residues’, which is the total number of exchanging residues given back exchange, prolines and the percentage deuterium, and the column ‘uptake_corrected’, which is the final corrected D-uptake for the peptide.
[6]:
peptides_corrected = correct_d_uptake(peptides_control, drop_first=2, d_percentage=90.)
peptides_corrected.iloc[0]
[6]:
start 9
end 17
stop 18
sequence xxFQIQRIY
state SecB WT apo
exposure 10.02
uptake 2.486444
maxuptake 8
fd_uptake 5.0734
fd_uptake_sd 0.020042
nd_uptake 0
nd_uptake_sd 0
rfu 0.490094
protein Accession
modification NaN
fragment NaN
mhp 1199.6241
center 1202.897618
center_sd 0.016323
uptake_sd 0.02845
rt 5.5139
rt_sd 0.010976
rfu sd 0.005932
_sequence MTFQIQRIY
_start 11
_stop 18
ex_residues 6.3
uptake_corrected 3.087594
Name: 0, dtype: object
This data array can now be used to create an HDXMeasurement
object, the main data object in PyHDX. Experimental metadata such as labelling pH and temperature (in Kelvin) can be specified. These quantities are required for calculating intrinsic exchange rates and ΔG values. The pH values are uncorrected values are measured by the pH meter (ie p(H, D) values)
[7]:
sequence = "MSEQNNTEMTFQIQRIYTKDISFEAPNAPHVFQKDWQPEVKLDLDTASSQLADDVYEVVLRVTVTASLGEETAFLCEVQQGGIFSIAGIEGTQMAHCLGAYCPNILFPYARECITSMVSRGTFPQLNLAPVNFDALFMNYLQQQAGEGTEEHQDA"
temperature, pH = 273.15 + 30, 8.0
hdxm = HDXMeasurement(
peptides_corrected, sequence=sequence, pH=pH, temperature=temperature, name='My HDX measurement'
)
hdxm
[7]:
HDX Measurement: My HDX measurementNumber of peptides: 63Number of residues: 145 (11 - 156)Number of timepoints: 6Timepoints: 10.02, 30.00, 60.00, 300.00, 600.00, 6000.00 secondsCoverage Percentage: 88.39Average redundancy: 5.04 Average peptide length: 11.89Repeatability (mean std): 0.05 DaTemperature: 303.15 KpH: 8.0
Iterating over a HDXMeasurement
object returns a set of HDXTimepoint
each with their own attributes describing the topology of the coverage. When creating the object, peptides which are not present in all timepoints are removed, such that all timepoints and HDXTimepoint
have identical coverage.
Note that the internal time units in PyHDX are seconds.
[8]:
fig, ax = pplt.subplots(figsize=(10, 5))
i = 0
peptide_coverage(ax, hdxm[i].data, 20, cbar=True)
t = ax.set_title(f'Peptides t = {hdxm.timepoints[i]}')
l = ax.set_xlabel('Residue number')

[9]:
fig, ax = pplt.subplots(figsize=(10, 5))
i = 4
peptide_coverage(ax, hdxm[i].data, 20, cbar=True)
t = ax.set_title(f'Peptides t = {hdxm.timepoints[i]}')
l = ax.set_xlabel('Residue number')

The data in an HDXMeasurement
object can be saved to and reloaded from disk (with associated metadata) in .csv format.
[10]:
from pyhdx.fileIO import csv_to_hdxm
hdxm.to_file('My_HDX_file.csv')
hdx_load = csv_to_hdxm('My_HDX_file.csv')
hdx_load
[10]:
HDX Measurement: My HDX measurementNumber of peptides: 63Number of residues: 145 (11 - 156)Number of timepoints: 6Timepoints: 10.02, 30.00, 60.00, 300.00, 600.00, 6000.00 secondsCoverage Percentage: 88.39Average redundancy: 5.04 Average peptide length: 11.89Repeatability (mean std): 0.05 DaTemperature: 303.15 KpH: 8.0