Module Documentation#

This page contains the full API docs of PyHDX

Models#

class pyhdx.models.Coverage(data, n_term=1, c_term=None, sequence=None)[source]#

Object describing layout and coverage of peptides and generating the corresponding matrices. Peptides should all belong to the same state and have the same exposure time.

Parameters
  • data – DataFrame with input peptides

  • n_term – Residue index of the N-terminal residue. Default value is 1, can be negative to accomodate for N-terminal purification tags

  • c_term – Residue index number of the C-terminal residue (where first residue has index number 1)

  • sequence – Amino acid sequence of the protein in one-letter FASTA encoding. Optional, if not specified the amino acid sequence from the peptide data is used to (partially) reconstruct the sequence. Supplied amino acid sequence must be compatible with sequence information in the peptides.

property Np: int#

Number of peptides.

Return type

int

property Nr: int#

Total number of residues spanned by the peptides.

Return type

int

X: numpy.ndarray#

Np x Nr matrix (peptides x residues). Values are 1 where residue j is in peptide i.

property X_norm#

X coefficient matrix normalized column wise.

Type

ndarray

Z: numpy.ndarray#

Np x Nr matrix (peptides x residues). Values are 1/(ex_residues) where residue j is in peptide i.

property Z_norm#

Z coefficient matrix normalized column wise.

Type

ndarray

apply_interval(array_or_series)[source]#

Returns the section of array_or_series in the interval

Given a Numpy array or Pandas series with a length equal to the full protein, returns the section of the array equal to the covered region. Returned series length is equal to number of columns in the X matrix

Parameters

array_or_series – Input data object to crop to interval

Returns

Input object cropped to interval of the interval spanned by the peptides

Return type

Series

property block_length: numpy.ndarray#

Lengths of unique blocks of residues in the peptides map, along the r_number axis

Return type

ndarray

Type

ndarary

get_sections(gap_size=- 1)[source]#

Get the intervals of independent sections of coverage.

Intervals are inclusive, exclusive. Gaps are defined with gap_size, adjacent peptides with distances bigger than this value are considered not to overlap. Set to -1 to treat touching peptides as belonging to the same section.

Parameters

gap_size – The size which defines a gap

property index: pandas.core.indexes.range.RangeIndex#

Pandas index numbers corresponding to the part of the protein covered by peptides

Return type

RangeIndex

property percent_coverage: float#

Percentage of residues covered by peptides

Return type

float

property r_number: pandas.core.indexes.range.RangeIndex#

Pandas index numbers corresponding to the part of the protein covered by peptides

Return type

RangeIndex

property redundancy: float#

Average redundancy of peptides in regions with at least 1 peptide

Return type

float

class pyhdx.models.CoverageSet(hdxm_list)[source]#

Coverage object for multiple HDXMeasurement objects.

This objects finds the minimal interval of residue numbers which fit all :class:`.HDXMeasurement`s

Parameters

hdxm_list – List of input :class:`.HDXMeasurment objects.

apply_interval(array_or_series)[source]#

Given a Numpy array or Pandas series with a length equal to the full protein, returns the section of the array equal to the covered region. Returned series length is equal to number of columns in the X matrix

get_masks()[source]#

mask of shape NsxNr with True entries covered by hdx measurements (exluding gaps)

property index: pandas.core.indexes.range.RangeIndex#

Index of residue numbers

Return type

RangeIndex

property s_r_mask: numpy.ndarray#

Sample-residue mask

Boolean array where entries ij are True if residue j is covered by peptides of sample i (Coverage aps not taken into account)

Return type

ndarray

class pyhdx.models.HDXMeasurement(data, **metadata)[source]#

Main HDX data object.

This object has peptide data of a single state and with multiple timepoints. Timepoint data is split into PeptideMeasurements objects for each timepoint. Supplied data is made ‘uniform’ such that all timepoints have the same peptides

Parameters
  • data – Dataframe with all peptides belonging to a single state.

  • **metadata – Dictionary of optional metadata. By default, holds the temperature and pH parameters.

property Np: int#

Number of peptides.

Return type

int

property Nr: int#

Total number of residues spanned by the peptides.

Return type

int

property Nt: int#

Number of timepoints.

Return type

int

coverage: Coverage#

Coverage object describing peptide layout

property d_exp: pandas.core.frame.DataFrame#

D-uptake values (corrected for back-exchange).

Shape of the returned DataFrame is Np (rows) x Nt (columns)

Return type

DataFrame

data: pd.DataFrame#

Dataframe with all peptides

get_tensors(exchanges=False, dtype=None)[source]#

Returns a dictionary of tensor variables for fitting HD kinetics.

Tensor variables are (shape): Temperature (1 x 1) X (Np x Nr) k_int (Nr x 1) timepoints (1 x Nt) d_exp (D) (Np x Nt)

Parameters
  • exchanges – If True only returns tensor data describing residues which exchange (ie have peptides and are not prolines)

  • dtype – Optional Torch data type. Use torch.float32 for faster fitting of large data sets, possibly at the expense of accuracy

Returns

Dictionary with tensors

guess_deltaG(rates, correct_c_term=True)[source]#

Obtain ΔG initial guesses from apparent H/D exchange rates.

Units of rates are per second. As the intrinsic rate of exchange of the c-terminal residue is ~100 fold lower, guess values for PF and ΔG are also much lower. Use the option correct_c_term to set the c-terminal guess value equal to the value of the residue preceding it.

Parameters
  • rates – Apparent exchange rates (units s^-1). Series index is protein residue number.

  • correct_c_term – If True, sets the guess value of the c-terminal residue to the value of the residue preceding it.

Returns

ΔG guess values (units kJ/mol)

Return type

Series

property name: str#

HDX Measurement name

Return type

str

property pH: Optional[float]#

pH of the H/D exchange reaction.

Return type

Optional[float]

peptides: list[HDXTimepoint]#

List of HDXTimepoint, one per exposure timepoint

property rfu_peptides: pandas.core.frame.DataFrame#

Relative fractional uptake per peptide.

Shape of the returned DataFrame is Np (rows) x Nt (columns)

Return type

DataFrame

property rfu_residues: pandas.core.frame.DataFrame#

Relative fractional uptake per residue.

Shape of the returned DataFrame is Nr (rows) x Nt (columns)

Return type

DataFrame

state: str#

Protein state label for this HDX measurement

property temperature: Optional[float]#

Temperature of the H/D exchange reaction (K).

Return type

Optional[float]

timepoints: np.ndarray#

Deuterium exposure times

to_file(file_path, include_version=True, include_metadata=True, fmt='csv', **kwargs)[source]#

Write the data in this HDXMeasurement to file.

Parameters
  • file_path – File path to create and write to.

  • include_version – Set True to include PyHDX version and current time/date

  • fmt – Formatting to use, options are ‘csv’ or ‘pprint’

  • include_metadata – If True, the objects’ metadata is included

  • **kwargs – Optional additional keyword arguments passed to df.to_csv

Return type

None

class pyhdx.models.HDXMeasurementSet(hdxm_list)[source]#

Set of multiple HDXMeasurement s

Parameters

hdxm_list – Input list of HDXMeasurement

add_alignment(alignment, first_r_numbers=None)[source]#
Parameters
  • alignment – list

  • first_r_numbers – default is [1, 1, …] but specifiy here if alignments do not all start at residue 1

Returns

d_exp: numpy.ndarray#

Array with measured D-uptake values, shape is Ns x Np x Nt, padded with zeros.

property exchanges: numpy.ndarray#

Boolean mask True where there are residues which exchange

Shape of the returned array is Ns x Np

Return type

ndarray

get(name)[source]#

find a HDXMeasurement by name

Return type

HDXMeasurement

get_tensors(dtype=None)[source]#

Returns a dictionary of tensor variables for fitting HD kinetics.

Tensor variables are (shape): Temperature (Ns x 1 x 1) X (Ns x Np x Nr) k_int (Ns x Nr) timepoints (Ns x 1 x Nt) d_exp (D) (Ns x Np x Nt)

Returns

Dictionary with tensors

guess_deltaG(rates_df, **kwargs)[source]#

Obtain ΔG initial guesses from apparent H/D exchange rates.

Parameters
  • rates_df – Pandas dataframe apparent exchange rates (units s^-1). Column names must correspond to HDX measurement names.

  • **kwargs – Additional keyword arguments passed to HDXMeasurement.guess_deltaG()

Returns

ΔG guess values (units kJ/mol)

Return type

DataFrame

property rfu_residues: pandas.core.frame.DataFrame#

Relative fractional uptake per residue.

Shape of the returned DataFrame is Nr (rows) x Ns*Nt (columns) and is multiindexed by columns (state, exposure, quantity)

Return type

DataFrame

timepoints: numpy.ndarray#

Array with timepoints, shape is Ns x Nt, padded with zeros in case of samples with unequal number of timepoints

to_file(file_path, include_version=True, include_metadata=True, fmt='csv', **kwargs)[source]#

Write the data in this HDXMeasurementSet to file.

Parameters
  • file_path – File path to create and write to.

  • include_version – Set True to include PyHDX version and current time/date

  • fmt – Formatting to use, options are ‘csv’ or ‘pprint’

  • include_metadata – If True, the objects’ metadata is included

  • **kwargs – Optional additional keyword arguments passed to df.to_csv

Return type

None

class pyhdx.models.HDXTimepoint(data, **kwargs)[source]#

Class with subset of peptides corresponding to only one state and exposure

Parameters
  • data – Dataframe with input data

  • **kwargs

calc_rfu(residue_rfu)[source]#

Calculates RFU per peptide given an array of individual residue scores

Parameters

residue_rfu (ndarray) – Array of rfu per residue of length prot_len

Returns

rfu – Array of rfu per peptide

Return type

ndarray

Return type

ndarray

property d_exp: pandas.core.series.Series#

Experimentally measured D-values (corrected)

Return type

Series

exposure: float#

Deuterium exposure time for this HDX timepoint (units seconds)

property name: str#

Name of this HDX timepoint

Format is <state>_<exposure>

Return type

str

property rfu_peptides: pandas.core.series.Series#

Relative fractional uptake per peptide

Return type

Series

property rfu_residues: pandas.core.series.Series#

Relative fractional uptake (RFU) per residue.

RFU values are obtained by weighted averaging, weight value is the length of each peptide

Return type

Series

state: str#

Protein state label for this HDX timepoint

weighted_average(field)[source]#

Calculate per-residue weighted average of values in data column

Parameters

field – Data field (column) to calculated weighted average of

Returns

THe weighted averaging result

Return type

Series

class pyhdx.models.PeptideMasterTable(data, drop_first=1, ignore_prolines=True, d_percentage=100.0, sort=True, remove_nan=True)[source]#

Main peptide input object.

The input DataFrame data must have the following entries for each peptide:

start: Residue number of the first amino acid in the peptide end: Residue number of the last amino acid in the peptide (inclusive) sequence: Amino acid sequence of the peptide (one letter code) exposure: The time the sample was exposed to a deuterated solution. Units are seconds. state: String describing to which state (experimental conditions) the peptide belongs uptake: Number of deuteriums the peptide has taken up

The following fields are added to the data array upon initialization:

  • _start: Unmodified copy of initial start field

  • _end: Unmodified copy of initial end field

  • _sequence: Unmodified copy of initial sequence

  • ex_residues: Number of residues that undergo deuterium exchange. This number is calculated using the drop_first,

    ignore_prolines, and d_percentage parameters.

N-terminal residues which are removed because they are either within drop_first or they are N-terminal prolines are marked with ‘x’ in the sequence field. Prolines which are removed because they are in the middle of a peptide are marked with a lower case ‘p’ in the sequence field.

The field scores is used in calculating exchange rates and can be set by either the set_backexchange or set_control methods.

Parameters
  • data – Pandas dataframe with peptide entries

  • drop_first – Number of N-terminal amino acids to ignore

  • d_percentage – Percentage of deuterium in the labelling solution

  • ignore_prolines – Toggle ignoring of proline residues. Should always be set to True

  • sort – Set to True to sort the input. Sort order is ‘start’, ‘end’, ‘sequence’, ‘exposure’, ‘state’.

  • remove_nan – Set to True to remove NaN entries in the ‘uptake’ column

property exposures#

ndarray Array with unique exposures

get_data(state, exposure)[source]#

Get all peptides matching state and exposure.

Parameters
  • state (str) – Measurement state

  • exposure (float) – Measurement exposure time

Returns

output_data – DataFrame with selected peptides

Return type

DataFrame

get_state(state)[source]#

Returns entries in the table with state ‘state’ Rows with NaN entries for ‘uptake_corrected’ are removed

Parameters

state (str) – Name of the ‘state’ entries to select

Return type

DataFrame

Returns

Dataframe of peptides from specified ‘state’

select(**kwargs)[source]#

Select data based on column values.

Parameters

kwargs (dict) – Column name, value pairs to select

Returns

output_data – DataFrame with selected peptides

Return type

DataFrame

set_backexchange(back_exchange)[source]#

Sets the normalized percentage of uptake through a fixed backexchange value for all peptides.

Parameters

back_exchange (float) – Percentage of back exchange

Return type

None

set_control(control_1, control_0=None)[source]#

Apply a control dataset to this object. The column ‘RFU’ is added to the object by normalizing its uptake value with respect to the control uptake value to one. Optionally, control_zero can be specified which is a dataset whose uptake value will be used to zero the uptake.

Nonmatching peptides are set to NaN

#todo insert math

Parameters
  • param (control_0: tuple with (state, exposure) for peptides to use for zeroing uptake values (ND control)) –

  • param

property states#

ndarray Array with unique states

class pyhdx.models.Protein(data, index=None, **metadata)[source]#

Object describing a protein

Protein objects are based on panda’s DataFrame’s with added functionality

Parameters
  • data (ndarray or dict or DataFrame) – data object to initiate the protein object from

  • index (str, optional) – Name of the column with the residue number (index column)

  • **metadata – Dictionary of optional metadata.

get_k_int(temperature, pH, **kwargs)[source]#

Calculates the intrinsic rate of the sequence. Values of no coverage or prolines are assigned a value of -1 The rates run are for the first residue (1) up to the last residue that is covered by peptides

When the previous residue is unknown the current residue is also assigned a value of -1.g

Parameters
  • temperature (float) – Temperature of the labelling reaction (Kelvin)

  • pH (float) – pH of the labelling reaction

Returns

k_int – Array of intrisic exchange rates

Return type

ndarray

to_file(file_path, include_version=True, include_metadata=True, fmt='csv', **kwargs)[source]#

Write Protein data to file.

Parameters
  • file_path (str) – File path to create and write to.

  • include_version (bool) – Set True to include PyHDX version and current time/date

  • fmt (str) – Formatting to use, options are ‘csv’ or ‘pprint’

  • include_metadata (bool) – If True, the objects’ metadata is included

  • **kwargs (dict, optional) – Optional additional keyword arguments passed to df.to_csv

Return type

None

pyhdx.models.array_intersection(arrays, fields)[source]#

Find and return the intersecting entries in multiple arrays.

Parameters
  • arrays – Iterable of input structured arrays

  • fields – Iterable of fields to use to decide if entires are intersecting

Returns

Output iterable of arrays with only intersecting entries.

Return type

selected

pyhdx.models.contiguous_regions(condition)[source]#

Finds contiguous True regions of the boolean array “condition”. Returns a 2D array where the first column is the start index of the region and the second column is the end index.

pyhdx.models.hdx_intersection(hdx_list, fields=None)[source]#

Finds the intersection between peptides.

Peptides are supplied as HDXMeasurement objects. After the intersection of peptides is found, new objects are returned where all peptides (coverage, exposure) between the measurements are identical.

Optionally intersections by custom fields can be made.

Parameters
  • hdx_list – Input list of HDXMeasurement

  • fields – By which fields to take the intersections. Default is [‘_start’, ‘_end’, ‘exposure’]

Returns

Output list of HDXMeasurement

Return type

hdx_out

Fitting#

class pyhdx.fitting.EmptyResult(chi_squared, params)#
chi_squared#

Alias for field number 0

params#

Alias for field number 1

class pyhdx.fitting.GenericFitResult(output, fit_function, name)[source]#
class pyhdx.fitting.KineticsFitResult(hdxm, intervals, results, models)[source]#

Fit result object. Generally used for initial guess results.

Parameters
  • hdxm (HDXMeasurement) – HDX measurement object to fit

  • intervals (list) – List of tuples with intervals (inclusive, exclusive) describing which residues results and models refer to

  • results (list) – List of FitResults

  • models (list) – Lis of KineticsModel

get_d(t)[source]#

calculate d at timepoint t only for lsqkinetics (refactor glocal) type fitting results (scores per peptide)

get_p(t)[source]#

Calculate P at timepoint t. Only for wt average type fitting results

get_param(name)[source]#

Get an array of parameter with name name from the fit result. The length of the array is equal to the number of amino acids.

Parameters

name (str) – Name of the parameter to extract

Returns

par_arr – Array with parameter values

Return type

ndarray

property output#

Dataframe with fitted rates per residue

Type

Dataframe

property rate#

Returns an array with the exchange rates

property tau#

Returns an array with the exchange rates

class pyhdx.fitting.RatesFitResult(results)[source]#

Accumulates multiple Generic/KineticsFit Results

pyhdx.fitting.check_bounds(fit_result)[source]#

Check if the obtained fit result is within bounds

pyhdx.fitting.fit_gibbs_global(hdxm, initial_guess, r1=1, epochs=200000, patience=50, stop_loss=5e-06, optimizer='SGD', callbacks=None, **optimizer_kwargs)[source]#

Fit Gibbs free energies globally to all D-uptake data in the supplied hdxm

Parameters
  • hdxm (HDXMeasurement) – Input HDX measurement

  • initial_guess (Series or ndarray) – Gibbs free energy initial guesses (shape Nr, units J/mol)

  • r1 (float) – Regularizer value r1 (along residues)

  • epochs (int) – Maximum number of fitting iterations

  • patience (int) – Number of epochs to wait until termination when progress between epochs is below stop_loss

  • stop_loss (float) – Threshold for difference in loss between epochs when an epoch is considered to make no more progress.

  • optimizer (str) – Which optimizer to use. Default is Stochastic Gradient Descent. See PyTorch documentation for information.

  • callbacks (list or None) – List of callback objects. Call signature is callback(epoch, model, optimizer)

  • **optimizer_kwargs – Additional keyword arguments passed to the optimizer.

Returns

result

Return type

TorchSingleFitResult

pyhdx.fitting.fit_gibbs_global_batch(hdx_set, initial_guess, r1=1, r2=1, r2_reference=False, epochs=200000, patience=50, stop_loss=5e-06, optimizer='SGD', callbacks=None, **optimizer_kwargs)[source]#

Fit Gibbs free energies globally to all D-uptake data in multiple HDX measurements

Parameters
  • hdx_set (HDXMeasurementSet) – Input HDX measurements

  • initial_guess (Series or DataFrame or ndarray) – Gibbs free energy initial guesses (shape Ns x Nr or Nr, units J/mol)

  • r1 (float) – Regularizer value r1 (along residues)

  • r2 (float) – Regularizer value r2 (along protein states/samples)

  • r2_reference (bool:) – If True the first dataset is used as a reference to calculate r2 differences, otherwise the mean is used

  • epochs (int) – Maximum number of fitting iterations

  • patience (int) – Number of epochs to wait until termination when progress between epochs is below stop_loss

  • stop_loss (float) – Threshold for difference in loss between epochs when an epoch is considered to make no more progress.

  • optimizer (str) – Which optimizer to use. Default is Stochastic Gradient Descent. See PyTorch documentation for information.

  • callbacks (list or None) – List of callback objects. Call signature is callback(epoch, model, optimizer)

  • **optimizer_kwargs – Additional keyword arguments passed to the optimizer.

Returns

result

Return type

TorchBatchFitResult

pyhdx.fitting.fit_gibbs_global_batch_aligned(hdx_set, initial_guess, r1=1, r2=1, epochs=200000, patience=50, stop_loss=5e-06, optimizer='SGD', callbacks=None, **optimizer_kwargs)[source]#

Batch fit gibbs free energies to two HDX measurements. The supplied HDXMeasurementSet must have alignment information (supplied by HDXMeasurementSet.add_alignment)

Parameters
  • hdx_set (HDXMeasurementSet) – Input HDX measurements

  • initial_guess (Series or DataFrame or ndarray) – Gibbs free energy initial guesses (shape Ns x Nr or Nr, units J/mol)

  • r1 (float) – Regularizer value r1 (along residues)

  • r2 (float) – Regularizer value r2 (along protein states/samples)

  • epochs (int) – Maximum number of fitting iterations

  • patience (int) – Number of epochs to wait until termination when progress between epochs is below stop_loss

  • stop_loss (float) – Threshold for difference in loss between epochs when an epoch is considered to make no more progress.

  • optimizer (str) – Which optimizer to use. Default is Stochastic Gradient Descent. See PyTorch documentation for information.

  • callbacks (list or None) – List of callback objects. Call signature is callback(epoch, model, optimizer)

  • **optimizer_kwargs – Additional keyword arguments passed to the optimizer.

Returns

result

Return type

TorchBatchFitResult

pyhdx.fitting.fit_kinetics(t, d, model, chisq_thd=100)[source]#

Fit time kinetics with two time components and corresponding relative amplitude.

Parameters
  • t (ndarray) – Array of time points

  • d (ndarray) – Array of uptake values

  • model (KineticsModel) –

  • chisq_thd (float) – Threshold chi squared above which the fitting is repeated with the Differential Evolution algorithm.

Returns

res – Symfit fitresults object.

Return type

FitResults

pyhdx.fitting.fit_rates(hdxm, method='wt_avg', **kwargs)[source]#

Fit observed rates of exchange to HDX-MS data in hdxm

Parameters
  • hdxm (HDXMeasurement) –

  • method (str) – Method to use to determine rates of exchange

  • kwargs – Additional kwargs passed to fitting

Returns

fit_result

Return type

KineticsFitResult

pyhdx.fitting.fit_rates_half_time_interpolate(hdxm)[source]#

Calculates exchange rates based on weighted averaging followed by interpolation to determine half-time, which is then calculated to rates.

Parameters

hdxm (HDXMeasurement) –

Returns

output – dataclass with fit result

Return type

dataclass

pyhdx.fitting.fit_rates_weighted_average(hdxm, bounds=None, chisq_thd=0.2, model_type='association', client=None, pbar=None)[source]#

Fit a model specified by ‘model_type’ to D-uptake kinetics. D-uptake is weighted averaged across peptides per timepoint to obtain residue-level D-uptake.

Parameters
  • hdxm (HDXMeasurement) –

  • bounds (tuple, optional) – Tuple of lower and upper bounds of rate constants in the model used.

  • chisq_thd (float) – Threshold of chi squared result, values above will trigger a second round of fitting using DifferentialEvolution

  • model_type (str) – Missing docstring

  • client (: ??) – Controls delegation of fitting tasks to Dask clusters. Options are: None: Do not use task, fitting is done in the local thread in a for loop. :class: Dask Client : Uses the supplied Dask client to schedule fitting task. worker_client: The function was ran by a Dask worker and the additional fitting tasks created are scheduled on the same Cluster.

  • pbar – Not implemented

Returns

fit_result

Return type

KineticsFitResult

pyhdx.fitting.get_bounds(times)[source]#

estimate default bound for rate fitting from a series of timepoints

Parameters

times (array_like) –

Returns

bounds – lower and upper bounds

Return type

tuple

pyhdx.fitting.run_optimizer(inputs, output_data, optimizer_klass, optimizer_kwargs, model, criterion, regularizer, epochs=200000, patience=50, stop_loss=5e-06, callbacks=None, tqdm=True)[source]#

Runs optimization/fitting of PyTorch model.

Parameters
  • inputs (list) – List of input Tensors

  • output_data (Tensor) – comparison data to model output

  • optimizer_klass (optim) –

  • optimizer_kwargs (dict) – kwargs to pass to pytorch optimizer

  • model (Module) – pytorch model

  • criterion (callable) – loss function

  • callable (regularizer) – regularizer function

  • epochs (int) – Max number of epochs

  • patience (int) – Number of epochs with less progress than stop_loss before terminating optimization

  • stop_loss (float) – Threshold of optimization value below which no progress is made

  • callbacks (list or None) – List of callback functions

  • tqdm (bool) – Toggle tqdm progress bar

Fitting PyTorch#

class pyhdx.fitting_torch.DeltaGFit(dG)[source]#
forward(temperature, X, k_int, timepoints)[source]#
# inputs, list of:

temperatures: scalar (1,) X (N_peptides, N_residues) k_int: (N_peptides, 1)

class pyhdx.fitting_torch.TorchFitResult(hdxm_set, model, losses=None, **metadata)[source]#

PyTorch Fit result object.

Parameters
property dG#

output dG as Series or as DataFrame

index is residue numbers

eval(timepoints)[source]#

evaluate the model at timepoints and return dataframe

static generate_output(hdxm, dG)[source]#
Parameters
get_dcalc(timepoints=None)[source]#

returns calculated d uptake for optional timepoints if no timepoints are given, a default set of logarithmically space timepoints is generated

get_peptide_mse()[source]#

Get a dataframe with mean squared error per peptide (ie per peptide squared error averaged over time)

get_residue_mse()[source]#

Get a dataframe with residue mean squared errors

Errors are from peptide MSE which is subsequently reduced to residue level by weighted averaging

get_squared_errors()[source]#

np.ndarray: Returns the squared error per peptide per timepoint. Output shape is Ns x Np x Nt

Return type

ndarray

property mse_loss#

Losses from mean squared error part of Lagrangian

Type

obj

Type

float

property reg_loss#

Losses from regularization part of Lagrangian

Type

float

property regularization_percentage#

Percentage part of the total loss that is regularization loss

Type

float

to_file(file_path, include_version=True, include_metadata=True, fmt='csv', **kwargs)[source]#

save only output to file

property total_loss#

Total loss value of the Lagrangian

Type

obj

Type

float

class pyhdx.fitting_torch.TorchFitResultSet(results)[source]#

Set of multiple TorchFitResults

to_file(file_path, include_version=True, include_metadata=True, fmt='csv', **kwargs)[source]#

save only output to file

pyhdx.fitting_torch.estimate_errors(hdxm, dG)[source]#

Calculate covariances and uncertainty (perr, experimental)

Parameters

FileIO#

pyhdx.fileIO.csv_to_dataframe(filepath_or_buffer, comment='#', **kwargs)[source]#

Reads a .csv file or buffer into a :pandas:`DataFrame` object. Comment lines are parsed where json dictionaries marked by tags are read. The <pandas_kwargs> marked json dict is used as kwargs for pd.read_csv The <metadata> marked json dict is stored in the returned dataframe object as `df.attrs[‘metadata’].

Parameters
  • filepath_or_buffer (str, pathlib.Path or io.StringIO) – Filepath or StringIO buffer to read.

  • comment (str) – Indicates which lines are comments.

  • kwargs – Optional additional keyword arguments passed to pd.read_csv

Returns

df

Return type

pd.DataFrame

pyhdx.fileIO.csv_to_hdxm(filepath_or_buffer, comment='#', **kwargs)[source]#

Reads a pyhdx .csv file or buffer into a pyhdx.models.HDXMeasurement or pyhdx.models.HDXMeasurementSet object.

Parameters
  • filepath_or_buffer (str or pathlib.Path or io.StringIO) – Filepath or StringIO buffer to read.

  • comment (str) – Indicates which lines are comments.

  • **kwargs (dict, optional) – Optional additional keyword arguments passed to pd.read_csv

Returns

protein – Resulting HDXMeasurement object with r_number as index

Return type

pyhdx.models.HDXMeasurement

pyhdx.fileIO.csv_to_protein(filepath_or_buffer, comment='#', **kwargs)[source]#

Reads a .csv file or buffer into a pyhdx.models.Protein object. Comment lines are parsed where json dictionaries marked by tags are read. The <pandas_kwargs> marked json dict is used as kwargs for pd.read_csv The <metadata> marked json dict is stored in the returned dataframe object as `df.attrs[‘metadata’].

Parameters
  • filepath_or_buffer (str or pathlib.Path or io.StringIO) – Filepath or StringIO buffer to read.

  • comment (str) – Indicates which lines are comments.

  • **kwargs (dict, optional) – Optional additional keyword arguments passed to pd.read_csv

Returns

protein – Resulting Protein object with r_number as index

Return type

pyhdx.models.Protein

pyhdx.fileIO.dataframe_to_file(file_path, df, fmt='csv', include_metadata=True, include_version=False, **kwargs)[source]#

Save a pd.DataFrame to an io.StringIO object. Kwargs to read the resulting .csv object with pd.read_csv to get the original pd.DataFrame back are included in the comments. Optionally additional metadata or the version of PyHDX used can be included in the comments.

Parameters
  • file_path (str or pathlib.Path) – File path of the target file to write.

  • df (pd.DataFrame) – The pandas dataframe to write to the file.

  • fmt (str) – Specify the formatting of the output. Options are ‘.csv’ (machine readable) or ‘pprint’ (human readable)

  • include_metadata (bool or dict) – If True, the metadata in df.attrs[‘metadata’] is included. If dict, this dictionary is used as the metadata. If False, no metadata is included.

  • include_version (bool) – True to include PyHDX version information.

  • **kwargs (dict, optional) – Optional additional keyword arguments passed to df.to_csv

Returns

sio – Resulting io.StringIO object.

Return type

io.StringIO

pyhdx.fileIO.dataframe_to_stringio(df, sio=None, fmt='csv', include_metadata=True, include_version=True, **kwargs)[source]#

Save a pd.DataFrame to an io.StringIO object. Kwargs to read the resulting .csv object with pd.read_csv to get the original pd.DataFrame back are included in the comments. Optionally additional metadata or the version of PyHDX used can be included in the comments.

Parameters
  • df (pd.DataFrame) – The pandas dataframe to write to the io.StringIO object.

  • sio (io.StringIO, optional) – The io.StringIO object to write to. If None, a new io.StringIO object is created.

  • fmt (str) – Specify the formatting of the output. Options are ‘csv’ (machine readable) or ‘pprint’ (human readable)

  • include_metadata (bool or dict) – If True, the metadata in df.attrs[‘metadata’] is included. If dict, this dictionary is used as the metadata. If False, no metadata is included.

  • include_version (bool) – True to include PyHDX version information.

  • **kwargs (dict, optional) – Optional additional keyword arguments passed to df.to_csv

Returns

sio – Resulting io.StringIO object.

Return type

io.StringIO

pyhdx.fileIO.load_fitresult(fit_dir)[source]#

Load a fitresult into a fitting_torch.TorchSingleFitResult or TorchBatchFitResult object

The fit result must be in the format as generated by saving a fit result with save_fitresult.

:param fir_dir str or Path: Fit result directory.

pyhdx.fileIO.read_dynamx(*file_paths, intervals=('inclusive', 'inclusive'), time_unit='min')[source]#

Reads a dynamX .csv file and returns the data as a numpy structured array

Parameters
  • file_paths (iterable) – File path of the .csv file or StringIO object

  • intervals (tuple) – Format of how start and end intervals are specified.

  • time_unit (str) – Time unit of the field ‘exposure’. Options are ‘h’, ‘min’ or ‘s’

Returns

full_df – Peptides as a pandas DataFrame

Return type

DataFrame

pyhdx.fileIO.save_fitresult(output_dir, fit_result, log_lines=None)[source]#

Save a fit result object to the specified directory with associated metadata

Output directory contents: dG.csv/.txt: Fit output result (dG, covariance, k_obs, pfact) losses.csv/.txt: Losses per epoch log.txt: Log file with additional metadata (number of epochs, final losses, pyhdx version, time/date)

Parameters
  • output_dir (pathlib.Path or str) – Output directory to save fitresult to

  • fit_result (pydhx.fittin_torch.TorchFitResult) – fit result object to save

  • log_lines (list) – Optional additional lines to write to log file.

Output#

class pyhdx.output.FitReport(fit_result, title=None, doc=None, add_date=True, temp_dir=None, **kwargs)[source]#

Create .pdf output of a fit result

class pyhdx.output.LocalThreadExecutor[source]#
shutdown(wait=True)[source]#

Clean-up the resources associated with the Executor.

It is safe to call this method several times. Otherwise, no other methods can be called after this one.

Parameters

wait – If True then shutdown will not return until all running futures have finished executing and the resources used by the executor have been reclaimed.

submit(f, *args, **kwargs)[source]#

Submits a callable to be executed with the given arguments.

Schedules the callable to be executed as fn(*args, **kwargs) and returns a Future instance representing the execution of the callable.

Returns

A Future representing the given call.

Support#

pyhdx.support.autowrap(start, end, margin=4, step=5)[source]#

Automatically finds wrap value for coverage to not have overlapping peptides within margin

Parameters
  • start

  • end

  • margin

pyhdx.support.colors_to_pymol(r_number, color_arr, c_term=None, no_coverage='#8c8c8c')[source]#

coverts colors (hexadecimal format) and corresponding residue numbers to pml script to color structures in pymol residue ranges in output are inclusive, incluive

c_term:

optional residue number of the c terminal of the last peptide doedsnt cover the c terminal

pyhdx.support.gen_subclasses(cls)[source]#

Recursively find all subclasses of cls

pyhdx.support.grouper(3, 'abcdefg', 'x') --> ('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'x', 'x')[source]#
pyhdx.support.hex_to_rgb(h)[source]#

returns rgb as int 0-255

pyhdx.support.make_color_array(rates, colors, thds, no_coverage='#8c8c8c')[source]#
Parameters
  • rates – array of rates

  • colors – list of colors (slow to fast)

  • thds – list of thresholds

no_coverage: color value for no coverage :return:

pyhdx.support.make_monomer(input_file, output_file)[source]#

reads input_file pdb file and removes all chains except chain A and all water

pyhdx.support.multi_otsu(*rates, classes=3)[source]#

global otsu thesholding of multiple rate arrays in log space

Parameters
  • rates (iterable) – iterable of numpy structured arrays with a ‘rate’ field

  • classes (int) – Number of classes to divide the data into

Returns

thds – tuple with thresholds

Return type

tuple

pyhdx.support.pprint_df_to_file(df, file_path_or_obj)[source]#

Pretty print (human-readable) a dataframe to a file

Parameters
pyhdx.support.reduce_inter(args, gap_size=- 1)[source]#

Reduce overlapping intervals to its non-overlapping intveral parts

Author: Brent Pedersen Source: https://github.com/brentp/interlap/blob/3c4a5923c97a5d9a11571e0c9ea5bb7ea4e784ee/interlap.py#L224

gap_sizeint

Gaps of this size between adjacent peptides is not considered to overlap. A value of -1 means that peptides with exactly zero overlap are separated. With gap_size=0 peptides with exactly zero overlap are not separated, and larger values tolerate larger gap sizes.

>>> reduce_inter([(2, 4), (4, 9)])
[(2, 4), (4, 9)]
>>> reduce_inter([(2, 6), (4, 10)])
[(2, 10)]
pyhdx.support.rgb_to_hex(rgb_a)[source]#

Converts rgba input values are [0, 255]

alpha is set to zero

returns as ‘#000000’

pyhdx.support.scale(x, out_range=(- 1, 1))[source]#

rescale input array x to range out_range

pyhdx.support.series_to_pymol(pd_series)[source]#

Coverts a pandas series to pymol script to color proteins structures in pymol Series must have hexadecimal color values and residue number as index

Parameters

pd_series (Series) –

Returns

s_out

Return type

str

pyhdx.support.try_wrap(start, end, wrap, margin=4)[source]#

Check for a given coverage if the value of wrap is high enough to not have peptides overlapping within margin

start, end interval is inclusive, exclusive