Module Documentation#
This page contains the full API docs of PyHDX
Models#
- class pyhdx.models.Coverage(data, n_term=None, c_term=None, sequence=None)[source]#
Object describing layout and coverage of peptides and generating the corresponding matrices. Peptides should all belong to the same state and have the same exposure time.
- Parameters
data – DataFrame with input peptides
weight_exponent – Value of the exponent for weighted averaging used in converting peptide RFU values to residue-level RFU values. Default value is 1., corresponding to weighted averaging where weights are the inverse of peptide length. Generally, weights are 1/(peptide_length)**exponent.
n_term – Residue index of the N-terminal residue. Default value is 1, can be negative to accommodate for N-terminal purification tags
c_term – Residue index number of the C-terminal residue (where first residue has index number 1)
sequence – Amino acid sequence of the protein in one-letter FASTA encoding. Optional, if not specified the amino acid sequence from the peptide data is used to (partially) reconstruct the sequence. Supplied amino acid sequence must be compatible with sequence information in the peptides.
- X: numpy.ndarray#
Np x Nr matrix (peptides x residues). Values are 1 where residue j is in peptide i.
- property X_norm: numpy.ndarray#
X coefficient matrix normalized column wise.
- Z: numpy.ndarray#
Np x Nr matrix (peptides x residues). Values are 1/(ex_residues) where residue j is in peptide i.
- property Z_norm: numpy.ndarray#
Z coefficient matrix normalized column wise.
- apply_interval(array_or_series)[source]#
Returns the section of array_or_series in the interval
Given a Numpy array or Pandas series with a length equal to the full protein, returns the section of the array equal to the covered region. Returned series length is equal to number of columns in the X matrix
- Parameters
array_or_series – Input data object to crop to interval
- Returns
Input object cropped to interval of the interval spanned by the peptides
- Return type
- property block_length: numpy.ndarray#
Lengths of unique blocks of residues in the peptides map, along the r_number axis
- Return type
- Type
ndarary
- get_sections(gap_size=- 1)[source]#
Get the intervals of independent sections of coverage.
Intervals are inclusive, exclusive. Gaps are defined with gap_size, adjacent peptides with distances bigger than this value are considered not to overlap. Set to -1 to treat touching peptides as belonging to the same section.
- Parameters
gap_size – The size which defines a gap
- property index: pandas.core.indexes.range.RangeIndex#
Pandas index numbers corresponding to the part of the protein covered by peptides
- Return type
- property r_number: pandas.core.indexes.range.RangeIndex#
Pandas index numbers corresponding to the part of the protein covered by peptides
- Return type
- class pyhdx.models.CoverageSet(hdxm_list)[source]#
Coverage object for multiple
HDXMeasurement
objects.This objects finds the minimal interval of residue numbers which fit all :class:`.HDXMeasurement`s
- Parameters
hdxm_list – List of input :class:`.HDXMeasurment objects.
- apply_interval(array_or_series)[source]#
Given a Numpy array or Pandas series with a length equal to the full protein, returns the section of the array equal to the covered region. Returned series length is equal to number of columns in the X matrix
- get_masks()[source]#
mask of shape NsxNr with True entries covered by hdx measurements (exluding gaps)
- property index: pandas.core.indexes.range.RangeIndex#
Index of residue numbers
- Return type
- property s_r_mask: numpy.ndarray#
Sample-residue mask
Boolean array where entries ij are
True
if residue j is covered by peptides of sample i (Coverage aps not taken into account)- Return type
- class pyhdx.models.HDXMeasurement(data, **metadata)[source]#
Main HDX data object.
This object has peptide data of a single state and with multiple timepoints. Timepoint data is split into
PeptideMeasurements
objects for each timepoint. Supplied data is made ‘uniform’ such that all timepoints have the same peptides- Parameters
data – Dataframe with all peptides belonging to a single state.
**metadata – Dictionary of optional metadata. By default, holds the temperature and pH parameters.
- property d_exp: pandas.core.frame.DataFrame#
D-uptake values (corrected for back-exchange).
Shape of the returned DataFrame is Np (rows) x Nt (columns)
- Return type
- data: pd.DataFrame#
Dataframe with all peptides
- get_tensors(exchanges=False, dtype=None)[source]#
Returns a dictionary of tensor variables for fitting HD kinetics.
Tensor variables are (shape): Temperature (1 x 1) X (Np x Nr) k_int (Nr x 1) timepoints (1 x Nt) d_exp (D) (Np x Nt)
- Parameters
exchanges – If
True
only returns tensor data describing residues which exchange (ie have peptides and are not prolines)dtype – Optional Torch data type. Use torch.float32 for faster fitting of large data sets, possibly at the expense of accuracy
- Returns
Dictionary with tensors
- guess_deltaG(rates, correct_c_term=True)[source]#
Obtain ΔG initial guesses from apparent H/D exchange rates.
Units of rates are per second. As the intrinsic rate of exchange of the c-terminal residue is ~100 fold lower, guess values for PF and ΔG are also much lower. Use the option correct_c_term to set the c-terminal guess value equal to the value of the residue preceding it.
- Parameters
rates – Apparent exchange rates (units s^-1). Series index is protein residue number.
correct_c_term – If
True
, sets the guess value of the c-terminal residue to the value of the residue preceding it.
- Returns
ΔG guess values (units kJ/mol)
- Return type
- peptides: list[HDXTimepoint]#
List of
HDXTimepoint
, one per exposure timepoint
- property rfu_peptides: pandas.core.frame.DataFrame#
Relative fractional uptake per peptide.
Shape of the returned DataFrame is Np (rows) x Nt (columns)
- Return type
- property rfu_residues: pandas.core.frame.DataFrame#
Relative fractional uptake per residue.
Shape of the returned DataFrame is Nr (rows) x Nt (columns)
- Return type
- property rfu_residues_sd: pandas.core.frame.DataFrame#
Standard deviations of relative fractional uptake per residue.
Shape of the returned DataFrame is Nr (rows) x Nt (columns)
- Return type
- timepoints: np.ndarray#
Deuterium exposure times
- to_file(file_path, include_version=True, include_metadata=True, fmt='csv', **kwargs)[source]#
Write the data in this
HDXMeasurement
to file.- Parameters
file_path – File path to create and write to.
include_version – Set
True
to include PyHDX version and current time/datefmt – Formatting to use, options are ‘csv’ or ‘pprint’
include_metadata – If
True
, the objects’ metadata is included**kwargs – Optional additional keyword arguments passed to df.to_csv
- Return type
- class pyhdx.models.HDXMeasurementSet(hdxm_list)[source]#
Set of multiple
HDXMeasurement
s- Parameters
hdxm_list – Input list of
HDXMeasurement
- add_alignment(alignment, first_r_numbers=None)[source]#
- Parameters
alignment – list
first_r_numbers – default is [1, 1, …] but specifiy here if alignments do not all start at residue 1
- Returns
- d_exp: numpy.ndarray#
Array with measured D-uptake values, shape is Ns x Np x Nt, padded with zeros.
- property exchanges: numpy.ndarray#
Boolean mask
True
where there are residues which exchangeShape of the returned array is Ns x Np
- Return type
- get_tensors(dtype=None)[source]#
Returns a dictionary of tensor variables for fitting HD kinetics.
Tensor variables are (shape): Temperature (Ns x 1 x 1) X (Ns x Np x Nr) k_int (Ns x Nr) timepoints (Ns x 1 x Nt) d_exp (D) (Ns x Np x Nt)
- Returns
Dictionary with tensors
- guess_deltaG(rates_df, **kwargs)[source]#
Obtain ΔG initial guesses from apparent H/D exchange rates.
- Parameters
rates_df – Pandas dataframe apparent exchange rates (units s^-1). Column names must correspond to HDX measurement names.
**kwargs – Additional keyword arguments passed to
HDXMeasurement.guess_deltaG()
- Returns
ΔG guess values (units kJ/mol)
- Return type
- property rfu_residues: pandas.core.frame.DataFrame#
Relative fractional uptake per residue.
Shape of the returned DataFrame is Nr (rows) x Ns*Nt (columns) and is multiindexed by columns (state, exposure, quantity)
- Return type
- timepoints: numpy.ndarray#
Array with timepoints, shape is Ns x Nt, padded with zeros in case of samples with unequal number of timepoints
- to_file(file_path, include_version=True, include_metadata=True, fmt='csv', **kwargs)[source]#
Write the data in this
HDXMeasurementSet
to file.- Parameters
file_path – File path to create and write to.
include_version – Set
True
to include PyHDX version and current time/datefmt – Formatting to use, options are ‘csv’ or ‘pprint’
include_metadata – If
True
, the objects’ metadata is included**kwargs – Optional additional keyword arguments passed to df.to_csv
- Return type
- class pyhdx.models.HDXTimepoint(data, **kwargs)[source]#
Class with subset of peptides corresponding to only one state and exposure
- Parameters
data – Dataframe with input data
**kwargs –
- calc_rfu(residue_rfu)[source]#
Calculates RFU per peptide given an array of individual residue scores
- property d_exp: pandas.core.series.Series#
Experimentally measured D-values (corrected)
- Return type
- propagate_errors(field)[source]#
Propagate errors on field when calculating per-residue weighted average values.
- Parameters
field – Data field (column) of errors to propagate.
- Returns
Propagated errors per residue.
- Return type
- property rfu_peptides: pandas.core.series.Series#
Relative fractional uptake per peptide
- Return type
- property rfu_residues: pandas.core.series.Series#
Relative fractional uptake (RFU) per residue.
RFU values are obtained by weighted averaging, weight value is the length of each peptide
- Return type
- property rfu_residues_sd: pandas.core.series.Series#
Error propagated standard deviations of RFU per residue.
- Return type
- class pyhdx.models.Protein(data, index=None, **metadata)[source]#
Object describing a protein
Protein objects are based on panda’s DataFrame’s with added functionality
- Parameters
- get_k_int(temperature, pH, **kwargs)[source]#
Calculates the intrinsic rate of the sequence. Values of no coverage or prolines are assigned a value of -1 The rates run are for the first residue (1) up to the last residue that is covered by peptides
When the previous residue is unknown the current residue is also assigned a value of -1.g
- to_file(file_path, include_version=True, include_metadata=True, fmt='csv', **kwargs)[source]#
Write Protein data to file.
- Parameters
file_path (
str
) – File path to create and write to.include_version (
bool
) – SetTrue
to include PyHDX version and current time/datefmt (
str
) – Formatting to use, options are ‘csv’ or ‘pprint’include_metadata (
bool
) – If True, the objects’ metadata is included**kwargs (
dict
, optional) – Optional additional keyword arguments passed to df.to_csv
- Return type
None
- pyhdx.models.contiguous_regions(condition)[source]#
Finds contiguous True regions of the boolean array “condition”. Returns a 2D array where the first column is the start index of the region and the second column is the end index.
- pyhdx.models.hdx_intersection(hdx_list, fields=None)[source]#
Finds the intersection between peptides.
Peptides are supplied as
HDXMeasurement
objects. After the intersection of peptides is found, new objects are returned where all peptides (coverage, exposure) between the measurements are identical.Optionally intersections by custom fields can be made.
- Parameters
hdx_list – Input list of
HDXMeasurement
fields – By which fields to take the intersections. Default is [‘_start’, ‘_end’, ‘exposure’]
- Returns
Output list of
HDXMeasurement
- Return type
hdx_out
Fitting#
- class pyhdx.fitting.DUptakeFitResult(result, mse_loss, reg_loss, hdx_obj, metadata)[source]#
- result: numpy.ndarray#
Array with raw D-uptake fit values, including (guessed/interpolated) D-uptake at residues without coverage.
- class pyhdx.fitting.EmptyResult(chi_squared, params)#
- chi_squared#
Alias for field number 0
- params#
Alias for field number 1
- class pyhdx.fitting.KineticsFitResult(hdxm, intervals, results, models)[source]#
Fit result object. Generally used for initial guess results.
- Parameters
hdxm (
HDXMeasurement
) – HDX measurement object to fitintervals (
list
) – List of tuples with intervals (inclusive, exclusive) describing which residues results and models refer toresults (
list
) – List ofFitResults
models (
list
) – Lis ofKineticsModel
- get_d(t)[source]#
calculate d at timepoint t only for lsqkinetics (refactor glocal) type fitting results (scores per peptide)
- get_param(name)[source]#
Get an array of parameter with name name from the fit result. The length of the array is equal to the number of amino acids.
- property output#
Dataframe with fitted rates per residue
- Type
Dataframe
- property rate#
Returns an array with the exchange rates
- property tau#
Returns an array with the exchange rates
- class pyhdx.fitting.RatesFitResult(results)[source]#
Accumulates multiple Generic/KineticsFit Results
- pyhdx.fitting.d_uptake_cost_func(x, A, b, d)[source]#
Cost functions for residue-level D-uptake Calculates ||Ax - b|| + d*regularization
Where the regularization is the sum of the absolute value of the gradient of x:
\[\frac{d}{N-1}\sum^{N-1}_i |x_i - x_{i+1}|\]- Parameters
x – D-uptake values per residue.
A – Coupling matrix (‘X’), connecting peptides to residues
b – D-uptake values per residue
d – regularization parameter
- Returns
Value of the cost function
- Return type
- pyhdx.fitting.fit_d_uptake(hdx_obj, guess=None, r1=1.0, bounds=True, repeats=10, verbose=True, client=None)[source]#
Fit residue-level D-uptake to a HDX measurement of multiple timepoints or a single HDX timepoint.
- Parameters
hdx_obj – Input HDX object, either HDXMeasurement or HDXTimepoint.
guess – Optional guess array of D-uptake values.
r1 – Value for r1 regularizer.
bounds – Optional bounds. Default is True, which are bounds [0, 1] for all elements. Set to False or None to disable. Custom bounds can be supplied as list of tuples or scipy bounds object.
repeats – Number of times to repeat the fit.
verbose – Show/hide progress bar
- Returns
D-Uptake fit result object.
- pyhdx.fitting.fit_gibbs_global(hdxm, initial_guess, r1=1, epochs=200000, patience=50, stop_loss=5e-06, optimizer='SGD', callbacks=None, **optimizer_kwargs)[source]#
Fit Gibbs free energies globally to all D-uptake data in the supplied hdxm
- Parameters
hdxm (
HDXMeasurement
) – Input HDX measurementinitial_guess (
Series
orndarray
) – Gibbs free energy initial guesses (shape Nr, units J/mol)r1 (
float
) – Regularizer value r1 (along residues)epochs (
int
) – Maximum number of fitting iterationspatience (
int
) – Number of epochs to wait until termination when progress between epochs is below stop_lossstop_loss (
float
) – Threshold for difference in loss between epochs when an epoch is considered to make no more progress.optimizer (
str
) – Which optimizer to use. Default is Stochastic Gradient Descent. See PyTorch documentation for information.callbacks (
list
or None) – List of callback objects. Call signature is callback(epoch, model, optimizer)**optimizer_kwargs – Additional keyword arguments passed to the optimizer.
- Returns
result
- Return type
TorchSingleFitResult
- pyhdx.fitting.fit_gibbs_global_batch(hdx_set, initial_guess, r1=1, r2=1, r2_reference=False, epochs=200000, patience=50, stop_loss=5e-06, optimizer='SGD', callbacks=None, **optimizer_kwargs)[source]#
Fit Gibbs free energies globally to all D-uptake data in multiple HDX measurements
- Parameters
hdx_set (
HDXMeasurementSet
) – Input HDX measurementsinitial_guess (
Series
orDataFrame
orndarray
) – Gibbs free energy initial guesses (shape Ns x Nr or Nr, units J/mol)r1 (
float
) – Regularizer value r1 (along residues)r2 (
float
) – Regularizer value r2 (along protein states/samples)r2_reference (
bool
:) – If True the first dataset is used as a reference to calculate r2 differences, otherwise the mean is usedepochs (
int
) – Maximum number of fitting iterationspatience (
int
) – Number of epochs to wait until termination when progress between epochs is below stop_lossstop_loss (
float
) – Threshold for difference in loss between epochs when an epoch is considered to make no more progress.optimizer (
str
) – Which optimizer to use. Default is Stochastic Gradient Descent. See PyTorch documentation for information.callbacks (
list
or None) – List of callback objects. Call signature is callback(epoch, model, optimizer)**optimizer_kwargs – Additional keyword arguments passed to the optimizer.
- Returns
result
- Return type
TorchBatchFitResult
- pyhdx.fitting.fit_gibbs_global_batch_aligned(hdx_set, initial_guess, r1=1, r2=1, epochs=200000, patience=50, stop_loss=5e-06, optimizer='SGD', callbacks=None, **optimizer_kwargs)[source]#
Batch fit gibbs free energies to two HDX measurements. The supplied HDXMeasurementSet must have alignment information (supplied by HDXMeasurementSet.add_alignment)
- Parameters
hdx_set (
HDXMeasurementSet
) – Input HDX measurementsinitial_guess (
Series
orDataFrame
orndarray
) – Gibbs free energy initial guesses (shape Ns x Nr or Nr, units J/mol)r1 (
float
) – Regularizer value r1 (along residues)r2 (
float
) – Regularizer value r2 (along protein states/samples)epochs (
int
) – Maximum number of fitting iterationspatience (
int
) – Number of epochs to wait until termination when progress between epochs is below stop_lossstop_loss (
float
) – Threshold for difference in loss between epochs when an epoch is considered to make no more progress.optimizer (
str
) – Which optimizer to use. Default is Stochastic Gradient Descent. See PyTorch documentation for information.callbacks (
list
or None) – List of callback objects. Call signature is callback(epoch, model, optimizer)**optimizer_kwargs – Additional keyword arguments passed to the optimizer.
- Returns
result
- Return type
TorchBatchFitResult
- pyhdx.fitting.fit_kinetics(t, d, model, chisq_thd=100)[source]#
Fit time kinetics with two time components and corresponding relative amplitude.
- pyhdx.fitting.fit_rates(hdxm, method='wt_avg', **kwargs)[source]#
Fit observed rates of exchange to HDX-MS data in hdxm
- Parameters
hdxm (
HDXMeasurement
) –method (
str
) – Method to use to determine rates of exchangekwargs – Additional kwargs passed to fitting
- Returns
fit_result
- Return type
- pyhdx.fitting.fit_rates_half_time_interpolate(hdxm)[source]#
Calculates exchange rates based on weighted averaging followed by interpolation to determine half-time, which is then calculated to rates.
- Parameters
hdxm (
HDXMeasurement
) –- Returns
output – dataclass with fit result
- Return type
dataclass
- pyhdx.fitting.fit_rates_weighted_average(hdxm, bounds=None, chisq_thd=0.2, model_type='association', client=None, pbar=None)[source]#
Fit a model specified by ‘model_type’ to D-uptake kinetics. D-uptake is weighted averaged across peptides per timepoint to obtain residue-level D-uptake.
- Parameters
hdxm (
HDXMeasurement
) –bounds (
tuple
, optional) – Tuple of lower and upper bounds of rate constants in the model used.chisq_thd (
float
) – Threshold of chi squared result, values above will trigger a second round of fitting using DifferentialEvolutionmodel_type (
str
) – Missing docstringclient (: ??) – Controls delegation of fitting tasks to Dask clusters. Options are: None: Do not use task, fitting is done in the local thread in a for loop. :class: Dask Client : Uses the supplied Dask client to schedule fitting task. worker_client: The function was ran by a Dask worker and the additional fitting tasks created are scheduled on the same Cluster.
pbar – Not implemented
- Returns
fit_result
- Return type
- pyhdx.fitting.get_bounds(times)[source]#
estimate default bound for rate fitting from a series of timepoints
- Parameters
times (array_like) –
- Returns
bounds – lower and upper bounds
- Return type
- pyhdx.fitting.run_optimizer(inputs, output_data, optimizer_klass, optimizer_kwargs, model, criterion, regularizer, epochs=200000, patience=50, stop_loss=5e-06, callbacks=None, verbose=True)[source]#
Runs optimization/fitting of PyTorch model.
- Parameters
inputs (
list
) – List of input Tensorsoutput_data (
Tensor
) – comparison data to model outputoptimizer_klass (
optim
) –optimizer_kwargs (
dict
) – kwargs to pass to pytorch optimizermodel (
Module
) – pytorch modelcriterion (callable) – loss function
callable (regularizer) – regularizer function
epochs (
int
) – Max number of epochspatience (
int
) – Number of epochs with less progress than stop_loss before terminating optimizationstop_loss (
float
) – Threshold of optimization value below which no progress is madecallbacks (
list
or None) – List of callback functionsverbose (
bool
) – Toggle progress bar
Fitting PyTorch#
- class pyhdx.fitting_torch.TorchFitResult(hdxm_set, model, losses=None, **metadata)[source]#
PyTorch Fit result object.
- Parameters
hdxm_set (
HDXMeasurementSet
) –model –
**metdata –
- static generate_output(hdxm, dG)[source]#
- Parameters
hdxm (
HDXMeasurement
) –dG (
Series
with r_number as index) –
- get_dcalc(timepoints=None)[source]#
returns calculated d uptake for optional timepoints if no timepoints are given, a default set of logarithmically space timepoints is generated
- get_peptide_mse()[source]#
Get a dataframe with mean squared error per peptide (ie per peptide squared error averaged over time)
- get_residue_mse()[source]#
Get a dataframe with residue mean squared errors
Errors are from peptide MSE which is subsequently reduced to residue level by weighted averaging
- get_squared_errors()[source]#
np.ndarray: Returns the squared error per peptide per timepoint. Output shape is Ns x Np x Nt
- Return type
- property mse_loss#
Losses from mean squared error part of Lagrangian
- Type
obj
- Type
float
- property regularization_percentage#
Percentage part of the total loss that is regularization loss
- Type
- to_file(file_path, include_version=True, include_metadata=True, fmt='csv', **kwargs)[source]#
save only output to file
- property total_loss#
Total loss value of the Lagrangian
- Type
obj
- Type
float
- pyhdx.fitting_torch.estimate_errors(hdxm, dG)[source]#
Calculate covariances and uncertainty (perr, experimental)
- Parameters
hdxm (
HDXMeasurement
) –dG (
ndarray
) – Array with dG values.
FileIO#
- pyhdx.fileIO.csv_to_dataframe(filepath_or_buffer, comment='#', **kwargs)[source]#
Reads a .csv file or buffer into a :pandas:`DataFrame` object. Comment lines are parsed where json dictionaries marked by tags are read. The <pandas_kwargs> marked json dict is used as kwargs for pd.read_csv The <metadata> marked json dict is stored in the returned dataframe object as `df.attrs[‘metadata’].
- pyhdx.fileIO.csv_to_hdxm(filepath_or_buffer, comment='#', **kwargs)[source]#
Reads a pyhdx .csv file or buffer into a pyhdx.models.HDXMeasurement or pyhdx.models.HDXMeasurementSet object.
- Parameters
- Returns
protein – Resulting HDXMeasurement object with r_number as index
- Return type
- pyhdx.fileIO.csv_to_protein(filepath_or_buffer, comment='#', **kwargs)[source]#
Reads a .csv file or buffer into a pyhdx.models.Protein object. Comment lines are parsed where json dictionaries marked by tags are read. The <pandas_kwargs> marked json dict is used as kwargs for pd.read_csv The <metadata> marked json dict is stored in the returned dataframe object as `df.attrs[‘metadata’].
- Parameters
- Returns
protein – Resulting Protein object with r_number as index
- Return type
- pyhdx.fileIO.dataframe_to_file(file_path, df, fmt='csv', include_metadata=True, include_version=False, **kwargs)[source]#
Save a pd.DataFrame to an io.StringIO object. Kwargs to read the resulting .csv object with pd.read_csv to get the original pd.DataFrame back are included in the comments. Optionally additional metadata or the version of PyHDX used can be included in the comments.
- Parameters
file_path (
str
or pathlib.Path) – File path of the target file to write.df (pd.DataFrame) – The pandas dataframe to write to the file.
fmt (
str
) – Specify the formatting of the output. Options are ‘.csv’ (machine readable) or ‘pprint’ (human readable)include_metadata (
bool
ordict
) – If True, the metadata in df.attrs[‘metadata’] is included. Ifdict
, this dictionary is used as the metadata. If False, no metadata is included.include_version (
bool
) – True to include PyHDX version information.**kwargs (
dict
, optional) – Optional additional keyword arguments passed to df.to_csv
- Returns
sio – Resulting io.StringIO object.
- Return type
- pyhdx.fileIO.dataframe_to_stringio(df, sio=None, fmt='csv', include_metadata=True, include_version=True, **kwargs)[source]#
Save a pd.DataFrame to an io.StringIO object. Kwargs to read the resulting .csv object with pd.read_csv to get the original pd.DataFrame back are included in the comments. Optionally additional metadata or the version of PyHDX used can be included in the comments.
- Parameters
df (pd.DataFrame) – The pandas dataframe to write to the io.StringIO object.
sio (io.StringIO, optional) – The io.StringIO object to write to. If None, a new io.StringIO object is created.
fmt (
str
) – Specify the formatting of the output. Options are ‘csv’ (machine readable) or ‘pprint’ (human readable)include_metadata (
bool
ordict
) – If True, the metadata in df.attrs[‘metadata’] is included. Ifdict
, this dictionary is used as the metadata. If False, no metadata is included.include_version (
bool
) – True to include PyHDX version information.**kwargs (
dict
, optional) – Optional additional keyword arguments passed to df.to_csv
- Returns
sio – Resulting io.StringIO object.
- Return type
- pyhdx.fileIO.load_fitresult(fit_dir)[source]#
Load a fitresult into a fitting_torch.TorchSingleFitResult or
TorchBatchFitResult
objectThe fit result must be in the format as generated by saving a fit result with save_fitresult.
- pyhdx.fileIO.read_dynamx(filepath_or_buffer, time_conversion=('min', 's'))[source]#
Reads DynamX .csv files and returns the resulting peptide table as a pandas DataFrame.
- Parameters
filepath_or_buffer – File path of the .csv file or
StringIO
object.time_conversion – How to convert the time unit of the field ‘exposure’. Format is (‘<from>’, <’to’>). Unit options are ‘h’, ‘min’ or ‘s’.
- Returns
Peptide table as a pandas DataFrame.
- pyhdx.fileIO.save_fitresult(output_dir, fit_result, log_lines=None)[source]#
Save a fit result object to the specified directory with associated metadata
Output directory contents: dG.csv/.txt: Fit output result (dG, covariance, k_obs, pfact) losses.csv/.txt: Losses per epoch log.txt: Log file with additional metadata (number of epochs, final losses, pyhdx version, time/date)
Output#
- class pyhdx.output.FitReport(fit_result, title=None, doc=None, add_date=True, temp_dir=None, **kwargs)[source]#
Create .pdf output of a fit result
- class pyhdx.output.LocalThreadExecutor[source]#
- shutdown(wait=True)[source]#
Clean-up the resources associated with the Executor.
It is safe to call this method several times. Otherwise, no other methods can be called after this one.
- Parameters
wait – If True then shutdown will not return until all running futures have finished executing and the resources used by the executor have been reclaimed.
Support#
- pyhdx.support.array_intersection(arrays, fields)[source]#
Find and return the intersecting entries in multiple arrays.
- Parameters
arrays – Iterable of input structured arrays
fields – Iterable of fields to use to decide if entires are intersecting
- Returns
Output iterable of arrays with only intersecting entries.
- Return type
selected
- pyhdx.support.autowrap(start, end, margin=4, step=5)[source]#
Automatically finds wrap value for coverage to not have overlapping peptides within margin
- Parameters
start –
end –
margin –
- pyhdx.support.clean_types(d)[source]#
cleans up nested dict/list/tuple/other d for exporting as yaml
Converts library specific types to python native types, including numpy dtypes, OrderedDict, numpy arrays
- Return type
- pyhdx.support.colors_to_pymol(r_number, color_arr, c_term=None, no_coverage='#8c8c8c')[source]#
coverts colors (hexadecimal format) and corresponding residue numbers to pml script to color structures in pymol residue ranges in output are inclusive, incluive
- c_term:
optional residue number of the c terminal of the last peptide doedsnt cover the c terminal
- pyhdx.support.grouper(3, 'abcdefg', 'x') --> ('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'x', 'x')[source]#
- pyhdx.support.make_color_array(rates, colors, thds, no_coverage='#8c8c8c')[source]#
- Parameters
rates – array of rates
colors – list of colors (slow to fast)
thds – list of thresholds
no_coverage: color value for no coverage :return:
- pyhdx.support.make_monomer(input_file, output_file)[source]#
reads input_file pdb file and removes all chains except chain A and all water
- pyhdx.support.multi_otsu(*rates, classes=3)[source]#
global otsu thesholding of multiple rate arrays in log space
- pyhdx.support.pbar_decorator(pbar)[source]#
Wraps a progress bar around a function, updating the progress bar with each function call
- pyhdx.support.pprint_df_to_file(df, file_path_or_obj)[source]#
Pretty print (human-readable) a dataframe to a file
- pyhdx.support.reduce_inter(args, gap_size=- 1)[source]#
Reduce overlapping intervals to its non-overlapping intveral parts
Author: Brent Pedersen Source: https://github.com/brentp/interlap/blob/3c4a5923c97a5d9a11571e0c9ea5bb7ea4e784ee/interlap.py#L224
- gap_size
int
Gaps of this size between adjacent peptides is not considered to overlap. A value of -1 means that peptides with exactly zero overlap are separated. With gap_size=0 peptides with exactly zero overlap are not separated, and larger values tolerate larger gap sizes.
>>> reduce_inter([(2, 4), (4, 9)]) [(2, 4), (4, 9)] >>> reduce_inter([(2, 6), (4, 10)]) [(2, 10)]
- gap_size
- pyhdx.support.rgb_to_hex(rgb_a)[source]#
Converts rgba input values are [0, 255]
alpha is set to zero
returns as ‘#000000’
- pyhdx.support.select_config()[source]#
When the .pyhdx directory has multiple config files, prompts the users for which config to use and subsequently loads it.
- Return type