alignment
align_dataframes(dataframes, alignment, first_r_numbers=None)
Aligned dataframes based on an alignment.
The supplied dataframes should have the residue number as index. The returned dataframes are reindex and their residue numbers are moved to the 'r_number' columns.
Parameters
dataframes : :obj:list
or :obj:dict
Input dataframes
alignment : :obj:list
or :obj:dict
Alignment as list or dict, (type must be equal to dataframes
, values are strings with single-letter amino
acids codes where gaps are '-'.
first_r_numbers: :obj:list
List of residue numbers corresponding to the first residue in the alignment sequences. Use for N-terminally
truncated proteins or proteins with fused purification tags.
Returns
:class:pd.Dataframe
Aligned and concatenated dataframe. If 'alignment' is given as a dict, the returned dataframe is a column multiindex dataframe.
Source code in pyhdx/alignment.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
|
parse_clustal_string(s, num_proteins, whitelines=2, offset=0)
Takes input Clustal result and parses it to a dictionary.
Keys in the output dict are IDs of the protein as input into clustal. Values are aligned (containing '-' for gaps) and concatenated FASTA sequences.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s |
str
|
Input Clustal string. |
required |
num_proteins |
int
|
Number of aligned proteins in the clustal result. |
required |
whitelines |
int
|
Number of white lines between each block of aligned proteins. Default value is 2. |
2
|
offset |
int
|
Number of lines before alignment information starts. |
0
|
Returns:
Type | Description |
---|---|
dict
|
Dictionary with concatenated aligned result |
Source code in pyhdx/alignment.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|