rstoolbox.utils.
split_values
(df, keys)¶Reshape the data to aide plotting of multiple comparable scores.
Note
This might change the data in a way that a decoy would be repeated multiple times.
The dictionary that needs to be provided to split the data container has three main keys:
keep
: Identity the columns to keep (they cannot be the ones that split). If not provided, all columns are kept.split
: List with columns to split. Each position is a tuple. The first position is the name of the column to split and the rest will be the value names that will be used to identify it.names
: Names of the columns. The first one will be the name of the column where the values will be assigned, the rest will be the names of the columns for the rest of the identifiers.Parameters: | |
---|---|
Returns: | Altered Data container. |
Example
In [1]: from rstoolbox.io import parse_rosetta_file
...: from rstoolbox.utils import split_values
...: import pandas as pd
...: pd.set_option('display.width', 1000)
...: pd.set_option('display.max_columns', 500)
...: ifile = '../rstoolbox/tests/data/input_2seq.minisilent.gz'
...: scorel = ['score', 'GRMSD2Target', 'GRMSD2Template', 'LRMSD2Target',
...: 'LRMSDH2Target', 'LRMSDLH2Target', 'description']
...: df = parse_rosetta_file(ifile, {'scores': scorel})
...: df
...:
Out[1]:
score GRMSD2Target GRMSD2Template LRMSD2Target LRMSDH2Target LRMSDLH2Target description
0 -206.678 1.976 1.927 4.404 4.055 2.490 test_3lhp_binder_labeled_00001
1 -214.362 2.659 2.417 4.469 4.124 2.730 test_3lhp_binder_labeled_00002
2 -203.582 2.026 1.607 5.208 4.598 2.907 test_3lhp_binder_labeled_00003
3 -213.779 2.407 2.047 5.728 4.866 3.002 test_3lhp_binder_labeled_00004
4 -213.972 2.245 1.907 3.787 3.258 2.692 test_3lhp_binder_labeled_00005
5 -195.138 2.581 2.453 5.021 4.127 2.473 test_3lhp_binder_labeled_00006
In [2]: split1 = {'split': [('GRMSD2Target', 'grmsdTr'), ('GRMSD2Template', 'grmsdTp'),
...: ('LRMSD2Target', 'lrmsdTp'), ('LRMSDH2Target', 'lrmsdh2'),
...: ('LRMSDLH2Target', 'lrmsdlh2')],
...: 'names': ['rmsd', 'rmsd_type']}
...: split_values(df, split1)
...:
Out[2]:
score description rmsd rmsd_type
0 -206.678 test_3lhp_binder_labeled_00001 1.976 grmsdTr
1 -214.362 test_3lhp_binder_labeled_00002 2.659 grmsdTr
2 -203.582 test_3lhp_binder_labeled_00003 2.026 grmsdTr
3 -213.779 test_3lhp_binder_labeled_00004 2.407 grmsdTr
4 -213.972 test_3lhp_binder_labeled_00005 2.245 grmsdTr
5 -195.138 test_3lhp_binder_labeled_00006 2.581 grmsdTr
0 -206.678 test_3lhp_binder_labeled_00001 1.927 grmsdTp
1 -214.362 test_3lhp_binder_labeled_00002 2.417 grmsdTp
2 -203.582 test_3lhp_binder_labeled_00003 1.607 grmsdTp
3 -213.779 test_3lhp_binder_labeled_00004 2.047 grmsdTp
4 -213.972 test_3lhp_binder_labeled_00005 1.907 grmsdTp
5 -195.138 test_3lhp_binder_labeled_00006 2.453 grmsdTp
0 -206.678 test_3lhp_binder_labeled_00001 4.404 lrmsdTp
1 -214.362 test_3lhp_binder_labeled_00002 4.469 lrmsdTp
2 -203.582 test_3lhp_binder_labeled_00003 5.208 lrmsdTp
3 -213.779 test_3lhp_binder_labeled_00004 5.728 lrmsdTp
4 -213.972 test_3lhp_binder_labeled_00005 3.787 lrmsdTp
5 -195.138 test_3lhp_binder_labeled_00006 5.021 lrmsdTp
0 -206.678 test_3lhp_binder_labeled_00001 4.055 lrmsdh2
1 -214.362 test_3lhp_binder_labeled_00002 4.124 lrmsdh2
2 -203.582 test_3lhp_binder_labeled_00003 4.598 lrmsdh2
3 -213.779 test_3lhp_binder_labeled_00004 4.866 lrmsdh2
4 -213.972 test_3lhp_binder_labeled_00005 3.258 lrmsdh2
5 -195.138 test_3lhp_binder_labeled_00006 4.127 lrmsdh2
0 -206.678 test_3lhp_binder_labeled_00001 2.490 lrmsdlh2
1 -214.362 test_3lhp_binder_labeled_00002 2.730 lrmsdlh2
2 -203.582 test_3lhp_binder_labeled_00003 2.907 lrmsdlh2
3 -213.779 test_3lhp_binder_labeled_00004 3.002 lrmsdlh2
4 -213.972 test_3lhp_binder_labeled_00005 2.692 lrmsdlh2
5 -195.138 test_3lhp_binder_labeled_00006 2.473 lrmsdlh2
In [3]: split2 = {'split': [('GRMSD2Target', 'global', 'target'),
...: ('GRMSD2Template', 'global', 'template'),
...: ('LRMSD2Target', 'local', 'target'),
...: ('LRMSDH2Target', 'local', 'helix2'),
...: ('LRMSDLH2Target', 'local', 'lhelix2')],
...: 'names': ['rmsd', 'rmsd_type', 'rmsd_target']}
...: split_values(df, split2)
...:
Out[3]:
score description rmsd rmsd_type rmsd_target
0 -206.678 test_3lhp_binder_labeled_00001 1.976 global target
1 -214.362 test_3lhp_binder_labeled_00002 2.659 global target
2 -203.582 test_3lhp_binder_labeled_00003 2.026 global target
3 -213.779 test_3lhp_binder_labeled_00004 2.407 global target
4 -213.972 test_3lhp_binder_labeled_00005 2.245 global target
5 -195.138 test_3lhp_binder_labeled_00006 2.581 global target
0 -206.678 test_3lhp_binder_labeled_00001 1.927 global template
1 -214.362 test_3lhp_binder_labeled_00002 2.417 global template
2 -203.582 test_3lhp_binder_labeled_00003 1.607 global template
3 -213.779 test_3lhp_binder_labeled_00004 2.047 global template
4 -213.972 test_3lhp_binder_labeled_00005 1.907 global template
5 -195.138 test_3lhp_binder_labeled_00006 2.453 global template
0 -206.678 test_3lhp_binder_labeled_00001 4.404 local target
1 -214.362 test_3lhp_binder_labeled_00002 4.469 local target
2 -203.582 test_3lhp_binder_labeled_00003 5.208 local target
3 -213.779 test_3lhp_binder_labeled_00004 5.728 local target
4 -213.972 test_3lhp_binder_labeled_00005 3.787 local target
5 -195.138 test_3lhp_binder_labeled_00006 5.021 local target
0 -206.678 test_3lhp_binder_labeled_00001 4.055 local helix2
1 -214.362 test_3lhp_binder_labeled_00002 4.124 local helix2
2 -203.582 test_3lhp_binder_labeled_00003 4.598 local helix2
3 -213.779 test_3lhp_binder_labeled_00004 4.866 local helix2
4 -213.972 test_3lhp_binder_labeled_00005 3.258 local helix2
5 -195.138 test_3lhp_binder_labeled_00006 4.127 local helix2
0 -206.678 test_3lhp_binder_labeled_00001 2.490 local lhelix2
1 -214.362 test_3lhp_binder_labeled_00002 2.730 local lhelix2
2 -203.582 test_3lhp_binder_labeled_00003 2.907 local lhelix2
3 -213.779 test_3lhp_binder_labeled_00004 3.002 local lhelix2
4 -213.972 test_3lhp_binder_labeled_00005 2.692 local lhelix2
5 -195.138 test_3lhp_binder_labeled_00006 2.473 local lhelix2