rstoolbox.utils.split_values

rstoolbox.utils.split_values(df, keys)

Reshape the data to aide plotting of multiple comparable scores.

Note

This might change the data in a way that a decoy would be repeated multiple times.

The dictionary that needs to be provided to split the data container has three main keys:

  1. keep: Identity the columns to keep (they cannot be the ones that split). If not provided, all columns are kept.
  2. split: List with columns to split. Each position is a tuple. The first position is the name of the column to split and the rest will be the value names that will be used to identify it.
  3. names: Names of the columns. The first one will be the name of the column where the values will be assigned, the rest will be the names of the columns for the rest of the identifiers.
Parameters:
  • df (DataFrame) – Data container.
  • keys (dict) – Selection of the columns to keep and split.
Returns:

Altered Data container.

Example

In [1]: from rstoolbox.io import parse_rosetta_file
   ...: from rstoolbox.utils import split_values
   ...: import pandas as pd
   ...: pd.set_option('display.width', 1000)
   ...: pd.set_option('display.max_columns', 500)
   ...: ifile = '../rstoolbox/tests/data/input_2seq.minisilent.gz'
   ...: scorel = ['score', 'GRMSD2Target', 'GRMSD2Template', 'LRMSD2Target',
   ...:           'LRMSDH2Target', 'LRMSDLH2Target', 'description']
   ...: df = parse_rosetta_file(ifile, {'scores': scorel})
   ...: df
   ...: 
Out[1]: 
     score  GRMSD2Target  GRMSD2Template  LRMSD2Target  LRMSDH2Target  LRMSDLH2Target                     description
0 -206.678  1.976         1.927           4.404         4.055          2.490           test_3lhp_binder_labeled_00001
1 -214.362  2.659         2.417           4.469         4.124          2.730           test_3lhp_binder_labeled_00002
2 -203.582  2.026         1.607           5.208         4.598          2.907           test_3lhp_binder_labeled_00003
3 -213.779  2.407         2.047           5.728         4.866          3.002           test_3lhp_binder_labeled_00004
4 -213.972  2.245         1.907           3.787         3.258          2.692           test_3lhp_binder_labeled_00005
5 -195.138  2.581         2.453           5.021         4.127          2.473           test_3lhp_binder_labeled_00006

In [2]: split1 = {'split': [('GRMSD2Target', 'grmsdTr'), ('GRMSD2Template', 'grmsdTp'),
   ...:                     ('LRMSD2Target', 'lrmsdTp'), ('LRMSDH2Target', 'lrmsdh2'),
   ...:                     ('LRMSDLH2Target', 'lrmsdlh2')],
   ...: 'names': ['rmsd', 'rmsd_type']}
   ...: split_values(df, split1)
   ...: 
Out[2]: 
     score                     description   rmsd rmsd_type
0 -206.678  test_3lhp_binder_labeled_00001  1.976  grmsdTr 
1 -214.362  test_3lhp_binder_labeled_00002  2.659  grmsdTr 
2 -203.582  test_3lhp_binder_labeled_00003  2.026  grmsdTr 
3 -213.779  test_3lhp_binder_labeled_00004  2.407  grmsdTr 
4 -213.972  test_3lhp_binder_labeled_00005  2.245  grmsdTr 
5 -195.138  test_3lhp_binder_labeled_00006  2.581  grmsdTr 
0 -206.678  test_3lhp_binder_labeled_00001  1.927  grmsdTp 
1 -214.362  test_3lhp_binder_labeled_00002  2.417  grmsdTp 
2 -203.582  test_3lhp_binder_labeled_00003  1.607  grmsdTp 
3 -213.779  test_3lhp_binder_labeled_00004  2.047  grmsdTp 
4 -213.972  test_3lhp_binder_labeled_00005  1.907  grmsdTp 
5 -195.138  test_3lhp_binder_labeled_00006  2.453  grmsdTp 
0 -206.678  test_3lhp_binder_labeled_00001  4.404  lrmsdTp 
1 -214.362  test_3lhp_binder_labeled_00002  4.469  lrmsdTp 
2 -203.582  test_3lhp_binder_labeled_00003  5.208  lrmsdTp 
3 -213.779  test_3lhp_binder_labeled_00004  5.728  lrmsdTp 
4 -213.972  test_3lhp_binder_labeled_00005  3.787  lrmsdTp 
5 -195.138  test_3lhp_binder_labeled_00006  5.021  lrmsdTp 
0 -206.678  test_3lhp_binder_labeled_00001  4.055  lrmsdh2 
1 -214.362  test_3lhp_binder_labeled_00002  4.124  lrmsdh2 
2 -203.582  test_3lhp_binder_labeled_00003  4.598  lrmsdh2 
3 -213.779  test_3lhp_binder_labeled_00004  4.866  lrmsdh2 
4 -213.972  test_3lhp_binder_labeled_00005  3.258  lrmsdh2 
5 -195.138  test_3lhp_binder_labeled_00006  4.127  lrmsdh2 
0 -206.678  test_3lhp_binder_labeled_00001  2.490  lrmsdlh2
1 -214.362  test_3lhp_binder_labeled_00002  2.730  lrmsdlh2
2 -203.582  test_3lhp_binder_labeled_00003  2.907  lrmsdlh2
3 -213.779  test_3lhp_binder_labeled_00004  3.002  lrmsdlh2
4 -213.972  test_3lhp_binder_labeled_00005  2.692  lrmsdlh2
5 -195.138  test_3lhp_binder_labeled_00006  2.473  lrmsdlh2

In [3]: split2 = {'split': [('GRMSD2Target', 'global', 'target'),
   ...:                     ('GRMSD2Template', 'global', 'template'),
   ...:                     ('LRMSD2Target', 'local', 'target'),
   ...:                     ('LRMSDH2Target', 'local', 'helix2'),
   ...:                     ('LRMSDLH2Target', 'local', 'lhelix2')],
   ...: 'names': ['rmsd', 'rmsd_type', 'rmsd_target']}
   ...: split_values(df, split2)
   ...: 
Out[3]: 
     score                     description   rmsd rmsd_type rmsd_target
0 -206.678  test_3lhp_binder_labeled_00001  1.976  global    target    
1 -214.362  test_3lhp_binder_labeled_00002  2.659  global    target    
2 -203.582  test_3lhp_binder_labeled_00003  2.026  global    target    
3 -213.779  test_3lhp_binder_labeled_00004  2.407  global    target    
4 -213.972  test_3lhp_binder_labeled_00005  2.245  global    target    
5 -195.138  test_3lhp_binder_labeled_00006  2.581  global    target    
0 -206.678  test_3lhp_binder_labeled_00001  1.927  global    template  
1 -214.362  test_3lhp_binder_labeled_00002  2.417  global    template  
2 -203.582  test_3lhp_binder_labeled_00003  1.607  global    template  
3 -213.779  test_3lhp_binder_labeled_00004  2.047  global    template  
4 -213.972  test_3lhp_binder_labeled_00005  1.907  global    template  
5 -195.138  test_3lhp_binder_labeled_00006  2.453  global    template  
0 -206.678  test_3lhp_binder_labeled_00001  4.404  local     target    
1 -214.362  test_3lhp_binder_labeled_00002  4.469  local     target    
2 -203.582  test_3lhp_binder_labeled_00003  5.208  local     target    
3 -213.779  test_3lhp_binder_labeled_00004  5.728  local     target    
4 -213.972  test_3lhp_binder_labeled_00005  3.787  local     target    
5 -195.138  test_3lhp_binder_labeled_00006  5.021  local     target    
0 -206.678  test_3lhp_binder_labeled_00001  4.055  local     helix2    
1 -214.362  test_3lhp_binder_labeled_00002  4.124  local     helix2    
2 -203.582  test_3lhp_binder_labeled_00003  4.598  local     helix2    
3 -213.779  test_3lhp_binder_labeled_00004  4.866  local     helix2    
4 -213.972  test_3lhp_binder_labeled_00005  3.258  local     helix2    
5 -195.138  test_3lhp_binder_labeled_00006  4.127  local     helix2    
0 -206.678  test_3lhp_binder_labeled_00001  2.490  local     lhelix2   
1 -214.362  test_3lhp_binder_labeled_00002  2.730  local     lhelix2   
2 -203.582  test_3lhp_binder_labeled_00003  2.907  local     lhelix2   
3 -213.779  test_3lhp_binder_labeled_00004  3.002  local     lhelix2   
4 -213.972  test_3lhp_binder_labeled_00005  2.692  local     lhelix2   
5 -195.138  test_3lhp_binder_labeled_00006  2.473  local     lhelix2