One of the key advantadges of rstoolbox is the ability to control the amount
and type of data that is loaded from a silent/score file. This control is managed
through a definition, a dictionary that describes the type of data that can be
loaded.
Note
definition is meant to be applied to parse_rosetta_file().
As of now, there are 10 different options that can be convined into a definition:
| definition term | description |
|---|---|
| scores | Basic selection of the scores to store. Default is all scores. |
| scores_ignore | Selection of specific scores to ignore. |
| scores_rename | Rename some score names to others. |
| scores_by_residue | Pick score by residue types into a single array value. |
| scores_missing | Names of scores that might be missing in some decoys. |
| naming | Use the decoy identifier’s name to create extra score terms. |
| sequence | Pick sequence data from the silent file. |
| structure | Pick structural data from the silent file. |
| psipred | Pick PSIPRED data from the silent file. |
| dihedrals | Retrieve dihedral data from the silent file. |
| labels | Retrieve residue labels from the silent file. |
| graft_ranges | When using the MotifGraftMover, multi-columns will be created when more than one segment is grafted. Provide here the number of segments. |
Tip
definition can be passed directly as a dictionary or can be saved as a
JSON or YAML file and loaded from there.
This is the most basic parameter, and refer to regular scores in the silent/score
file. It allows to select just the scores that are wanted for the analysis.
There are three main ways to define scores, provide a list naming the
scores of interest:
{'scores': ['score', 'packstat', 'description']}
add a string asterisc if all scores all wanted (this is the default value for this parameter):
{'scores': '*'}
or add a minus sign, which will ignore all scores:
{'scores': '-'}
Example
In [1]: from rstoolbox.io import parse_rosetta_file
...: import pandas as pd
...: pd.set_option('display.width', 1000)
...: ifile = '../rstoolbox/tests/data/input_2seq.minisilent.gz'
...: definition1 = {'scores': ['score', 'packstat', 'description']}
...: df = parse_rosetta_file(ifile, definition1)
...: df.head()
...:
Out[1]:
score packstat description
0 -206.678 0.633 test_3lhp_binder_labeled_00001
1 -214.362 0.577 test_3lhp_binder_labeled_00002
2 -203.582 0.568 test_3lhp_binder_labeled_00003
3 -213.779 0.614 test_3lhp_binder_labeled_00004
4 -213.972 0.591 test_3lhp_binder_labeled_00005
In [2]: definition2 = {'scores': '*'}
...: df1 = parse_rosetta_file(ifile, definition2)
...: df2 = parse_rosetta_file(ifile)
...: df1.head()
...: