rstoolbox.analysis.selector_percentage

rstoolbox.analysis.selector_percentage(df, seqID, key_residues, selection_name='selection')

Calculate the percentage coverage of a Selection over the sequence.

Depends on sequence information for the seqID.

Adds a new column to the data container:

New Column Data Content
<selection_name>_<seqID>_perc Percentage of the sequence covered by the key_residues.
Parameters:
  • df (Union[DesignFrame, DesignSeries]) – Data container.
  • seqID (str) – Identifier of the sequence of interest.
  • key_residues (Union[int, list() of int, str, Selection]) – Residues of interest.
  • selection_name (str) – Prefix to add to the selection. Default is selection.
Returns:

Union[DesignFrame, DesignSeries]

Raises:
NotImplementedError:
 if the data passed is not in Union[DesignFrame, DesignSeries].
KeyError:if there is no sequence information for chain seqID of the decoys.

Example

In [1]: from rstoolbox.io import parse_rosetta_file
   ...: from rstoolbox.analysis import selector_percentage
   ...: import pandas as pd
   ...: pd.set_option('display.width', 1000)
   ...: pd.set_option('display.max_columns', 500)
   ...: df = parse_rosetta_file("../rstoolbox/tests/data/input_ssebig.minisilent.gz",
   ...:                         {'scores': ['score'], 'sequence': 'C'})
   ...: df = selector_percentage(df, 'C', '1-15')
   ...: df.head()
   ...: 
Out[1]: 
    score                                                  sequence_C  selection_C_perc
0 -64.070  TTWIKFFAGGTLVEEFEYSSVNWEEIEKRAWKKLGRWKKAEEGDLMIVYPDGKVVSWA  0.258621        
1 -70.981  NTWSTNILNGHPKITLLVEERGAEEIHLEWLKKQGLRKKAEENVYTTKLPNGAVKVYG  0.258621        
2 -43.863  PRWFIAMGDGVIWEIVLGSEQNLEEIAKKGLKRRGLYKKAEESIYTIIYPDGIAHTFG  0.258621        
3 -75.847  PYEWVFIINGVPQTTWNHPPTKMEELEKFARKKGGSSKKAEEGKFAIIIWKGYFIVSD  0.258621        
4 -55.347  LREYLVEAGGYPQSSAWRTKTGLEEAMREILEKKGLMKKAEEGRDLRFLPKGIVRVQA  0.258621