rstoolbox.components.DesignFrame.score_by_pssm

DesignFrame.score_by_pssm(seqID, matrix)

Score sequences according to a provided PSSM matrix.

Generates new column by applying the PSSM score to each position of the requested sequences:

New Column Data Content
pssm_score_<seqID> Score obtained by applying matrix
Parameters:
  • seqID (str) – Identifier of the sequence of interest.
  • matrix (DataFrame) – Positional frequency matrix. column: residue type; index: sequence position.
Returns:

Union[DesignSeries, DesignFrame] - Itself with the new column.

Raises:
NotImplementedError:
 if the data container is not DataFrame or Series.
ValueError:if matrix rows do not match sequence length.

Example

In [1]: from rstoolbox.io import parse_rosetta_file
   ...: from rstoolbox.tests.helper import random_frequency_matrix
   ...: import pandas as pd
   ...: pd.set_option('display.width', 1000)
   ...: df = parse_rosetta_file("../rstoolbox/tests/data/input_2seq.minisilent.gz",
   ...:                         {'scores': ['score', 'description'], 'sequence': 'B'})
   ...: matrix = random_frequency_matrix(len(df.get_sequence('B')[0]), 0)
   ...: df.score_by_pssm('B', matrix)
   ...: 
Out[1]: 
     score                     description                                                                                                            sequence_B  pssm_score_B
0 -206.678  test_3lhp_binder_labeled_00001  TRPEEARERAWRLAEIAMRKGWEEHEREWEWWKRASKGREERDMLPERMIAAALRAIGEIFNAEWQMRLEMEKERKNPNAGEEKMKEQKKEAWKIAYYWGLMAAYWIKQHREKERK  6.080453    
1 -214.362  test_3lhp_binder_labeled_00002  PKPEEAMREAYKLIKKYMLKAQKEAQEEWERMRRTDGTKEEKDMFPEKMIAQALRAIGEIFNAYYWAFLKLQEFKKYPSVRWEEQEEARKRLKIMMKIGAEWAREIAREMKERIKR  6.086099    
2 -203.582  test_3lhp_binder_labeled_00003  TKPEEMAREAYKRMLKALKQGEEEMKRMYEQMKKGVDSKEERDMEPEKMIAIALRAIGELFNAWMKALRHMKELRKLGTSGPKEEEKHWRWIFELHRWAGEEIQRAAEIQERKARW  5.296613    
3 -213.779  test_3lhp_binder_labeled_00004  TKPEEWARWAYKEHLKMAEKHRKEMEIEWEELKRRDGKEEEKDMWPERMIAMALRAIGELFNHHMYAEMRAKEEKKKPEAKTEEARRARREIMKYHHEAGRLIEEAMRRLMERHKK  6.766995    
4 -213.972  test_3lhp_binder_labeled_00005  KKWEEMMREAERQGKEYAQKAWKEALLEWKWMRKRPVTEEMKDMAPEWMIAAALRAIGEHFNIYWQQKLEHEKLRKIPNVPEEELEKGKEELKRIEEEAARMAEKYMQELRKKMES  5.507625    
5 -195.138  test_3lhp_binder_labeled_00006  PRPEEMARFAKEEMHKHEEKAYREFLLEYELAIRKNPTEEPKDMQPEWAIAAALRAIGEIFNQWMYHLLEIRKENGSSHTRYEEREKYRKLAKRLHEEAAKEIWKFMHEAMRRFES  4.778006