rstoolbox.components.SequenceFrame.to_bits

SequenceFrame.to_bits()

Change the sequenceFrame from frequency to bits. Bit calculation is performed as explained in http://www.genome.org/cgi/doi/10.1101/gr.849004:

Rseq = Smax - Sobs = log2 N - (-sum(n=1,N):pn * log2 pn)
Where:
  • N is the total number of options (4: DNA/RNA; 20: PROTEIN). This is automatically picked from the number of columns, which means that adding/deleting columns for different reasons will translate into different maximum expected height.
  • pn is the observed frequency of the symbol n.
Returns:SequenceFrame - with the new scoring system