API reference¶

Components¶

These are the list of dedicated objects provided to manage design data. They can be called through rstoolbox.components.

`Selection`([selection])	Complex management of residue selection from a sequence.
`SelectionContainer`(*args)	Helper class to manage representation of selectors in `pandas`.
`DesignSeries`(args, *kwargs)	The `DesignSeries` extends the `Series` adding some functionalities in order to improve its usability in the analysis of a single design decoys.
`DesignFrame`(args, *kwargs)	The `DesignFrame` extends the `DataFrame` adding some functionalities in order to improve its usability in the analysis of sets of design decoys.
`SequenceFrame`(args, *kwargs)	Per position frequency occurrence for a set of decoys.
`FragmentFrame`(args, *kw)	Data container for Fragment data.

IO: Sequence¶

Helper functions to read/write direct sequence information. They can be called through rstoolbox.io.

`read_fasta`(filename[, expand, multi, defchain])	Reads one or more FASTA files and returns the appropiate object containing the requested data: the `DesignFrame`.
`write_fasta`(df, seqID[, separator, …])	Writes fasta files of the selected decoys.
`write_clustalw`(df, seqID[, filename])	Write sequences of selected designs as a CLUSTALW alignment.
`write_mutant_alignments`(df, seqID[, filename])	Writes a text file containing only the positions changed with respect to the `reference_sequence`.
`read_hmmsearch`(filename)	Read output from `hmmsearch` or `hmmscan`.
`pymol_mutant_selector`(df)	Generate selectors for the mutations in target decoys.

IO: Structure¶

Helper functions to read/write outputs of programs based on protein structure. They can be called through rstoolbox.io.

parse_master_file(filename[, max_rmsd, …]) Load data obtained from a MASTER search.

IO: Rosetta¶

Helper functions to read/write data generated with Rosetta. They can be called through rstoolbox.io.

`parse_rosetta_file`(filename[, description, …])	Read a Rosetta score or silent file and returns the design population in a `DesignFrame`.
`parse_rosetta_json`(filename)	Read a json formated rosetta score file.
`parse_rosetta_pdb`(filename[, keep_weights, …])	Read the `POSE_ENERGIES_TABLE` from a Rosetta output PDB file.
`parse_rosetta_contacts`(filename)	Read a residue contact file as generated by ContactMapMover.
`parse_rosetta_fragments`(filename[, source])	Read a Rosetta fragment-file and return the appropiate `FragmentFrame`.
`write_rosetta_fragments`(df[, frag_size, …])	Writes a Rosetta fragment-file (new format) from an appropiate `FragmentFrame`.
`write_fragment_sequence_profiles`(df[, …])	Write a sequence profile from `FragmentFrame` to load into Rosetta’s SeqprofConsensus.
`get_sequence_and_structure`(pdbfile[, …])	Provided a PDB file, it will run a small RosettaScript to capture its sequence and structure, i.e.
`make_structures`(df[, outdir, tagsfilename, …])	Extract the selected decoys (if any).

IO: Experiments¶

Helper functions to read/write data generated through wedlab experiments. They can be called through rstoolbox.io.

`read_SPR`(filename)	Reads Surface Plasmon Resonance data.
`read_CD`(dirname[, prefix, invert_temp, …])	Read Circular Dichroism data for multiple temperatures.
`read_MALS`(filename[, mmfile])	Read data from Multi-Angle Light Scattering data.
`read_fastq`(filename[, seqID])	Reads a FASTQ file and stores the ID together with the sequence.

Analysis¶

Helper functions for sequence analysis. They can be called through rstoolbox.analysis.

`sequential_frequencies`(df, seqID[, query, …])	Generates a `SequenceFrame` for the frequencies of the sequences in the `DesignFrame` with `seqID` identifier.
`sequence_similarity`(df, seqID[, …])	Evaluate the sequence similarity between each decoy and the `reference_sequence` for a given `seqID`.
`positional_sequence_similarity`(df[, seqID, …])	Per position identity and similarity against a `reference_sequence`.
`binary_similarity`(df, seqID[, key_residues, …])	Binary profile for each design sequence against the `reference_sequence`.
`binary_overlap`(df, seqID[, key_residues, matrix])	Overlap the binary similarity representation of all decoys in a `DesignFrame`.
`positional_enrichment`(df, other, seqID)	Calculates per-residue enrichment from sequences in the first `DesignFrame` with respect to the second.
`positional_structural_count`(df[, seqID, …])	Percentage of secondary structure types for each sequence position of all decoys.
`positional_structural_identity`(df[, seqID, …])	Per position evaluation of how many times the provided data matches the expected `reference_structure`.
`secondary_structure_percentage`(df, seqID[, …])	Calculate the percentage of the different secondary structure types.
`selector_percentage`(df, seqID, key_residues)	Calculate the percentage coverage of a `Selection` over the sequence.
`label_percentage`(df, seqID, label)	Calculate the percentage coverage of a `label` over the sequence.
`label_sequence`(df, seqID, label[, complete])	Gets the sequence of a `label`.
`cumulative`(values[, bins, max_count, …])	Generates, for a given list of values, its cumulative distribution values.

Plot¶

Once the data is loaded in the different components, it is ready to use into any plotting library, but some special plotting alternatives are offered through rstoolbox.plot.

`multiple_distributions`(df, fig, grid[, …])	Automatically plot boxplot distributions for multiple score types of the decoy population.
`sequence_frequency_plot`(df, seqID, ax[, …])	Makes a heatmap subplot into the provided axis showing the sequence distribution of each residue type for each position.
`logo_plot`(df, seqID[, refseq, key_residues, …])	Generates full figure classic LOGO plots.
`logo_plot_in_axis`(df, seqID, ax[, refseq, …])	Generates classic LOGO plot in a given axis.
`positional_sequence_similarity_plot`(df, ax)	Generates a plot covering the amount of identities and positives matches from a population of designs to a reference sequence according to a substitution matrix.
`per_residue_matrix_score_plot`(df, seqID, ax)	Plot a linear representation of the scoring obtained by applying a substitution matrix.
`positional_structural_similarity_plot`(df, ax)	Generates a bar plot for positional prevalence of secondary structure elements.
`plot_fragments`(small_frags, large_frags, …)	Plot RMSD quality of a pair of `FragmentFrame` in two provided axis.
`plot_fragment_profiles`(fig, small_frags, …)	Plots a full summary of the a `FragmentFrame` quality with sequence and expected secondary structure match.
`plot_alignment`(df, seqID, ax[, line_break, …])	Make an image representing the alignment of sequences with higlights to mutant positions.
`plot_ramachandran`(df, seqID, fig[, grid, …])	Generates a ramachandran plot in RAMPAGE style.
`plot_ramachandran_single`(df, seqID, ax[, …])	Plot only one of the 4 ramachandran plots in RAMPAGE format.
`plot_dssp_vs_psipred`(df, seqID, ax)	Generates a horizontal heatmap showing differences in psipred predictions to dssp assignments.

Plot: Experiments¶

Plot data obtained from experimental procedures. Accessible through rstoolbox.plot.

`plot_96wells`([cdata, sdata, bdata, bcolors, …])	Plot data of a 96 well plate into an equivalent-shaped plot.
`plot_thermal_melt`(df, ax[, linecolor, …])	Plot Thermal Melt data.
`plot_MALS`(df, ax[, uvcolor, lscolor, …])	Plot Multi-Angle Light Scattering data.
`plot_CD`(df, ax[, color, wavelengths, sample])	Plot Circular Dichroism data.
`plot_SPR`(df, ax[, datacolor, fitcolor, …])	Plot Surface Plasmon Resonance data.

Utils: Plot¶

Special functions to help personalise your plot easily can be loaded through rstoolbox.utils.

`format_Ipython`()	Ensure `monospace` representation of `DataFrame` in Jupyter Notebooks.
`highlight`(row, selection[, color, …])	Highlight rows in Jupyter Notebooks that match the given index.
`use_qgrid`(df, **kwargs)	Create a `QgridWidget` object from the qgrid library in Jupyter Notebooks.
`add_left_title`(ax, title, **kwargs)	Add a centered title on the left of the selected axis.
`add_right_title`(ax, title, **kwargs)	Add a centered title on rigth of the selected axis.
`add_top_title`(ax, title, **kwargs)	Add a centered title on top of the selected axis.
`edit_legend_text`(ax, labels[, title])	Change the labels and title of a legend.
`add_white_to_cmap`([color, cmap, n_colors])	Generate a new colormap with white as first instance.
`color_variant`(color[, brightness_offset])	Make a color darker or lighter.

Utils: Contextualize¶

Functions aimed to help assess a design population in the context of known protein structures.

`load_refdata`(ref[, homology])	Load the predefined reference data from `cath`, `scop`, `scop2` or `chain`.
`make_redundancy_table`([precalculated, select])	Query into the PDB to retrieve the pre-calculated homology tables.
`plot_in_context`(df, fig, grid, refdata[, …])	Plot position of decoys in a backgroud reference dataset.
`distribution_quality`(df, refdata, values, …)	Locate the quantile position of each putative `DesingSerie` in a list of score distributions.

Utils: Transforms¶

Special functions to help transform your data can be loaded through rstoolbox.utils.

`add_column`(df, name, value)	Adds a new column to the DataFrame with the given value.
`split_values`(df, keys)	Reshape the data to aide plotting of multiple comparable scores.
`split_dataframe_rows`(df, column_selectors[, …])	Given a dataframe in which certain columns are lists, it splits these lists making new rows in the `DataFrame` out of itself.
`report`(df)	Cast basic sequence count into pdb count for the appropiate columns.
`concat_fragments`(fragment_list)	Combine multiple `FragmentFrame`.

Utils: RosettaScript¶

Get the RosettaScripts that are called by different functions of the library with rstoolbox.utils.

`baseline`([minimize])	RosettaScript to calculate DSSP secondary structure and phi-psi angles.
`mutations`([seqID])	RosettaScript to execute a RESFILE.

Utils: Experiments¶

Special functions to help obtain data from multiple Next Generation Sequencing data.Can be loaded through rstoolbox.utils.

`translate_dna_sequence`(sequence)	Translates DNA to protein.
`translate_3frames`(sequence[, matches])	Translates DNA to protein trying all possible frames.
`adapt_length`(seqlist, start, stop[, inclusive])	Pick only the sequence between the provided pattern tags.
`sequencing_enrichment`(indata[, enrichment, …])	Retrieve data from multiple NGS files.

Internals: Functions for Developers¶

This functions are only of interest if you plan on writing new functionalities in rstoolbox.

`io.open_rosetta_file`(filename[, multi, …])	Internal function; reads through a Rosetta silent file and yields only the lines that the library knows how to parse.
`components.get_selection`(key_residues, seqID)	Internal function; global management and casting of `Selection`.
`utils.make_rosetta_app_path`(application)	Provided the expected Rosetta application, add path and suffix.
`tests.helper.random_frequency_matrix`(size[, …])	Generate a random frequency matrix.
`tests.helper.random_proteins`(size, count)	Generate random protein sequences.
`tests.helper.random_fastq`(sequence, …)	Generate a requested number of fastq files.