These are the list of dedicated objects provided to manage design data. They can be called through rstoolbox.components.
Selection([selection]) |
Complex management of residue selection from a sequence. |
SelectionContainer(*args) |
Helper class to manage representation of selectors in pandas. |
DesignSeries(*args, **kwargs) |
The DesignSeries extends the Series adding some functionalities in order to improve its usability in the analysis of a single design decoys. |
DesignFrame(*args, **kwargs) |
The DesignFrame extends the DataFrame adding some functionalities in order to improve its usability in the analysis of sets of design decoys. |
SequenceFrame(*args, **kwargs) |
Per position frequency occurrence for a set of decoys. |
FragmentFrame(*args, **kw) |
Data container for Fragment data. |
Helper functions to read/write direct sequence information. They can be called through rstoolbox.io.
read_fasta(filename[, expand, multi, defchain]) |
Reads one or more FASTA files and returns the appropiate object containing the requested data: the DesignFrame. |
write_fasta(df, seqID[, separator, …]) |
Writes fasta files of the selected decoys. |
write_clustalw(df, seqID[, filename]) |
Write sequences of selected designs as a CLUSTALW alignment. |
write_mutant_alignments(df, seqID[, filename]) |
Writes a text file containing only the positions changed with respect to the reference_sequence. |
read_hmmsearch(filename) |
Read output from hmmsearch or hmmscan. |
pymol_mutant_selector(df) |
Generate selectors for the mutations in target decoys. |
Helper functions to read/write outputs of programs based on protein structure. They can be called through rstoolbox.io.
parse_master_file(filename[, max_rmsd, …]) |
Load data obtained from a MASTER search. |
Helper functions to read/write data generated with Rosetta. They can be called through rstoolbox.io.
parse_rosetta_file(filename[, description, …]) |
Read a Rosetta score or silent file and returns the design population in a DesignFrame. |
parse_rosetta_json(filename) |
Read a json formated rosetta score file. |
parse_rosetta_pdb(filename[, keep_weights, …]) |
Read the POSE_ENERGIES_TABLE from a Rosetta output PDB file. |
parse_rosetta_contacts(filename) |
Read a residue contact file as generated by ContactMapMover. |
parse_rosetta_fragments(filename[, source]) |
Read a Rosetta fragment-file and return the appropiate FragmentFrame. |
write_rosetta_fragments(df[, frag_size, …]) |
Writes a Rosetta fragment-file (new format) from an appropiate FragmentFrame. |
write_fragment_sequence_profiles(df[, …]) |
Write a sequence profile from FragmentFrame to load into Rosetta’s SeqprofConsensus. |
get_sequence_and_structure(pdbfile[, …]) |
Provided a PDB file, it will run a small RosettaScript to capture its sequence and structure, i.e. |
make_structures(df[, outdir, tagsfilename, …]) |
Extract the selected decoys (if any). |
Helper functions to read/write data generated through wedlab experiments. They can be called through rstoolbox.io.
read_SPR(filename) |
Reads Surface Plasmon Resonance data. |
read_CD(dirname[, prefix, invert_temp, …]) |
Read Circular Dichroism data for multiple temperatures. |
read_MALS(filename[, mmfile]) |
Read data from Multi-Angle Light Scattering data. |
read_fastq(filename[, seqID]) |
Reads a FASTQ file and stores the ID together with the sequence. |
Helper functions for sequence analysis. They can be called through rstoolbox.analysis.
sequential_frequencies(df, seqID[, query, …]) |
Generates a SequenceFrame for the frequencies of the sequences in the DesignFrame with seqID identifier. |
sequence_similarity(df, seqID[, …]) |
Evaluate the sequence similarity between each decoy and the reference_sequence for a given seqID. |
positional_sequence_similarity(df[, seqID, …]) |
Per position identity and similarity against a reference_sequence. |
binary_similarity(df, seqID[, key_residues, …]) |
Binary profile for each design sequence against the reference_sequence. |
binary_overlap(df, seqID[, key_residues, matrix]) |
Overlap the binary similarity representation of all decoys in a DesignFrame. |
positional_enrichment(df, other, seqID) |
Calculates per-residue enrichment from sequences in the first DesignFrame with respect to the second. |
positional_structural_count(df[, seqID, …]) |
Percentage of secondary structure types for each sequence position of all decoys. |
positional_structural_identity(df[, seqID, …]) |
Per position evaluation of how many times the provided data matches the expected reference_structure. |
secondary_structure_percentage(df, seqID[, …]) |
Calculate the percentage of the different secondary structure types. |
selector_percentage(df, seqID, key_residues) |
Calculate the percentage coverage of a Selection over the sequence. |
label_percentage(df, seqID, label) |
Calculate the percentage coverage of a label over the sequence. |
label_sequence(df, seqID, label[, complete]) |
Gets the sequence of a label. |
cumulative(values[, bins, max_count, …]) |
Generates, for a given list of values, its cumulative distribution values. |
Once the data is loaded in the different components, it is ready to use into any
plotting library, but some special plotting alternatives are offered through rstoolbox.plot.
multiple_distributions(df, fig, grid[, …]) |
Automatically plot boxplot distributions for multiple score types of the decoy population. |
sequence_frequency_plot(df, seqID, ax[, …]) |
Makes a heatmap subplot into the provided axis showing the sequence distribution of each residue type for each position. |
logo_plot(df, seqID[, refseq, key_residues, …]) |
Generates full figure classic LOGO plots. |
logo_plot_in_axis(df, seqID, ax[, refseq, …]) |
Generates classic LOGO plot in a given axis. |
positional_sequence_similarity_plot(df, ax) |
Generates a plot covering the amount of identities and positives matches from a population of designs to a reference sequence according to a substitution matrix. |
per_residue_matrix_score_plot(df, seqID, ax) |
Plot a linear representation of the scoring obtained by applying a substitution matrix. |
positional_structural_similarity_plot(df, ax) |
Generates a bar plot for positional prevalence of secondary structure elements. |
plot_fragments(small_frags, large_frags, …) |
Plot RMSD quality of a pair of FragmentFrame in two provided axis. |
plot_fragment_profiles(fig, small_frags, …) |
Plots a full summary of the a FragmentFrame quality with sequence and expected secondary structure match. |
plot_alignment(df, seqID, ax[, line_break, …]) |
Make an image representing the alignment of sequences with higlights to mutant positions. |
plot_ramachandran(df, seqID, fig[, grid, …]) |
Generates a ramachandran plot in RAMPAGE style. |
plot_ramachandran_single(df, seqID, ax[, …]) |
Plot only one of the 4 ramachandran plots in RAMPAGE format. |
plot_dssp_vs_psipred(df, seqID, ax) |
Generates a horizontal heatmap showing differences in psipred predictions to dssp assignments. |
Plot data obtained from experimental procedures. Accessible through rstoolbox.plot.
plot_96wells([cdata, sdata, bdata, bcolors, …]) |
Plot data of a 96 well plate into an equivalent-shaped plot. |
plot_thermal_melt(df, ax[, linecolor, …]) |
Plot Thermal Melt data. |
plot_MALS(df, ax[, uvcolor, lscolor, …]) |
Plot Multi-Angle Light Scattering data. |
plot_CD(df, ax[, color, wavelengths, sample]) |
Plot Circular Dichroism data. |
plot_SPR(df, ax[, datacolor, fitcolor, …]) |
Plot Surface Plasmon Resonance data. |
Special functions to help personalise your plot easily can be loaded through rstoolbox.utils.
format_Ipython() |
Ensure monospace representation of DataFrame in Jupyter Notebooks. |
highlight(row, selection[, color, …]) |
Highlight rows in Jupyter Notebooks that match the given index. |
use_qgrid(df, **kwargs) |
Create a QgridWidget object from the qgrid library in Jupyter Notebooks. |
add_left_title(ax, title, **kwargs) |
Add a centered title on the left of the selected axis. |
add_right_title(ax, title, **kwargs) |
Add a centered title on rigth of the selected axis. |
add_top_title(ax, title, **kwargs) |
Add a centered title on top of the selected axis. |
edit_legend_text(ax, labels[, title]) |
Change the labels and title of a legend. |
add_white_to_cmap([color, cmap, n_colors]) |
Generate a new colormap with white as first instance. |
color_variant(color[, brightness_offset]) |
Make a color darker or lighter. |
Functions aimed to help assess a design population in the context of known protein structures.
load_refdata(ref[, homology]) |
Load the predefined reference data from cath, scop, scop2 or chain. |
make_redundancy_table([precalculated, select]) |
Query into the PDB to retrieve the pre-calculated homology tables. |
plot_in_context(df, fig, grid, refdata[, …]) |
Plot position of decoys in a backgroud reference dataset. |
distribution_quality(df, refdata, values, …) |
Locate the quantile position of each putative DesingSerie in a list of score distributions. |
Special functions to help transform your data can be loaded through rstoolbox.utils.
add_column(df, name, value) |
Adds a new column to the DataFrame with the given value. |
split_values(df, keys) |
Reshape the data to aide plotting of multiple comparable scores. |
split_dataframe_rows(df, column_selectors[, …]) |
Given a dataframe in which certain columns are lists, it splits these lists making new rows in the DataFrame out of itself. |
report(df) |
Cast basic sequence count into pdb count for the appropiate columns. |
concat_fragments(fragment_list) |
Combine multiple FragmentFrame. |
Get the RosettaScripts that are called by different functions of the library with rstoolbox.utils.
baseline([minimize]) |
RosettaScript to calculate DSSP secondary structure and phi-psi angles. |
mutations([seqID]) |
RosettaScript to execute a RESFILE. |
Special functions to help obtain data from multiple Next Generation Sequencing data.Can be loaded through rstoolbox.utils.
translate_dna_sequence(sequence) |
Translates DNA to protein. |
translate_3frames(sequence[, matches]) |
Translates DNA to protein trying all possible frames. |
adapt_length(seqlist, start, stop[, inclusive]) |
Pick only the sequence between the provided pattern tags. |
sequencing_enrichment(indata[, enrichment, …]) |
Retrieve data from multiple NGS files. |
This functions are only of interest if you plan on writing new functionalities in rstoolbox.
io.open_rosetta_file(filename[, multi, …]) |
Internal function; reads through a Rosetta silent file and yields only the lines that the library knows how to parse. |
components.get_selection(key_residues, seqID) |
Internal function; global management and casting of Selection. |
utils.make_rosetta_app_path(application) |
Provided the expected Rosetta application, add path and suffix. |
tests.helper.random_frequency_matrix(size[, …]) |
Generate a random frequency matrix. |
tests.helper.random_proteins(size, count) |
Generate random protein sequences. |
tests.helper.random_fastq(sequence, …) |
Generate a requested number of fastq files. |