rstoolbox.io.read_fastq

rstoolbox.io.read_fastq(filename, seqID='A')

Reads a FASTQ file and stores the ID together with the sequence.

The default generated DesignFrame will contain two columns:

Column Name Data Content
description Sequence identifier.
sequence_<chain> Sequence content.
Parameters:
  • filename (str) – FASTQ filename.
  • seqID (str) – Identifier of the sequence of interest
Returns:

DesignFrame

Example

In [1]: from rstoolbox.io import read_fastq
   ...: import pandas as pd
   ...: pd.set_option('display.width', 1000)
   ...: pd.set_option('display.max_columns', 500)
   ...: df = read_fastq("../rstoolbox/tests/data/cdk2_rand_001.fasq.gz")
   ...: df.head(8)
   ...: 
Out[1]: 
  description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        sequence_A
0  cdk2_v0008  GGTGCGTCGTACTTTATGCAGATCCCCCATAGGCGCATGTCAGTATTCGGTATCGCCAAAGTGCACGCTCGTCACAAGCACTTAACAGGTGAGGTGGTAGCTCTTAAGAAAATACGCCTGTTCCAACCAGAACCAGGGCCGATCATGGTCAAGCCGAATATGTGTCCCTACTACTATGAATGGATTGGAAAGCGTAATCAACTGGATTCCTTTGCGCCCTGCATATCGTGTAAGATAAAGAAACGTGACACAAAGGTGAGGGGGGTTTGTTTTCATAATAGCGCAATACATTGTAAAAGTTATCGGTGCGTCGATCAAATCTTCTGCGGTTGTATAAAATGGATGATGATGGGCCGCGATTGTGAGGGGCAGGGGGAATCTCAGAATAATACGGATATAGGGGGTCCAACGGGATGTGATATCAATTGGCGAACATGTCATTTTACAGAACTTCGACATGACTGTGAAAACTGGCAAAGCGTCATCTGCAGTACTCATCACATATGTACGATGGGCCATATCGACCAGACTTCTGCTTCGGAGACCCAGGACTGGGATTCCTTTCAATGGGTGATGCTCCGATACATCCACGGCGAACAGAAGAAATATAGCATTCAGTTGGGCAATTGGGATGCTAAACAGGCAGTCAACATGCATAGACAGGAGCTGAAGGTGCTTGTGAAGAAGCGCCACGAGGAAGGCAAGATTTGTGCATGCTGCGTAATGTCACACATCGGTGTCGAAATTTCATTCTTTGGCAAGCGCTCACAGAGATTTCAGAGCGAATTTATGCAACATTGGGTGGCAAACTTCGCTATGAAGTTCAAATTTAGGAATATAGGTTGGCCACACACATCGTGGACCCAGCTCGCTGCACTGGGGGGTTGGGAGGGCTGGCACAAACCCGGGACT
1  cdk2_v0004  CATGGGATGCCAATCACAAACTGCCCGAGCGACCGATATGACCGACTTGAGCACATGTGTGTCCGCACATATCTGACTGGGGAGGTGGTGGCACTTAAAAAGATTCGGCTCCACGTGCAAGACATGGCCCATACGTTGGATCATACATTAGACCATATGAAGTGGGCGCAGTCTTTCCGTAACGGGTTGATGTACTCTGAACATCGGGGGCACTGTGCCTATCCTGTATGCTCCCTGAGATCCTCGACCGTAGTCAGGTGGACGATGGTTGTAGAATACCCCTTTTGGCACACCGCCTTATGGAAGCCCATTCAAGGCACGAAGGTGTTAATGATCGGGACGCGTAAAAACTGCGTGATCCAAATGTTAATGAGGTTCGAAACGAGGGCAAACGAAAACACAGCCTGTCCCAATACTAACTTTACTGATGGTGGCGAACGTTGTTGGTGTTGTGCTTGTCGGTTTTGTAAGCATGAGATGCTGCAGCATATAGAGGAGAAACAGATAGATATCACAGATTGGTGCCTGTTTATGAGTCAACGACAAGTAAGATTCAAATGGGTTGTACTCAGGCTCTGGTTAGATACTCCTATAAAGACAAGTTCAGCCGTAGGTATCGGCTCGACTAACGGGGCAACCGACAATTTCGAGGGGTGCAGTTGGGACACGATGGCCCTTGAGTATGGATCGCAAGAGCATAATAATTGCCCCGTTGACATTAGAGATAGACTGGAGTTTCAAGACGATGGCGGGCTGAGGAACCTAAATCCTAGTACTGACATATATCCCTACGAAATGACCCTTTTCTTGTTTATGATTAAAAAGTATACCTTTGTAAGATGTGAGGTTAATCTTGATTGCCAGATGAGACCAGAATGGATTGGTGATGCCTTG                  
2  cdk2_v0001  GGTGCGTCGAAATATTGTCCTCGGGCGCGGATTCAGTGCGAGCATTACCAGGAAGCGTTCGTTTGTCAGACGATAATAACATTAACTGGGGAGGTCGTTGCACTGAAAAAAATCAGACTGATGTTCTTTGAACAAAGCGCCGAGATGCTAAAACAAAGAATGCACGGGCATCATATGGGAGATGATCGAAGGGGCTGGGAATATGTTTCGTGCTGGTGGTGCTACGCCATCCATCGGTGGATCCATCATTCTCACTTTCATGAAATTCGTCAAGAAACTGTAACAATACTGGGGGAATACATTAGAATCACGTGCGATCAATATTTGTGCAAGTTCAAGTTTGCGGAGGTTATTCGAGATGCGTTTGTGGGGATGGAATGTATCACCGCGAAGAAAAAGTCGCAGAACAAAAGAAACGGAATACAGTATATGACTACAGCCAGTGTCGCGCTAACGCAATGGCACCAAGTAGGACTTTTCACTAACGTTAACCAACTTGACATTAATCAAATGACCGATTCCGCTCGAGAGGCTAACTTTACGCCTATTTATTGGATCAAACAGGACTGTTTCTTAAAGACACCATATCAGAACTACGAGGCTACGGTCTTCCAGACCGCAGACATTTGGTGCCGTCATGAGGCTGAATGCTGGGATCACCAAACATGGGATTGGCCCAATCCGCTAACCCAATTTTGTGAAGAACACAAGCCCAGCGATGTTAACGGGCTCGAGAATTATAGGGTCTTTTACTTTGACTGGGCATTCCATAAGGCTATACTCTGCCATGTCAAAGACATGGCACAACCGTTCGCTCTACGGGTATTCGACGAAGGCTGCTTTTGGCGATGTCAGGTTGAACAAGATTATACCCTCATCCCCGAGAGCTGGAAGTGCGTGACCCCCGGGACT
3  cdk2_v0005  ATGGCTAACAATTGTGATCCAGCAATGGAAGAGGTCATGCTACGATGCTTCGGCCTGGATAGATGCGGGTTACTCACAGGAGAGGTCGTGGCTCTTAAAAAAATAAGATTACACGAATACATCCACCAACTATGGATGAGCAGTTATCAGAGTCATAATGCGCACAAAACACGTTACTCAACTTGTCACTCCCAAGAACAGGTGTGTTGGCAATGTGATGTGTTCATATGTGCCTGGTGTGACCAAACATTCTTAGTCTATACCGTGAATGCTTATGATTATTGCTGCTGGTGGAGGAAAAGGTGCCTGAACGAGGGCACAACTTTCAAGATGCCCGTAGCAGTCCCAAACTGGCCCTACCAGCTAACACACCTTTGTGAGGAAGCAATCTCCATGGACTTTGGAATGGGGAATTGGATGTGTGAGCATCATGAAGAGTGGTTACACTATAGTCACATGAACATGTGCTTCTGTTTTTTTACACAGTGGATACAAGAATATGAAGACTACCAACCCGACATGTGTATAGATGTTAATCAACAATGTACGGTCATGGGCGATTACAAAGAAATACCTATTGAATGTAAAGTGCAGGCCTACATAGCTAGGTGCTTGATCTATCCCATTGTTAAAGCTGGGACACACACTTTCCATGGGGTAGGCTTTCCACCCGGTTGGGGTCAGGGTGATAAGTTCCATTGGAACCATAAGATGTCCAAGTTCCCTGGGGGGGAGATTATGAAAACGGTGTATGTCGTATGTCAAATCTCGGGGCCTCCTATGCAAAACGAATACCGGTACCTCAAGCCTTCAAATACGCTCCAGAACATGTCTTGGTATGATGGCAATCATACTTCATTGTCAGTCGCGAGCGGATGGGAAGACTTCTTCACT                  
4  cdk2_v0006  GGTGCGTCGTTCGCCACGTTACAAAAAGATAAACCGCCGGTCATGGACACTCCTAAGCATTGTGATGCAAAAAAGACATGGCTAACCGGGGAAGTGGTGGCCTTGAAAAAAATCAGGCTACGACCTCCGTGTACATCTCTTGGACAGAGATACGTCGCTATTGACATAAAACACGCAGTTTACAGCCATAAGCAGCATAGGTCAGTCATGGACATCTTTCACACGCTGTATTTTGGCAAAAAGTGGGCCATTCGCGTTAAAGAGGCAGACTCCGTGAGGGAACAGACAGCCTGGGATTTTTGGTGGAACTGGAAGCACATCAATCAGACGTGCGGTTTCGAGGACGAAGTTATACATCCAAACCAAATGTGCATGCGCATATTGAACAACACGAAACGGGACTTACTTTTCCAATGGCCTGTCGTGTCATGTGTGAAACGAGTCCATATCCGTATCCAAACCGCATTTCGTTTCTATGCGTGTATAGTAGGGTTTCCATATGAGCCCAAGATGGATATACGAACCCAAATATGTCGTGGAACGATCGAGGAGTGGTTCCGATTCGATGTCTATAGGGAACGTATTTGGAGAAATGAAATGTATTCACAGTCTGAGAAGAGTCACAATTGTTGGAACAATAAAAATACCCAAATGTGCGCTCCCAACATGTTAAAGGGTTCTCACAATGCCTGCAAACAATCTAGGCATCATTGGAACGCAATGGATATAAGGATCCACGACGAACTACGGATCGTAGGTAGTCCATATTATCAAACGATAAGCTACCGAGTGTACAACCTAAACACTCCAGTCAAAAAAAGTCGTAACAGCGGGACCGCTCGAGGCCATTGGCATGTTGGAAATCATCTAAAGTTACGGCACGATATATACGATTGTTGCGCTCCCGGGACT
5  cdk2_v0003  CTGTGGTTATGGATGGAAATTGGTTGGCGGCATAGGTGGCAATATAAAAGTGTGGACAATCAGGCTCCTATGTTGACGGGTGAAGTCGTGGCACTAAAAAAAATAAGGCTAGGCCATCATGAACAGCCGTCTAAGCAGCTAGAGCCCGAGATCGATATGGTTATGCTTCAAATAGACCACCGGTGTCACCTTCGGGTCGAGGATCACTACATAGGTCACAAAGATGGAATATCGCAATTCCCTCGTCAACCACTTGCGTGTTCCGTAAACAATCAAAAACTACACGACAGGGATCGGGACTGCTGTCACGATCTCCAAAAAATGAATTTGGTCGCTCTGATTAGTGAGCCGGCGCATGGCATGATCAACATGCGGTGTATGCCCATTCGAAGTCGATATGATCGAGATAACCCTACAGACTGTTCGACCATAGCAACTCCATCTGATAATGTTCAGCCAAACAAAGGTGCAGGAACCACACCTTATGGGCCAGAAATGTTAGAGGATCATTGGCCGTTGTGGAAGAGAACTACACGGTACGAGTGTCACTGGCACACAGATTGCGAACTTAAAAGAGACCCATGCGGCCCCCCCTTTTGGTATGCCCAACATGGTGGGTATGCGAGGTGGCAGGCAGGTTCCCTCACTTGGTGCCACGTGGATAACGAAGAATGCGCGAAAAACATTGACGGCGAATCTAAATGGCATCGCCTGATGATTCCACCGCAGGTACGGCTCTTTAAATTCCCAAAGCTGTCGCCTTGGCCGGTTCGGGTTTGTAATCCTCCAGTAAGTGGTCTATTTCCCTTAGAATTTCAAGAACGGACTGATGAATATATGCAAGTTTACGCCGGATTCGACTTAGCAATGGGCACCAACATGCAGAAGCGATAC                  
6  cdk2_v0001  GGTGCGTCGAAATATTGTCCTCGGGCGCGGATTCAGTGCGAGCATTACCAGGAAGCGTTCGTTTGTCAGACGATAATAACATTAACTGGGGAGGTCGTTGCACTGAAAAAAATCAGACTGATGTTCTTTGAACAAAGCGCCGAGATGCTAAAACAAAGAATGCACGGGCATCATATGGGAGATGATCGAAGGGGCTGGGAATATGTTTCGTGCTGGTGGTGCTACGCCATCCATCGGTGGATCCATCATTCTCACTTTCATGAAATTCGTCAAGAAACTGTAACAATACTGGGGGAATACATTAGAATCACGTGCGATCAATATTTGTGCAAGTTCAAGTTTGCGGAGGTTATTCGAGATGCGTTTGTGGGGATGGAATGTATCACCGCGAAGAAAAAGTCGCAGAACAAAAGAAACGGAATACAGTATATGACTACAGCCAGTGTCGCGCTAACGCAATGGCACCAAGTAGGACTTTTCACTAACGTTAACCAACTTGACATTAATCAAATGACCGATTCCGCTCGAGAGGCTAACTTTACGCCTATTTATTGGATCAAACAGGACTGTTTCTTAAAGACACCATATCAGAACTACGAGGCTACGGTCTTCCAGACCGCAGACATTTGGTGCCGTCATGAGGCTGAATGCTGGGATCACCAAACATGGGATTGGCCCAATCCGCTAACCCAATTTTGTGAAGAACACAAGCCCAGCGATGTTAACGGGCTCGAGAATTATAGGGTCTTTTACTTTGACTGGGCATTCCATAAGGCTATACTCTGCCATGTCAAAGACATGGCACAACCGTTCGCTCTACGGGTATTCGACGAAGGCTGCTTTTGGCGATGTCAGGTTGAACAAGATTATACCCTCATCCCCGAGAGCTGGAAGTGCGTGACCCCCGGGACT
7  cdk2_v0012  CTCGGGAACCAAATCGCTGCGGCTCCGTCCGTTGCTCCTCTGCGGGCGGATCTCGAAAACCGACCTCGGCCATTGACGGGTGAGGTCGTTGCTCTGAAGAAAATTAGGTTACGCTGTATGCAGTTCGGGTCGATGTATCAAAAACCCGAACAGGCAGGGTGGATGGCCCCCCGTTACCATTATTTTTATCAACAACATTGTTGGATCGAAATGATCGCCGCGGAGCGCATGGGGAAGGCCGATGAAGATGTAAATTGGTCCGTGGTTTGGTTTTATAGACACAGCGAATATTGGAATTCCGTATTTTACATCTTCCAATGTATGTGCGAACACATAGGTGGCATCCACGCCCAGGGAGGACAATTAACCATGGAAAATGCAATTACCAATGAAGAACCCTTCGCGAAGTGGGGTAATAGTATGACGGTAGCTCATACTAAACTCAAGGCATGGAACGCCATGATGGATGTAGGAGAGGACCACTGCGACTTTCAGTTTAACCAAGACGTGAGAGGGCAAATAACCCTGACGGAGGAGGCAGTCCACACCATGTTTGGAATATTCAAATGGATACTATGGTACATGTGGGGTGCTATGCAATGGAGGAAACACTCGGTTTACGCGGTTAAGGAGCAATACTGGGATTGCCGCGAGGAAATGGATTGGATGTGCACTTGGATGCATAAATGTTGGAAAATGTACTGGTCGGATTGGTGGGGCAAGGAAGAGCTCGAATTCCGATCTGACGCCAAGCCTATCACGAAAATGATGATGTGTTGGATCCTACCGTACTCCCTACACGAGGATACCAGGCAGGTTTTTTCCCCTGCAAACTGTATTTGGTTCTGTGGGAGTGTATACTTGCAAAAGATATGGAAGATGCACAAGGACGCT