Sequences#
The Sequence object provides generic biological sequence manipulation functions, plus functions that are critical for the evolve module calculations.
Generic molecular types#
Sequence properties are affected by the moltype you specify. The default type for a sequence is "text".
In some circumstances you can also have a "bytes" moltype, which I’ll explicitly construct here.
DNA and RNA sequences#
Creating a DNA sequence from a string#
Sequence properties are affected by the moltype you specify. Here we specify the DNA molecular type.
Creating a RNA sequence from a string#
Converting to FASTA format#
Convert a RNA sequence to FASTA format#
Writing a sequence to file#
Creating a named sequence#
Setting or changing the name of a sequence#
Complementing a DNA sequence#
Reverse complementing a DNA sequence#
Translate a sequence to protein#
The default is to trim a terminating stop if it exists. If you set trim_stop=False and there is a terminating stop, an AlphabetError is raised.
You can also specify the genetic code.
Translating a DNA sequence containing stop codons#
By default, get_translation() will fail if there are any stop codons in frame in the sequence. You can allow translation in these cases by setting the optional argument include_stop=True.
Converting a DNA sequence to RNA#
Convert an RNA sequence to DNA#
Testing complementarity#
Joining two DNA sequences#
Getting all k-mers from a sequence#
Note
By default, any k-mer that contains an ambiguity code is excluded from the output.
You can include ALL k-mers by setting strict=False.
Slicing DNA sequences#
Obtaining the codons from a DnaSequence object#
Use the method get_in_motif_size
Getting 3rd positions from codons#
Getting 1st and 2nd positions from codons#
In this instance we can use features.
Return a randomised version of the sequence#
Remove gaps from a sequence#
Getting GC% by counting states#
We can obtain the percentage of GC nucleotides by using the counts method on sequences. This returns an object that behaves like a dictionary.
Note
Other arguments on the counts() method allow including ambiguous or gap characters in the result.
Counting k-mers#
A k-mer is a word of size \(k\) and, as is the convention, its counts are derived from all possible positions (as distinct from non-overlapping words which is how motif counts are calculated).
In this case, the result is a numpy array with the order of elements corresponding to the result of
Note
We support third-party plugins for k-mer counting. After installing one, they can be selected by specifying the package with the .count_kmers(k=2, use_hook="<package name>").