Sequences#

The Sequence object provides generic biological sequence manipulation functions, plus functions that are critical for the evolve module calculations.

Generic molecular types#

Sequence properties are affected by the moltype you specify. The default type for a sequence is "text".

In some circumstances you can also have a "bytes" moltype, which I’ll explicitly construct here.

DNA and RNA sequences#

Creating a DNA sequence from a string#

Sequence properties are affected by the moltype you specify. Here we specify the DNA molecular type.

Creating a RNA sequence from a string#

Converting to FASTA format#

Convert a RNA sequence to FASTA format#

Writing a sequence to file#

Creating a named sequence#

Setting or changing the name of a sequence#

Complementing a DNA sequence#

Reverse complementing a DNA sequence#

Translate a sequence to protein#

The default is to trim a terminating stop if it exists. If you set trim_stop=False and there is a terminating stop, an AlphabetError is raised.

You can also specify the genetic code.

Translating a DNA sequence containing stop codons#

By default, get_translation() will fail if there are any stop codons in frame in the sequence. You can allow translation in these cases by setting the optional argument include_stop=True.

Converting a DNA sequence to RNA#

Convert an RNA sequence to DNA#

Testing complementarity#

Joining two DNA sequences#

Getting all k-mers from a sequence#

Note

By default, any k-mer that contains an ambiguity code is excluded from the output.

You can include ALL k-mers by setting strict=False.

Slicing DNA sequences#

Obtaining the codons from a DnaSequence object#

Use the method get_in_motif_size

Getting 3rd positions from codons#

Getting 1st and 2nd positions from codons#

In this instance we can use features.

Return a randomised version of the sequence#

Remove gaps from a sequence#

Getting GC% by counting states#

We can obtain the percentage of GC nucleotides by using the counts method on sequences. This returns an object that behaves like a dictionary.

Note

Other arguments on the counts() method allow including ambiguous or gap characters in the result.

Counting k-mers#

A k-mer is a word of size \(k\) and, as is the convention, its counts are derived from all possible positions (as distinct from non-overlapping words which is how motif counts are calculated).

In this case, the result is a numpy array with the order of elements corresponding to the result of

Note

We support third-party plugins for k-mer counting. After installing one, they can be selected by specifying the package with the .count_kmers(k=2, use_hook="<package name>").