cusp calculates a codon usage table for one or more nucleotide coding sequences and writes the table to file.
The codon usage table gives for each codon: i. Sequence of the codon. ii. The encoded amino acid. iii. The proportion of usage of the codon among its redundant set, i.e. the set of codons which code for this codon's amino acid. iv. The expected number of codons, given the input sequence(s), per 1000 bases. v. The observed number of codons in the input sequences.
|
The example shown reads a single CDS from Pseudomonas aeruginosa which has a very high GC content ands a strong coding bias, as shown by the codons for Alanine where those ending with G or C are used almost exclusively.
The columns are as follows: i. "Codon" (sequence of the codon). ii. "AA" (the encoded amino acid). iii. "Fraction" (the proportion of usage of the codon among its redundant set, i.e. the set of codons which code for this codon's amino acid). iv. "Frequency" (the expected number of codons, given the input sequence(s), per 1000 bases). This will be an extrapolation if the sequence is shorter than 1000 bases. v. "Number" (the observed number of codons in the input sequences).
If multiple sequences are input then the statistics are given for all of the sequences together, not individually.
cusp reads a codon usage file as a template for it's output file. The data in this table is ignored entirely. This functionality is hard-coded and invisible to the user.