chips

Function

Description

chips calculates Frank Wright's Nc statistic for a nucleotide sequence. This is the "effective number of codons used in a gene sequence" (ref 1), and is a simple measure of synonymous codon usage bias. Nc quantifies how far the codon usage of a gene departs from equal usage of synonymous codons.

Nc is easily calculated from codon usage data alone and is independent of gene length and amino acid composition. Nc can take values from 20, in the case of extreme bias where one codon is exclusively used for each amino acid, to 61 when the use of alternative synonymous codons is equally likely. Nc thus provides an intuitively meaningful measure of the extent of codon preference in a gene. Low values therefore indicate a strong codon bias, and high values indicate a low bias (and possibly a non-coding region).

Usage

Command line arguments


Input file format

A nucleic acid sequence USA.

Output file format

If all codons are used, the Nc value will be 61. If only one codon is used for each amino acid the Nc value will be 20. Low values therefore indicate a strong codon bias, and high values indicate a low bias (and possibly a non-coding region).

Data files

chips reads the codon usage file "CODONS/Ehum.cut" from the EMBOSS data directory. It uses the file as a template only and ignores the date itself.

Notes

This calculation was originally in the EGCG package as "codfish" (codon usage for fission yeast). As Frank Wright is a vegan, we looked for a meat-free name for the EMBOSS version, "chips". The official explanation is "Codon Heterozygosity (Inverse of) in a Protein-coding Sequence".

References

  1. Wright, F. (1990) Gene 87:23-29 "The 'effective number of codons' used in a gene."

Warnings

The Nc statistic has problems for very short sequences (20 amino acids or less) which are yet to be fully resolved. They are caused by the need to consider amino acids which are missing in the sequence.

chips analyses exclusively protein coding regions. If the provided sequence extends beyond the coding region then the start and/or end positions of the CDS must be specified by using the -sbegin and -send qualifiers that are in-built for all sequence types.

Diagnostic Error Messages

None.

Exit status

It always exits with a status of 0.

Known bugs

None.

Author(s)

History

Target users

Comments