cai calculates the Codon Adaptation Index for a given nucleotide sequence, given a reference codon usage table. The CAI index is a simple, effective measure of synonymous codon usage bias. It index assesses the extent to which selection has been effective in moulding the pattern of codon usage. In that respect it is useful for predicting the level of expression of a gene, for assessing the adaptation of viral genes to their hosts, and for making comparisons of codon usage in different organisms. The index may also give an approximate indication of the likely success of heterologous gene expression.
The CAI index uses a reference set of highly expressed genes from a species to assess the relative merits of each codon. A score for a gene sequence is calculated from the frequency of use of all codons in that gene sequence.
|
cai requires a reference codon usage table prepared from a set of genes which are known to be highly expressed. This is specified by the -cfile option and must exist in the EMBOSS data directory. The default codon usage table Eyeastcai.cut is the standard set of Saccharomyces cerevisiae highly expressed gene codon frequiencies. Another table (Eschpo_cai.cut) was prepared from a set of Schizosaccharomyces pombe genes by Peter Rice for the S. pombe sequencing team at the Sanger Centre, and is available in the EMBOSS data directory. You should prepare your own codon usage table for your organism of interest.
Codons are nucleotide triplet that encode an amino acid residue in a polypeptide chain. There are four possible nucleotides in DNA; adenine (A), guanine (G), cytosine (C) and thymine (T), therefore 64 possible triplets to encode the 20 amino acids plus the translation termination signal. The encoding is therefore redundant, with all but two amino acids coded for by more than one triplet. Organisms often have a particular preference for one of the possible codons for a given amino acid.
Codon preferences reflect a balance between mutational bias and selection for efficiency of translation. In fast-growing microorganisms there are optimal codons that reflect the composition of the genomic tRNA pool and probably help achieve faster translation rates and high accuracy. Such selection is expected to be strong in highly expressed genes, as is the case for Escherichia coli or Saccharomyces cerevisiae. In contrast, codon usage optimization is normally absent in organisms with slower growing rates such as Homo sapiens (human), where codon preferences are determined by mutational biases characteristic to a particular genome.
Various factors are thought to influence codon usage bias in baceteria, including gene expression level already mentioned, %G+C composition (reflecting horizontal gene transfer or mutational bias), GC skew (reflecting strand-specific mutational bias), amino acid conservation, protein hydropathy, transcriptional selection, RNA stability, and optimal growth temperature.
Various methods have been used to analyze codon usage bias. CAI and methods such as the 'frequency of optimal codons' (Fop) are commonly used to predict gene expression levels. Others such as the 'effective number of codons' (Nc) and Shannon entropy are used to measure codon usage evenness, whereas multivariate statistical methods, iincluding correspondence analysis and principal component analysis, may be used to analyze variations in codon usage between genes.