Given the name of a directory containing the CUTG database, cutgextract will calculate codon usage tables for individual species (e.g. EHomo_sapiens.cut) and place them in the CODONS subdirectory of the EMBOSS data directory. This is an all-or-nothing extraction; it will create many files and take several minutes. The usage tables are calculated from the sum of codons over all sequences for each organism.
cutgextract looks in the specified directory and opens all the files with the extension '.codon'. These are all expected to be CUTG data files. It then parses out the codon usage data and writes one file per species into the EMBOSS data/CODONS directory. The names of the files are derived from the species names in the CUTG files. These files names will be long and therefore descriptive.
|
cutgextract writes a set of EMBOSS codon usage data files to the EMBOSS data/CODONS data directory.
The EMBOSS distribution includes a set of codon usage tables calculated from the files listed in ftp://ftp.ebi.ac.uk/pub/databases/codonusage/README), with a few additions whose exact derivation cannot easily be determined. Many people would prefer to create their own from the public CUTG data. The CUTG database can be downloaded from ftp://ftp.ebi.ac.uk/pub/databases/cutg.
If you run cutgextract on the CUTG database from ftp://ftp.ebi.ac.uk/pub/databases/cutg all of the *.codon files included in the database will be processed. You may need to uncompress these files (*.codon) before running cutgextract on them.
cutgextract would normally be used once when the EMBOSS package is installled, or when a new version of the CUTG database is released.
CUTG has a drawback: it has a table for each organism without making the distinction between different gene populations.