cons

Function

Description

cons calculates a consensus sequence from a multiple sequence alignment. To obtain the consensus, the sequence weights and a scoring matrix are used to calculate a score for each amino acid residue or nucleotide at each position in the alignment. The highest scoring residue goes into the consensus sequence if the score is higher than a user-specified "plurality" value, otherwise, there is no consensus at that position.

Algorithm

To obtain the consensus, the sequence weights and a scoring matrix are used to calculate a score at each position in the alignment as follows. The residue (or nucleotide) i in an alignment column, is compared to all other residues (j) in the same column. The score for i is the sum over all residues j (not i=j) of the score(ij)*weight(j), where score(ij) is taken from a nucleotide or protein scoring matrix (see -datafile qualifier) and the "weight(j)" is the weighting given to the sequence j, which is given in the alignment file.

The highest scoring type of residue is then found in the column. If the number of "positive matches" (see below) for this residue is greater than the "plurality value" (see below), then this residue is the consensus residue. Otherwise there is no consensus for that position and an 'n' (nucleotide sequence alignment) or an 'x' (protein sequence alignment) character is written to the consensus sequence.

The positive matches for a residue i are calculated as being the sum of the corresponding sequence weights for all the residues that increase the score of residue i (i.e. that have a positive score). The "plurality" qualifier sets the cut-off for the number of positive matches (weighted) below which there is no consensus.

Usage

Command line arguments


Input file format

The USA of a set of aligned sequences.

Output file format

The output consists of a sequence file holding the consensus sequence.

Data files

cons uses the standard set of scoring matrix data files in the EMBOSS data directory.

Notes

The "identity" qualifier provides an additional constrain to "plurality" when determining a consensus residue at an alignment site. "identity" sets the required number of identities at a site for it to be included in the consensus. If for example this is set to the number of sequences in the alignment, then only a site with the same residue in all sequences would be included in the consensus.

The "setcase" qualifier sets the threshold for the positive matches above which the consensus residue is given is upper-case and below which it is in lower-case.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

Author(s)

History

Target users

Comments