EMBOSS: degapseq

degapseq

Function

Description

degapseq reads one or more sequences and writes them out again but stripped of any non-alphabetic characters. It's main purpose is to remove gap characters from aligned sequences, but it will also remove such things as the symbol for translation STOP ('*') in a protein sequence.

Usage

Command line arguments

Input file format

Any valid input sequence USA is allowed.

The input sequence can be nucleic or protein.

The input sequence can be gapped or ungapped.

Output file format

The output is a sequence with no gaps.

Data files

None.

Notes

There are many different formats for storing molecular sequences in files. Some formats are specifically for aligned sequences, where gaps are inserted into the sequences for purposes of alignment. Gaps are indicated with different characters depending on the format in question, but commonly include '.', '-' and '~'. Some formats use more than one type of character to indicate different types of gaps, for example gaps at the sequence ends, internal gaps, gaps inserted by a program and gaps inserted manually by a person editing the alignment may all be denoted with different characters.

EMBOSS uses the dash character ('-') only to indicate gaps. When an EMBOSS program reads a sequence with gaps, all gap characters are changed internally to a dash ('-'). Thus any distinguishing characters for different gap types are convered to a '-' on output.

References

None.

Warnings

It will remove '*' characters from protein sequences as well as removing the gap characters.

Function

Description

Usage

Command line arguments

Input file format

Output file format

Data files

Notes

References

Warnings

Diagnostic Error Messages

Exit status

Known bugs

Author(s)

History

Target users

Comments