degapseq |
degapseq reads one or more sequences and writes them out again but stripped of any non-alphabetic characters. It's main purpose is to remove gap characters from aligned sequences, but it will also remove such things as the symbol for translation STOP ('*') in a protein sequence.
% degapseq dnagap.fasta nogaps.seq Removes non-alphabetic (e.g. gap) characters from sequences |
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers: [-sequence] seqall (Gapped) sequence(s) filename and optional format, or reference (input USA) [-outseq] seqoutall [ |
Standard (Mandatory) qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
(Gapped) sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required |
[-outseq] (Parameter 2) |
Sequence set(s) filename and optional format (output USA) | Writeable sequence(s) | <*>.format |
Additional (Optional) qualifiers | Allowed values | Default | |
(none) | |||
Advanced (Unprompted) qualifiers | Allowed values | Default | |
(none) |
The input sequence can be nucleic or protein.
The input sequence can be gapped or ungapped.
>FASTA F10002 FASTA FORMAT DNA SEQUENCE ACGT....ACGTACGTACGTACGTACGTACGTACGTACGT ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT ACGTACGTACGTACGTACGT |
>FASTA F10002 FASTA FORMAT DNA SEQUENCE ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT ACGTACGTACGTACGTACGTACGTACGTACGTACGT |
There are many different formats for storing molecular sequences in files. Some formats are specifically for aligned sequences, where gaps are inserted into the sequences for purposes of alignment. Gaps are indicated with different characters depending on the format in question, but commonly include '.', '-' and '~'. Some formats use more than one type of character to indicate different types of gaps, for example gaps at the sequence ends, internal gaps, gaps inserted by a program and gaps inserted manually by a person editing the alignment may all be denoted with different characters.
EMBOSS uses the dash character ('-') only to indicate gaps. When an EMBOSS program reads a sequence with gaps, all gap characters are changed internally to a dash ('-'). Thus any distinguishing characters for different gap types are convered to a '-' on output.
Program name | Description |
---|---|
aligncopy | Reads and writes alignments |
aligncopypair | Reads and writes pairs from alignments |
biosed | Replace or delete sequence sections |
codcopy | Copy and reformat a codon usage table |
cutseq | Removes a section from a sequence |
descseq | Alter the name or description of a sequence |
entret | Retrieves sequence entries from flatfile databases and files |
extractalign | Extract regions from a sequence alignment |
extractfeat | Extract features from sequence(s) |
extractseq | Extract regions from a sequence |
featcopy | Reads and writes a feature table |
featreport | Reads and writes a feature table |
listor | Write a list file of the logical OR of two sets of sequences |
makenucseq | Create random nucleotide sequences |
makeprotseq | Create random protein sequences |
maskambignuc | Masks all ambiguity characters in nucleotide sequences with N |
maskambigprot | Masks all ambiguity characters in protein sequences with X |
maskfeat | Write a sequence with masked features |
maskseq | Write a sequence with masked regions |
newseq | Create a sequence file from a typed-in sequence |
nohtml | Remove mark-up (e.g. HTML tags) from an ASCII text file |
noreturn | Remove carriage return from ASCII files |
nospace | Remove all whitespace from an ASCII text file |
notab | Replace tabs with spaces in an ASCII text file |
notseq | Write to file a subset of an input stream of sequences |
nthseq | Write to file a single sequence from an input stream of sequences |
pasteseq | Insert one sequence into another |
revseq | Reverse and complement a nucleotide sequence |
seqret | Reads and writes (returns) sequences |
seqretsplit | Reads sequences and writes them to individual files |
sizeseq | Sort sequences by size |
skipredundant | Remove redundant sequences from an input set |
skipseq | Reads and writes (returns) sequences, skipping first few |
splitter | Split sequence(s) into smaller sequences |
trimest | Remove poly-A tails from nucleotide sequences |
trimseq | Remove unwanted characters from start and end of sequence(s) |
trimspace | Remove extra whitespace from an ASCII text file |
union | Concatenate multiple sequences into a single sequence |
vectorstrip | Removes vectors from the ends of nucleotide sequence(s) |
yank | Add a sequence reference (a full USA) to a list file |