seqretsplit

Function

Description

seqretsplit is a variant of the standard program for reading and writing sequences, seqret. It performs exactly the same function except that when it reads more than one sequence, it writes each sequence to an individual file. In all other respects, skipseq is the same as seqret. Its main use is therefore to split a file containing multiple sequences into many files, each containing one sequence. There are many options built-in into EMBOSS for detailed specification of the input and output sequences, for example the sequence type and file format. Optionally, feature information will be read and written.

Usage

The specification of the output file is not used in this case.

At some point this ought to change and you will not be prompted for the output file at all.

Command line arguments


Input file format

seqretsplit reads a normal sequence USA.

Output file format

One file for each input sequence is written out.

The names of the files it creates are derived from the ID name of the sequence, followed by an extension denoting the format of the sequence. You have no control over the names of the files it writes out.

For example, if the files embl:hsfa11* are read in and the output is specified as wibble.seq, then the following files are expected to be created:

hsfa110.fasta
hsfa111.fasta
hsfa112.fasta
hsfa113.fasta
hsfa114.fasta

(No file wibble.seq is created.)

Data files

None.

Notes

See the documentation for seqret to see the full range of things that you can do when reading and writing sequences.

Some non-EMBOSS programs will accept only single sequences. In such cases seqretsplit is useful for splitting a multiple sequence file into many individual files. Some EMBOSS programs will also read only a single sequence, which may, however, be one of many in a file. You can specify the sequence using the USA filename:sequenceID. Nonetheless, some people feel more comfortable handling one sequence per file, so seqretsplit will be useful to them too.

One file for each input sequence is written. The names of the files it creates are derived from the ID name of the sequence, followed by an extension denoting the format of the sequence. You have no control over the names of the files it writes out. For example, if the files embl:hsfa11* are read in and the output is specified as wibble.seq, then the following files are expected to be created:

hsfa110.fasta
hsfa111.fasta
hsfa112.fasta
hsfa113.fasta
hsfa114.fasta

(No file wibble.seq is created.)

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

It shouldn't really prompt for the output filename.

This is a side effect of the way sequence output works in EMBOSS. Writing multiple sequences to separate files (the -ossingle qualifier) does this, and seqretsplit has set it automatically on.

Author(s)

History

Target users

Comments