tfscan

Function

Description

tfscan scans one or more DNA sequences for transcription factor binding sites from the TRANSFAC database. The taxonomic group (Fungi, Insects, ,Plants, Vertebrates or Other) is specified. Matches are searched for using fast sequence word-matching, optionally allowing mismatches. Because the binding sites are so small, there will be many spurious (false positive) matches. Optionally, the minimum length of a match to be reported may be specified.

An output file is written with information on the matches, including sequence ID and accession number, the start and end positions of the match in an input sequence and the sequence of the region where a match has been found. Binding factor information, where available, is given at the end of the matches for each matching entry.

Usage

Command line arguments


Input file format

tfscan reads normal nucleic acid sequence USAs.

Output file format

The output consists of a title line then 5 columns separated by whitespace.

The first column is the identifier of the entry.

The second column is the Accession Number of the entry.

The third and fourth columns are the start and end positions of the match in your input sequence.

The fifth column is the sequence of the region where a match has been found.

Binding factor information, where available, is given at the end of the matches for each matching entry.

Data files

tfscan reads the TRANSFAC SITE data held in the EMBOSS data files:

Your EMBOSS administrator will have to run the EMBOSS program tfextract in order to set these files up from the TRANSFAC distribution files.

Notes

The TRANSFAC Database is a commercial database of eukaryotic cis-acting regulatory DNA elements and trans-acting factors. It covers the whole range from yeast to human. The site.dat data file from TRANSFAC contains information on individual (putatively) regulatory protein binding sites. It has been divided into the following taxonomic groups.

An old public domain version of TRANSFAC is available at: ftp://ftp.ebi.ac.uk/pub/databases/transfac/transfac32.tar.Z

References

Warnings

Your EMBOSS administrator will have to run the EMBOSS program tfextract in order to set up the data files from the TRANSFAC distribution files.

Diagnostic Error Messages

"EMBOSS An error in tfscan.c at line 82:
Either EMBOSS_DATA undefined or TFEXTRACT needs running"

This means that you should contact your EMBOSS administrator and ask them to run the tfextract program to set up the TRANSFAC data for EMBOSS.

Exit status

It always exits with a status of 0.

Known bugs

None.

Your EMBOSS administrator will have to run the EMBOSS program tfextract in order to set up the data files from the TRANSFAC distribution files.

Author(s)

History

Target users

Comments