jaspextract

 

Function

Extract data from JASPAR

Description

JASPAR is a collection of transcription factor DNA-binding preferences, modelled as matrices. These can be converted into Position Weight Matrices (PWMs or PSSMs), used for scanning genomic sequences.

JASPAR is the only database with this scope where the data can be used with no restrictions (open-source).

This program copies the 3 JASPAR matrix sets into the EMBOSS data directories, performing any necessary conversions.

The home page of JASPAR is: http://jaspar.genereg.net/

The EMBOSS program jaspscan will not work unless this program is run.

Running this program may be the job of your system manager.

Usage

Here is a sample session with jaspextract


% jaspextract 
Extract data from JASPAR
JASPAR database directory [.]: jaspar

Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-directory]         directory  The directory containing the unzipped and
                                  untarred JASPAR_CORE, JASPAR_FAM and
                                  JASPAR_PHYLOFACTS subdirectories

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers: (none)
   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages

Standard (Mandatory) qualifiers Allowed values Default
[-directory]
(Parameter 1)
The directory containing the unzipped and untarred JASPAR_CORE, JASPAR_FAM and JASPAR_PHYLOFACTS subdirectories Directory  
Additional (Optional) qualifiers Allowed values Default
(none)
Advanced (Unprompted) qualifiers Allowed values Default
(none)

Input file format

The input files are the uncompressed and extracted JASPAR_CORE.tgz, JASPAR_FAM.tgz and JASPAR_PHYLOFACTS.tgz files provided in the JASPAR MatrixDir download directory of the JASPAR homepage (http://jaspar.genereg.net).

The files will usually be extracted in the same directory thereby creating a directory containing subdirectories JASPAR_CORE, JASPAR_FAM and JASPAR_PHYLOFACTS. Given a directory specification, jasparextract will search for these subdirectories and process the files in any that exist. jasparextract will warn the user if any of the three subdirectories do not exist.

Output file format

The output file format is currently the same as the JASPAR distribution format.

Output files for usage example

Directory: JASPAR_CORE

This directory contains output files, for example MA0070.pfm MA0071.pfm MA0072.pfm MA0073.pfm MA0074.pfm MA0075.pfm MA0076.pfm MA0077.pfm MA0078.pfm MA0079.pfm and matrix_list.txt.

File: JASPAR_CORE/MA0070.pfm

 5  3 16  1  0 17 17  0  0 16 12  8
 6  9  1  1 18  1  0  0 18  1  0  2
 2  3  1  0  0  0  0  1  0  0  1  2
 5  3  0 16  0  0  1 17  0  1  5  6

File: JASPAR_CORE/MA0071.pfm

15  9  6 11 21  0  0  0  0 25
 1  1 12  2  0  0  0  0 25  0
 2  0  4  5  4 25 25  0  0  0
 7 15  3  7  0  0  0 25  0  0

File: JASPAR_CORE/MA0072.pfm

 9 17 15 35 23  2  0 28  0  0  0  0 36 15
 8  2  0  1  0 12  0  0  0  0  0 36  0  6
 8  7  3  0  0 13  0  8 36 36  0  0  0 10
11 10 18  0 13  9 36  0  0  0 36  0  0  5

File: JASPAR_CORE/MA0073.pfm

 3  1  3  0  7  9  8  4  0 11  4  1  3  4  2  4  4  4  1  4
 8 10  8 11  4  2  3  6 11  0  7 10  8  6  9  5  5  6  7  4
 0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  1  0  3  2
 0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  2  1  1  0  1

File: JASPAR_CORE/MA0074.pfm

 3  0  0  0  0  9  4  2  2  5  0  0  1  0  7
 0  0  0  0  9  0  2  4  0  0  0  0  0  9  1
 7 10  9  0  0  1  0  2  8  5 10  0  0  0  2
 0  0  1 10  1  0  4  2  0  0  0 10  9  1  0

File: JASPAR_CORE/MA0075.pfm

52 59  0  0 58
 2  0  0  0  0
 4  0  1  0  1
 1  0 58 59  0

File: JASPAR_CORE/MA0076.pfm

16  0  0  0  0 20 16  4  1
 1 20 20  0  0  0  0  1  6
 2  0  0 20 20  0  0 15  0
 1  0  0  0  0  0  4  0 13

File: JASPAR_CORE/MA0077.pfm

24 54 59  0 65 71  4 24  9
 7  6  4 72  4  2  0  6  9
31  7  0  2  0  1  1 38 55
14  9 13  2  7  2 71  8  3

File: JASPAR_CORE/MA0078.pfm

 7  8  3 30  0  0  0  0  0
 9  8 18  0  1  0  0  0 17
 6  4  1  0  0  0 31  2 10
 9 11  9  1 30 31  0 29  4

File: JASPAR_CORE/MA0079.pfm

1 2 0 0 0 2 0 0 1 2
1 1 0 0 5 0 1 0 1 0
4 4 8 8 2 4 5 6 6 0
2 1 0 0 1 2 2 2 0 6

File: JASPAR_CORE/matrix_list.txt

MA0073	22.2782723704014	RREB1	ZN-FINGER, C2H2	; acc "Q92766" ; medline "8816445" ; seqdb "SWISSPROT" ; species "Homo sapiens" ; sysgroup "vertebrate" ; total_ic "22.2800" ; type "SELEX" 
MA0075	9.06306510239135	Prrx2	HOMEO	; acc "Q06348" ; medline "7901837" ; seqdb "SWISSPROT" ; species "Mus musculus" ; sysgroup "vertebrate" ; total_ic "9.0620" ; type "SELEX" 
MA0070	14.6408952002356	Pbx	HOMEO	; acc "P40424" ; medline "7910944" ; seqdb "SWISSPROT" ; species "Homo sapiens" ; sysgroup "vertebrate" ; total_ic "14.6440" ; type "SELEX" 
MA0079	9.7185757452318	SP1	ZN-FINGER, C2H2	; acc "P08047" ; medline "2192357" ; seqdb "SWISSPROT" ; species "Homo sapiens" ; sysgroup "vertebrate" ; total_ic "9.7160" ; type "SELEX" 
MA0077	9.07881462267179	SOX9	HMG	; acc "P48436" ; medline "9973626" ; seqdb "SWISSPROT" ; species "Homo sapiens" ; sysgroup "vertebrate" ; total_ic "9.0810" ; type "SELEX" 
MA0078	10.5018372361999	Sox17	HMG	; acc "Q61473" ; medline "8636240" ; seqdb "SWISSPROT" ; species "Mus musculus" ; sysgroup "vertebrate" ; total_ic "10.5030" ; type "SELEX" 
MA0071	13.1897301896459	RORA	NUCLEAR RECEPTOR	; acc "P35397" ; medline "7926749" ; seqdb "SWISSPROT" ; species "Homo sapiens" ; sysgroup "vertebrate" ; total_ic "13.1920" ; type "SELEX" 
MA0072	17.4248426117905	RORA1	NUCLEAR RECEPTOR	; acc "P35398" ; medline "7926749" ; seqdb "SWISSPROT" ; species "Homo sapiens" ; sysgroup "vertebrate" ; total_ic "17.4230" ; type "SELEX" 
MA0074	20.4511671987138	RXR-VDR	NUCLEAR RECEPTOR	; acc "P11473" ; medline "8674817" ; seqdb "SWISSPROT" ; species "Homo sapiens" ; sysgroup "vertebrate" ; total_ic "20.4520" ; type "SELEX" 
MA0076	14.123230134165	ELK4	ETS	; acc "P28324" ; medline "8524663" ; seqdb "SWISSPROT" ; species "Homo sapiens" ; sysgroup "vertebrate" ; total_ic "14.1230" ; type "SELEX" 

Directory: JASPAR_FAM

This directory contains output files.

Directory: JASPAR_PHYLOFACTS

This directory contains output files.

Data files

None

Notes

The home page of REBASE is: http://jaspar.genereg.net/

Running this program may be the job of your system manager.

References

  1. DNA binding sites: representation and discovery Bioinformatics. 2000 Jan;16(1):16-23
  2. Applied bioinformatics for the identification of regulatory elements Nat Rev Genet. 2004 Apr;5(4):276-87

Warnings

The program will warn you if any of the JASPAR_CORE, JASPAR_FAM or JASPAR_PHYLOFACTS subdirectories do not exist.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0 unless an error is reported

Known bugs

None.

See also

Program name Description
aaindexextract Extract amino acid property data from AAINDEX
cutgextract Extract codon usage tables from from CUTG database
printsextract Extract data from PRINTS database for use by pscan
prosextract Processes the PROSITE motif database for use by patmatmotifs
rebaseextract Process the REBASE database for use by restriction enzyme applications
tfextract Process TRANSFAC transcription factor database for use by tfscan

Author(s)

Alan Bleasby (ajb © ebi.ac.uk)
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

History

Completed 23rd July 2007

Target users

This program is intended to be used by administrators responsible for software and database installation and maintenance.

Comments

None