patmatdb
Function
Description
patmatdb searches the input protein sequences for the specified sequence motif. The output is a standard EMBOSS report file including the number of matches between the motif and each input sequence, length of match, start and end positions of match, and the motif-sequence alignment. Patterns for patmatdb are based on the format of pattern used in the PROSITE database, with the difference that the terminating dot '.' and the hyphens, '-', between the characters are optional.
Algorithm
Patterns for patmatdb are based on the format of pattern used in the PROSITE database, with the difference that the terminating dot '.' and the hyphens, '-', between the characters are optional. For example: [DE](2)HS{P}X(2)PX(2,4)C means two Asps or Glus in any order followed by His, Ser, any residue other then Pro, then two of any residue followed by Pro followed by two to four of any residue followed by Cys. The search is case-independent, so AAA matches aaa. (Any string of at least 2 characters).
Usage
Command line arguments
Input file format
patmatdb reads in one or more protein sequence USAs.
Pattern specification
Patterns for patmatdb are based on the format of pattern used in the
PROSITE database, with the difference that the terminating dot '.' and
the hyphens, '-', between the characters are optional.
The PROSITE pattern definition from the PROSITE documentation follows.
- The standard IUPAC one-letter codes for the amino acids are used.
- The symbol `x' is used for a position where any amino acid is
accepted.
- Ambiguities are indicated by listing the acceptable amino acids
for a given position, between square parentheses `[ ]'. For
example: [ALT] stands for Ala or Leu or Thr.
- Ambiguities are also indicated by listing between a pair of curly
brackets `{ }' the amino acids that are not accepted at a given
position. For example: {AM} stands for any amino acid except Ala
and Met.
- Each element in a pattern is separated from its neighbor by a
`-'. (Optional in patmatdb).
- Repetition of an element of the pattern can be indicated by
following that element with a numerical value or a numerical
range between parenthesis. Examples: x(3) corresponds to x-x-x,
x(2,4) corresponds to x-x or x-x-x or x-x-x-x.
- When a pattern is restricted to either the N- or C-terminal of a
sequence, that pattern either starts with a `<' symbol or
respectively ends with a `>' symbol.
- A period ends the pattern. (Optional in patmatdb).
For example, in SWISSPROT entry 100K_RAT you can look for the pattern:
[DE](2)HS{P}X(2)PX(2,4)C
This means: Two Asps or Glus in any order followed by His, Ser, any residue
other then Pro, then two of any residue followed by Pro followed by two to four
of any residue followed by Cys.
The search is case-independent, so 'AAA' matches 'aaa'.
Output file format
By default patmatdb writes a 'dbmotif' report file.
Data files
None.
Notes
None.
References
- Bairoch, A., Bucher P. (1994) PROSITE: recent developments. Nucleic
Acids Research, Vol 22, No.17 3583-3589.
- Bairoch, A., (1992) PROSITE: a dictionary of sites and patterns in
proteins. Nucleic Acids Research, Vol 20, Supplement, 2013-2018.
- Peek, J., O'Reilly, T., Loukides, M., (1997) Unix Power Tools, 2nd
Edition.
- Gusfield, D., (1997) Algorithms on strings, Trees and Sequences.
- Sedgewick, R,. (1990) Algorithms in C.
Warnings
None.
Diagnostic Error Messages
None.
Exit status
It always exits with status 0
Known bugs
None.
Author(s)
History
Target users
Comments