textsearch |
textsearch searches for words (specified as a regular expression) in the description text of one or more input sequences. It writes an output file with optional contents such as the name, description and accession number of any sequence whose description line from the annotation matches the search term. Optionally, the search is case-sensitive and the results output as an HTML table. textsearch is convenient for small input files but will be slow for larger files and databases; you should use use SRS or Entrez instead.
textsearch searches only the description line, not the full sequence annotation.
Search for 'lactose':
% textsearch tsw:* 'lactose' Search the textual description of sequence(s) Output file [edd1_rat.textsearch]: |
Go to the input files for this example
Go to the output files for this example
Example 2
Search for 'lactose' or 'permease' in E.coli proteins:
% textsearch tsw:*_ecoli 'lactose | permease' Search the textual description of sequence(s) Output file [laci_ecoli.textsearch]: |
Go to the output files for this example
Example 3
Output a search for 'lacz' formatted with HTML to a file:
% textsearch tembl:* 'lacz' -html -outfile embl.lacz.html Search the textual description of sequence(s) |
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers: [-sequence] seqall (Gapped) sequence(s) filename and optional format, or reference (input USA) [-pattern] string The search pattern is a regular expression. Use a | to indicate OR. For example: human|mouse will find text with either 'human' OR 'mouse' in the text (Any string is accepted) [-outfile] outfile [*.textsearch] Output file name Additional (Optional) qualifiers: -casesensitive boolean [N] Do a case-sensitive search -html boolean [N] Format output as an HTML table Advanced (Unprompted) qualifiers: -only boolean [N] This is a way of shortening the command line if you only want a few things to be displayed. Instead of specifying: '-nohead -noname -nousa -noacc -nodesc' to get only the name output, you can specify '-only -name' -heading boolean [@(!$(only))] Display column headings -usa boolean [@(!$(only))] Display the USA of the sequence -accession boolean [@(!$(only))] Display 'accession' column -name boolean [@(!$(only))] Display 'name' column -description boolean [@(!$(only))] Display 'description' column Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outfile" associated qualifiers -odirectory3 string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages |
Standard (Mandatory) qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
(Gapped) sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required |
[-pattern] (Parameter 2) |
The search pattern is a regular expression. Use a | to indicate OR. For example: human|mouse will find text with either 'human' OR 'mouse' in the text | Any string is accepted | An empty string is accepted |
[-outfile] (Parameter 3) |
Output file name | Output file | <*>.textsearch |
Additional (Optional) qualifiers | Allowed values | Default | |
-casesensitive | Do a case-sensitive search | Boolean value Yes/No | No |
-html | Format output as an HTML table | Boolean value Yes/No | No |
Advanced (Unprompted) qualifiers | Allowed values | Default | |
-only | This is a way of shortening the command line if you only want a few things to be displayed. Instead of specifying: '-nohead -noname -nousa -noacc -nodesc' to get only the name output, you can specify '-only -name' | Boolean value Yes/No | No |
-heading | Display column headings | Boolean value Yes/No | @(!$(only)) |
-usa | Display the USA of the sequence | Boolean value Yes/No | @(!$(only)) |
-accession | Display 'accession' column | Boolean value Yes/No | @(!$(only)) |
-name | Display 'name' column | Boolean value Yes/No | @(!$(only)) |
-description | Display 'description' column | Boolean value Yes/No | @(!$(only)) |
# Search for: lactose tsw-id:LACI_ECOLI LACI_ECOLI P03023 Lactose operon repressor. tsw-id:LACY_ECOLI LACY_ECOLI P02920 Lactose permease (Lactose-proton symport). |
# Search for: lactose | permease tsw-id:LACI_ECOLI LACI_ECOLI P03023 Lactose operon repressor. tsw-id:LACY_ECOLI LACY_ECOLI P02920 Lactose permease (Lactose-proton symport). |
|
The first column in the name or ID of each sequence. The remaining text is the description line of the sequence.
When the -html qualifier is specified, then the output will be wrapped in HTML tags, ready for inclusion in a Web page. Note that tags such as <HTML>, <BODY>, </BODY> and </HTML> are not output by this program as the table of databases is expected to form only part of the contents of a web page - the rest of the web page must be supplier by the user.
The lines of out information are guaranteed not to have trailing white-space at the end. So if '-nodesc' is used, there will not be any whitespace after the ID name.
This is a rather slow way to search for text in databases. If you are searching for text in public databases, you should consider using either Entrez (http://www.ncbi.nlm.nih.gov/Entrez/) or SRS (http://srs.rfcgr.mrc.ac.uk/ or http://www.sanger.ac.uk/srs6/ etc.) instead.
Program name | Description |
---|---|
abiview | Display the trace in an ABI sequencer file |
cirdna | Draws circular maps of DNA constructs |
infoalign | Display basic information about a multiple sequence alignment |
infobase | Return information on a given nucleotide base |
inforesidue | Return information on a given amino acid residue |
infoseq | Display basic information about sequences |
lindna | Draws linear maps of DNA constructs |
pepnet | Draw a helical net for a protein sequence |
pepwheel | Draw a helical wheel diagram for a protein sequence |
prettyplot | Draw a sequence alignment with pretty formatting |
prettyseq | Write a nucleotide sequence and its translation to file |
remap | Display restriction enzyme binding sites in a nucleotide sequence |
seealso | Finds programs with similar function to a specified program |
showalign | Display a multiple sequence alignment in pretty format |
showdb | Displays information on configured databases |
showfeat | Display features of a sequence in pretty format |
showseq | Displays sequences with features in pretty format |
sixpack | Display a DNA sequence with 6-frame translation and ORFs |
tfm | Displays full documentation for an application |
whichdb | Search all sequence databases for an entry and retrieve it |
wossname | Finds programs by keywords in their short description |