Areas requiring software development


The table below includes libraries, applications or areas of research which have been suggested as being useful additions to EMBOSS or EMBASSY.

This is a list of things that you might like to volunteer to work on. If you do decide to work on one or more of these areas, please let the mailing list know - many people may wish to collaborate with you or suggest easier ways of doing things.

If you have an idea for a library or an application, but do not have the time to work on it yourself, let the mailing list know and it will be added to this web page.

Name Priority Status Description Comments
AJAX code refactoring High Active Function & parameter renaming and major documentation revision In preparation for future EMBOSS developments.
BLAST wrapper High Inactive Wrapper to the BLAST suite of programs Probably individual applications for BLASTP, BLASTN etc.
fasta High Inactive Integration of Bill Pearson's FASTA, TFASTA, etc as EMBASSY wrapper. Bill Pearson mentioned he would like to do this, but it needs some way for sequences to be fetched again (e.g. saving file number and offset for 'any' sequence access method). The code is part of the way there.
rnafolding High Inactive Integration of external RNA folding applications. The Zuker package may still be about the best.
hitmatch Medium Inactive Replacement for EGCG's equickmatch, using blastn output Needs a blast or fasta output parser. Should read in the query and database sequences, and perform a full NW or SW alignment, word-based if possible as they should be near-perfect matches. The aim is to report only those matches above a given threshold, and report the full alignments. If possible, with only the *differences* marked instead of the similarities.

No documentation for equickmatch is available.
alignutils Medium Inactive Sequence alignment utilities, to replace EGCG sortconsensus. alignutils documentation is available. Could implement various alignment site-scoring algorithms..
dodayhoffstat Medium Inactive Replacement for EGCG's dodayhoffstat dodayhoffstat documentation is available. Relatively easy to do.
mapplot Medium Inactive For displaying restriction plots. mapplot was specifically requested.
dottie Medium Inactive A general interactive dot plot application. Could use what's available, e.g. Erik Sonnhammer's dotter. A new implementation would requires interactive graphics.
nucstats Medium Inactive To report nucleic acid "vital statistics", e.g. ACGT composition etc. See pepstats for ideas.
plasmid drawing Medium Inactive To draw plasmids with restriction sites. As a replacement for MapPlot from GCG. Perhaps modify cirdna? See TACG (http://tacg.sourceforge.net/) for ideas.
fastacheck Low - maybe remove Inactive Replacement for EGCG's fastacheck fastacheck documentation is available. Simple to do, but now FASTA has better statistics probably not needed. Functionality to read FASTA statistics and select hits might be useful regardless though.
gapframe Low Inactive Adjust gap positions to be only at codon boundaries in a DNA sequence with known CDS position(s). Easy to do but requirement might be too low to justify it.
homologies Medium Inactive Table of the pairwise distances of aligned sequences. The EMBASSY allversusall application does this and could be moved to EMBOSS.
neural-nets Low Inactive Neural net routines and applications. Lots of free packages; Jose Valverde was working on this and recommended using XNN in 2002. Might be better alternatives now. Neureka is available from ftp://ftp.ii.uib.no/pub/neureka/. Not a high priority.
GAs Low Inactive Genetic Algorithm routines and applications.
Feature display Medium Inactive Graphical display of selected features from a feature table. Possible with plplot but probably better with a new graphics library.
Feature utilities Medium Inactive Operation on a feature table file to extract selected features to another file. Should be turned into a quite extensive set of library functions.
Cluster Low Inactive This program is still in the 'test' set of programs. Sanger stopped using it therefore probably not needed. Easiest route to get clustering functionality at application level might be to use e.g. SANBI's stuff (but what about license?) AJAX clustering routines would be useful.
ALIEN Medium Inactive Multiple alignment program. Many multiple alignment programs are available and could be wrapped.
Gene ID programs Medium Inactive Would be useful. Is there non-commercial code for this?
genetrans Medium Inactive Replacement for EGCG's genetrans genetrans documentation is available. Functionality possibly redundant with existing EMBOSS apps though - check !
QA tests Complete Complete QA application testing using set of standard outputs and simple parsing of the results. Scripts to test output of expected results of EMBOSS programs.