Author: Eric Talevich



Compare protein sequence alignments. Identify diagnostic residues between the
given "foreground" (FG) and  "background" (BG) clades.

If you use this software in a publication, please cite our paper that describes

    Talevich, E. & Kannan, N. (2013) Structural and evolutionary adaptation of
    rhoptry kinases and pseudokinases, a family of coccidian virulence factors.
    *BMC Evolutionary Biology* 13:117

    Available at:

Freely distributed under the permissive BSD 2-clause license (see LICENSE).


A proper installation looks like::

    python install

If you have setuptools installed, the dependencies on Biopython_, BioFrills_,
SciPy_ and ReportLab_ will be fetched and installed automatically.

.. _Biopython:
.. _biofrills:
.. _SciPy:
.. _ReportLab:

Optionally, CladeCompare can align your sequences for you if you have MUSCLE_,
HMMer3_ or MAPGAPS_ installed.

.. _HMMer3:

If you have the dependencies installed, you can use this package in-place,
without installing it. Just download the source code (git clone, or download
the ZIP file and unpack it) and include the top-level directory in your system
path, or add symbolic links to, and
to an existing directory that's in your path (e.g. ``~/bin``).


Finally, if you are on a Unix-like system (i.e. Linux, Mac or Cygwin), you can
verify your installation by running the test suite. Change to the ``test/``
directory and run ``make``::

    cd test

If CladeCompare is installed correctly, the program will run in several modes
and generate output files. View the ``.html`` files in your web browser to see
what happened.


Web interface

Launch the script ```` and fill in the form in your web browser.
The form accepts sequences in FASTA or CMA format, and you can upload an HMM
profile to align unaligned FASTA sequence sets. (See below for details about
each field.)

If you launched the application from the command line, press Ctrl-C (on
Unix-like systems) to stop the web server application.

Note that only one instance of the server will run on your system at a time; if
you launch ```` twice in a row, another browser tab or window will
open but the server will not restart.

Command line

The command-line interface ```` provides the same functionality
as the web interface, plus a few more options.  To read the built-in help and
option descriptions:: --help

Two alignments are compared by specifying the foreground and background sets,
in that order, as arguments::

    # Compare two alignments fg_aln.seq bg_aln.seq

The program prints the following information for each column in the alignment(s):

- The consensus amino acid types of the foreground and background
- p-value indicating the significance of the contrast in amino acid frequencies
- a little ASCII bar chart indicating contrast, based on the p-value.

P-values are adjusted for number of columns in the alignment with the
Benjamini-Hochberg "step-up" multiple-testing correction (false discovery rate,

Redirect the output to a file with the extension ".out"::

    # Compare two alignments fg_aln.seq bg_aln.seq > fg-v-bg.out

Or specify the output file name with the ``-o`` option (same effect):: fg_aln.seq bg_aln.seq -o fg-v-bg.out

If you're not using MAPGAPS_, it would make sense to either:

- Create a sequence alignment of all sequences, foreground and background,
  together; then divide the alignment into two FASTA files (e.g. by sequence
- Align both the foreground and background sets w