fammer

HomePage: http://github.com/etal/fammer

Author: Eric Talevich

Download: https://pypi.python.org/packages/source/f/fammer/fammer-0.2.tar.gz

        ======
Fammer
======

Utilities for curating a hierarchical set of sequence profiles representing a
protein superfamily.

If you use this software in a publication, please cite our paper that describes
it:

    Talevich, E. & Kannan, N. (2013) Structural and evolutionary adaptation of
    rhoptry kinases and pseudokinases, a family of coccidian virulence factors.
    *BMC Evolutionary Biology* 13:117
    doi:10.1186/1471-2148-13-117

    Available at: http://www.biomedcentral.com/1471-2148/13/117

Freely distributed under the permissive BSD 2-clause license (see LICENSE).

Installation
------------

The installation consists of a Python library, ``fammerlib``, and two scripts,
``fammer.py`` and ``tmalign.py``.

Download the .zip file and unpack it, or clone this Git repository, to get the
source code.

To use all the features of Fammer, you'll need the following third-party
programs installed:

- Python_ 2.7
- MAFFT_
- HMMer_ 3.0
- MAPGAPS_ (optional)
- TMalign_ (optional) -- for structural alignments
- FastTree_ (optional) -- for clustering

.. _Python: http://www.python.org/download/
.. _MAFFT: http://mafft.cbrc.jp/alignment/software/
.. _HMMer: http://hmmer.janelia.org/
.. _MAPGAPS: http://mapgaps.igs.umaryland.edu/
.. _TMalign: http://cssb.biology.gatech.edu/skolnick/webservice/TM-align/index.shtml
.. _FastTree: http://www.microbesonline.org/fasttree/

.. For hackers, also PRANK: http://code.google.com/p/prank-msa/

If you're on a Debian-based Linux system, check your package manager for these
first to save yourself some time::

    sudo apt-get install mafft hmmer tm-align fasttree python-pip

Then, install the Python library dependencies and Fammer itself as follows.

Recommended:
````````````

Install the Python packaging system pip or setuptools. Then run the setup
script, and all Python dependencies will be pulled in::

    python setup.py build
    python setup.py install

(You might need root privileges for the last step.)

Manual:
```````

Install the Python libraries Biopython_, biofrills_, biocma_ and networkx_.
Then run the setup script as above.

.. _Biopython: http://biopython.org/wiki/Download
.. _biofrills: https://github.com/etal/biofrills
.. _biocma: https://github.com/etal/biocma
.. _networkx: http://networkx.lanl.gov/



Basic usage
-----------

Global options:

  ``-h``, ``--help``
      Show a help message and basic usage.
  ``--quiet``
      Don't print status messages, only warnings and errors.

Sub-commands:

    `build`_
        Build profiles from a given directory tree.
    `update-fasta`_
        Replace original FASTA sequence sets with the ungapped sequences from
        the corresponding alignment (.aln) files, sorted by decreasing length.
    `scan`_
        Scan and classify input sequences using a set of profiles.
    `add`_
        Scan a target database with the given HMM profile set.  Add hits that
        meet acceptance thresholds to the profile FASTA files.
    `refine`_
        Leave-one-out validation of HMM profiles.
    `cluster`_
        Split a sequence set into clusters (based on phylogeny).


Commands
--------

build
`````

Construct a profile database from a directory tree of family profile alignments.

Assume we have a directory tree set up under ``Superfamily/`` as above.
Next, run ``fammer.py build Superfamily`` to align all sequence files with
MAFFT, and (recursively up) align the consensus sequences of each subfamily
together::

    Superfamily/
        Group1/
            subfam1_Group1.fasta
            subfam1_Group1.aln
            subfam2_Group1.fasta
            subfam2_Group1.aln
            subfam3_Group1.fasta
            subfam2_Group1.aln
            ...
            Group1-Unclassified.fasta
            Group1-Unclassified.aln
        Group1.aln
        ...
        Superfamily-Unclassified.fasta
        Superfamily-Unclassified.aln
    Superfamily.aln

The alignments are in un-wrapped Clustal format.

You can manually adjust the alignments and rebuild, if desired, perhaps
iteratively. Only the "parent" family alignments will be rebuilt as needed, e.g.
i