Author: Jake Biesinger; Daniel Newkirk; Alvin Chon; Yong Zhang; Tao (Foo) Liu


        README for AREM 1.0.1, based on MACS 1.4.0rc2
Time-stamp: <2011-03-01 18:21:42 Jake Biesinger>

* Introduction

High-throughput sequencing coupled to chromatin immuno-
precipitation (ChIP-Seq) is widely used in characterizing genome-wide
binding patterns of transcription factors, cofactors, chromatin modifiers,
and other DNA binding proteins. A key step in ChIP-Seq data analysis
is to map short reads from high-throughput sequencing to a reference
genome and identify peak regions enriched with short reads. Although
several methods have been proposed for ChIP-Seq analysis, most ex-
isting methods only consider reads that can be uniquely placed in the
reference genome, and therefore have low power for detecting peaks lo-
cated within repeat sequences. Here we introduce a probabilistic ap-
proach for ChIP-Seq data analysis which utilizes all reads, providing a
truly genome-wide view of binding patterns. Reads are modeled using a
mixture model corresponding to K enriched regions and a null genomic
background. We use maximum likelihood to estimate the locations of the
enriched regions, and implement an expectation-maximization (E-M) al-
gorithm, called AREM, to update the alignment probabilities of each
read to different genomic locations.

For additional information, see our paper in RECOMB 2011 or visit our website:

AREM is based on the popular MACS peak caller, as described below:

With the improvement of sequencing techniques, chromatin
immunoprecipitation followed by high throughput sequencing (ChIP-Seq)
is getting popular to study genome-wide protein-DNA interactions. To
address the lack of powerful ChIP-Seq analysis method, we present a
novel algorithm, named Model-based Analysis of ChIP-Seq (MACS), for
identifying transcript factor binding sites. MACS captures the
influence of genome complexity to evaluate the significance of
enriched ChIP regions, and MACS improves the spatial resolution of
binding sites through combining the information of both sequencing tag
position and orientation. MACS can be easily used for ChIP-Seq data
alone, or with control sample with the increase of specificity.

The original MACS package is available at:

* Install

Please check the file 'INSTALL' in the distribution.

* Usage

Usage: arem <-t tfile> [-n name] [-g genomesize] [options]

Example: arem -t ChIP.bam -c Control.bam -f BAM -g h -n test -w --call-subpeaks

arem -- Aligning Reads by Expectation-Maximization, based on Model-based Analysis for ChIP-Sequencing (MACS)

  --version             show program's version number and exit
  -h, --help            show this help message and exit.
  -t TFILE, --treatment=TFILE
                        ChIP-seq treatment files. REQUIRED. When ELANDMULTIPET
                        is selected, you must provide two files separated by
                        comma, e.g.
  -c CFILE, --control=CFILE
                        Control files. When ELANDMULTIPET is selected, you
                        must provide two files separated by comma, e.g.
  -n NAME, --name=NAME  Experiment name, which will be used to generate output
                        file names. DEFAULT: "NA"
  -f FORMAT, --format=FORMAT
                        Format of tag file, "AUTO", "BED" or "ELAND" or
                        "ELANDMULTI" or "ELANDMULTIPET" or "ELANDEXPORT" or
                        "SAM" or "BAM" or "BOWTIE". The default AUTO option
                        will let MACS decide which format the file is. Please
                        check the definition in 00README file if you choose EL
                        E. DEFAULT: "AUTO"
  --petdist=PETDIST     Best distance between Pair-End Tags. Only available
                        when format is 'ELANDMULTIPET'. DEFAULT: 200
  -g GSIZE, --gsize=GSIZE