Author: Andy Jenkins



A generator-based tool and framework for piping log data (and other sources) from inputs to outputs,
with any number of filters in between.

Pypelogs is modeled after the Unix shell: it's fed from an input file or STDIN, sends input through any number
of filters, then outputs to STDOUT or a specified sink. Internally, pypelogs is treating each input line
as an "event" (a Python dict object), similar to Logstash or other event processors that work on JSON objects.
Pypelogs can input and output data as either plain text or JSON.  Many other output options are possible
(e.g. output events as documents into a MongoDB instance).

Pypelogs makes extensive use of Python generators.  This benefits in a few ways:
* New directives are very easy to create.  Many are less that 20 lines of code.
* Event processing is very efficient.  Events are 'pulled' through the system, so that if the
output is blocked for any reason, then the tool simply stops reading in more data.

## Installation

Pypelogs can be installed via Pip. (coming soon...)

## Usage

Pypelogs installs itself as `pl` (coming soon...).  The syntax is as follows:

    $ pl input [filter1 filter2 ... filterN] [output]

* The first argument is an input.  It tells pypelogs where and how to parse events from a source.
* After that, one or more filters are supplied.  Filters transform the input events.
* Optionally, an output is specified.  If the output is omitted, then pypelogs will output JSON to STDOUT.

## Example 1 - Basic text processing

Pypelogs can do stream processing similar to the Unix command line.
As a basic example, let's find all of the directories in my PC documents that contain a `.jpg` file.
This example could be accomplished with a perl one-liner; the point here is to demonstrate the
ability to compose pypelogs directives:

    $ find Documents -name '*.jpg' | pl text split:text,/,dir=[0:-1] groupby:dir bucket:0 keep:dir

The first part is just the Cygwin find, which will list all `*.jpg` files under the named path.  Here is
what pypelogs does:

1. `text` - Parse STDIN as text, creating a new event (Python dict) per line.  Each event has a single field
   called `text` which is the text of the line
1. `split:text,/,dir=[0:-1]` - Split the `text` field of each incoming event at the `/` character.  Join back
   the first through next-to-last elements and assign it to `dir`.  Each event now has a `dir` field indicating
   the parent directory of the file.
1. `groupby:dir` - Group incoming events by the value of the `dir` field (so that events with the same value
   of `dir` are condensed.  This will slurp in all incoming events and yield a single list as output containing
   the discovered dirs, and the number of events for each (the `count` field)
1. `bucket:0` - The `groupby` filter turned our event stream into a bucket (list of dict objects).  Convert
   it back to a stream of individual events.
1. `keep:dir` - Keep only the `dir` field of each event.  In this case, the original `text` field and the `count`
   field are stripped

The output looks something like the following:

    {"dir": "Documents/PC/OldPC/UVA Club/newsletter 02"}
    {"dir": "Documents/PC/HoldMail.jsp_files"}
    {"dir": "Documents/Personal/Kenneth Robert/Contributions_files"}

Oops, let's sort by directory name.  The command is the same, but with an extra filter on the end to sort:

    $ find Documents -name '*.jpg' | pl text split:text,/,dir=[0:-1] groupby:dir bucket:0 keep:dir sort:dir

Note the added filter to sort by the `dir` field.  Now output looks something like the following:

    {"dir": "Documents/Eve/Penguin Movie"}
    {"dir": "Documents/Gear11"}
    {"dir": "Documents/Mom Finances"}
    {"dir": "Documents/Mom Finances/Check scans"}
    {"dir": "Documents/PC"}
    {"dir": "Documents/PC/Baby Stuff"}

Maybe we want to feed this list into xargs to do some more processing.  Let's add an output filter to convert
the events to simple text strings:

    $ find Documents -name '*.jpg' | pl text split:text,/,dir=[0:-1] groupby:dir bucket:0 keep:dir sort: