picalo

HomePage: http://www.picalo.org/

Author: Conan C. Albrecht

Download: https://pypi.python.org/packages/source/p/picalo/picalo-4.94.tar.gz

        Picalo
                          
                  Data Analysis Library
                  
                  http://www.picalo.org/

Picalo is a Python library to help anyone who works with data files, 
especially those who work with data in relational/spreadsheet format.  
It is primarily created for investigators and auditors search through 
data sets for anomalies, trends, ond other information, but it is 
generally useful for any type of data or text files.

Picalo is different from NumPy/Numarray in that it is meant for
heterogeneous data rather than homogenous data.  In NumPy, you
have an array (table) of the same type--all ints, for example.
In Picalo, you have a table made up of different column types,
very similar to a database.

One of Picalo's primary purposes is making relational
databases easier to work with.  Once you have a Picalo table, 
you can add, move, or delete columns; work with records (horizontal
slices of the data); select and group records in various ways;
and run analyses on tables.  Picalo includes adapters for popular
databases, and it provides a Query object that make queries seem
just like regular Tables (except they are live from the database).

If you work with relational databases, delimited (CSV/TSV) files, 
EBCDIC files, MS Excel files, log files, text files, or other 
heterogeneous datasets, Picalo might make your life easier.

Picalo is programmed to be as Pythonic as possible.  It's core objects--
tables, columns, records--they act like lists.  A column is a list of cells.
A record is a list of cells.  A table is a list of records.  Tables can be 
sorted via the sort function, just like the Sorting HowTo shows.  The return
values of almost all functions are new tables, so functions can be chained
together like pipes in Unix.

Picalo includes an optional Project object that stores tables in
Zope Object DB files.  When Projects are used, Picalo automatically
swaps records in and out of memory as needed to ensure efficient use of 
resources.  Projects allow Picalo to work with essentially an unlimited
amount of data.

The project was started in 2003 by Conan C. Albrecht, a professor
in Information Systems at Brigham Young University.  Conan remains
the primary developer of Picalo.

Here's an example of Picalo code loading a CSV and working with it:

    # import the picalo libraries and turn off visual progress bars
    import picalo, StringIO
    picalo.use_progress_indicators(False)

    # load the csv, could have been from a filename
    csv = '''Name,Age
    Homer,35
    Marge,34
    Lisa,8
    Bart,10
    '''
    table = picalo.load_csv(StringIO.StringIO(csv))

    table.set_type('Age', int)  # set the type of the Age column (csv defaults types to str)
    table.view()                # prints a formatted table
    print table[0].Age          # prints 35
    print table[0]['Age']       # also prints 35
    print table[0][1]           # again prints 35
    print table[-1].Name        # prints Bart
    table2 = table[0:2]         # get a slice of records
    for name in table.column('Name'):
      print name                # prints the names, one by one

    # insert a column, which defaults cells to None
    table.insert_column(1, 'DoubleAge', int)
    # change cells using an expression
    table.replace_column_values('DoubleAge', 'record.Age * 2')

    # sort by Name, then Age
    picalo.Simple.sort(table, True, 'Name', 'Age')
    # sort in more Pythonic way (only by Name this time)
    table.sort(key=lambda r: r.Name)

    # print the std. dev. of the age column
    print picalo.stdev(table.column('Age'))

    # select records by regex, those containing 'a'
    table2 = picalo.Simple.select_by_regex(table, Name='^.*a.*$')

    # filter the existing table, then clear the filter
    table.filter('record.Age > 20')
    print len(table)            # prints 2
    table.clear_filter()
    print len(table)            # prints 4

    # reorder the columns 
    table.reorder_columns(['Age', 'Name', 'DoubleAge'])

    # add a live, calculated column
    table.append_calculated('ReverseNam