HomePage: http://www.jarn.com/

Author: Jarn AS

Download: https://pypi.python.org/packages/source/e/experimental.catalogqueryplan/experimental.catalogqueryplan-3.2.8.zip


While the catalog tool in Zope is immensely useful, we have seen some slowdowns
in large Plone sites with a combination of additional indexes and lots of

The catalog implementation is using BTree set operations like union, multiunion
and intersection. Those operations are fairly fast, especially when everything
is in memory. However, the catalog implementation is rather naive which leads
to lots of set operations on rather big sets.

Query plan

Search engines and databases uses query optimizers to select query plans that
will minimize the result set as early as possible, because working with large
amounts of data is time consuming.

What we want to do is to search against the indexes giving the smallest result
set first. However, for that to be useful, we need to pass that result along
into the indexes to allow the indexes to limit the result set as soon as
possible internally. When calculating a path search, there is no need to look
in all 150000 results if the portal type index has already limited the possible
result to 10000. If we have already limited the result to 10000 results, all
set operations are going to be significantly faster.

We identify different searches by the list of indexes that are searched. If
there are no query plans for a set of indexes, the query is run like normal
while storing the number of results for each index. When all indexes have been
checked, the list is sorted on number of results and stored as a query
plan. Next time a search on the same indexes comes in, the query plan is
looked up.

To get different query plans for similar queries, you can provide additional
bogus index names. They will be ignored by the catalog, but will become part of
the key. For indexes that have only a small number of distinct values the
query value will become part of the key as well. These type of indexes often
have an uneven distribution of indexed keys to values. For example there might
be very few `pending` documents in a site, but many `published` ones.


To test, import the monkey patch in other tests, like CMFPlone::

 import experimental.catalogqueryplan

and run the test.


Development of this project takes place at:


3.2.8 - 2013-01-07

- Port ExtendedPathIndex fix from ExtendedPathIndex 3.0.1.
  [mike.rhodes, hannosch]

3.2.7 - 2011-08-23

- Backport c122666 from ZCatalog, fixing batching restriction in early part
  of second half of the batch.
  [davisagli, hannosch]

3.2.6 - 2011-08-21

- Backport from e.btree: Update to Cython 0.15.

- Backport from e.btree: Correct small/big assignment if only the first
  argument is a tree set.

- Backport from e.btree: Avoid intersection optimizations if both arguments
  are non-tree sets.

3.2.5 - 2011-05-27

- Backport c50071 from Products.CMFPlone to fix batch handling.

- Backport c121708 from ZCatalog, fixing the addition of two LazyCat's if any
  of them had already been flattened.

3.2.4 - 2011-04-27

- Fix possible TypeError in `sortResults` method if only b_start but not b_size
  has been provided.

3.2.3 - 2011-04-10

- Specify supported Python versions.

3.2.2 - 2011-04-09

- Backport c121349 from ZCatalog, optimizing the date range index to add a
  floor and ceiling date. In this version it's hardcoded values.

- Backport c121191 from ZCatalog, which fixes an edge-case in the date range
  index optimization.

3.2.1 - 2011-03-16

- Specify minimum requirement for the 3.2.x series of Plone >= 4.0.3.

3.2.0 - 2011-03-08

* Patched `PloneBatch.__getitem__` to work with new limited batch results.

* Backported sort/batch improvements from ZCatalog 2.13.4.

* Update to Cyth