wildcard.pdfpal
HomePage: http://pypi.python.org/pypi/wildcard.pdfpal
Author: Nathan Van Gheem
Download: https://pypi.python.org/packages/source/w/wildcard.pdfpal/wildcard.pdfpal-0.7b6.zip
Introduction ============ This package provides some nice integrations for PDF heavy web sites. * Generates thumbnails from PDF * Adds folder view for pdfs so it can use the generated thumbnail * Adds OCR for PDF indexing * Everything configurable so you can choose to not use thumbnail gen or OCR * Ability to create searchable PDFs with HOCR * use the `@@async-monitor` url to monitor asynchronous jobs that have yet to run OCR --- OCR requires Ghostscript to be installed and Tesseract. Just you package management to install these packages: # sudo apt-get install ghostscript tesseract-ocr This will install tessact 2 not tesseract 3. Searchable PDFs --------------- Requires svn checkout of tesseract version 3.01 or 3.00 with the hocr configuration in place. Take a look at this thread to find out how to configure hocr http://ubuntuforums.org/showthread.php?t=1647350 In addition, you'll need exactimage and pdftk installed # sudo apt-get install exactimage pdftk libtiff-tools To not use the latest tesseract version to will have to add this in your instances declaration: environment-vars += AUTHORIZE_OLD_TESSERACT_VERSION true Plone 3 ------- * Requires hashlib Extra ----- You can convert all at once by calling the url `@@queue-up-all`. Changelog ========= 0.7b6 ~ 2012-04-20 ------------------ -fix uninstall [vangheem] 0.7b5 ~ 2012-04-19 ------------------ - do not run conversion if documentviewer is installed [vangheem] - add better uninstall support [vangheem] 0.7b4 ~ 2012-04-09 ------------------ - fix image url for album view. [vangheem] 0.7b3 ~ 2012-04-05 ------------------ - fix content type spec for thumbnail response [vangheem] - display image thumb urls in in album view [vangheem] 0.7b2 ~ 2011-04-12 ------------------ - more checks on reading files [vangheem] - provide button to manually index document [vangheem] - add ability to split pdf up into multiple PDFs [vangheem] 0.7b1 ~ 2011-01-06 ------------------ - fixes for quality and size issues [vangheem] 0.6b2 ~ 2011-01-04 ------------------ - fix async monitor view to work with plone.app.async = 1.0 It changed the order of some args in the job. [vangheem] 0.6b1 ~ 2011-01-04 ------------------ - added ability to make PDFs searchable and make it work seamlessly if wc.pageturner is installed so flex paper is created with the searchable PDF version. 0.5b5 ~ 2010-12-07 ------------------ - did not conditionally import plone.app.async 0.5b4 ~ 2010-12-06 ------------------ * better info on async monitor * only reindex searchabletext when doing OCR so the modification date on the object does not get set. * make sure to catch exceptions so it doesn't leave around files after a bad conversion * add colorbox for pdf folder view 0.5b3 ~ 2010-12-02 ------------------ * add ability to queue up all pdf files 0.5b2 - 2010-12-02 ------------------ * fix async monitor view 0.5b1 - 2010-12-02 ------------------ * Initial release