FileRUN - Web based document flow management system

File Indexing

From FileRun Documentation

Contents

File Indexing Setup Guide

Enabling file searching

You can enable the file searching from "Control Panel" >> "System configuration".

Configuring file converters

For binary files like "DOC", "PPT", "XLS", FileRun requires the use of third-party applications. Here is a list of applications we recommended for each binary file type:

Extension Application Name Application’s Website
DOC catdoc http://www.wagner.pp.ru/~vitus/software/catdoc/
RTF catdoc same as above
PPT catppt same as above
XLS xls2csv same as above
DOCX a2tcmd http://www.jimisoft.com/en/all2txt.html (Windows only)
XLSX a2tcmd same as above
PPTX a2tcmd same as above
PDF Xpdf (pdftotext) http://www.foolabs.com/xpdf/download.html


The files are associated by their extension with the converter program. These mappings are defined in "Control Panel" >> "System configuration"

Set Index Queue Manager

As extracting the text from a binary file requires a lot of CPU processing, the files are queued and processed one at a time. This requires the script "/path-to-filerun/cron/process_search_index_queue.php" to be executed frequently. We recommend running the script every 1 or 2 minutes, so you will not have to wait to long until an uploaded file will be found by the search engine.

On a Linux server this can easily be done be setting up a cron job like this:

  1. Create a new text file at "/path-to-filerun/cron/process_search_index_queue.sh" and write the following inside:
    php process_search_index_queue.php
  2. Open a command line console (SSH)
  3. Open the crontab editor by running:
    crontab -e
  4. Write:
    * * * * * /path-to-filerun/cron/process_search_index_queue.sh
  5. Press ":wq" and "Enter" to save the changes and close the editor.

If your hosting service is running the cPanel administrative tool, it usually provides a web-based tool for setting up cron jobs easier.

On Windows this can be achieved by creating a Windows schedule event which calls a .BAT file containing something like this:

CD \path-to-filerun\cron
C:\PHP\PHP.EXE process_search_index_queue.php