File Indexing
From FileRun Documentation
Contents |
File Indexing Setup Guide
Enabling file searching
You can enable the file searching from "Control Panel" >> "System configuration".
Configuring file converters
For binary files like "DOC", "PPT", "XLS", FileRun requires the use of third-party applications. Here is a list of applications we recommended for each binary file type:
| Extension | Application Name | Application’s Website |
|---|---|---|
| DOC | catdoc | http://www.wagner.pp.ru/~vitus/software/catdoc/ |
| RTF | catdoc | same as above |
| PPT | catppt | same as above |
| XLS | xls2csv | same as above |
| DOCX | a2tcmd | http://www.jimisoft.com/en/all2txt.html (Windows only) |
| XLSX | a2tcmd | same as above |
| PPTX | a2tcmd | same as above |
| Xpdf (pdftotext) | http://www.foolabs.com/xpdf/download.html |
The files are associated by their extension with the converter program.
These mappings are defined in "Control Panel" >> "System configuration"
Set Index Queue Manager
As extracting the text from a binary file requires a lot of CPU processing, the files are queued and processed one at a time. This requires the script "/path-to-filerun/cron/process_search_index_queue.php" to be executed frequently. We recommend running the script every 1 or 2 minutes, so you will not have to wait to long until an uploaded file will be found by the search engine.
On a Linux server this can easily be done be setting up a cron job like this:
- Create a new text file at "/path-to-filerun/cron/process_search_index_queue.sh" and write the following inside:
php process_search_index_queue.php
- Open a command line console (SSH)
- Open the crontab editor by running:
crontab -e
- Write:
* * * * * /path-to-filerun/cron/process_search_index_queue.sh
- Press ":wq" and "Enter" to save the changes and close the editor.
If your hosting service is running the cPanel administrative tool, it usually provides a web-based tool for setting up cron jobs easier.
On Windows this can be achieved by creating a Windows schedule event which calls a .BAT file containing something like this:
CD \path-to-filerun\cron C:\PHP\PHP.EXE process_search_index_queue.php
