This feature is available only in the Enterprise FileRun version.

File indexing and full-text searching

Introduction

Searching files by keywords in their contents requires additional configuration and third-party software.

The feature is enabled from Control Panel » System configuration » Files » Searching.

Please follow the next steps.

Step 1: Install Apache Tika

Note: Your server needs to have Java support, in order to run Apache Tika.

You can read more about Apache Tika here: https://tika.apache.org

Running Tika in command line mode:

  1. Download the tika-app-[*].jar file from here: https://tika.apache.org/download.html
  2. Set the path to the tika-app-[*].jar file inside FileRun's control panel

That's it!

Click the Check path to make sure it works. If Java is installed on the server and the file path is correct, you should see the Apache Tika version displayed as a result of the test.

Running Tika in server mode:

Running Tika in server mode usually speeds up the indexing process.

  1. Download the tika-server-[*].jar file from here: https://tika.apache.org/download.html
  2. Start up the server: java -jar tika-server-[*].jar
  3. Set the hostname and port number (default 9998) of the Tika server
Click the Test server to make sure it works. If everything is in order you should see the Apache Tika version displayed as a result of the test.

You can also run Apache Tika in server mode using Docker: https://github.com/LogicalSpark/docker-tikaserver

Step 2: Install Elasticsearch

You can download and read more about Elasticsearch here: https://www.elastic.co

Once you have an instance of Elasticsearch running, configure it inside FileRun:

You only need to set the URL of the host. If the server is password protected, include the credentials inside the URL:

http://username:password@your-elastic-server.com

Click the Test server to make sure FileRun can connect to it. If everything is in order you should see the Elasticsearch cluster name and list of nodes.

The Test server step is not optional, as FileRun is using this to create the index if it doesn't already exist.

Step 3: Testing indexing

Run the following from the FileRun server's command line:

cd /path/to/filerun/cron
php process_search_index_queue.php

It will show the progress of processing the search indexing queue. It will extract file contents using Apache Tika and send it to Elasticsearch for indexing.

If you are getting PHP errors, you might need to specify the path of your PHP configuration file:

php -c /path/to/php.ini process_search_index_queue.php

To find out the path of the “php.ini” used by FileRun create a file “http://your-site.com/filerun/info.php”, type <?php phpinfo(); inside and open the file in your browser.

If you are getting this error from the Elasticsearch server: FORBIDDEN/12/index read-only / allow delete (api)], run this to switch of the read-only flag on the index:

curl -X PUT 'http://127.0.0.1:9200/files/_settings' --data '{"index": {"blocks": {"read_only_allow_delete": "false"}}}' --header "Content-Type: application/json"

Step 4: Set the automated indexing task

As extracting the text from a binary file requires a lot of CPU processing, the files are queued and processed one at a time. This requires the script “cron/process_search_index_queue.php” to be executed frequently. We recommend running the script every 5 minutes or so, so you will not have to wait to long until an uploaded file will be found by the search engine.

On a Linux server this can easily be done be setting up a cron job like this:

  1. Create a new text file at “cron/process_search_index_queue.sh” and write the following inside:
    php -c /path/to/php.ini process_search_index_queue.php
  2. Open a command line console (SSH)
  3. Open the crontab editor by running:
    crontab -e
  4. Write:
    * * * * * /path-to-filerun/cron/process_search_index_queue.sh
  5. Press “:wq” and “Enter” to save the changes and close the editor.

If your hosting service is running the cPanel administrative tool, it usually provides a web-based tool for setting up cron jobs easier.

On Windows this can be achieved by creating a Windows schedule event which calls a .BAT file containing something like this:

CD cron
C:/PHP/PHP.EXE process_search_index_queue.php