Using Docker with Full-Text Search

This guide assumes you already have FileRun installed via Docker.

Your server would need at least 2GB of RAM memory for ElasticSearch.

Edit your existing docker-compose.yml to include the two additional services (tika and elasticsearch) and link them by service name:

 1services:
 2  [...]
 3  web:
 4    image: [FileRun]
 5    links:
 6      - db
 7      - tika
 8      - elasticsearch
 9  tika:
10    image: logicalspark/docker-tikaserver
11  elasticsearch:
12    image: docker.elastic.co/elasticsearch/elasticsearch-wolfi:9.2.1
13    container_name: elasticsearch
14    environment:
15      - "discovery.type=single-node"
16      - "xpack.security.http.ssl.enabled=false" #disable SSL
17      - "xpack.security.enabled=false" #disable authentication
18      - "xpack.security.enrollment.enabled=false" #disable authentication
19      - cluster.name=docker-cluster
20      - bootstrap.memory_lock=true
21      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
22    ulimits:
23      memlock:
24        soft: -1
25        hard: -1
26    mem_limit: 6g
27    volumes:
28      - /filerun/esearch:/usr/share/elasticsearch/data

For more information on running ElasticSearch via Docker, please see the official documenation.

Please note the above volumes configuration for the Elasticsearch index data, with the mount path set to /usr/share/elasticsearch/data. Chown this folder 1000:1000.

You can use the following command with the docker-compose file above:

Enable full-text file indexing

To configure the file indexing feature please follow this guide.

The Elasticsearch Host URL that needs to be configured is http://elasticsearch:9200.

The Apache Tika server hostname should be configured with tika and the port number 9998.

Setup the cron indexing process

From the server's command line, open a console inside the FileRun Docker container:

 1docker exec -it filerun bash

filerun is the container name. You can use the ID if a name is not given. To check the Docker containers ID, you can use the docker ps command.

Create the indexation script file which will run periodically:

 1vim /var/www/html/cron/process_search_index_queue.sh

and paste (press i and then CTRL+V) the following inside:

 1php /var/www/html/cron/process_search_index_queue.php

Press Esc then :wq and Enter to save the changes and close the editor.

Adjust the script file permissions by making it executable:

 1chmod 755 /var/www/html/cron/process_search_index_queue.sh

Open the crontab:

 1vim /etc/crontab

and paste (press i and then CTRL+V) the following at its end (leaving the empty line at the bottom of the file):

 1* * * * * root /var/www/html/cron/process_search_index_queue.sh

Press Esc then :wq and Enter to save the changes and close the editor.

You should now have FileRun automatically index inside Elasticsearch the contents of all the file types supported by Apache Tika. Note that the above cronjob runs once every minute, and it may take a minute or two for a file to be found by its content after uploading.

Important note: If your FileRun Docker container ever gets stopped for some reason, you will need to redo the "Setup the indexing process" section.