808: NFS Cache Validation - heterodb/pg-strom GitHub Wiki

Summary

NFS Cache is a function that caches data on the NFS server on the disk of the client server. This document describes how to configure NFS Cache, and the actual results of retrieving Apache Arrow files stored on other servers with NFS Cache enabled.

Configuration

Prerequisites

  • NFSv4 server must be running.
  • Kernel must support fscache, cachefiles.

1. check kernel module support

lsmod | grep -E 'fscache|cachefiles' 

2. Install required packages.

sudo dnf install -y cachefilesd 

3. Create a directory to store cache files.

mkdir /var/cache/fscache 

4. Disable SELinux

The SELinux default configuration prevents from saving the cache. You should change the SELinux configuration, but we omit it in this explanation.

sudo setenforce 0 

5. change cachefilesd configuration.

vi /etc/cachefilesd.conf 

Configuration

## Specify the directory created in ## 3.
dir /var/cache/fscache
## Specify cache tag 
tag mycache 

6. start/enable cachefilesd

sudo systemctl enable cachefilesd 
sudo systemctl start cachefilesd 

7. Mount the NFS server.

sudo mount -t nfs -o fsc <NFS Server Host>:<NFS Server Path> <Mount path> 

Experiment

Method

Star Schema Benchmark lineorder tables were output to an Apache Arrow file, placed on an NFS server, mounted with FS Cache enabled, and referenced by Arrow Fdw to measure the time required to run Star Schema Benchmark. The time required to run all of the Star Schema Benchmark was measured.

Data generation

. /dbgen -s 400 -Ta 

Arrow Fdw setup

IMPORT FOREIGN SCHEMA arroworder FROM SERVER arrow_fdw 
INTO public 
OPTIONS (file '/mnt/0/lineorder.arrow'); 

Execution Results

We measured the time (in milliseconds) taken to execute all queries #1 through #13. The measurement was performed three times. The measurement results are shown in the graph below.

image

The first time the query is executed, it takes time because the target file does not exist in the cache and needs to be retrieved via the network, but after the second time, it is clear that the cache speeds up the process.

⚠️ **GitHub.com Fallback** ⚠️