User Guide - jmazanec15/opensearch-knn-perf-tool GitHub Wiki

Installation

Requirements:

  • Python 3.7
  • Docker

Clone the repo to your machine and install the required packages:

git clone https://github.com/jmazanec15/opensearch-knn-perf-tool.git
cd opensearch-knn-perf-tool
pip install -r requirements.txt

Usage

Before running tests, make sure to create the output/ directory if you haven't already:

mkdir output

To run a test with the default settings, run:

docker compose --file docker/opensearch/docker-compose.yml --env-file docker/opensearch/docker-compose.env --project-directory . up

NOTE: While it is recommended to run test inside Docker with docker compose up, it is not required. The test command can run OpenSearch tests, as long as the tool is run with OpenSearch running on PORT 9200.

After the tests have finished running, you can run

docker compose --file docker/opensearch/docker-compose.yml --env-file docker/opensearch/docker-compose.env --project-directory . down

to remove any stopped containers and run

docker system prune --volumes -f

to remove any leftover images and volumes to free up even more disk space.

The remaining examples are incomplete and need to be updated.

While test command is recommended to be run through Docker, the other diff and plot commands can and should be run locally directly with python.

To run a diff with default settings, run:

python opensearch-knn-perf-tool.py diff result1.json result2.json

To run a plot with default settings, run:

python opensearch-knn-perf-tool.py plot result1.json result2.json

Syntax

The general syntax of the tool is:

python opensearch-knn-perf-tool.py [--log LOGLEVEL] <command> <configuration-options> ...

--log       log level of tool, options are: info, debug, warning, error, critical

The tool has three commands:

test

python opensearch-knn-perf-tool.py test <path-to-config.yml> <path-to-output.json>

diff (not implemented yet)

python opensearch-knn-perf-tool.py diff <test-result1.json> <test-result2.json> [--file diff-output-path.json] [--metadata] [--keys key1,key2,...]

--file        file path of output, outputs diff to file instead of console
--metadata    add test result metadata into the diff
--keys        specify keys from test result to diff

plot (not implemented yet)

python opensearch-knn-perf-tool.py plot <test-result1.json> <test-result2.json> ...

Configuration

The tool is configured using YAML files. There is one top level tool configuration file tool.yml that has general tool settings along with k-NN service specific settings in service.yml files.

For example, the default OpenSearch tool.yml is:

test_name: opensearch_test
test_id: opensearch_index
knn_service: opensearch
service_config: config/opensearch/service.yml
dataset: dataset/data.hdf5
dataset_format: hdf5
test_parameters:
  num_runs: 10

and default OpenSearch service.yml is:

index_spec: config/opensearch/index-spec.json
max_num_segments: 10
index_thread_qty: 1
bulk_size: 500
k: 10

The format of the configuration files are defined with Cerberus' validation schema and can be found here.

Tests

We have provided some sample tests that cover common use cases such OpenSearch indexing or NMSLIB querying. These can be specified with test_id in the tool.yml file. The currently available tests are:

Test Name Description
opensearch_index
  • Creates an OpenSearch index with a disabled refresh interval
  • Bulk injests vectors from dataset
  • Refreshes index
opensearch_query
  • Queries each vector against an OpenSearch index
nmslib_index
  • Creates an NMSLIB index
  • Adds vectors to index
  • Creates index
nmslib_query
  • Queries each vector against an NMSLIB index

In order to add new tests, see here.

To learn more about the test setup, see here.

Docker

Each Docker image comes with an accompanying docker-compose.yml at docker/<service>/docker-compose.yml.

To customize the docker compose settings, we provide a .env file at docker/<service>/docker-compose.env. For example, the default OpenSearch docker-compose.env is:

IMAGE_NAME=okpt/opensearch
IMAGE_PATH=docker/opensearch/Dockerfile
CONTAINER_ENV_PATH=docker/opensearch/container.env
OPENSEARCH_STARTUP_CONFIG_PATH=config/opensearch/opensearch.yml

When using the provided docker-compose.yml, we mount some directories from the host machine onto the Docker container (rather than the entire repo to save space), so there are some conventions you should be aware of:

  • datasets should go in a dataset/ directory
  • configs should go in a config/ directory
  • output should go in an output/ directory (which can be empty but should be created before running docker compose up)

To customize the performance tool, there is an .env file at docker/<service>/container.env. For example, the default OpenSearch container.env is:

OKPT_COMMAND=test
OKPT_CONFIG_PATH=config/opensearch/tool.yml
OKPT_OUTPUT_PATH=output/opensearch-index.json
OKPT_LOG_LEVEL=info

Note: When using Docker, the default output path is output/opensearch-index.json, so be sure to change that value in your container.env file to prevent previous test results from getting overwritten.

The entire Docker setup looks something like:

To learn more about the Docker setup, see here.

⚠️ **GitHub.com Fallback** ⚠️