User Guide - jmazanec15/opensearch-knn-perf-tool GitHub Wiki
Requirements:
- Python 3.7
- Docker
Clone the repo to your machine and install the required packages:
git clone https://github.com/jmazanec15/opensearch-knn-perf-tool.git
cd opensearch-knn-perf-tool
pip install -r requirements.txt
Before running tests, make sure to create the output/
directory if you haven't
already:
mkdir output
To run a test with the default settings, run:
docker compose --file docker/opensearch/docker-compose.yml --env-file docker/opensearch/docker-compose.env --project-directory . up
NOTE: While it is recommended to run
test
inside Docker withdocker compose up
, it is not required. Thetest
command can run OpenSearch tests, as long as the tool is run with OpenSearch running onPORT 9200
.
After the tests have finished running, you can run
docker compose --file docker/opensearch/docker-compose.yml --env-file docker/opensearch/docker-compose.env --project-directory . down
to remove any stopped containers and run
docker system prune --volumes -f
to remove any leftover images and volumes to free up even more disk space.
The remaining examples are incomplete and need to be updated.
While test
command is recommended to be run through Docker, the other diff
and plot
commands can and should be run locally directly with python
.
To run a diff
with default settings, run:
python opensearch-knn-perf-tool.py diff result1.json result2.json
To run a plot
with default settings, run:
python opensearch-knn-perf-tool.py plot result1.json result2.json
The general syntax of the tool is:
python opensearch-knn-perf-tool.py [--log LOGLEVEL] <command> <configuration-options> ...
--log log level of tool, options are: info, debug, warning, error, critical
The tool has three commands:
python opensearch-knn-perf-tool.py test <path-to-config.yml> <path-to-output.json>
python opensearch-knn-perf-tool.py diff <test-result1.json> <test-result2.json> [--file diff-output-path.json] [--metadata] [--keys key1,key2,...]
--file file path of output, outputs diff to file instead of console
--metadata add test result metadata into the diff
--keys specify keys from test result to diff
python opensearch-knn-perf-tool.py plot <test-result1.json> <test-result2.json> ...
The tool is configured using YAML files. There is one top level tool
configuration file tool.yml
that has general tool settings along with k-NN
service specific settings in service.yml
files.
For example, the default OpenSearch tool.yml
is:
test_name: opensearch_test
test_id: opensearch_index
knn_service: opensearch
service_config: config/opensearch/service.yml
dataset: dataset/data.hdf5
dataset_format: hdf5
test_parameters:
num_runs: 10
and default OpenSearch service.yml
is:
index_spec: config/opensearch/index-spec.json
max_num_segments: 10
index_thread_qty: 1
bulk_size: 500
k: 10
The format of the configuration files are defined with Cerberus' validation schema and can be found here.
We have provided some sample tests that cover common use cases such OpenSearch
indexing or NMSLIB querying. These can be specified with test_id
in the
tool.yml
file. The currently available tests are:
Test Name | Description |
---|---|
opensearch_index |
|
opensearch_query |
|
nmslib_index |
|
nmslib_query |
|
In order to add new tests, see here.
To learn more about the test setup, see here.
Each Docker image comes with an accompanying docker-compose.yml
at
docker/<service>/docker-compose.yml
.
To customize the docker compose
settings, we provide a .env
file at
docker/<service>/docker-compose.env
. For example, the default OpenSearch
docker-compose.env
is:
IMAGE_NAME=okpt/opensearch
IMAGE_PATH=docker/opensearch/Dockerfile
CONTAINER_ENV_PATH=docker/opensearch/container.env
OPENSEARCH_STARTUP_CONFIG_PATH=config/opensearch/opensearch.yml
When using the provided docker-compose.yml
, we mount some directories from the
host machine onto the Docker container (rather than the entire repo to save
space), so there are some conventions you should be aware of:
- datasets should go in a
dataset/
directory - configs should go in a
config/
directory - output should go in an
output/
directory (which can be empty but should be created before runningdocker compose up
)
To customize the performance tool, there is an .env
file at
docker/<service>/container.env
. For example, the default OpenSearch
container.env
is:
OKPT_COMMAND=test
OKPT_CONFIG_PATH=config/opensearch/tool.yml
OKPT_OUTPUT_PATH=output/opensearch-index.json
OKPT_LOG_LEVEL=info
Note: When using Docker, the default output path is
output/opensearch-index.json
, so be sure to change that value in yourcontainer.env
file to prevent previous test results from getting overwritten.
The entire Docker setup looks something like:
To learn more about the Docker setup, see here.