GPU Faiss with cuVS usage - tarang-jain/faiss GitHub Wiki
Usage
The use_cuvs=True
flag in GPU index configurations can be used to enable cuVS imeplementations for an index. Note that this flag is automatically set to True for supported index types if FAISS_ENABLE_CUVS=ON
is set while building Faiss from source or with the faiss-gpu-cuvs
conda package. If use_cuvs=False
, the classic Faiss GPU implementations are used.
Building a cuVS GPU index
RMM
cuVS internally uses Rapids Memory Manager (RMM) to allow customizing device and host memory allocation patterns. With cuVS enabled, Faiss' GPU resource manager is also configured to use RMM. The following example shows how to construct a best fit pool allocator with an initial size of 1 GiB. The pool uses rmm::mr::cuda_memory_resource
as its “upstream” resource for pure device allocations.
In C++:
#include <rmm/mr/device/device_memory_resource.hpp>
#include<rmm/mr/per_device_resource.hpp>
// Set pool memory resource for the current device with 1 GiB initial pool size. All allocations use the same pool.
rmm::mr::pool_memory_resource<rmm::mr::device_memory_resource> pool_mr(
rmm::mr::get_current_device_resource(), 1024 * 1024 * 1024ull);
rmm::mr::set_current_device_resource(&pool_mr);
In Python:
import rmm
pool = rmm.mr.PoolMemoryResource(rmm.mr.CudaMemoryResource(),
initial_pool_size=2 ** 30)
# Set the RMM resource for the current device
rmm.mr.set_per_device_resource(pool)
# Or set the RMM resource for a particular device
rmm.mr.set_per_device_resource(0, pool)
In a cuVS enabled build, the StandardGpuResources object uses the current RMM device resource set by the user to do device allocations.
Note: RMM's Python interface is not a direct dependency of Faiss and must be installed externally:
conda install -c rapidsai -c conda-forge rmm
Setting such an allocator for the device typically yields better search performance than using the default CUDA memory resource. The next step is to set the use_cuvs
flag to True
in the index configuration.
In C++:
faiss::gpu::StandardGpuResources res;
faiss::gpu::GpuIndexIVFFlatConfig config;
config.use_cuvs = true;
faiss::gpu::GpuIndexIVFFlat index_gpu =
faiss::gpu::GpuIndexIVFFlat(res, d, nlist, faiss::METRIC_L2, config);
In Python:
res = faiss.StandardGpuResources()
co = faiss.GpuIndexIVFFlatConfig()
co.use_cuvs = True
index_gpu = faiss.GpuIndexIVFFlat(res, ncols, nlist, faiss.METRIC_L2, co)
Once the index has been initialized with this config, all supported operations on the index, such as add
and search
can be done using regular GPU Faiss function calls. They are internally applied to the underlying cuVS index.
index_gpu.add(xb)
D, I = index.search(xq, k)
Cloning a CPU index
The same use_cuvs
flag exists for cloning a CPU index to a cuVS index.
In C++:
faiss::gpu::GpuClonerOptions co;
co.use_cuvs = true;
index_gpu = faiss.GpuIndexIVFFlat(res, ncols, nlist, faiss.METRIC_L2, co)
index_gpu = faiss::gpu::index_cpu_to_gpu(
res,
0,
&index_cpu,
&co);
In Python:
co = faiss.GpuClonerOptions()
co.use_cuvs = True
index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu, co)
CAGRA + HNSW
In addition to faster search times on the GPU for a CAGRA index, a GpuIndexCagra
object can be cloned to initialize the base layer of a CPU HNSW index. This is done via the IndexHNSWCagra
class. Thus building an HNSW index can be accelerated on the GPU using faster CAGRA index build times.IndexHNSWCagra
has a parameter called base_level_only
. If base_level_only=True
, only the base layer of the HNSW index is initialized from the CAGRA graph. The resultant HNSW index is immutable and does not support adding new vectors after the original graph has been built. If base_level_only=False
, the CAGRA index is still used to initialize the base layer of the HNSW index, but new vectors can be added to the index using the HNSW add
API.
faiss::gpu::GpuIndexCagra gpu_cagra_index(res, d);
// For the CAGRA index, the `train` stage is used for building a graph with all the vectors.
gpu_cagra_index.train(n, xb);
faiss::IndexHNSWCagra* cpu_hnsw_index;
// Additional vectors can be added to the index if `base_level_only` is set to `false` for the
// HNSW index.
cpu_hnsw_index->base_level_only = false;
// Initialize the HNSW index from the CAGRA graph.
gpu_cagra_index.copyTo(cpu_hnsw_index);
// By now, the HNSW index has all the vectors from the original CAGRA index. And it also has the
// ability to add new vectors.
cpu_hnsw_index->add(n, newVecs);