Python C code snippets - tarang-jain/faiss GitHub Wiki
It is not always obvious how the C++ and Python layers interact. Therefore, we give some handy code in Python notebooks that can be copy/pasted to perform some useful operations.
They rely mostly on vector_to_array
and a few other Python/C++ tricks described here
The faiss.contrib.inspect_tools
module has a few useful functions to inspect the Faiss objects.
In particular inspect_tools.print_object_fields
lists all the fields of an object and their values.
How can I get the PCA matrix in numpy from a PCA object?
Use the function faiss.contrib.inspect_tools. get_LinearTransform_matrix
, or see this code:
get_matrix_from_PCA.ipynb.
This applies to any LinearTransform
object.
How can I get / change the centroids from a ProductQuantizer or ResidualQuantizer object?
For PQ: see access_PQ_centroids.ipynb.
For RQ: see demo_replace_RQ_codebooks.ipynb
How can I get the content of inverted lists?
Use the function faiss.contrib.inspect_tools.get_invlist
, or see this code:
get_invlists.ipynb
How can I lookup the inverted list corresponding to a stored vector?
This does not require C++ magic. See #3555
How can I get the link structure of a HNSW index?
See this code snippet: demo_hnsw_struct.ipynb alternative rendering.
IndexNNDescent
?
How can I get the knn graph for an See demo_access_nndescent.ipynb
How can I merge normal ArrayInvertedLists?
See demo_merge_array_invertedlists.ipynb
Faiss/pytorch interop: how can I use a PQ codec without leaving the GPU?
How to explore the contents of an opaque index?
We have an index file but don't know what's in it.
When accessing the Index
fields of a wrapper index, they show up as a plain Index
object.
The downcast_index
converts this plain index to the "leaf" class the index belongs to.
This snippet is a demo on how to use downcast_index
to extract all info from it:
demo_explore_indedex.ipynb
How can I get all the ids from an IDMap or an IDMap2?
IDMap2
inherits IDMap
, so this code works for both.
how can I convert an IDMap2 to IDMap?
This code works for both directions: convert_idmap2_idmap.ipynb
How to train a CPU index with a GPU just for k-means?
How to use the GPU at add time?
See assign_on_gpu.ipynb.
How can I force the k-means initialization?
plus: how to do this for IVF training
See initial_centroids_demo.ipynb
How to transfer a trained OPQ and/or IVF centroids to another index?
See https://github.com/facebookresearch/faiss/issues/2455
How can I replace the inverted list content:
See demo_replace_invlists.ipynb
How can I get access to non-8 bit quantization code entries in PQ / IVFPQ / AQ ?
You need a BitStringReader
, see #2285
Simulating an IndexPQ on GPU with a 1-centroid IVFPQ
IndexPQ is not supported on GPU, but it is relatively easy to simulate it with an IVFPQ.
Accessing the vectors of a graph-based index (NSG or HNSW)
The data is stored in a storage
index, which is an IndexFlatCodes
.
demo_access_NSG_data.ipynb
To get the reconstructed vectors, use index2.reconstruct(vector_id)
or index2.reconstruct_n()
.
Wrapping small C++ objects for use from Python
Sometimes it is useful to implement a small callback needed by Faiss in C++. However, it may be too specific or depend to external code, so it does not make sense to include in Faiss (and Faiss is hard to compile ;-) )
In that case, you can make a SWIG wrapper for a snippet of C++.
Here is an example for an IDSelector
object that has an is_member callback: bow_id_selector.swig
To compile the code with Faiss installed via conda and SWIG 4.x on Linux:
# generate wrapper code
swig -c++ -python -I$CONDA_PREFIX/include bow_id_selector.swig
# compile generated wrapper code:
g++ -shared -O3 -g -fPIC bow_id_selector_wrap.cxx -o _bow_id_selector.so \
-I $( python -c "import distutils.sysconfig ; print(distutils.sysconfig.get_python_inc())" ) \
-I $CONDA_PREFIX/include $CONDA_PREFIX/lib/libfaiss_avx2.so
This produces bow_id_selector.py
and _bow_id_selector.so
that can be loaded in Python with
import numpy as np
import faiss
import bow_id_selector
# very small sparse CSR matrix
n = 3
indptr = np.array([0, 2, 3, 6], dtype='int32')
indices = np.array([7, 8, 3, 1, 2, 3], dtype='int32')
# don't forget swig_ptr to convert from a numpy array to a C++ pointer
selector = bow_id_selector.IDSelectorBOW(n, faiss.swig_ptr(indptr), faiss.swig_ptr(indices))
selector.set_query_words(1, 2)
selector.is_member(0) # returns False
selector.is_member(1) # returns False
selector.is_member(2) # returns True
selector.is_member(3) # crashes!
# And of course you can combine it with existing Faiss objects
params = faiss.SearchParameters(sel=selector)