FAISS Documentation - DrAlzahraniProjects/csusb_fall2024_cse6550_team4 GitHub Wiki
FAISS Documentation
Last edited by csusb_fall2024_cse6550_team4
Table of Contents
1. Installation
To add FAISS to your Python project, install it using pip. Execute the following command in your terminal to install FAISS for CPU environments:
pip install faiss-cpu
2. Configuration
FAISS offers flexibility in configurations, allowing for various index types based on your data requirements. You can configure indexes and directories to store data efficiently.
Index Types
FAISS provides multiple index types that support different data sizes and retrieval needs:
- IndexFlatL2: Employs exact nearest neighbor search with Euclidean distance, ideal for smaller datasets.
- IndexIVFFlat: Suitable for large datasets that require approximate nearest neighbor searches, providing a good balance between speed and accuracy.
- IndexHNSWFlat: Uses hierarchical graph-based search for efficient and accurate searches.
Directory Setup for FAISS Indexes
To manage FAISS indexes, create a designated directory for storing them:
import os
# Define the directory path for storing FAISS index
directory_path = "faiss_indexes"
# Create the directory if it does not exist
if not os.path.exists(directory_path):
os.makedirs(directory_path)
# Specify the full path for the index file
index_path = os.path.join(directory_path, "faiss_index.index")
print("Directory setup complete.")
3. Implementation
Before utilizing FAISS, you must create an index and populate it with data. Below is an example of setting up a FAISS index with random data using cosine similarity.
import numpy as np
import faiss
# Define the dimensionality of the vectors
dimension = 128 # Example vector dimension
# Create a FAISS index with cosine similarity
index = faiss.IndexFlatIP(dimension)
# Generate random vectors, normalize them, and add to index
num_vectors = 1000
vectors = np.random.random((num_vectors, dimension)).astype('float32')
faiss.normalize_L2(vectors)
# Add vectors to the FAISS index
index.add(vectors)
print("FAISS index created and data added.")
Loading the FAISS Index After creating an index, you can save it to a file and load it as needed. The example below demonstrates saving and loading the FAISS index.
import numpy as np
import faiss
import os
# Define dimensions and create cosine similarity index
dimension = 128
index = faiss.IndexFlatIP(dimension)
# Generate and normalize data, then add to index
num_vectors = 1000
vectors = np.random.random((num_vectors, dimension)).astype('float32')
faiss.normalize_L2(vectors)
index.add(vectors)
print("FAISS index created and populated.")
# Set up directory path for saving
directory_path = "faiss_indexes_cosine"
if not os.path.exists(directory_path):
os.makedirs(directory_path)
# Save the FAISS index
index_path = os.path.join(directory_path, "my_faiss_cosine_index.index")
faiss.write_index(index, index_path)
print(f"FAISS index saved at {index_path}.")
# Load the FAISS index from file
loaded_index = faiss.read_index(index_path)
print(f"FAISS index loaded from {index_path}.")
4. Usage
Similarity Search
FAISS enables similarity search by creating an index to retrieve vectors based on cosine similarity. The example below normalizes query and data vectors to ensure accurate search results.
import numpy as np
import faiss
# Define vector dimensionality
dimension = 128
# Create an index for cosine similarity
index = faiss.IndexFlatIP(dimension)
# Generate random data, normalize it for cosine similarity, and add to index
num_vectors = 1000
vectors = np.random.random((num_vectors, dimension)).astype('float32')
faiss.normalize_L2(vectors)
index.add(vectors)
print("FAISS index created and data added for similarity search.")
# Define and normalize query vector
query_vector = np.random.random((1, dimension)).astype('float32')
faiss.normalize_L2(query_vector)
# Search for the 5 most similar vectors
k = 5
distances, indices = index.search(query_vector, k)
# Display search results
print("Top 5 results (indices):", indices)
print("Top 5 distances (similarities):", distances)
5. Troubleshooting
Directory Not Found
If you encounter an error such as “No such file or directory,” ensure that the directory for the index path exists before attempting to save the FAISS index:
if not os.path.exists(directory_path):
os.makedirs(directory_path)
Dimension Mismatch Make sure that the vector dimensionality matches the index configuration. For example, if you have created an index with dimension = 128, ensure all vectors being added to the index have 128 dimensions.
Import Errors If you face import errors, check that all required libraries are installed. You can install FAISS and numpy using the following command:
pip install faiss-cpu numpy
Debugging Tips
-
Check Vector Count: Use
index.ntotal
to verify the total number of vectors in the index. This ensures that data has been added correctly:print("Total vectors in index:", index.ntotal)
-
Print Paths and Configurations: Add print statements to verify paths, configurations, and other important variables. This helps confirm that directories, file paths, and configurations are set up correctly:
print("Directory path:", directory_path)
print("Index path:", index_path)
print("FAISS index configuration:", index)
-
Confirm Dimensions: To avoid dimension mismatch errors, check that the dimensions of vectors align with the index's configuration. For example, if the index is set up for 128 dimensions, ensure all vectors have 128 dimensions.
-
Reinstall Libraries: If import errors occur, it can be helpful to reinstall libraries to ensure compatibility. Run:
pip install --upgrade faiss-cpu numpy