Vector Databases - spinningideas/resources GitHub Wiki

A vector database views data as a set of interconnected vectors that form a map or a representation of the data in one or more "dimensions". What makes vectors so powerful is that they are multi-dimensional and can add numerous layers or dimensions on top of each data point resulting in a rich dataset (eg a single data point can itself have hundreds of related datapoints folded off of it).

Vector databases are purpose-built to handle the unique structure of vector embeddings used by machine learning solutions to perform searches and retrieve the matching data using a given retrieval algorithm. An vector database contains indexes on the vectors themselves to aid search and retrieval and finds results by comparing values and finding those that are most similar to one another or near-by.

Pinecone is closed source and only available as a SaaS service. Milvus and Pinecone have more overlap, with Pinecone focused on the embeddings workflow like versioning and using embedding with other features. Milvus is entirely focused on nearest neighbor operations. Faiss is solving the approximate nearest neighbor problem, not the storage problem. It’s not a database, it’s an index.

Current State (2022) of Vendors/Solutions

Vendors/Solutions

pinecone.io

Intro

API

Examples

milvus

Milvus is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment.

Architecture

weaviate

Weaviate is an open source vector search engine that stores both objects and vectors, allowing for combining vector search with structured filtering with the fault-tolerance and scalability of a cloud-native database, all accessible through GraphQL, REST, and various language clients.

Intro

Architecture

Related Libraries

FAISS