Hybrid Search - milvus-io/milvus GitHub Wiki
In addition to vectors, Milvus supports data types such as boolean, integers, floating-point numbers, and more. A collection in Milvus can hold multiple fields for accommodating different data features or properties. Milvus is a flexible vector database that pairs scalar filtering with powerful vector similarity search.
A hybrid search is a vector similarity search, during which you can filter the scalar data by specifying a boolean expression.
For example:
In Python
import random
from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
# Connect to server
connections.connect("default", host='localhost', port='19530')
# Create a collection
collection_name = "test_collection_search"
schema = CollectionSchema([
FieldSchema("film_id", DataType.INT64, is_primary=True),
FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
])
collection = Collection(collection_name, schema, using='default', shards_num=2)
# Insert some random data
data = [
[i for i in range(10)],
[[random.random() for _ in range(2)] for _ in range(10)],
]
collection.insert(data)
collection.num_entities
# Load collection to memory
collection.load()
# Conduct a similarity search with an expression filtering ID column
search_param = {
"data": [[1.0, 1.0]],
"anns_field": "films",
"param": {"metric_type": "L2"},
"limit": 2,
"expr": "film_id in [2,4,6,8]"
}
res = collection.search(**search_param)
# Check results
hits = res[0]
print(f"- Total hits: {len(hits)}, hits ids: {hits.ids} ")
print(f"- Top1 hit id: {hits[0].id}, distance: {hits[0].distance}, score: {hits[0].score} ")In Node.js
import { MilvusClient } from "@zilliz/milvus2-sdk-node";
const milvusClient = new MilvusClient("localhost:19530");
// Prepare a test collection
const COLLECTION_NAME = "test_collection_search";
milvusClient.collectionManager.createCollection({
collection_name: COLLECTION_NAME,
fields: [
{
name: "films",
description: "vector field",
data_type: DataType.FloatVector,
type_params: {
dim: "2",
},
},
{
name: "film_id",
data_type: DataType.Int64,
autoID: false,
is_primary_key: true,
description: "",
},
],
});
// Insert some random data
let id = 1;
const entities = Array.from({ length: 10 }, () => ({
films: Array.from({ length: 2 }, () => Math.random() * 10),
film_id: id++,
}));
await milvusClient.collectionManager.insert({
collection_name: COLLECTION_NAME,
fields_data: entities,
});
// Load collection to memory & conduct a search with boolean expression
await milvusClient.collectionManager.loadCollection({
collection_name: COLLECTION_NAME,
});
await milvusClient.dataManager.search({
collection_name: COLLECTION_NAME,
// partition_names: [],
expr: "film_id in [1,4,6,8]",
vectors: [entities[0].films],
search_params: {
anns_field: "films",
topk: "4",
metric_type: "L2",
params: JSON.stringify({ nprobe: 10 }),
},
vector_type: 100, // float vector -> 100
});
// search result will be like:{ status: { error_code: 'Success', reason: '' }, results: [ { score: 0, id: '1' }, { score: 9.266796112060547, id: '4' }, { score: 28.263811111450195, id: '8' }, { score: 41.055686950683594, id: '6' } ]}A predicate expression outputs a boolean value. Milvus conducts scalar filtering by searching with predicates. A predicate expression, when evaluated, returns either TRUE or FALSE.
EBNF grammar rules describe boolean expressions rules:
Expr = LogicalExpr | NIL
LogicalExpr = LogicalExpr BinaryLogicalOp LogicalExpr
| UnaryLogicalOp LogicalExpr
| "(" LogicalExpr ")"
| SingleExpr;
BinaryLogicalOp = "&&" | "and" | "||" | "or";
UnaryLogicalOp = "not";
SingleExpr = TermExpr | CompareExpr;
TermExpr = IDENTIFIER "in" ConstantArray;
Constant = INTEGER | FLOAT
ConstantExpr = Constant
| ConstantExpr BinaryArithOp ConstantExpr
| UnaryArithOp ConstantExpr;
ConstantArray = "[" ConstantExpr { "," ConstantExpr } "]";
UnaryArithOp = "+" | "-"
BinaryArithOp = "+" | "-" | "*" | "/" | "%" | "**";
CompareExpr = IDENTIFIER CmpOp IDENTIFIER
| IDENTIFIER CmpOp ConstantExpr
| ConstantExpr CmpOp IDENTIFIER
| ConstantExpr CmpOpRestricted IDENTIFIER CmpOpRestricted ConstantExpr;
CmpOpRestricted = "<" | "<=";
CmpOp = ">" | ">=" | "<" | "<=" | "=="| "!=";
The following table lists the description of each symbol mentioned in the above Boolean expression rules:
| Notation | Description |
|---|---|
| = | Definition. |
| , | Concatenation. |
| ; | Termination. |
| | | Alternation. |
| {...} | Repetition. |
| (...) | Grouping. |
| NIL | Empty. The expression can be an empty string. |
| INTEGER | Integers such as 1, 2, 3. |
| FLOAT | Float numbers such as 1.0, 2.0. |
| CONST | Integers or float numbers. |
| IDENTIFIER | Identifier. In Milvus, the IDENTIFIER represents the field name. |
| LogicalOp | A LogicalOp is a logical operator that supports combining more than one relational operation in one comparison. Returned value of a LogicalOp is either TRUE (1) or FALSE (0). There are two types of LogicalOps, including BinaryLogicalOps and UnaryLogicalOps. |
| UnaryLogicalOp | UnaryLogicalOp refers to the unary logical operator "not". |
| BinaryLogicalOp | Binary logical operators that perform actions on two operands. In a complex expression with two or more operands, the order of evaluation depends on precedence rules. |
| ArithmeticOp | An ArithmeticOp, namely an arithmetic operator, performs mathematical operations such as addition and subtraction on operands. |
| UnaryArithOp | A UnaryArithOp is an arithmetic operator that performs an operation on a single operand. The negative UnaryArithOp changes a positive expression into a negative one, or the other way round. |
| BinaryArithOp | A BinaryArithOp, namely a binary operator, performs operations on two operands. In a complex expression with two or more operands, the order of evaluation depends on precedence rules. |
| CmpOp | CmpOp is a relational operator that perform actions on two operands. |
| CmpOpRestricted | CmpOpRestricted is restricted to "Less than" and "Equal". |
| ConstantExpr | ConstantExpr can be a Constant or a BinaryArithop on two ConstExprs or a UnaryArithOp on a single ConstantExpr. It is defined recursively. |
| ConstantArray | ConstantArray is wrapped by square brackets, and ConstantExpr can be repeated in the square brackets. ConstArray must include at least one ConstantExpr. |
| TermExpr | TermExpr is used to check whether the value of an IDENTIFIER appears in a ConstantArray. TermExpr is represented by "in". |
| CompareExpr | A CompareExpr, namely comparison expression can be relational operations on two IDENTIFIERs, or relational operations on one IDENTIFIER and one ConstantExpr, or ternary operation on two ConstantExprs and one IDENTIFIER. |
| SingleExpr | SingleExpr, namely single expression, can be either a TermExpr or a CompareExpr. |
| LogicalExpr | A LogicalExpr can be a BinaryLogicalOp on two LogicalExprs, or a UnaryLogicalOp on a single LogicalExpr, or a LogicalExpr grouped within parentheses, or a SingleExpr. The LogicalExpr is defined recursively. |
| Expr | Expr, an abbreviation meaning expression, can be LogicalExpr or NIL. |
Logical operators perform a comparison between two expressions.
| Symbol | Operation | Example | Description |
|---|---|---|---|
| 'and' && | and | expr1 && expr2 | True if both expr1 and expr2 are true. |
| 'or' || | or | expr1 || expr2 | True if either expr1 or expr2 are true. |
Binary arithmetic operators contain two operands and can perform basic arithmetic operations and return the corresponding result.
| Symbol | Operation | Example | Description |
|---|---|---|---|
| + | Addition | a + b | Add the two operands. |
| - | Subtraction | a - b | Subtract the second operand from the first operand. |
| * | Multiplication | a * b | Multiply the two operands. |
| / | Division | a / b | Divide the first operand by the second operand. |
| ** | Power | a ** b | Raise the first operand to the power of the second operand. |
| % | Modulo | a % b | Divide the first operand by the second operand and yield the remainder portion. |
Relational operators use symbols to check for equality, inequality, or relative order between two expressions.
| Symbol | Operation | Example | Description |
|---|---|---|---|
| < | Less than | a < b | True if a is less than b. |
| > | Greater than | a > b | True if a is greater than b. |
| == | Equal | a == b | True if a is equal to b. |
| != | Not equal | a != b | True if a is not equal to b. |
| <= | Less than or equal | a <= b | True if a is less than or equal to b. |
| >= | Greater than or equal | a >= b | True if a is greater than or equal to b. |
The following table lists the precedence and associativity of operators. Operators are listed top to bottom, in descending precedence.
| Precedence | Operator | Description | Associativity |
|---|---|---|---|
| 1 | + - | UnaryArithOp | Left-to-right |
| 2 | not | UnaryLogicOp | Right-to-left |
| 3 | ** | BinaryArithOp | Left-to-right |
| 4 | * / % | BinaryArithOp | Left-to-right |
| 5 | + - | BinaryArithOp | Left-to-right |
| 6 | < <= > >= | CmpOp | Left-to-right |
| 7 | == != | CmpOp | Left-to-right |
| 8 | && and | BinaryLogicOp | Left-to-right |
| 9 | || or | BinaryLogicOp | Left-to-right |
- Expressions are normally evaluated from left to right. Complex expressions are evaluated one at a time. The order in which the expressions are evaluated is determined by the precedence of the operators used.
- If an expression contains two or more operators with the same precedence, the operator to the left is evaluated first.
- When a lower precedence operation should be processed first, it should be enclosed within parentheses.
- Parentheses can be nested within expressions. Innermost parenthetical expressions are evaluated first.