llama cpp reranker - mostlygeek/llama-swap GitHub Wiki
Configuration for supporting the v1/rerank
endpoint with llama-server and BGE reranker V2
- Download model at gpustack/bge-reranker-v2-m3-GGUF
Config
models:
"reranker":
env:
- "CUDA_VISIBLE_DEVICES=GPU-eb1"
cmd: |
/path/to/llama-server/llama-server-latest
--port ${PORT}
-ngl 99
-m /path/to/models/bge-reranker-v2-m3-Q4_K_M.gguf
--ctx-size 8192
--reranking
--no-mmap
[!TIP] path.to.sh used for
/path/to/models/...
paths in example.
Testing
$ curl -s http://10.0.1.50:8080/v1/rerank \
-H 'Content-Type: application/json' \
-d '{
"model": "reranker",
"query": "What is the best way to learn Python?",
"documents": [
"Python is a popular programming language used for web development and data analysis.",
"The best way to learn Python is through online courses and practice.",
"Python is also used for artificial intelligence and machine learning applications.",
"To learn Python, start with the basics and build small projects to gain experience."
], "max_reranked": 2}' | jq .
Output
{
"model": "reranker",
"object": "list",
"usage": {
"prompt_tokens": 110,
"total_tokens": 110
},
"results": [
{
"index": 0,
"relevance_score": -2.9403347969055176
},
{
"index": 1,
"relevance_score": 7.181779861450195
},
{
"index": 2,
"relevance_score": -4.595512866973877
},
{
"index": 3,
"relevance_score": 3.0560922622680664
}
]
}