RAG Collection - codingismycraft/ragit GitHub Wiki
A RAG collection is a fundamental component of the RAGit system. It is uniquely
identified by a collection name
or simply name
. This document outlines the
steps involved in creating and managing a custom RAG collection.
A RAG Collection
is a collection of documents stored under the shared
directory (ragit-data
). Assuming we have a collection called mydata
, its
related data will exist under the following directory:
~/ragit-data/mydata/documents
To create a new RAG collection, you need to prepare the documents directory where your collection's documents will be stored.
Create a directory to store your collection's documents:
mkdir -p ~/ragit-data/<collection-name>/documents
Replace <collection-name>
with the desired name for your collection.
After you create the above directory, copy all relevant documents into the
newly created documents
directory:
cp path/to/your/documents/* ~/ragit-data/<collection-name>/documents/
The ragit
command is available from anywhere under the VM and can be used to
interact with the backend of the RAGit service. More precisely, the following
is the available functionality:
List all available RAG collections using the following command:
ragit -l
Example output:
dummy
mycode
stories
Display statistics for a specific RAG collection using the following command,
replacing <collection-name>
with your collection's name:
ragit -n <collection-name>
Example output:
name.....................: stories
full path................: /home/vagrant/ragit-data/stories/documents
total documents..........: 4
total documents in db....: 4
total chunks.............: 21
with embeddings..........: 21
without embeddings.......: 0
inserted to vectordb.....: 21
to insert to vector db...: 0
Process the available documents for a specific RAG collection using the
following command, replacing <collection-name>
with your collection's name:
ragit -n <collection-name> -p
Example output:
Will insert all available chunks to the database.
Inserted 0 chunks.
Will insert all available embeddings to the database.
Inserted 0 embeddings.
updating the vector db.
Totally inserted records: 0
Inserted 0 chunks to the vector db.
By following these steps, you can create and manage a custom RAG collection within the RAGit framework. This process involves setting up the documents directory, copying relevant documents, and using RAGit's command-line tools to process and manage your collection. This ensures that your data is properly indexed and ready for use in RAG-based applications.