Service: S3 Sync Vectorstore - EyevinnOSC/community GitHub Wiki
Getting Started
Vector stores in OpenAI augments the Assistant with knowledge from outside its model, such as proprietary product information or documents provided by your users. OpenAI automatically parses and chunks your documents, creates and stores the embeddings, and use both vector and keyword search to retrieve relevant content to answer user queries.
With the open web service S3 Sync Vectorstore you can upload the contents on an S3 compatible store to a vector store in OpenAI. This tutorial help you get started with this service.
Prerequisites
- If you have not already done so, sign up for an OSC account.
- An OpenAI account and API key.
Step 1: Store credentials as service secrets
Navigate to the S3 Sync Vectorstore service in the Eyevinn Open Source Cloud web console and select the tab titled "Service Secrets". Click on button "New Secret".
Store the OpenAI API key as a secret.
Now create secrets for the S3 access key credentials to the S3 bucket where you have the documents you want to upload to the vector store.
Step 2: Create a sync job
To create and start the job to synchronize the content on your S3 bucket with a vector store choose the tab "My S3 Sync Vectorstore jobs" and click on the button "Create Job".
Name
: Give the job a unique name, for exampleguide1
.CmdLineArgs
: Enter the S3 URL to the bucket and the id of the vector store.OpenaiApiKey
: Enter the reference to the secret containing the OpenAI api key.Purpose
: What purpose your vector store has, default isassistants
if you leave this out. Consult the OpenAI documentation for valid options.AwsRegion
: If the S3 bucket is on AWS specify in which region it is.S3Endpoint
: For S3 buckets in OSC or provided by non-AWS vendors provide the S3 endpoint url here.AwsAccessKeyId
: A reference to the secret containing the access key id.AwsSecretAccessKey
: A reference to the secret containing the secret access key.
Now press the button "Create" to create and start the job.
Step 3: Verify result
When the job is marked as completed you can go to OpenAI and check that all files have been added to the vector store.
You can now add this vector store as a file search tool with your assistant.
Using the Command Line tool
For automating this process you can use the OSC Command line tool to create a job.
% export OSC_ACCESS_TOKEN=<your-osc-pat>
% npx @osaas/cli@latest create eyevinn-s3-sync-vectorstore guide2 \
-o cmdLineArgs="s3://origin-osaas-client-jsdocs/ vs_67c7f18585d481918824cd12b135870c" \
-o OpenaiApiKey="{{secrets.openaidevkey}}" \
-o AwsRegion="eu-north-1" \
-o AwsAccessKeyId="{{secrets.accesskeyid}}" \
-o AwsSecretAccessKey="{{secrets.secretaccesskey}}"
GitHub action
You can also trigger this in a GitHub action workflow for example to update the vector store when new documentation has been generated.
rag:
needs: build_docs
runs-on: ubuntu-latest
steps:
- uses: actions/setup-node@v3
with:
node-version: '18.x'
registry-url: https://registry.npmjs.org/
- name: Create OSC job to sync vector store
run: |
npx -y @osaas/cli@latest remove eyevinn-s3-sync-vectorstore ghsync
npx -y @osaas/cli@latest create eyevinn-s3-sync-vectorstore ghsync \
-o cmdLineArgs="s3://origin-osaas-client-jsdocs/ vs_67c84ced60c48191affdb63969dd5494" \
-o OpenaiApiKey="{{secrets.openaikey}}" \
-o AwsRegion="eu-north-1" \
-o AwsAccessKeyId="{{secrets.accesskeyid}}" \
-o AwsSecretAccessKey="{{secrets.secretaccesskey}}"
env:
OSC_ACCESS_TOKEN: ${{ secrets.OSC_ACCESS_TOKEN }}