Technical Architecture - davidmarsoni/llog GitHub Wiki

:wrench: General Information

This page is dedicated to the technical architecture of Llog. It provides an overview of the technology stack and the project architecture.

:computer: Technology Stack

:snake: Web framework: Flask

Flask is a lightweight WSGI web application framework in Python. It is designed with simplicity and flexibility in mind, making it a great choice for building our project.

It allows us to create a simple and efficient web application with minimal overhead and deploy it easily on Google Cloud Run.

:llama: Query framework: LlamaIndex

LlamaIndex is an LLM contextualization framework that allows you to make use of different data sources to provide context to a chosen LLM.

This framework, available in Python and TypeScript, allows you the following:

Query LLMs with data taken from online or local resources through “Context Augmentation”.
Make use of agents, extending the reasoning by adding a layer compared to regular LLM prompts. A popular example of this is the Chinese DeepSeek R1 or OpenAI's ChatGPT-4o-mini, which provide a chain-of-thought using said agents.
Add an additional layer on agents by creating workflows, which use a multitude of steps and agents to complete complex tasks.

:mag: Tavily

Tavily is a web search engine that allows you to search the web for information. It is used in our project to provide a web search tool for our agents.

:robot: LLM model: OpenAI

OpenAI is one of the most popular LLM providers. It provides a wide range of models, including the GPT-3.5 and GPT-4 models.

We have decided to use this model in our project to make use of the latest advancements in LLM technology. The OpenAI API is really easy to use and provides a lot of features that are really useful for our project.

For more information about the pricing of these models, you can refer to the OpenAI pricing page.

:cloud: Cloud storage: Google Cloud Storage

Google Cloud Storage is a scalable and secure object storage service. It is used to store the data files for our project.

In our project, we use Google Cloud Storage to store the cached data files of our Notion database and the data files for our LLM models.

More information about the Google Cloud Storage pricing can be found on the Google Cloud Storage pricing page.

:rocket: Cloud deployment: Google Cloud Run

Google Cloud Run is a fully managed compute platform that automatically scales your stateless containers. It is used to deploy our project on the cloud.

Google Cloud Run also has a good enough free tier to host our project. To have more information about the free tier, you can refer to the Google Cloud Run pricing page.

:building_construction: Project Architecture

Architecture Diagram

The architecture of our project is based on the following components:

:mag: RAG (Retrieval-Augmented Generation)

This part is composed of the following 2 components:

The text data source import, which is done by parsing the document and then generating the index file, metadata file, and data file.
The Notion page or database import, which is done by parsing the Notion page via the Notion API and then generating the index file, metadata file, and data file.

After creating the three files, the data is stored in the Google Cloud Storage bucket.

To be able to generate complex metadata for the files, we use a prompt with an OpenAI model to generate complete metadata for the files.

:rocket: Agentic app

The agents part is composed of three agents:

The query agent, which has 4 tools:
- analyze query: allows the model to see if the query can be answered directly with the data available in the history context or if it needs to query the data source.
- metadata search: allows the agent to filter the indexes by metadata to get the most relevant indexes.
- context search: allows the agent to filter the content of the indexes to get the most relevant content.
- web search: allows the agent to search the web with Tavily to get the most relevant content.
The review agent, which has 3 tools:
- context checker: allows the agent to check if the generated answer matches the context of the question.
- instruction checker: allows the agent to check if the generated answer matches the instruction of the question. For example, if the question is to write code, the agent will check if the generated answer is code or not. Or if the user asks to respond in a specific language, the agent will check if the generated answer is in the right language.
- word counter: allows the agent to check if the number of words generated matches the user's request. For example, if the user asks to generate a text of 100 words, the agent will check if the generated answer is 100 words or not.
the write agent, which has no tools but has a custom prompt that it has to follow to generate the answer. This agent is used to generate the final answer for the user.