LLM Layer - uw-ssec/llmaven GitHub Wiki

🤖 LLM Layer

Overview

The LLM Layer and Plugins architecture in LLMaven connects the platform's agent system to a flexible, high-performance inference engine. This layer empowers agents to use language models in task-specific roles while providing infrastructure to support scalable, real-time, and customizable AI assistance.

LLMaven is designed as a platform, not a framework. That means it supports interchangeable components—including models—so teams can select the tools that work best for their research workflows.

🚀 Inference Engine: vLLM

Why vLLM?

LLMaven uses vLLM as its default inference engine due to its superior performance in multi-user environments.

Key Features:

PagedAttention: Efficient memory management for large-scale models.
Continuous Batching: Handles multiple requests simultaneously.
Tensor Parallelism: Speeds up model inference across GPUs.

Alternatives like Ollama are optimized for local use but fall short in high-throughput, multi-agent contexts. (Comparison by Robert McDermott)

🔌 Plug-and-Play (PnP) LLMs

LLMaven’s PnP architecture allows users to:

Swap models per agent based on task type
Integrate with custom APIs and external services
Use agents as orchestrators of complex workflows

Each agent operates independently and communicates via the Model Context Protocol (MCP). This architecture enables rapid experimentation while maintaining guardrails.

🧠 Agent-Specific Model Recommendations

Below is a breakdown of the core agents and recommended open-source models:

1. Supervisor Agent

Role: Planning, reasoning, delegation across agents
Recommended Model: DeepSeek-R1

Excels at reasoning, math, and code tasks
Competitively benchmarked against closed models like GPT-4

2. Docs Agent

Role: Document retrieval, summarization, graph population
Recommended Model: OLMo

Trained on transparent academic data
Fully open-source, reproducible, and vetted for scientific usage

3. Coding Agent

Role: Code understanding, repo navigation, issue/PR summarization
Recommended Model: Llama 4

Open-source and supports strong code generation capabilities
Compatible with GitHub Copilot-like workflows

4. Data Agent

Role: Handles structured/unstructured datasets, storage access
Recommended Model: Molmo

Multimodal AI models for interpreting text + data
Suitable for interacting with embeddings, files, and visualizations

5. Pipeline Agent

Role: Executes and monitors custom pipelines, visual workflows
Recommended Model: DeepSeek-R1

Excellent planner with strong multi-step reasoning performance

🧱 Infrastructure Integration

LLMaven's LLM and plugin layer interacts with other platform components:

Component	Purpose
Neo4j	Vector + knowledge graph search
MinIO	S3-compatible storage for datasets/models
Logfire	Observability into model usage + output validation
Open Web UI	Multi-user frontend with PostgreSQL integration

🧭 Design Principles Reinforced

Modularity: Swap any model or plugin per agent.
Reproducibility: All model configurations are tracked.
Transparency: Rely on vetted, open models with known training data.
Performance: Serve multiple agents simultaneously using optimized backends.

📚 Further Reading

This LLM layer is what gives LLMaven its power and adaptability—transforming it from just another agentic framework into a truly research-oriented platform.