LLM Layer - uw-ssec/llmaven GitHub Wiki

🤖 LLM Layer

Overview

The LLM Layer and Plugins architecture in LLMaven connects the platform's agent system to a flexible, high-performance inference engine. This layer empowers agents to use language models in task-specific roles while providing infrastructure to support scalable, real-time, and customizable AI assistance.

LLMaven is designed as a platform, not a framework. That means it supports interchangeable components—including models—so teams can select the tools that work best for their research workflows.

🚀 Inference Engine: vLLM

Why vLLM?

LLMaven uses vLLM as its default inference engine due to its superior performance in multi-user environments.

Key Features:

  • PagedAttention: Efficient memory management for large-scale models.
  • Continuous Batching: Handles multiple requests simultaneously.
  • Tensor Parallelism: Speeds up model inference across GPUs.

Alternatives like Ollama are optimized for local use but fall short in high-throughput, multi-agent contexts. (Comparison by Robert McDermott)

🔌 Plug-and-Play (PnP) LLMs

LLMaven’s PnP architecture allows users to:

  • Swap models per agent based on task type
  • Integrate with custom APIs and external services
  • Use agents as orchestrators of complex workflows

Each agent operates independently and communicates via the Model Context Protocol (MCP). This architecture enables rapid experimentation while maintaining guardrails.

🧠 Agent-Specific Model Recommendations

Below is a breakdown of the core agents and recommended open-source models:

1. Supervisor Agent

Role: Planning, reasoning, delegation across agents
Recommended Model: DeepSeek-R1

  • Excels at reasoning, math, and code tasks
  • Competitively benchmarked against closed models like GPT-4

2. Docs Agent

Role: Document retrieval, summarization, graph population
Recommended Model: OLMo

  • Trained on transparent academic data
  • Fully open-source, reproducible, and vetted for scientific usage

3. Coding Agent

Role: Code understanding, repo navigation, issue/PR summarization
Recommended Model: Llama 4

  • Open-source and supports strong code generation capabilities
  • Compatible with GitHub Copilot-like workflows

4. Data Agent

Role: Handles structured/unstructured datasets, storage access
Recommended Model: Molmo

  • Multimodal AI models for interpreting text + data
  • Suitable for interacting with embeddings, files, and visualizations

5. Pipeline Agent

Role: Executes and monitors custom pipelines, visual workflows
Recommended Model: DeepSeek-R1

  • Excellent planner with strong multi-step reasoning performance

🧱 Infrastructure Integration

LLMaven's LLM and plugin layer interacts with other platform components:

Component Purpose
Neo4j Vector + knowledge graph search
MinIO S3-compatible storage for datasets/models
Logfire Observability into model usage + output validation
Open Web UI Multi-user frontend with PostgreSQL integration

🧭 Design Principles Reinforced

  • Modularity: Swap any model or plugin per agent.
  • Reproducibility: All model configurations are tracked.
  • Transparency: Rely on vetted, open models with known training data.
  • Performance: Serve multiple agents simultaneously using optimized backends.

📚 Further Reading

This LLM layer is what gives LLMaven its power and adaptability—transforming it from just another agentic framework into a truly research-oriented platform.