LLM Layer - uw-ssec/llmaven GitHub Wiki
🤖 LLM Layer
Overview
The LLM Layer and Plugins architecture in LLMaven connects the platform's agent system to a flexible, high-performance inference engine. This layer empowers agents to use language models in task-specific roles while providing infrastructure to support scalable, real-time, and customizable AI assistance.
LLMaven is designed as a platform, not a framework. That means it supports interchangeable components—including models—so teams can select the tools that work best for their research workflows.
vLLM
🚀 Inference Engine:Why vLLM?
LLMaven uses vLLM as its default inference engine due to its superior performance in multi-user environments.
Key Features:
- PagedAttention: Efficient memory management for large-scale models.
- Continuous Batching: Handles multiple requests simultaneously.
- Tensor Parallelism: Speeds up model inference across GPUs.
Alternatives like Ollama are optimized for local use but fall short in high-throughput, multi-agent contexts. (Comparison by Robert McDermott)
🔌 Plug-and-Play (PnP) LLMs
LLMaven’s PnP architecture allows users to:
- Swap models per agent based on task type
- Integrate with custom APIs and external services
- Use agents as orchestrators of complex workflows
Each agent operates independently and communicates via the Model Context Protocol (MCP). This architecture enables rapid experimentation while maintaining guardrails.
🧠 Agent-Specific Model Recommendations
Below is a breakdown of the core agents and recommended open-source models:
1. Supervisor Agent
Role: Planning, reasoning, delegation across agents
Recommended Model: DeepSeek-R1
- Excels at reasoning, math, and code tasks
- Competitively benchmarked against closed models like GPT-4
2. Docs Agent
Role: Document retrieval, summarization, graph population
Recommended Model: OLMo
- Trained on transparent academic data
- Fully open-source, reproducible, and vetted for scientific usage
3. Coding Agent
Role: Code understanding, repo navigation, issue/PR summarization
Recommended Model: Llama 4
- Open-source and supports strong code generation capabilities
- Compatible with GitHub Copilot-like workflows
4. Data Agent
Role: Handles structured/unstructured datasets, storage access
Recommended Model: Molmo
- Multimodal AI models for interpreting text + data
- Suitable for interacting with embeddings, files, and visualizations
5. Pipeline Agent
Role: Executes and monitors custom pipelines, visual workflows
Recommended Model: DeepSeek-R1
- Excellent planner with strong multi-step reasoning performance
🧱 Infrastructure Integration
LLMaven's LLM and plugin layer interacts with other platform components:
Component | Purpose |
---|---|
Neo4j | Vector + knowledge graph search |
MinIO | S3-compatible storage for datasets/models |
Logfire | Observability into model usage + output validation |
Open Web UI | Multi-user frontend with PostgreSQL integration |
🧭 Design Principles Reinforced
- Modularity: Swap any model or plugin per agent.
- Reproducibility: All model configurations are tracked.
- Transparency: Rely on vetted, open models with known training data.
- Performance: Serve multiple agents simultaneously using optimized backends.
📚 Further Reading
This LLM layer is what gives LLMaven its power and adaptability—transforming it from just another agentic framework into a truly research-oriented platform.