Timmy AI Assistant - ericfitz/tmi GitHub Wiki

Timmy AI Assistant

Timmy is TMI's planned conversational AI assistant for threat model analysis. Timmy operates within the scope of a single threat model and reasons over its data -- assets, threats, diagrams, documents, repositories, and notes -- to help you understand, analyze, and improve your threat models.

Status: Timmy is under active development on the dev/1.4.0 branch. The backend is functional: chat API endpoints, LLM integration via LangChainGo, vector embedding pipeline, content providers, and streaming responses are all implemented. The frontend chat UI is not yet implemented. See Implementation Status below for details.

Development demo video (YouTube)

Purpose

Timmy is inspired by Google's NotebookLM: a "grounded" chat that reasons over specific sources rather than answering from general knowledge alone. You control which sub-entities are included in the conversation via the timmy_enabled flag on each sub-entity, allowing you to focus the discussion on relevant material.

Problems Timmy Solves

Threat models are dense and hard to reason about holistically

A mature threat model contains dozens of assets, threats, data flows, and supporting documents. Humans struggle to hold all of that in mind simultaneously. Timmy can synthesize across the full model and surface connections, gaps, or inconsistencies that a person might miss.

Security review is bottlenecked on expert availability

Not every team has a senior security reviewer on hand. Timmy acts as an always-available collaborator -- it cannot replace a human reviewer, but it can help teams self-serve on initial analysis, ask better questions, and arrive at a review better prepared.

Threat modeling artifacts are underutilized after creation

Teams build threat models and then rarely revisit them conversationally. Timmy makes the model queryable: "What are the highest-risk data flows?", "Which assets lack mitigations?", "Summarize the threats related to authentication."

Onboarding to an existing threat model is slow

A new team member or reviewer joining a threat model must read through everything. Timmy can provide guided summaries and answer targeted questions, dramatically reducing ramp-up time.

How Users Will Interact with Timmy

You navigate to a threat model's chat page, see your sources (sub-entities) in a sidebar, toggle which ones to include, and have a conversation. You can ask Timmy to:

Analyze threats -- identify highest-risk areas, evaluate threat severity, and assess coverage
Identify gaps -- find assets without threats, threats without mitigations, and incomplete data flows
Explain data flows -- summarize how data moves through the system based on DFD diagrams
Suggest mitigations -- recommend security controls based on identified threats
Summarize content -- provide overviews of the threat model or specific sub-entities
Answer questions -- respond to targeted queries about any aspect of the threat model

Previous chat sessions will be preserved and can be resumed.

Implementation Status

Completed (dev/1.4.0)

timmy_enabled field on all threat model sub-entity types: diagrams, assets, threats, documents, notes, and repositories. Defaults to true. Also present on team notes and project notes
Database models for chat sessions, messages, embeddings, and usage tracking (TimmySession, TimmyMessage, TimmyEmbedding, TimmyUsage)
Database schema definitions for the four Timmy tables with proper indexes and foreign key constraints
Server configuration (TimmyConfig) with settings for LLM provider/model, dual embedding providers (text + code), retrieval parameters, rate limits, memory budgets, and chunking. See Configuration-Reference#timmy-ai-assistant for all variables. Timmy is disabled by default
Chat API endpoints -- REST endpoints for creating sessions (with SSE progress), sending messages (with SSE token streaming), listing sessions, and listing message history
LLM integration via LangChainGo -- provider-agnostic chat completion and embedding with OpenTelemetry instrumentation
Dual-index vector embedding pipeline -- text index for assets, threats, diagrams, documents, and notes; code index for repositories (optional, requires separate code embedding model). In-memory brute-force cosine similarity search with LRU eviction and memory budgeting
Content provider abstraction (ContentProvider interface and ContentProviderRegistry) for extracting plain text from source entities for embedding, including:
- Direct text extraction from database-resident entities (assets, threats, notes, repositories)
- JSON semantic extraction from DFD diagrams
- HTTP/HTML content extraction with SSRF protection
- PDF content extraction
- Content pipeline with pluggable sources and extractors (Google Drive, general HTTP)
Two-tier context building -- Tier 1 (entity overview) and Tier 2 (vector search results) assembled into LLM prompts
Rate limiting -- per-user message rate limiting (sliding window) and system-wide LLM concurrency control
SSRF validator for safely fetching external document URLs during content extraction
Import/export support in the frontend for the timmy_enabled field
Dual-index RAG (#241) -- text and code vector indexes with separate embedding models, external embedding ingestion API, optional query decomposition and cross-encoder reranking
Embedding automation API -- embedding-automation built-in group, /automation/embeddings/ endpoints for external tools to push pre-computed embeddings
Optional query decomposition -- LLM-driven splitting of user queries into index-specific sub-queries (off by default)
Optional cross-encoder reranking -- API-based reranker rescores merged results from both indexes for higher-precision context

In Progress (dev/1.4.0)

Additional content providers (#249) -- Confluence, OneDrive/SharePoint, and Google Workspace delegated access

Not Yet Implemented

Frontend chat UI -- Angular components for the chat page, source sidebar, and session management

Architecture Decisions

Key decisions from the backend design discussion:

LLM integration: Provider-agnostic via LangChainGo, allowing operators to choose their LLM provider.
Vector store: In-memory brute-force cosine similarity search with database-serialized embeddings (rows-per-embedding). No separate vector database required. Dual indexes (text + code) with composite keys per threat model.
Conversation storage: Normal relational tables in the existing threat model database.
Memory management: Explicit budget with LRU eviction and session admission control under memory pressure. Single shared memory pool across both index types.
Scope: Two vector indexes per threat model (text + code), loaded on demand, evicted independently after inactivity.
Entity-to-index mapping: Strict -- repositories go to the code index, all other entity types go to the text index.
Query pipeline: Optional two-stage enhancement -- LLM-driven query decomposition splits user questions into index-specific sub-queries (off by default), and cross-encoder reranking rescores merged results for higher precision (requires separate reranker model). Both degrade gracefully when not configured.

Query Pipeline Architecture

flowchart TD
    A[User Message] --> B{Query Decomposer\nconfigured?}
    B -->|yes| C[LLM Decompose:\ntext_query + code_query]
    B -->|no| D[Use original query\nfor both indexes]
    C --> E[Embed text_query\nwith text model]
    C --> F[Embed code_query\nwith code model]
    D --> E
    D --> F
    E --> G[Search Text Index\ntop-K results]
    F --> H{Code Index\nconfigured?}
    H -->|yes| I[Search Code Index\ntop-K results]
    H -->|no| J[Skip]
    G --> K[Merge Candidates]
    I --> K
    J --> K
    K --> L{Reranker\nconfigured?}
    L -->|yes| M[Cross-Encoder Rerank\nwith original query]
    L -->|no| N[Use merged results\nas-is]
    M --> O[Apply Rerank top-K\ncutoff]
    O --> P[Format Tier 2 Context]
    N --> P
    P --> Q[Assemble Full Prompt:\nBase + Tier 1 + Tier 2]
    Q --> R[LLM Synthesis\nstreaming response]

Architecture-and-Design -- System architecture and design decisions
REST-API-Reference -- API endpoint reference

Related Issues

Server backend: ericfitz/tmi#214
Client UX: ericfitz/tmi-ux#293