30_LLD_Adaptive_Input_Module - asavschaeffer/globule GitHub Wiki
# Adaptive Input Module - Low Level Design
**Version:** 1.0
**Date:** 2025-01-17
**Status:** Draft for Review
## 1. Purpose and Scope
The Adaptive Input Module (AIM) serves as the conversational gateway into the Globule system, transforming raw user input into enriched, schema-aware data ready for processing. It operates as the first touchpoint for users, providing intelligent input validation, schema detection, and contextual enrichment while maintaining a friction-free capture experience.
### Boundaries
- **In Scope:**
- Text input processing and validation
- Schema detection and application
- User interaction and confirmation flows
- Context gathering for ambiguous inputs
- Configuration-based behavior adaptation
- **Out of Scope:**
- Actual content processing (handled by Orchestration Engine)
- Schema definition (handled by Schema Engine)
- Storage operations (handled by Storage Manager)
- Non-text inputs in MVP (future enhancement)
## 2. Functional Overview
The module provides three core behaviors:
1. **Conversational Validation**: Engages users in brief dialogues when input intent is unclear, using a 3-second auto-confirmation mechanism with manual override options.
2. **Intelligent Schema Detection**: Automatically identifies input types through pattern matching and confidence scoring, applying appropriate schemas for structured processing.
3. **Adaptive Behavior**: Adjusts interaction style based on user preferences, context, and historical patterns, supporting multiple verbosity levels from silent to debug mode.
### Key Guarantees
- Response time under 100ms for user feedback
- Schema detection accuracy above 90% for common patterns
- Zero data loss during input processing
- Graceful degradation when schemas unavailable
## 3. External Interfaces
### 3.1 Input Interface
class AdaptiveInputModule:
async def process\_input(
self,
text: str,
context: Optional\[Dict\[str, Any]] = None,
override\_schema: Optional\[str] = None
) -> EnrichedInput:
"""
Process raw text input and return enriched data.
Args:
text: Raw user input
context: Optional context (e.g., clipboard content, previous input)
override\_schema: Force specific schema application
Returns:
EnrichedInput with detected schema and gathered context
"""
### 3.2 Data Contracts
@dataclass
class EnrichedInput:
original\_text: str
enriched\_text: str
detected\_schema\_id: Optional\[str]
confidence\_score: float
additional\_context: Dict\[str, Any]
user\_corrections: List\[str]
timestamp: datetime
@dataclass
class SchemaMatch:
schema\_id: str
confidence: float
matched\_patterns: List\[str]
reason: str # 'pattern\_match', 'ml\_classification', 'user\_override'
### 3.3 Integration Points
- **Schema Engine**: Query available schemas, validate against definitions
- **Configuration System**: Retrieve user preferences and context settings
- **Orchestration Engine**: Pass enriched input for processing
- **Event Bus** (future): Emit schema detection events for analytics
## 4. Internal Design
### 4.1 Input Receiver
**Purpose**: Entry point for all user input, handling rate limiting and initial validation.
**Internal Structure**:
class InputReceiver:
def \_\_init\_\_(self):
self.rate\_limiter = TokenBucket(10, 1) # 10 tokens/sec
self.input\_queue = asyncio.Queue(maxsize=100)
self.size\_limit = 10\_000 # characters
async def receive(self, raw\_input: str) -> str:
# Size validation
if len(raw\_input) > self.size\_limit:
raise InputTooLargeError(f"Max {self.size\_limit} chars")
# Rate limiting
if not self.rate\_limiter.consume():
raise RateLimitExceededError()
# Basic sanitization
sanitized = self.\_sanitize(raw\_input)
await self.input\_queue.put(sanitized)
return sanitized
**Communication**: Pushes validated input to Schema Detector via internal queue.
**Edge Handling**:
- Truncates oversized inputs with warning
- Queues inputs during rate limit with backpressure
- Strips control characters and normalizes whitespace
### 4.2 Schema Detector
**Purpose**: Identifies applicable schemas through multi-stage detection pipeline.
**Internal Logic**:
class SchemaDetector:
def \_\_init\_\_(self):
self.pattern\_matcher = PatternMatcher()
self.ml\_classifier = None # Lazy loaded
self.cache = LRUCache(maxsize=1000)
async def detect(self, text: str, hint: Optional\[str] = None) -> SchemaMatch:
# Check cache first
cache\_key = hash(text + str(hint))
if cached := self.cache.get(cache\_key):
return cached
# Stage 1: Explicit hint (highest priority)
if hint:
return SchemaMatch(hint, 1.0, \[], 'user\_override')
# Stage 2: Pattern matching (<5ms)
if pattern\_match := self.pattern\_matcher.match(text):
if pattern\_match.confidence > 0.9:
result = SchemaMatch(
pattern\_match.schema\_id,
pattern\_match.confidence,
pattern\_match.patterns,
'pattern\_match'
)
self.cache.put(cache\_key, result)
return result
# Stage 3: ML classification (async, <100ms)
if self.ml\_classifier and len(text) > 50:
prediction = await self.\_ml\_classify(text)
if prediction.confidence > 0.7:
result = SchemaMatch(
prediction.schema\_id,
prediction.confidence,
\[],
'ml\_classification'
)
self.cache.put(cache\_key, result)
return result
# Default fallback
return SchemaMatch('free\_text', 0.5, \[], 'default')
**Pattern Matching Strategy**:
class PatternMatcher:
patterns = {
'link\_curation': \[
(r'^https?://', 0.95),
(r'^www\\.', 0.85),
(r'\\.(com|org|net|io)', 0.7)
],
'task\_entry': \[
(r'^(todo|task):', 0.95),
(r'^- \\\[ \\]', 0.9),
(r'(remind me|due|deadline)', 0.8)
],
'prompt': \[
(r'^(you are|act as|please)', 0.85),
(r'(explain|analyze|summarize|review)', 0.8)
]
}
### 4.3 Interaction Controller
**Purpose**: Manages user dialogue flow and confirmation mechanisms.
**State Machine**:
stateDiagram-v2
\[\*] --> AwaitingInput
AwaitingInput --> SchemaDetected: High Confidence
AwaitingInput --> ConfirmationNeeded: Medium Confidence
AwaitingInput --> Clarification: Low Confidence
ConfirmationNeeded --> AutoConfirm: 3s timeout
ConfirmationNeeded --> UserConfirmed: Enter key
ConfirmationNeeded --> UserCorrected: 'n' key
Clarification --> SchemaSelected: User choice
UserCorrected --> SchemaSelected: Manual selection
AutoConfirm --> Processing
UserConfirmed --> Processing
SchemaSelected --> Processing
Processing --> \[\*]
**Implementation**:
class InteractionController:
def \_\_init\_\_(self, config: InteractionConfig):
self.auto\_confirm\_delay = config.auto\_confirm\_delay # 3s default
self.verbosity = config.verbosity\_level
self.state = InteractionState.AWAITING\_INPUT
async def handle\_detection(self, match: SchemaMatch) -> ConfirmationResult:
if match.confidence > 0.9:
# High confidence - apply immediately
return ConfirmationResult(accepted=True, schema\_id=match.schema\_id)
elif match.confidence > 0.6:
# Medium confidence - confirm with user
prompt = self.\_format\_confirmation(match)
return await self.\_confirm\_with\_timeout(prompt, match)
else:
# Low confidence - request clarification
options = await self.\_get\_schema\_options()
return await self.\_clarify\_with\_user(options)
async def \_confirm\_with\_timeout(self, prompt: str, match: SchemaMatch):
print(prompt)
# Start countdown
start\_time = time.time()
while time.time() - start\_time < self.auto\_confirm\_delay:
if user\_input := self.\_check\_input():
if user\_input == '\\n':
return ConfirmationResult(True, match.schema\_id)
elif user\_input == 'n':
return await self.\_handle\_correction(match)
# Show countdown progress
remaining = self.auto\_confirm\_delay - (time.time() - start\_time)
self.\_update\_countdown(remaining)
await asyncio.sleep(0.1)
# Auto-confirm after timeout
return ConfirmationResult(True, match.schema\_id)
### 4.4 Context Enricher
**Purpose**: Gathers additional context based on detected schema requirements.
**Context Gathering**:
class ContextEnricher:
async def enrich(self, text: str, schema: Schema) -> Dict\[str, Any]:
context = {
'timestamp': datetime.now(),
'input\_length': len(text),
'source': 'cli'
}
# Schema-specific enrichment
for action in schema.actions:
if action.type == 'fetch\_title' and 'url' in text:
context\['page\_title'] = await self.\_fetch\_url\_title(text)
elif action.type == 'prompt\_context':
response = await self.\_prompt\_user(action.prompt)
context\[action.field] = response
elif action.type == 'extract\_metadata':
context\['metadata'] = self.\_extract\_metadata(text)
return context
### 4.5 Preference Learner
**Purpose**: Tracks user corrections and adapts future behavior.
**Learning Mechanism**:
class PreferenceLearner:
def \_\_init\_\_(self):
self.correction\_history = deque(maxlen=1000)
self.schema\_scores = defaultdict(lambda: {'correct': 0, 'total': 0})
def record\_correction(self, original: SchemaMatch, corrected: str):
self.correction\_history.append({
'original': original,
'corrected': corrected,
'timestamp': time.time()
})
# Update scoring
self.schema\_scores\[original.schema\_id]\['total'] += 1
if corrected != original.schema\_id:
self.schema\_scores\[original.schema\_id]\['correct'] += 0
else:
self.schema\_scores\[original.schema\_id]\['correct'] += 1
def get\_confidence\_adjustment(self, schema\_id: str) -> float:
scores = self.schema\_scores\[schema\_id]
if scores\['total'] < 10:
return 0.0 # Not enough data
accuracy = scores\['correct'] / scores\['total']
# Boost confidence for high-accuracy schemas
if accuracy > 0.9:
return 0.1
# Reduce confidence for frequently corrected schemas
elif accuracy < 0.5:
return -0.2
return 0.0
## 5. Control Flow and Data Flow
### Primary Flow: Input Processing Pipeline
flowchart TD
A\[User Input] --> B\[Input Receiver]
B --> C{Size/Rate OK?}
C -->|No| D\[Error Response]
C -->|Yes| E\[Schema Detector]
E --> F{Detection Stage}
F -->|Pattern Match| G\[Quick Response <5ms]
F -->|ML Classify| H\[Async Classification <100ms]
G --> I\[Interaction Controller]
H --> I
I --> J{Confidence Level}
J -->|High >0.9| K\[Auto-Apply]
J -->|Medium 0.6-0.9| L\[Confirm Dialog]
J -->|Low <0.6| M\[Clarification]
L --> N{User Response}
N -->|Confirm| K
N -->|Correct| O\[Schema Selection]
N -->|Timeout| K
K --> P\[Context Enricher]
O --> P
M --> O
P --> Q\[Preference Learner]
Q --> R\[Return EnrichedInput]
### Asynchronous Processing Flow
async def process\_input\_pipeline(self, raw\_input: str):
# Stage 1: Receive and validate (sync, <1ms)
validated = await self.receiver.receive(raw\_input)
# Stage 2: Detect schema (async, <5ms typical)
detection\_task = asyncio.create\_task(
self.detector.detect(validated)
)
# Stage 3: Prepare UI while detecting
ui\_task = asyncio.create\_task(
self.controller.prepare\_interface()
)
# Wait for detection
match = await detection\_task
await ui\_task
# Stage 4: User interaction (async, variable)
confirmation = await self.controller.handle\_detection(match)
# Stage 5: Enrich context (async, <50ms)
if confirmation.accepted:
schema = await self.schema\_engine.get\_schema(confirmation.schema\_id)
context = await self.enricher.enrich(validated, schema)
# Stage 6: Learn from interaction
self.learner.record\_interaction(match, confirmation)
return EnrichedInput(
original\_text=raw\_input,
enriched\_text=validated,
detected\_schema\_id=confirmation.schema\_id,
confidence\_score=match.confidence,
additional\_context=context,
user\_corrections=confirmation.corrections
)
## 6. Configuration and Tuning
### Configuration Schema
adaptive\_input:
# Interaction settings
interaction:
auto\_confirm\_delay: 3.0 # seconds
verbosity\_level: "concise" # silent|concise|verbose|debug
show\_countdown: true
use\_colors: true
# Detection settings
detection:
pattern\_confidence\_threshold: 0.9
ml\_confidence\_threshold: 0.7
enable\_ml\_classification: false # Disabled by default for performance
cache\_size: 1000
# Rate limiting
rate\_limiting:
tokens\_per\_second: 10
burst\_size: 20
# Input validation
validation:
max\_input\_size: 10000 # characters
allowed\_characters: "printable" # printable|extended|all
strip\_control\_chars: true
# Learning preferences
learning:
enable\_preference\_learning: true
history\_size: 1000
min\_samples\_for\_adjustment: 10
# Schema-specific overrides
schema\_overrides:
link\_curation:
auto\_confirm\_delay: 1.0 # Faster for URLs
always\_fetch\_title: true
task\_entry:
verbosity\_level: "verbose" # More detail for tasks
### Runtime Tuning
class ConfigManager:
def apply\_context\_override(self, base\_config: Config, context: str) -> Config:
"""Apply context-specific configuration overrides"""
if context == "batch\_processing":
base\_config.interaction.auto\_confirm\_delay = 0 # No delays
base\_config.interaction.verbosity\_level = "silent"
elif context == "learning\_mode":
base\_config.interaction.verbosity\_level = "verbose"
base\_config.detection.ml\_confidence\_threshold = 0.5 # More suggestions
return base\_config
## 7. Failure Modes and Recovery
### 7.1 Detection Failures
- **Pattern Matcher Timeout**: Fall back to default schema with warning
- **ML Classifier Unavailable**: Continue with pattern matching only
- **Schema Not Found**: Use free_text schema as safe default
### 7.2 User Interaction Failures
- **Terminal Unresponsive**: Auto-confirm after timeout
- **Invalid User Input**: Re-prompt with clearer instructions
- **Repeated Corrections**: Suggest disabling problematic schema
### 7.3 Integration Failures
- **Schema Engine Unreachable**: Use cached schemas or defaults
- **Configuration Service Down**: Use hard-coded defaults
- **Context Enrichment Timeout**: Proceed without optional context
### Recovery Strategies
class FailureHandler:
async def handle\_schema\_detection\_failure(self, error: Exception, text: str):
logger.warning(f"Schema detection failed: {error}")
# Try fallback strategies
if cached\_result := self.cache.get\_fuzzy(text):
return cached\_result
# Use statistical fallback
if word\_count := len(text.split()):
if word\_count < 10 and 'http' in text:
return SchemaMatch('link\_curation', 0.6, \[], 'statistical\_guess')
# Ultimate fallback
return SchemaMatch('free\_text', 0.5, \[], 'fallback')
## 8. Performance Considerations
### 8.1 Latency Budget
| Operation | Target | Strategy |
|-----------|--------|----------|
| Pattern matching | <5ms | Compiled regex, early termination |
| ML classification | <100ms | Model quantization, caching |
| User feedback | <10ms | Pre-rendered prompts |
| Context enrichment | <50ms | Parallel fetches, timeouts |
| Total processing | <200ms | Pipeline parallelization |
### 8.2 Resource Usage
- **Memory**: ~50MB base + 10MB cache
- **CPU**: Single core sufficient for 100 req/s
- **Network**: Optional, only for enrichment
### 8.3 Optimizations
class PerformanceOptimizer:
def \_\_init\_\_(self):
# Pre-compile all patterns
self.compiled\_patterns = {
schema: \[(re.compile(p, re.IGNORECASE), conf)
for p, conf in patterns]
for schema, patterns in PATTERN\_DEFINITIONS.items()
}
# Pre-render common prompts
self.prompt\_cache = {
'link\_detected': "URL detected. Save as link? \[Enter/n]",
'task\_detected': "Task detected. Create task? \[Enter/n]",
# ... more cached prompts
}
# Warm up ML model
if self.ml\_enabled:
asyncio.create\_task(self.\_warmup\_ml\_model())
## 9. Security and Privacy
### 9.1 Input Validation
- **Size Limits**: Prevent memory exhaustion via 10KB limit
- **Character Filtering**: Strip non-printable characters
- **Pattern Injection**: Escape regex special characters
- **Rate Limiting**: Prevent DoS via token bucket
### 9.2 Privacy Considerations
- **Local Processing**: All detection happens on-device
- **No Telemetry**: User corrections stored locally only
- **Secure Schemas**: Validate schema sources before loading
- **Context Isolation**: Each input processed independently
### 9.3 Attack Surface Mitigation
class SecurityValidator:
DANGEROUS\_PATTERNS = \[
r'<script.\*?>.\*?</script>', # XSS attempts
r'(rm|del|format)\\s+-rf?\\s+/', # Command injection
r'\\.\\./', # Path traversal
]
def validate\_input(self, text: str) -> str:
# Check dangerous patterns
for pattern in self.DANGEROUS\_PATTERNS:
if re.search(pattern, text, re.IGNORECASE):
raise SecurityError(f"Potentially dangerous input detected")
# Sanitize
sanitized = bleach.clean(text, tags=\[], strip=True)
return sanitized\[:self.max\_size]
## 10. Testing Strategy
### 10.1 Unit Tests
class TestSchemaDetector:
def test\_url\_detection(self):
detector = SchemaDetector()
result = detector.detect("https://example.com")
assert result.schema\_id == "link\_curation"
assert result.confidence > 0.9
def test\_ambiguous\_input(self):
result = detector.detect("review this")
assert result.confidence < 0.6
assert result.schema\_id == "free\_text"
def test\_pattern\_priority(self):
# Explicit patterns should override ML
result = detector.detect("todo: https://example.com")
assert result.schema\_id == "task\_entry" # todo: takes precedence
### 10.2 Integration Tests
async def test\_full\_pipeline():
module = AdaptiveInputModule()
# Test high-confidence flow
result = await module.process\_input("https://globule.app")
assert result.detected\_schema\_id == "link\_curation"
assert "page\_title" in result.additional\_context
# Test user correction flow
with mock\_user\_input(\['n', '2']): # Correct, then select option 2
result = await module.process\_input("review this mockup")
assert result.user\_corrections == \['rejected\_prompt']
### 10.3 Performance Tests
@pytest.mark.benchmark
async def test\_detection\_performance(benchmark):
detector = SchemaDetector()
# Benchmark pattern matching
result = benchmark(detector.detect, "https://example.com")
assert benchmark.stats\['mean'] < 0.005 # <5ms average
# Benchmark with cache
for \_ in range(100):
detector.detect("https://example.com")
assert detector.cache.hit\_rate > 0.95
### 10.4 Usability Tests
- **Wizard of Oz**: Manual schema selection to validate UX
- **A/B Testing**: Compare auto-confirm timings (2s vs 3s vs 5s)
- **Think-Aloud**: Observe users categorizing various inputs
- **Error Recovery**: Test with network failures, timeouts
## 11. Alternatives Considered
### 11.1 Pure ML Classification
**Approach**: Use only ML models for all schema detection.
**Rejected Because**:
- Latency too high (200-500ms) for interactive use
- Requires training data for each schema
- Less interpretable than pattern matching
### 11.2 Hierarchical Schema Trees
**Approach**: User navigates schema hierarchy (Work → Task → Bug Report).
**Rejected Because**:
- Adds friction to capture process
- Requires users to understand schema organization
- Contradicts "capture first, organize never" principle
### 11.3 Post-Processing Classification
**Approach**: Capture everything as free text, classify later in background.
**Rejected Because**:
- Misses opportunity for contextual enrichment
- Can't guide users during ambiguous input
- Reduces immediate value of structured data
### 11.4 Client-Side Heavy Processing
**Approach**: Run full NLP models in browser/CLI.
**Rejected Because**:
- Startup time too slow
- Resource usage too high for lightweight CLI
- Model updates require client updates
The chosen design balances immediate response, accuracy, and user control while maintaining the system's core principle of frictionless capture with intelligent assistance.