Idea logging - robbiemu/aclarai GitHub Wiki
You're absolutely right to push for precision and clarity! Let's refine the logging strategy document to meet your requirements, focusing on developer-centric conventions and practical examples.
Idea-logging.md
📝 Logging Strategy
This document outlines aclarai's conventions for logging across its services. Its primary goal is to ensure consistency, aid debugging, and provide essential operational insight within the Docker Compose environment.
🎯 Purpose
To define standards for what, how, and where aclarai services log information, facilitating rapid issue diagnosis and system understanding.
I. Core Principles
- Consistency: All services should log in a predictable format, regardless of the originating component.
- Traceability: Log entries must include identifiers to track operations across service boundaries (e.g., specific
aclarai_id
for a block being processed,job_id
for a scheduled task). - Contextual Information: Logs should provide sufficient detail to understand what happened, why it happened, and where it happened within the codebase.
- Actionable: Errors and warnings should provide enough information for a developer to begin troubleshooting.
II. Log Levels and Usage Examples
aclarai components will consistently use standard Python logging levels:
DEBUG
: Detailed diagnostic information, primarily for development and deep troubleshooting.- Usage Example: Logging the full request and response payloads from an LLM API call in
aclarai-core
during theDecomposition
phase (sprint_3-Implement_core_ClaimifAI_pipeline.md
).
- Usage Example: Logging the full request and response payloads from an LLM API call in
INFO
: Confirmation that operations are proceeding as expected.- Usage Example: Recording when a file has been successfully imported into the vault by the
Import Panel
(sprint_2-Create_Tier_1_Markdown.md
). - Usage Example: Indicating the start and completion of a scheduled job, such as
sync_vault_to_graph
inaclarai-scheduler
(sprint_3-Bootstrap_scheduler_and_vault_sync_job.md
).
- Usage Example: Recording when a file has been successfully imported into the vault by the
WARNING
: An indication that something unexpected happened, or a potential problem, but it did not prevent the operation from completing.- Usage Example: Notifying that the fallback plugin was used for an unrecognized file format during import (
sprint_2-Implement_default_plugin.md
). - Usage Example: Detecting a version mismatch (
vault_ver < graph_ver
) for a Markdown block, causing the update to be skipped to prevent conflicts (sprint_4-Block_syncing_loop.md
).
- Usage Example: Notifying that the fallback plugin was used for an unrecognized file format during import (
ERROR
: A serious problem that prevented a specific operation from completing. The system, or other parts of it, might continue to function.- Usage Example: Logging a persistent failure to connect to the Neo4j database after retries during
(:Claim)
node creation (sprint_3-Create_nodes_in_neo4j.md
). - Usage Example: An LLM agent failing to produce an evaluation score after all retries, resulting in a
null
score for the claim (sprint_7-Implement_entailment_evaluation.md
).
- Usage Example: Logging a persistent failure to connect to the Neo4j database after retries during
CRITICAL
: A severe error that indicates the application or a major component is unable to continue functioning, requiring immediate attention.- Usage Example: The
aclarai-scheduler
failing to initialize due to a fundamental configuration error, preventing any jobs from running (sprint_3-Bootstrap_scheduler_and_vault_sync_job.md
).
- Usage Example: The
III. Standard Log Fields
Standard Python logging
module formatters will automatically include timestamp
and level
. Beyond these defaults, aclarai logs will consistently include the following fields and contextual identifiers:
service
: The name of the Docker service/component originating the log (e.g.,aclarai-core
,vault-watcher
,aclarai-scheduler
,aclarai-ui
).filename.function_name
: The specific Python file and function where the log entry originated (e.g.,[claim_processor.decompose_claim]
). This is crucial for pinpointing code locations.message
: A concise human-readable description of the event.- Contextual IDs (when applicable):
aclarai_id
: The unique identifier for a Markdown block, claim, concept, or other aclarai entity being processed (blk_xyz
,clm_abc
,concept_def
).job_id
: A unique identifier for a specific execution of a scheduled job.file_path
: The vault-relative path to the Markdown file currently being processed or affected.
Conceptual Log Format:
[TIMESTAMP] [LEVEL] [SERVICE] [FILENAME.FUNCTION_NAME] MESSAGE [DETAILS: {aclarai_id: "...", file_path: "..."}]
IV. Log Output Destination
All aclarai services will direct their logs to stdout
(standard output) and stderr
(standard error) within their respective Docker containers.
- This approach is idiomatic for containerized applications and ensures that Docker's default logging drivers can readily collect and expose all log streams.
- It provides a unified log view via
docker compose logs
for local development and debugging.
V. Non-Goals for Current Scope
- Centralized Logging Infrastructure: The current scope does not include the deployment of dedicated log aggregation, analysis, or visualization tools (e.g., ELK Stack, Grafana Loki).
- Advanced Metrics/Telemetry: Beyond detailed event logs, the system will not collect or expose explicit operational metrics (e.g., CPU/memory usage, LLM token counts per job, API latencies) in a structured, Prometheus-compatible format.
- Complex Log Rotation/Archiving: Basic log rotation will be handled by Docker's default logging driver configurations.
This strategy aims to provide robust, developer-friendly logging that supports efficient development and basic operational oversight, without introducing undue complexity for the current project phase.