report generation - osok/hawkeye GitHub Wiki

Report Generation System - HawkEye AI Threat Analysis

Overview

The HawkEye Report Generation System provides comprehensive multi-format reporting capabilities for security reconnaissance and threat analysis results. The system supports multiple output formats, dynamic template rendering, data aggregation, and visualization generation for professional security reports.

Architecture Components

1. Multi-Format Report Generation

classDiagram
    class BaseReporter {
        <<abstract>>
        +settings: Settings
        +logger: Logger
        +_generation_stats: Dict
        +generate_report(data, output_path)* str
        +get_format()* ReportFormat
        +validate_data(data) void
        +get_generation_statistics() Dict
    }
    
    class JSONReporter {
        +generate_report(data, output_path) str
        +get_format() ReportFormat
        +_serialize_data(data) Dict
        +_format_json_output(data) str
    }
    
    class CSVReporter {
        +generate_report(data, output_path) str
        +get_format() ReportFormat
        +_flatten_data(data) List[Dict]
        +_write_csv_sections(data) str
    }
    
    class XMLReporter {
        +generate_report(data, output_path) str
        +get_format() ReportFormat
        +_convert_to_xml(data) str
        +_format_xml_elements(data) str
    }
    
    class HTMLReporter {
        +template_engine: TemplateEngine
        +generate_report(data, output_path) str
        +generate_executive_summary(data) str
        +get_format() ReportFormat
        +_register_default_templates() void
    }
    
    class IntrospectionReporter {
        +generate_report(data, output_path) str
        +generate_mcp_summary(data) str
        +get_format() ReportFormat
        +_format_server_details(servers) str
    }
    
    BaseReporter <|-- JSONReporter
    BaseReporter <|-- CSVReporter
    BaseReporter <|-- XMLReporter
    BaseReporter <|-- HTMLReporter
    BaseReporter <|-- IntrospectionReporter
    
    class ReportFormat {
        <<enumeration>>
        JSON
        CSV
        XML
        HTML
        PDF
        MARKDOWN
    }
    
    class ReportType {
        <<enumeration>>
        EXECUTIVE_SUMMARY
        TECHNICAL_DETAILED
        VULNERABILITY_REPORT
        COMPLIANCE_REPORT
        SCAN_RESULTS
        DETECTION_RESULTS
        RISK_ASSESSMENT
        COMBINED_REPORT
        THREAT_ANALYSIS
    }
Loading

2. Template Engine Architecture

classDiagram
    class TemplateEngine {
        +_templates: Dict[str, BaseTemplate]
        +register_template(template) void
        +get_template(name) BaseTemplate
        +list_templates() List[str]
        +render_template(name, data) str
        +clear_templates() void
    }
    
    class AdaptiveTemplateEngine {
        +content_generators: Dict[str, Callable]
        +context_injectors: Dict[str, Callable]
        +severity_formatters: Dict[str, Dict]
        +register_content_generator(name, generator) void
        +register_context_injector(type, injector) void
        +register_severity_formatter(severity, formats) void
        +render_adaptive_template(name, data) str
    }
    
    class BaseTemplate {
        <<abstract>>
        +template_name: str
        +get_template_content()* str
        +get_css_content() str
        +get_js_content() str
        +render(data, **kwargs) str
        +_prepare_template_variables(data) Dict
    }
    
    class ExecutiveSummaryTemplate {
        +get_template_content() str
        +_format_key_findings(data) str
        +_generate_risk_charts(data) str
    }
    
    class TechnicalReportTemplate {
        +get_template_content() str
        +_format_detailed_results(data) str
        +_generate_technical_charts(data) str
    }
    
    class VulnerabilityReportTemplate {
        +get_template_content() str
        +_format_vulnerability_details(data) str
        +_generate_risk_matrix(data) str
    }
    
    class ComplianceReportTemplate {
        +get_template_content() str
        +_format_compliance_analysis(data) str
        +_generate_compliance_charts(data) str
    }
    
    class ThreatAnalysisTemplate {
        +get_template_content() str
        +_format_threat_scenarios(data) str
        +_generate_attack_chains(data) str
    }
    
    TemplateEngine <|-- AdaptiveTemplateEngine
    TemplateEngine --> BaseTemplate : manages
    BaseTemplate <|-- ExecutiveSummaryTemplate
    BaseTemplate <|-- TechnicalReportTemplate
    BaseTemplate <|-- VulnerabilityReportTemplate
    BaseTemplate <|-- ComplianceReportTemplate
    BaseTemplate <|-- ThreatAnalysisTemplate
Loading

3. Data Aggregation and Analysis

classDiagram
    class DataAggregator {
        +settings: Settings
        +logger: Logger
        +aggregate_report_data(data) ReportData
        +generate_scan_summary(results) ScanSummary
        +generate_detection_summary(results) DetectionSummary
        +generate_risk_summary(results) RiskSummary
        +generate_introspection_summary(data) IntrospectionSummary
        +generate_executive_summary(data) str
        +generate_recommendations(data) List[str]
        +generate_trend_analysis(historical_data) Dict[str, TimeSeriesData]
        +generate_distribution_analysis(data) Dict[str, DistributionData]
    }
    
    class TimeSeriesData {
        +name: str
        +timestamps: List[datetime]
        +values: List[float]
        +labels: List[str]
        +add_point(timestamp, value, label) void
        +get_trend() str
        +calculate_statistics() Dict[str, float]
        +to_dict() Dict[str, Any]
    }
    
    class DistributionData {
        +name: str
        +categories: Dict[str, int]
        +total_count: int
        +add_category(category, count) void
        +get_percentage(category) float
        +get_top_categories(limit) List[Tuple[str, int]]
        +to_dict() Dict[str, Any]
    }
    
    class ExecutiveSummaryGenerator {
        +settings: Settings
        +generate_summary(data) str
        +generate_findings(data) List[ExecutiveFinding]
        +generate_metrics(data) ExecutiveMetrics
        +_analyze_critical_issues(data) List[str]
        +_calculate_risk_score(data) float
        +_generate_recommendations(data) List[str]
    }
    
    DataAggregator --> TimeSeriesData : creates
    DataAggregator --> DistributionData : creates
    DataAggregator --> ExecutiveSummaryGenerator : uses
Loading

Core Features

1. Multi-Format Support

Supported Formats

  • JSON: Structured data export with hierarchical organization
  • CSV: Tabular data export with flattened structure for spreadsheet analysis
  • XML: Structured markup for system integration
  • HTML: Rich interactive reports with embedded charts and styling
  • PDF: Print-ready formatted reports (via HTML conversion)
  • Markdown: Documentation-friendly format for version control

Format-Specific Features

JSON Reporter:

  • Hierarchical data preservation
  • Nested object serialization
  • Custom encoder for complex data types
  • Compressed and pretty-print options

CSV Reporter:

  • Data flattening for tabular representation
  • Multiple sheets/sections within single file
  • Custom delimiters and encoding support
  • Statistical summaries as separate sections

XML Reporter:

  • Schema validation
  • Namespace support
  • XSLT transformation capabilities
  • DTD/XSD compliance

HTML Reporter:

  • Interactive dashboard-style reports
  • Embedded JavaScript for dynamic content
  • CSS grid layouts for responsive design
  • Print-optimized styles

2. Template Engine System

Core Template Features

# Template Variable Substitution
template_vars = {
    'render_time': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
    'total_targets': len(data.scan_results) if data.has_scan_data else 0,
    'critical_findings': data.risk_summary.critical_vulnerabilities if data.has_assessment_data else 0,
    'high_risk_targets': data.risk_summary.high_risk_systems if data.has_assessment_data else 0,
    'css_content': template.get_css_content(),
    'js_content': template.get_js_content(),
}

Adaptive Template Engine Features

  • Content Generators: Dynamic content creation based on data context
  • Context Injectors: Environment-specific information injection
  • Severity Formatters: Risk-based styling and formatting
  • Template Capability Matching: Automatic template selection based on data types

Template Types

Executive Summary Templates:

  • High-level overview dashboards
  • Key performance indicators
  • Risk distribution charts
  • Priority action items

Technical Report Templates:

  • Detailed scan results
  • Network topology diagrams
  • Service fingerprinting data
  • Protocol analysis details

Vulnerability Report Templates:

  • CVSS scoring matrices
  • Exploit chain analysis
  • Remediation recommendations
  • Compliance mappings

3. Visualization and Chart Generation

Chart Types Supported

Statistical Charts:

  • Bar charts for distribution analysis
  • Line graphs for trend visualization
  • Pie charts for proportion representation
  • Heat maps for risk matrices

Security-Specific Visualizations:

  • Risk assessment matrices
  • Attack surface diagrams
  • Threat landscape maps
  • Compliance dashboard indicators

Interactive Elements:

  • Collapsible sections for detailed data
  • Sortable tables with filtering
  • Drill-down capabilities for hierarchical data
  • Export functions for individual charts

Visualization Implementation

// Embedded JavaScript for Interactive Charts
function generateRiskMatrix(riskData) {
    const canvas = document.getElementById('risk-matrix');
    const ctx = canvas.getContext('2d');
    
    // Risk matrix visualization logic
    riskData.forEach(risk => {
        const x = risk.likelihood * canvas.width / 5;
        const y = (5 - risk.impact) * canvas.height / 5;
        const color = getRiskColor(risk.severity);
        
        drawRiskBubble(ctx, x, y, risk.size, color, risk.label);
    });
}

function getRiskColor(severity) {
    const colors = {
        'CRITICAL': '#dc3545',
        'HIGH': '#fd7e14',
        'MEDIUM': '#ffc107',
        'LOW': '#28a745',
        'INFO': '#17a2b8'
    };
    return colors[severity] || '#6c757d';
}

4. Data Transformation Pipeline

Pipeline Stages

flowchart LR
    A[Raw Data Input] --> B[Data Validation]
    B --> C[Format Conversion]
    C --> D[Aggregation Engine]
    D --> E[Template Selection]
    E --> F[Content Generation]
    F --> G[Styling Application]
    G --> H[Output Formatting]
    H --> I[File Generation]
    
    subgraph "Data Processing"
        B
        C
        D
    end
    
    subgraph "Report Generation"
        E
        F
        G
    end
    
    subgraph "Output Production"
        H
        I
    end
Loading

Transformation Components

Data Validation:

  • Schema compliance checking
  • Required field validation
  • Data type verification
  • Consistency validation

Format Conversion:

  • Cross-format data mapping
  • Nested structure flattening
  • Hierarchical data reconstruction
  • Custom serialization handling

Aggregation Processing:

  • Statistical calculations
  • Trend analysis
  • Distribution computations
  • Executive summary generation

Performance Characteristics

Generation Performance

HTML Reports:

  • Average generation time: 2-5 seconds for 1000 targets
  • Memory usage: 50-100MB during generation
  • Template caching reduces subsequent generations by 60%

JSON Reports:

  • Average generation time: 0.5-1 second for 1000 targets
  • Memory usage: 20-40MB during generation
  • Streaming support for large datasets

CSV Reports:

  • Average generation time: 1-2 seconds for 1000 targets
  • Memory usage: 30-60MB during generation
  • Incremental writing for memory efficiency

Optimization Strategies

Template Caching:

  • Pre-compiled template storage
  • CSS/JS content caching
  • Variable substitution optimization

Data Processing:

  • Lazy evaluation for large datasets
  • Parallel processing for independent sections
  • Memory pool management for object reuse

Output Optimization:

  • Compressed output options
  • Incremental file writing
  • Background generation support

Integration Points

CLI Integration

# Multi-format report generation
hawkeye analyze-threats -i results.json -o report.html --format html
hawkeye analyze-threats -i results.json -o report.json --format json  
hawkeye analyze-threats -i results.json -o report.csv --format csv

# Template selection
hawkeye analyze-threats -i results.json -o executive.html --template executive_summary
hawkeye analyze-threats -i results.json -o technical.html --template technical_report

Pipeline Integration

# Report generation in analysis pipeline
converter = PipelineToReportConverter()
report_data = converter.convert_pipeline_results(
    pipeline_results,
    report_title="MCP Security Assessment",
    report_type=ReportType.THREAT_ANALYSIS
)

# Multi-format generation
reporters = [
    HTMLReporter(),
    JSONReporter(), 
    CSVReporter()
]

for reporter in reporters:
    output_path = f"report.{reporter.get_format().value}"
    reporter.generate_report(report_data, output_path)

Configuration Management

# Report generation settings
REPORTING_CONFIG = {
    'default_format': ReportFormat.HTML,
    'template_directory': 'templates/',
    'enable_caching': True,
    'max_memory_usage': '200MB',
    'parallel_processing': True,
    'output_compression': True
}

Error Handling and Recovery

Exception Hierarchy

class ReportingError(Exception):
    """Base exception for reporting errors"""
    pass

class FormatError(ReportingError):
    """Format-specific generation errors"""
    pass

class TemplateError(ReportingError):
    """Template processing errors"""
    pass

Recovery Mechanisms

Template Fallback:

  • Automatic fallback to basic templates on complex template failures
  • Partial rendering with error annotations
  • Safe mode generation with minimal formatting

Data Recovery:

  • Graceful handling of missing data sections
  • Default value substitution for missing fields
  • Partial report generation with warnings

Output Recovery:

  • Retry mechanisms for file writing failures
  • Alternative output location selection
  • Memory cleanup on generation failures

Security Considerations

Data Sanitization

  • XSS prevention in HTML reports
  • SQL injection prevention in data queries
  • Path traversal protection in file operations

Access Control

  • Report access permission validation
  • Sensitive data redaction options
  • Audit trail for report generation

Data Privacy

  • PII removal from reports
  • Configurable data retention policies
  • Secure temporary file handling

Future Enhancements

Planned Features

  • PDF Generation: Direct PDF output without HTML conversion
  • Interactive Dashboards: Web-based dashboard with real-time updates
  • Report Scheduling: Automated report generation and distribution
  • Custom Template Builder: GUI-based template creation tool

Performance Improvements

  • Streaming Generation: Memory-efficient processing for large datasets
  • Distributed Processing: Multi-node report generation for enterprise scale
  • Caching Layer: Advanced caching with dependency tracking
  • Compression: Advanced compression algorithms for output optimization

This comprehensive report generation system provides flexible, performant, and feature-rich reporting capabilities essential for professional security assessment and threat analysis workflows.

⚠️ **GitHub.com Fallback** ⚠️