Storage Backend - FeitianTech/postquantum-webauthn-platform GitHub Wiki

Storage Backend

Table of Contents

  1. Introduction
  2. Dual-Layer Architecture
  3. Core Storage Components
  4. Data Access Patterns
  5. Object Naming Conventions
  6. Local Storage Fallback
  7. Error Handling and Retry Mechanisms
  8. Performance Considerations
  9. Security and Encryption
  10. Migration Strategies
  11. Testing and Validation
  12. Troubleshooting Guide

Introduction

The PostQuantum WebAuthn Platform implements a sophisticated dual-layer storage backend designed to provide flexible, scalable, and resilient data persistence for cryptographic credentials, session metadata, device logs, and MDS snapshots. The architecture separates concerns between abstract storage interfaces and concrete implementations, enabling seamless switching between local filesystem storage and Google Cloud Storage while maintaining data consistency and performance.

The storage system serves multiple critical functions:

  • Secure credential storage for WebAuthn authentication
  • Session metadata management for user state persistence
  • Device registration logging for audit trails
  • Metadata synchronization for authenticator verification
  • Backup and disaster recovery capabilities

Dual-Layer Architecture

The storage backend follows a layered design pattern with clear separation between abstraction and implementation:

graph TB
subgraph "Application Layer"
A[WebAuthn Routes]
B[Session Management]
C[Device Logging]
D[Metadata Services]
end
subgraph "Storage Abstraction Layer"
E[storage.py<br/>Credential Storage]
F[session_metadata_store.py<br/>Session Metadata]
G[device_logs.py<br/>Device Logging]
H[metadata.py<br/>Metadata Management]
end
subgraph "Implementation Layer"
I[cloud_storage.py<br/>Google Cloud Storage]
J[Local Filesystem<br/>Fallback Storage]
end
subgraph "External Dependencies"
K[Google Cloud Storage API]
L[GitHub API]
M[Filesystem]
end
A --> E
B --> F
C --> G
D --> H
E --> I
E --> J
F --> I
F --> J
G --> L
H --> I
H --> J
I --> K
G --> L
H --> L
J --> M
Loading

Diagram sources

  • server/server/storage.py
  • server/server/cloud_storage.py
  • server/server/session_metadata_store.py

Layer Responsibilities

Application Layer: Contains route handlers and business logic that utilize storage services for data persistence and retrieval.

Storage Abstraction Layer: Provides unified interfaces for different types of data storage, hiding implementation details from higher layers.

Implementation Layer: Contains concrete implementations for cloud and local storage backends.

External Dependencies: Interfaces with external systems like Google Cloud Storage and GitHub for data persistence.

Section sources

  • server/server/storage.py
  • server/server/session_metadata_store.py

Core Storage Components

Credential Storage System

The credential storage system manages WebAuthn credentials with robust session isolation and cross-platform compatibility:

classDiagram
class StorageInterface {
+savekey(name, key, session_id)
+readkey(name, session_id)
+delkey(name, session_id)
+iter_credentials(session_id)
+list_credentials(session_id)
}
class CloudStorage {
+upload_bytes(blob_name, data, content_type)
+download_bytes(blob_name)
+delete_blob(blob_name, missing_ok)
+list_blob_names(prefix)
+blob_exists(blob_name)
}
class LocalStorage {
+_local_filename(name, session_id, create)
+_local_directory(session_id, create)
+_legacy_local_filename(name)
}
class BlobNaming {
+_credential_blob(name, session_id)
+_legacy_credential_blob(name)
+_credential_prefix(session_id)
+_user_root_prefix(session_id)
}
StorageInterface --> CloudStorage : "uses when GCS enabled"
StorageInterface --> LocalStorage : "uses when GCS disabled"
StorageInterface --> BlobNaming : "uses for naming"
Loading

Diagram sources

  • server/server/storage.py
  • server/server/cloud_storage.py

Session Metadata Management

Session metadata storage provides persistent state management with automatic cleanup and activity tracking:

sequenceDiagram
participant App as Application
participant SSM as Session Metadata Store
participant CS as Cloud Storage
participant FS as Local Filesystem
App->>SSM : ensure_session(session_id)
SSM->>SSM : _using_gcs()
alt GCS Enabled
SSM->>CS : touch_last_access(session_id)
CS-->>SSM : success
else Local Storage
SSM->>FS : _local_session_directory(session_id, create=True)
FS-->>SSM : directory_path
end
App->>SSM : write_file(session_id, filename, data)
SSM->>SSM : _session_blob(session_id, filename)
alt GCS Enabled
SSM->>CS : upload_bytes(blob_name, data, content_type)
CS-->>SSM : success
else Local Storage
SSM->>FS : write to local file
FS-->>SSM : success
end
App->>SSM : list_files(session_id)
alt GCS Enabled
SSM->>CS : list_blob_names(prefix)
CS-->>SSM : blob_names[]
else Local Storage
SSM->>FS : os.listdir(directory)
FS-->>SSM : filenames[]
end
Loading

Diagram sources

  • server/server/session_metadata_store.py
  • server/server/storage.py

Section sources

  • server/server/storage.py
  • server/server/session_metadata_store.py

Data Access Patterns

Credential Operations

The storage system supports comprehensive credential lifecycle management:

Operation Description Implementation Pattern
Save Persist credential data with session isolation Pickle serialization + blob upload
Load Retrieve credentials with fallback support Multi-blob search with legacy compatibility
Delete Remove credentials with cleanup Both current and legacy blob removal
Iterate Bulk credential enumeration Streaming blob listing with filtering

Session Metadata Operations

Session metadata follows a hierarchical structure with automatic lifecycle management:

Operation Purpose Cleanup Policy
Touch Access Update activity timestamp Automatic pruning after 14 days
List Files Enumerate session contents Prefix-based filtering
Write File Store arbitrary data Content-type aware uploads
Delete Session Complete session removal Recursive blob deletion

Device Logging Pattern

Device logs implement asynchronous upload with structured data preservation:

flowchart TD
A[Registration Event] --> B[Build Log Payload]
B --> C[Convert to JSON-Safe Format]
C --> D[Generate Unique Path]
D --> E[Upload to GitHub]
E --> F[Log Upload Status]
G[Error Handling] --> H[Retry Logic]
H --> I[Failure Logging]
B --> G
E --> G
Loading

Diagram sources

  • server/server/device_logs.py

Section sources

  • server/server/storage.py
  • server/server/session_metadata_store.py
  • server/server/device_logs.py

Object Naming Conventions

The storage system employs consistent naming patterns for different data types:

Credential Blob Naming

Credential data follows a structured naming convention that ensures uniqueness and organization:

  • Current Format: {session_id}/user-data/credentials/{username}_credential_data.pkl
  • Legacy Format: {username}_credential_data.pkl (root level)
  • Prefix Structure: {session_id}/{folder_prefix}/{subdir}/{filename}

Session Metadata Naming

Session metadata uses a hierarchical structure for organized storage:

  • Root Prefix: {session_id}/user-data/metadata/
  • Access Marker: .last-access (hidden file for activity tracking)
  • File Extensions: .json for metadata, .meta.json for metadata info

Device Log Naming

Device logs implement time-based organization with AAGUID grouping:

  • Path Pattern: logs/{aaguid}/{timestamp}_{shortid}.json
  • Timestamp Format: YYYYMMDDTHHMMSSZ
  • Short ID: Random 8-character identifier for uniqueness

Bucket Management

The system supports configurable bucket management with environment-based configuration:

Environment Variable Purpose Default Behavior
FIDO_SERVER_GCS_BUCKET Target storage bucket Required for cloud storage
FIDO_SERVER_GCS_ENABLED Enable cloud storage Disabled by default
FIDO_SERVER_GCS_CREDENTIALS_FILE Service account file Uses default credentials
FIDO_SERVER_GCS_PROJECT Target project Inherits from credentials

Section sources

  • server/server/storage.py
  • server/server/session_metadata_store.py
  • server/server/device_logs.py

Local Storage Fallback

The storage system implements intelligent fallback mechanisms when cloud storage is unavailable:

Detection Logic

flowchart TD
A[Storage Operation] --> B{GCS Enabled?}
B --> |Yes| C{Bucket Available?}
B --> |No| D[Use Local Storage]
C --> |Yes| E[Use Cloud Storage]
C --> |No| F[Log Warning]
F --> D
D --> G[Local Filesystem]
E --> H[Google Cloud Storage]
Loading

Diagram sources

  • server/server/storage.py
  • server/server/cloud_storage.py

Fallback Strategies

Credential Storage Fallback:

  • Primary: Cloud storage with session isolation
  • Secondary: Local filesystem with session directory creation
  • Legacy Support: Backward compatibility with root-level files

Session Metadata Fallback:

  • Primary: Cloud storage for scalability
  • Secondary: Local filesystem for offline capability
  • Cleanup: Automatic pruning of inactive sessions

Device Logging Fallback:

  • Primary: GitHub integration for audit trails
  • Secondary: Local logging for development
  • Asynchronous: Background upload with graceful degradation

Migration Between Backends

The system supports seamless migration between storage backends through:

  1. Data Export: Serialize all data to portable format
  2. Validation: Verify data integrity across backends
  3. Atomic Switch: Replace backend configuration without data loss
  4. Rollback Capability: Restore previous configuration if needed

Section sources

  • server/server/storage.py
  • server/server/session_metadata_store.py

Error Handling and Retry Mechanisms

Retry Strategy Implementation

The cloud storage layer implements exponential backoff with configurable retry parameters:

sequenceDiagram
participant Client as Storage Client
participant Retry as Retry Handler
participant GCS as Google Cloud Storage
Client->>Retry : Operation Request
Retry->>GCS : Attempt Operation
GCS-->>Retry : Transient Error
Retry->>Retry : Calculate Delay
Retry->>Retry : Sleep (Backoff)
Retry->>GCS : Retry Operation
GCS-->>Retry : Success/Failure
alt Max Attempts Reached
Retry-->>Client : Final Failure
else Success
Retry-->>Client : Operation Result
end
Loading

Diagram sources

  • server/server/cloud_storage.py

Exception Classification

The system categorizes exceptions into retryable and non-retryable types:

Exception Category Examples Retry Behavior
Transient Network timeouts, rate limits, temporary unavailability Exponential backoff with max attempts
Permanent Authentication failures, resource not found Immediate failure with error reporting
Unknown Unexpected errors, system failures Conservative retry with logging

Network Outage Handling

During network outages, the system maintains operational continuity through:

  1. Graceful Degradation: Continue using local storage when cloud is unavailable
  2. Queue Management: Buffer operations for later replay
  3. Health Monitoring: Track connectivity status and alert on persistent failures
  4. Circuit Breaker: Prevent cascading failures during extended outages

Consistency Guarantees

The system provides eventual consistency through:

  • Optimistic Concurrency: Allow concurrent operations with conflict resolution
  • Idempotent Operations: Ensure repeated operations produce same results
  • Transaction Simulation: Use atomic operations where supported
  • Conflict Resolution: Merge conflicting updates with conflict detection

Section sources

  • server/server/cloud_storage.py
  • server/server/storage.py

Performance Considerations

Latency Optimization

The storage system implements several strategies to minimize latency:

Connection Pooling: Reuse HTTP connections for multiple operations to reduce handshake overhead.

Caching Strategy: Implement multi-level caching for frequently accessed metadata and credentials.

Batch Operations: Group multiple small operations into larger batches to reduce API call overhead.

Async Processing: Use asynchronous uploads for non-critical operations like device logs.

Cost-Effective Storage Classes

The system supports different storage classes based on access patterns:

Storage Class Use Case Cost Factor Performance
Standard Active credentials and session data Medium High throughput
Nearline Historical session metadata Low Moderate latency
Coldline Long-term device logs Very Low Higher latency
Archive Compliance logs Lowest Batch access only

Batch Operation Support

For bulk operations, the system provides:

  • Bulk Upload: Efficiently upload multiple files in single operation
  • Parallel Processing: Concurrent processing of independent operations
  • Progress Tracking: Monitor long-running batch operations
  • Error Recovery: Resume interrupted batch operations

Scalability Patterns

The architecture supports horizontal scaling through:

  • Partitioning: Distribute data across multiple storage units
  • Replication: Maintain multiple copies for availability
  • Load Balancing: Distribute requests across storage instances
  • Auto-scaling: Adjust resources based on demand

Section sources

  • server/server/cloud_storage.py
  • server/server/session_metadata_store.py

Security and Encryption

Data Protection Strategies

The storage system implements comprehensive security measures:

Encryption in Transit: All data transmitted to/from cloud storage uses TLS 1.2+ encryption.

Encryption at Rest: Supports customer-managed encryption keys for sensitive data.

Access Control: Implements principle of least privilege with role-based access controls.

IAM Policies

The system integrates with Google Cloud IAM for fine-grained access control:

graph LR
A[Service Account] --> B[Storage Bucket]
B --> C[Blob Operations]
C --> D[Read Access]
C --> E[Write Access]
C --> F[Delete Access]
G[Environment Variables] --> H[Credential Management]
H --> I[Automatic Refresh]
I --> J[Secure Storage]
Loading

Diagram sources

  • server/server/cloud_storage.py

Authentication Methods

Multiple authentication mechanisms are supported:

Method Use Case Security Level
Service Account Keys Production deployments High
Workload Identity Kubernetes environments High
Default Credentials Development/testing Medium
OAuth 2.0 Tokens Interactive applications Medium

Audit and Monitoring

The system provides comprehensive audit capabilities:

  • Operation Logging: Track all storage operations with timestamps
  • Access Monitoring: Monitor who accessed what data and when
  • Anomaly Detection: Identify unusual access patterns
  • Compliance Reporting: Generate reports for regulatory requirements

Section sources

  • server/server/cloud_storage.py
  • server/server/github_client.py

Migration Strategies

Data Migration Planning

Successful migration between storage backends requires careful planning:

Pre-Migration Assessment:

  1. Inventory all stored data and dependencies
  2. Assess current storage utilization and growth patterns
  3. Identify potential compatibility issues
  4. Plan rollback strategy and timeline

Migration Execution:

  1. Phase 1: Export data from source backend
  2. Phase 2: Validate data integrity in target backend
  3. Phase 3: Switch configuration to target backend
  4. Phase 4: Verify functionality and monitor performance

Post-Migration Validation:

  1. Test all critical operations
  2. Verify data completeness and accuracy
  3. Monitor for performance regressions
  4. Document lessons learned

Automated Migration Tools

The system provides utilities for automated migration:

flowchart TD
A[Migration Command] --> B[Source Validation]
B --> C[Target Preparation]
C --> D[Data Transfer]
D --> E[Integrity Verification]
E --> F[Configuration Update]
F --> G[Cleanup Old Data]
H[Error Handling] --> I[Rollback Option]
I --> J[Partial Recovery]
Loading

Diagram sources

  • server/server/startup.py

Zero-Downtime Migration

For production environments, zero-downtime migration is achieved through:

  1. Shadow Mode: Run both backends simultaneously during migration
  2. Read-Through Write-Back: Read from old backend, write to new backend
  3. Gradual Cutover: Phase out old backend gradually
  4. Monitoring: Continuous monitoring during migration process

Section sources

  • server/server/startup.py

Testing and Validation

Comprehensive Test Suite

The storage system includes extensive testing coverage:

Unit Tests: Individual component testing with mocked dependencies Integration Tests: End-to-end testing with real storage backends Performance Tests: Load testing and latency measurement Compatibility Tests: Cross-platform and cross-version testing

Test Infrastructure

The testing framework supports multiple scenarios:

graph TB
A[Test Runner] --> B[Unit Tests]
A --> C[Integration Tests]
A --> D[Performance Tests]
B --> E[Mocked Storage]
C --> F[Real Storage]
D --> G[Load Generator]
H[Environment Setup] --> I[Cloud Emulator]
H --> J[Local Filesystem]
H --> K[Network Simulator]
Loading

Diagram sources

  • tests/test_storage.py
  • tests/test_cloud_storage.py

Validation Strategies

The system implements multiple validation approaches:

Data Integrity Checks: Verify data consistency across storage operations Cross-Backend Validation: Compare results between different storage backends Performance Benchmarking: Measure and compare performance characteristics Security Auditing: Validate access controls and encryption implementation

Section sources

  • tests/test_storage.py
  • tests/test_cloud_storage.py

Troubleshooting Guide

Common Issues and Solutions

Storage Unavailable:

  • Verify GCS bucket configuration and permissions
  • Check network connectivity to Google Cloud endpoints
  • Review service account credentials and expiration

Performance Degradation:

  • Monitor API rate limits and adjust retry parameters
  • Consider storage class optimization for access patterns
  • Evaluate connection pooling and keep-alive settings

Data Corruption:

  • Validate data integrity using checksums
  • Check for concurrent modification conflicts
  • Review backup and recovery procedures

Diagnostic Tools

The system provides built-in diagnostic capabilities:

Health Checks: Verify storage backend connectivity and functionality Metrics Collection: Monitor operation counts, latencies, and error rates Logging: Comprehensive logging with configurable verbosity levels Alerting: Configure alerts for critical storage events

Recovery Procedures

For storage-related incidents, follow these recovery steps:

  1. Assessment: Identify the scope and impact of the issue
  2. Isolation: Determine affected components and data
  3. Recovery: Execute appropriate recovery procedures
  4. Validation: Verify system functionality post-recovery
  5. Documentation: Record incident details and lessons learned

Section sources

  • server/server/startup.py
  • server/server/cloud_storage.py
⚠️ **GitHub.com Fallback** ⚠️