Local Storage Implementation - FeitianTech/postquantum-webauthn-platform GitHub Wiki

Local Storage Implementation

Table of Contents

  1. Introduction
  2. Architecture Overview
  3. Directory Structure
  4. File Naming Conventions
  5. Serialization Formats
  6. Core Storage Components
  7. Atomic File Operations
  8. Error Handling
  9. Integration Patterns
  10. Production Considerations
  11. Troubleshooting Guide
  12. Best Practices

Introduction

The local storage implementation in the Post-Quantum WebAuthn Platform provides a robust file-based persistence layer for storing WebAuthn credentials, session metadata, and configuration data. This system serves as the primary storage mechanism for the demo server, offering both local filesystem storage and cloud storage capabilities through a unified interface.

The storage system is designed with several key principles:

  • Pluggable Architecture: Supports both local filesystem and Google Cloud Storage backends
  • Atomic Operations: Ensures data consistency through atomic file operations
  • Legacy Compatibility: Maintains backward compatibility with older storage formats
  • Error Resilience: Provides comprehensive error handling and fallback mechanisms
  • Performance Optimization: Implements efficient caching and cleanup strategies

Architecture Overview

The storage system follows a layered architecture that separates concerns between different types of data and storage backends:

graph TB
subgraph "Application Layer"
A[WebAuthn Routes]
B[Session Management]
C[Credential Artifacts]
end
subgraph "Storage Abstraction Layer"
D[Storage Factory]
E[Cloud Storage]
F[Local Storage]
end
subgraph "Data Types"
G[Credential Data]
H[Session Metadata]
I[Credential Artifacts]
end
subgraph "File System"
J[Local Directory Structure]
K[JSON Files]
L[Pickle Files]
end
A --> D
B --> D
C --> D
D --> E
D --> F
E --> J
F --> J
G --> K
H --> L
I --> K

Diagram sources

Section sources

Directory Structure

The local storage implementation organizes data into a hierarchical directory structure that provides logical separation and efficient access patterns:

Base Directory Structure

server/
├── static/
│   ├── session-metadata/          # Session metadata storage
│   │   ├── session-id-1/
│   │   │   ├── metadata-file-1.json
│   │   │   ├── metadata-file-2.json
│   │   │   └── .last-access      # Access timestamp marker
│   │   ├── session-id-2/
│   │   │   ├── ...
│   │   └── ...
│   ├── credential-artifacts/      # Advanced credential artifacts
│   │   ├── sha256-hash-1.json
│   │   ├── sha256-hash-2.json
│   │   └── ...
│   └── ...
└── session-credentials/           # Legacy credential storage
    ├── session-id-1/
    │   ├── username1_credential_data.pkl
    │   ├── username2_credential_data.pkl
    │   └── ...
    └── session-id-2/
        ├── ...
        └── ...

Directory Configuration

The system uses configurable directory paths defined in the configuration module:

Directory Purpose Environment Variable
SESSION_METADATA_DIR Session metadata storage FIDO_SERVER_SESSION_METADATA_DIR
_LOCAL_CREDENTIAL_BASE Credential data storage FIDO_SERVER_GCS_USER_CREDENTIAL_SUBDIR
_ARTIFACT_DIR Credential artifacts FIDO_SERVER_GCS_USER_ARTIFACT_SUBDIR

Section sources

File Naming Conventions

The storage system employs consistent naming conventions to ensure predictable file locations and easy identification:

Credential Data Files

Credential data is stored using a standardized naming pattern that combines the username with a fixed suffix:

{username}_{session_id}_credential_data.pkl

For legacy compatibility, the system also supports the older format:

{username}_credential_data.pkl

Session Metadata Files

Session metadata files use a structured naming convention with timestamps and extensions:

{uuid}.{extension}

Where {extension} could be .json, .info.json, or other metadata-related extensions.

Credential Artifacts

Advanced credential artifacts use SHA-256 hash-based naming for uniqueness and security:

{sha256-hash}.json

The hash is computed from the storage ID to ensure collision resistance and secure file location.

Section sources

Serialization Formats

The storage system supports multiple serialization formats depending on the type of data being stored:

Pickle Format for Credentials

Credential data is serialized using Python's pickle format for maximum compatibility with WebAuthn objects:

# Writing credential data
payload = pickle.dumps(key)
with open(path, "wb") as f:
    f.write(payload)

# Reading credential data  
try:
    creds = pickle.loads(payload)
except Exception:
    return []

JSON Format for Metadata

Session metadata and configuration data use JSON format for human readability and cross-platform compatibility:

# JSON serialization with byte conversion
def convert_bytes_for_json(obj: Any) -> Any:
    if isinstance(obj, (bytes, bytearray, memoryview)):
        return base64.b64encode(bytes(obj)).decode('utf-8')
    if isinstance(obj, dict):
        return {k: convert_bytes_for_json(v) for k, v in obj.items()}
    return obj

Binary Format for Artifacts

Credential artifacts are stored as JSON-encoded binary data with Base64 encoding for safe transmission:

# Artifact storage with binary conversion
def add_public_key_material(target: Dict[str, Any], public_key: Any) -> None:
    if isinstance(public_key, dict):
        cose_map = dict(public_key)
        target['publicKeyCose'] = convert_bytes_for_json(cose_map)
        raw_key = cose_map.get(-1)
        if isinstance(raw_key, (bytes, bytearray, memoryview)):
            target['publicKeyBytes'] = convert_bytes_for_json(raw_key)

Section sources

Core Storage Components

The storage system consists of several specialized components, each handling specific types of data:

Credential Storage (storage.py)

The main credential storage module provides functions for managing WebAuthn credential data:

classDiagram
class CredentialStorage {
+savekey(name, key, session_id) void
+readkey(name, session_id) List[Any]
+delkey(name, session_id) void
+list_credentials(session_id) Dict[str, List[Any]]
+iter_credentials(session_id) Iterator[Tuple[str, List[Any]]]
+extract_credential_data(cred) Any
}
class StorageHelpers {
+_local_filename(name, session_id, create) str
+_credential_blob(name, session_id) str
+_candidate_gcs_blob_names(name, session_id) Iterable[str]
+_resolve_session_id(session_id) str
}
CredentialStorage --> StorageHelpers : uses

Diagram sources

Key functions include:

  • savekey(): Persists credential data with atomic file operations
  • readkey(): Retrieves credential data with fallback to legacy storage
  • delkey(): Removes credential data with cleanup support
  • list_credentials(): Lists all credentials for a session
  • iter_credentials(): Iterates through all credentials efficiently

Session Metadata Store (session_metadata_store.py)

Manages session-specific metadata with automatic cleanup and activity tracking:

sequenceDiagram
participant App as Application
participant Store as SessionStore
participant FS as FileSystem
participant Cleanup as CleanupService
App->>Store : ensure_session(session_id)
Store->>FS : create_directory()
Store->>Cleanup : schedule_cleanup()
App->>Store : touch_last_access(session_id)
Store->>FS : update_marker_file()
Cleanup->>Store : periodic_cleanup()
Store->>FS : scan_inactive_sessions()
Store->>FS : remove_old_directories()

Diagram sources

Credential Artifacts (credential_artifacts.py)

Handles advanced credential artifacts with cloud storage integration:

flowchart TD
A[Store Artifact Request] --> B{Using GCS?}
B --> |Yes| C[Upload to GCS]
B --> |No| D[Save to Local FS]
C --> E[Generate Blob Name]
D --> F[Generate File Path]
E --> G[Upload Bytes]
F --> H[Write JSON File]
G --> I[Artifact Stored]
H --> I

Diagram sources

Section sources

Atomic File Operations

The storage system implements atomic file operations to prevent data corruption and ensure consistency during write operations:

Temporary File Pattern

The system uses a temporary file pattern similar to the Flask secret key generation:

# Secure atomic file creation
fd, temp_path = tempfile.mkstemp(prefix="session-secret.", dir=os.path.dirname(default_path))
with os.fdopen(fd, "wb") as target:
    target.write(secret)
    target.flush()
    os.fsync(target.fileno())  # Ensure data is written to disk
os.replace(temp_path, default_path)  # Atomic rename operation

File Locking and Consistency

For session metadata, the system implements implicit locking through directory operations:

# Directory-based synchronization
def _local_touch_last_access(directory: str) -> None:
    marker_path = os.path.join(directory, _LAST_ACCESS_BLOB)
    try:
        os.makedirs(os.path.dirname(marker_path), exist_ok=True)
        with open(marker_path, "a", encoding="utf-8"):
            os.utime(marker_path, None)  # Update timestamp atomically
    except OSError:
        pass

Transaction Safety

The system ensures transaction safety through several mechanisms:

  1. Atomic Writes: Uses os.replace() for atomic file replacement
  2. Temporary Files: Creates temporary files before finalizing writes
  3. Directory Operations: Leverages directory operations for synchronization
  4. Error Recovery: Implements rollback mechanisms for failed operations

Section sources

Error Handling

The storage system implements comprehensive error handling to ensure robust operation in various failure scenarios:

File I/O Error Handling

def _read_file(path: str) -> Optional[Dict[str, Any]]:
    try:
        with open(path, "r", encoding="utf-8") as handle:
            return json.load(handle)
    except FileNotFoundError:
        return None
    except json.JSONDecodeError:
        return None

Storage Backend Failures

The system gracefully handles failures in both local and cloud storage backends:

def readkey(name: str, *, session_id: Optional[str] = None) -> List[Any]:
    try:
        try:
            with open(_local_filename(name, resolved_session), "rb") as f:
                payload = f.read()
        except FileNotFoundError:
            with open(_legacy_local_filename(name), "rb") as f:
                payload = f.read()
    except Exception:
        return []
    
    try:
        creds = pickle.loads(payload)
    except Exception:
        return []
    return creds if isinstance(creds, list) else []

Network and Cloud Storage Errors

Cloud storage operations include retry mechanisms and graceful degradation:

def _with_retry(operation: Callable[[], _T], *, max_attempts: int = 3) -> _T:
    last_error: Optional[Exception] = None
    for attempt in range(1, max_attempts + 1):
        try:
            return operation()
        except _RETRYABLE_EXCEPTIONS as exc:
            last_error = exc
            if attempt >= max_attempts:
                break
            delay = base_delay * (2 ** (attempt - 1))
            time.sleep(delay)
    if last_error is not None:
        raise last_error

Permission and Disk Space Handling

The system includes specific handling for common failure modes:

Error Type Handling Strategy Recovery Action
Permission Denied Log warning, continue Use alternative location
Disk Full Graceful degradation Free space, retry
File Locked Retry with exponential backoff Wait and retry
Network Timeout Retry with backoff Fall back to cached data

Section sources

Integration Patterns

The storage system integrates seamlessly with the Flask application and WebAuthn flows:

Flask Application Integration

# Application startup and configuration
from .config import app, basepath
from .storage import savekey, readkey, delkey
from .session_metadata_store import ensure_session, touch_last_access

@app.before_request
def before_request():
    # Ensure session exists and update activity
    session_id = get_session_id()
    ensure_session(session_id)
    touch_last_access(session_id)

@app.route('/register', methods=['POST'])
def register_credential():
    # Store new credential
    savekey(username, credential_data, session_id=session_id)
    return jsonify(success=True)

@app.route('/authenticate', methods=['POST'])
def authenticate():
    # Retrieve stored credentials
    stored_credentials = readkey(username, session_id=session_id)
    # Perform authentication...

WebAuthn Registration Flow

sequenceDiagram
participant Client as WebAuthn Client
participant Server as Flask Server
participant Storage as Local Storage
participant Cleanup as Cleanup Service
Client->>Server : Register Request
Server->>Server : Generate Challenge
Server->>Client : Send Challenge
Client->>Server : Registration Response
Server->>Storage : savekey(username, credential_data)
Storage->>Storage : Write atomic file
Server->>Cleanup : Schedule cleanup
Server->>Client : Registration Success

Diagram sources

Authentication Flow Integration

The authentication flow retrieves stored credentials and validates against them:

def authenticate_credential(username: str, challenge: bytes, response: dict) -> bool:
    # Retrieve stored credentials
    stored_credentials = readkey(username, session_id=session_id)
    
    if not stored_credentials:
        return False
    
    # Validate against stored credentials
    for cred in stored_credentials:
        if validate_credential(cred, challenge, response):
            return True
    
    return False

Section sources

Production Considerations

While the local storage implementation is suitable for development and testing, several limitations make it less appropriate for production deployments:

Scalability Limitations

Aspect Current Implementation Production Requirement
Concurrent Access Single-process file locking Distributed coordination
Data Size Unlimited local storage Managed storage quotas
Backup Strategy Manual backup required Automated replication
Monitoring Basic logging Comprehensive metrics

Security Considerations

The local storage system has several security characteristics:

Advantages:

  • Local encryption of sensitive data
  • File system permissions control access
  • Atomic operations prevent partial writes
  • Isolation between sessions

Limitations:

  • No built-in encryption at rest
  • File system vulnerabilities
  • Limited audit logging
  • No centralized access control

Alternative Storage Options

For production deployments, consider these alternatives:

  1. Google Cloud Storage: Enterprise-grade reliability and scalability
  2. PostgreSQL/MySQL: Structured data with ACID transactions
  3. Redis/Memcached: High-performance caching with persistence
  4. Distributed File Systems: Shared storage across multiple nodes

Migration Strategies

When migrating from local to cloud storage:

# Migration detection and execution
def migrate_to_cloud_storage():
    if not gcs_enabled():
        return
    
    # Export local data
    local_data = list_credentials()
    
    # Upload to cloud storage
    for username, credentials in local_data.items():
        blob_name = build_blob_name(f"{username}_credentials", prefix="migrated/")
        upload_bytes(blob_name, pickle.dumps(credentials))
    
    # Verify migration
    cloud_data = list_credentials()
    if len(local_data) == len(cloud_data):
        # Clean up local data
        cleanup_local_storage()

Section sources

Troubleshooting Guide

Common issues and their solutions when working with the local storage system:

Permission Errors

Symptoms:

  • "Permission denied" errors during file operations
  • Unable to create directories or write files

Causes:

  • Insufficient file system permissions
  • Readonly file system mount
  • SELinux/AppArmor restrictions

Solutions:

# Check and fix permissions
chmod 755 /path/to/storage/directory
chown www-data:www-data /path/to/storage/directory

# Verify directory structure
ls -la /path/to/storage/directory
mkdir -p /path/to/storage/directory/subdir

Disk Space Exhaustion

Symptoms:

  • "No space left on device" errors
  • Slow file operations
  • Application hangs during writes

Monitoring:

# Check disk usage
df -h /path/to/storage/directory

# Monitor growth
du -sh /path/to/storage/directory/* | sort -hr

# Set up alerts
watch -n 60 'df -h /path/to/storage/directory'

Solutions:

  1. Implement automatic cleanup policies
  2. Configure log rotation
  3. Set up monitoring and alerting
  4. Add storage quota enforcement

File Locking Conflicts

Symptoms:

  • "Resource temporarily unavailable" errors
  • Deadlock situations
  • Intermittent failures

Investigation:

# Check for locked files
lsof | grep /path/to/storage

# Monitor file system events
inotifywait -r /path/to/storage/directory

# Check for zombie processes
ps aux | grep python

Prevention:

  1. Implement proper resource cleanup
  2. Use timeout mechanisms
  3. Add retry logic with exponential backoff
  4. Monitor for long-running operations

Data Corruption Issues

Symptoms:

  • JSON parsing errors
  • Pickle loading failures
  • Inconsistent data

Diagnosis:

# Verify file integrity
import json
try:
    with open(filepath, 'r') as f:
        data = json.load(f)
except json.JSONDecodeError as e:
    print(f"Corrupted JSON: {e}")

# Check file sizes
import os
print(f"File size: {os.path.getsize(filepath)} bytes")

Recovery:

  1. Implement checksum verification
  2. Maintain backup copies
  3. Add data validation
  4. Use transaction logs

Performance Issues

Symptoms:

  • Slow credential lookups
  • High CPU usage during cleanup
  • Memory leaks

Optimization:

# Optimize file operations
import concurrent.futures
from functools import lru_cache

@lru_cache(maxsize=128)
def get_cached_credentials(username: str):
    return readkey(username)

# Parallel processing for bulk operations
def process_bulk_operations(operations):
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = [executor.submit(op) for op in operations]
        return [future.result() for future in futures]

Section sources

Best Practices

To maximize the effectiveness and reliability of the local storage implementation:

Configuration Management

  1. Environment Variables: Use environment variables for all configuration
  2. Default Values: Provide sensible defaults for development
  3. Validation: Validate configuration at startup
  4. Documentation: Document all configuration options

Data Organization

  1. Logical Separation: Keep different data types in separate directories
  2. Naming Conventions: Follow consistent naming patterns
  3. Size Limits: Implement reasonable limits on file sizes
  4. Cleanup Policies: Establish automated cleanup schedules

Error Handling

  1. Graceful Degradation: Always provide fallback mechanisms
  2. Logging: Implement comprehensive logging
  3. Monitoring: Set up monitoring and alerting
  4. Testing: Test error conditions thoroughly

Performance Optimization

  1. Caching: Implement appropriate caching strategies
  2. Batch Operations: Group related operations
  3. Async Processing: Use asynchronous operations where possible
  4. Resource Limits: Set appropriate resource limits

Security Considerations

  1. File Permissions: Set restrictive file permissions
  2. Encryption: Consider encrypting sensitive data
  3. Access Control: Implement proper access controls
  4. Audit Logging: Maintain audit trails

Testing Strategies

  1. Unit Tests: Test individual components
  2. Integration Tests: Test end-to-end flows
  3. Error Scenarios: Test failure conditions
  4. Performance Tests: Test under load

Section sources