Local Storage Implementation - FeitianTech/postquantum-webauthn-platform GitHub Wiki
Local Storage Implementation
Table of Contents
- Introduction
- Architecture Overview
- Directory Structure
- File Naming Conventions
- Serialization Formats
- Core Storage Components
- Atomic File Operations
- Error Handling
- Integration Patterns
- Production Considerations
- Troubleshooting Guide
- Best Practices
Introduction
The local storage implementation in the Post-Quantum WebAuthn Platform provides a robust file-based persistence layer for storing WebAuthn credentials, session metadata, and configuration data. This system serves as the primary storage mechanism for the demo server, offering both local filesystem storage and cloud storage capabilities through a unified interface.
The storage system is designed with several key principles:
- Pluggable Architecture: Supports both local filesystem and Google Cloud Storage backends
- Atomic Operations: Ensures data consistency through atomic file operations
- Legacy Compatibility: Maintains backward compatibility with older storage formats
- Error Resilience: Provides comprehensive error handling and fallback mechanisms
- Performance Optimization: Implements efficient caching and cleanup strategies
Architecture Overview
The storage system follows a layered architecture that separates concerns between different types of data and storage backends:
graph TB
subgraph "Application Layer"
A[WebAuthn Routes]
B[Session Management]
C[Credential Artifacts]
end
subgraph "Storage Abstraction Layer"
D[Storage Factory]
E[Cloud Storage]
F[Local Storage]
end
subgraph "Data Types"
G[Credential Data]
H[Session Metadata]
I[Credential Artifacts]
end
subgraph "File System"
J[Local Directory Structure]
K[JSON Files]
L[Pickle Files]
end
A --> D
B --> D
C --> D
D --> E
D --> F
E --> J
F --> J
G --> K
H --> L
I --> K
Diagram sources
Section sources
Directory Structure
The local storage implementation organizes data into a hierarchical directory structure that provides logical separation and efficient access patterns:
Base Directory Structure
server/
├── static/
│ ├── session-metadata/ # Session metadata storage
│ │ ├── session-id-1/
│ │ │ ├── metadata-file-1.json
│ │ │ ├── metadata-file-2.json
│ │ │ └── .last-access # Access timestamp marker
│ │ ├── session-id-2/
│ │ │ ├── ...
│ │ └── ...
│ ├── credential-artifacts/ # Advanced credential artifacts
│ │ ├── sha256-hash-1.json
│ │ ├── sha256-hash-2.json
│ │ └── ...
│ └── ...
└── session-credentials/ # Legacy credential storage
├── session-id-1/
│ ├── username1_credential_data.pkl
│ ├── username2_credential_data.pkl
│ └── ...
└── session-id-2/
├── ...
└── ...
Directory Configuration
The system uses configurable directory paths defined in the configuration module:
| Directory | Purpose | Environment Variable |
|---|---|---|
SESSION_METADATA_DIR |
Session metadata storage | FIDO_SERVER_SESSION_METADATA_DIR |
_LOCAL_CREDENTIAL_BASE |
Credential data storage | FIDO_SERVER_GCS_USER_CREDENTIAL_SUBDIR |
_ARTIFACT_DIR |
Credential artifacts | FIDO_SERVER_GCS_USER_ARTIFACT_SUBDIR |
Section sources
File Naming Conventions
The storage system employs consistent naming conventions to ensure predictable file locations and easy identification:
Credential Data Files
Credential data is stored using a standardized naming pattern that combines the username with a fixed suffix:
{username}_{session_id}_credential_data.pkl
For legacy compatibility, the system also supports the older format:
{username}_credential_data.pkl
Session Metadata Files
Session metadata files use a structured naming convention with timestamps and extensions:
{uuid}.{extension}
Where {extension} could be .json, .info.json, or other metadata-related extensions.
Credential Artifacts
Advanced credential artifacts use SHA-256 hash-based naming for uniqueness and security:
{sha256-hash}.json
The hash is computed from the storage ID to ensure collision resistance and secure file location.
Section sources
Serialization Formats
The storage system supports multiple serialization formats depending on the type of data being stored:
Pickle Format for Credentials
Credential data is serialized using Python's pickle format for maximum compatibility with WebAuthn objects:
# Writing credential data
payload = pickle.dumps(key)
with open(path, "wb") as f:
f.write(payload)
# Reading credential data
try:
creds = pickle.loads(payload)
except Exception:
return []
JSON Format for Metadata
Session metadata and configuration data use JSON format for human readability and cross-platform compatibility:
# JSON serialization with byte conversion
def convert_bytes_for_json(obj: Any) -> Any:
if isinstance(obj, (bytes, bytearray, memoryview)):
return base64.b64encode(bytes(obj)).decode('utf-8')
if isinstance(obj, dict):
return {k: convert_bytes_for_json(v) for k, v in obj.items()}
return obj
Binary Format for Artifacts
Credential artifacts are stored as JSON-encoded binary data with Base64 encoding for safe transmission:
# Artifact storage with binary conversion
def add_public_key_material(target: Dict[str, Any], public_key: Any) -> None:
if isinstance(public_key, dict):
cose_map = dict(public_key)
target['publicKeyCose'] = convert_bytes_for_json(cose_map)
raw_key = cose_map.get(-1)
if isinstance(raw_key, (bytes, bytearray, memoryview)):
target['publicKeyBytes'] = convert_bytes_for_json(raw_key)
Section sources
Core Storage Components
The storage system consists of several specialized components, each handling specific types of data:
Credential Storage (storage.py)
The main credential storage module provides functions for managing WebAuthn credential data:
classDiagram
class CredentialStorage {
+savekey(name, key, session_id) void
+readkey(name, session_id) List[Any]
+delkey(name, session_id) void
+list_credentials(session_id) Dict[str, List[Any]]
+iter_credentials(session_id) Iterator[Tuple[str, List[Any]]]
+extract_credential_data(cred) Any
}
class StorageHelpers {
+_local_filename(name, session_id, create) str
+_credential_blob(name, session_id) str
+_candidate_gcs_blob_names(name, session_id) Iterable[str]
+_resolve_session_id(session_id) str
}
CredentialStorage --> StorageHelpers : uses
Diagram sources
Key functions include:
- savekey(): Persists credential data with atomic file operations
- readkey(): Retrieves credential data with fallback to legacy storage
- delkey(): Removes credential data with cleanup support
- list_credentials(): Lists all credentials for a session
- iter_credentials(): Iterates through all credentials efficiently
Session Metadata Store (session_metadata_store.py)
Manages session-specific metadata with automatic cleanup and activity tracking:
sequenceDiagram
participant App as Application
participant Store as SessionStore
participant FS as FileSystem
participant Cleanup as CleanupService
App->>Store : ensure_session(session_id)
Store->>FS : create_directory()
Store->>Cleanup : schedule_cleanup()
App->>Store : touch_last_access(session_id)
Store->>FS : update_marker_file()
Cleanup->>Store : periodic_cleanup()
Store->>FS : scan_inactive_sessions()
Store->>FS : remove_old_directories()
Diagram sources
Credential Artifacts (credential_artifacts.py)
Handles advanced credential artifacts with cloud storage integration:
flowchart TD
A[Store Artifact Request] --> B{Using GCS?}
B --> |Yes| C[Upload to GCS]
B --> |No| D[Save to Local FS]
C --> E[Generate Blob Name]
D --> F[Generate File Path]
E --> G[Upload Bytes]
F --> H[Write JSON File]
G --> I[Artifact Stored]
H --> I
Diagram sources
Section sources
Atomic File Operations
The storage system implements atomic file operations to prevent data corruption and ensure consistency during write operations:
Temporary File Pattern
The system uses a temporary file pattern similar to the Flask secret key generation:
# Secure atomic file creation
fd, temp_path = tempfile.mkstemp(prefix="session-secret.", dir=os.path.dirname(default_path))
with os.fdopen(fd, "wb") as target:
target.write(secret)
target.flush()
os.fsync(target.fileno()) # Ensure data is written to disk
os.replace(temp_path, default_path) # Atomic rename operation
File Locking and Consistency
For session metadata, the system implements implicit locking through directory operations:
# Directory-based synchronization
def _local_touch_last_access(directory: str) -> None:
marker_path = os.path.join(directory, _LAST_ACCESS_BLOB)
try:
os.makedirs(os.path.dirname(marker_path), exist_ok=True)
with open(marker_path, "a", encoding="utf-8"):
os.utime(marker_path, None) # Update timestamp atomically
except OSError:
pass
Transaction Safety
The system ensures transaction safety through several mechanisms:
- Atomic Writes: Uses
os.replace()for atomic file replacement - Temporary Files: Creates temporary files before finalizing writes
- Directory Operations: Leverages directory operations for synchronization
- Error Recovery: Implements rollback mechanisms for failed operations
Section sources
Error Handling
The storage system implements comprehensive error handling to ensure robust operation in various failure scenarios:
File I/O Error Handling
def _read_file(path: str) -> Optional[Dict[str, Any]]:
try:
with open(path, "r", encoding="utf-8") as handle:
return json.load(handle)
except FileNotFoundError:
return None
except json.JSONDecodeError:
return None
Storage Backend Failures
The system gracefully handles failures in both local and cloud storage backends:
def readkey(name: str, *, session_id: Optional[str] = None) -> List[Any]:
try:
try:
with open(_local_filename(name, resolved_session), "rb") as f:
payload = f.read()
except FileNotFoundError:
with open(_legacy_local_filename(name), "rb") as f:
payload = f.read()
except Exception:
return []
try:
creds = pickle.loads(payload)
except Exception:
return []
return creds if isinstance(creds, list) else []
Network and Cloud Storage Errors
Cloud storage operations include retry mechanisms and graceful degradation:
def _with_retry(operation: Callable[[], _T], *, max_attempts: int = 3) -> _T:
last_error: Optional[Exception] = None
for attempt in range(1, max_attempts + 1):
try:
return operation()
except _RETRYABLE_EXCEPTIONS as exc:
last_error = exc
if attempt >= max_attempts:
break
delay = base_delay * (2 ** (attempt - 1))
time.sleep(delay)
if last_error is not None:
raise last_error
Permission and Disk Space Handling
The system includes specific handling for common failure modes:
| Error Type | Handling Strategy | Recovery Action |
|---|---|---|
| Permission Denied | Log warning, continue | Use alternative location |
| Disk Full | Graceful degradation | Free space, retry |
| File Locked | Retry with exponential backoff | Wait and retry |
| Network Timeout | Retry with backoff | Fall back to cached data |
Section sources
Integration Patterns
The storage system integrates seamlessly with the Flask application and WebAuthn flows:
Flask Application Integration
# Application startup and configuration
from .config import app, basepath
from .storage import savekey, readkey, delkey
from .session_metadata_store import ensure_session, touch_last_access
@app.before_request
def before_request():
# Ensure session exists and update activity
session_id = get_session_id()
ensure_session(session_id)
touch_last_access(session_id)
@app.route('/register', methods=['POST'])
def register_credential():
# Store new credential
savekey(username, credential_data, session_id=session_id)
return jsonify(success=True)
@app.route('/authenticate', methods=['POST'])
def authenticate():
# Retrieve stored credentials
stored_credentials = readkey(username, session_id=session_id)
# Perform authentication...
WebAuthn Registration Flow
sequenceDiagram
participant Client as WebAuthn Client
participant Server as Flask Server
participant Storage as Local Storage
participant Cleanup as Cleanup Service
Client->>Server : Register Request
Server->>Server : Generate Challenge
Server->>Client : Send Challenge
Client->>Server : Registration Response
Server->>Storage : savekey(username, credential_data)
Storage->>Storage : Write atomic file
Server->>Cleanup : Schedule cleanup
Server->>Client : Registration Success
Diagram sources
Authentication Flow Integration
The authentication flow retrieves stored credentials and validates against them:
def authenticate_credential(username: str, challenge: bytes, response: dict) -> bool:
# Retrieve stored credentials
stored_credentials = readkey(username, session_id=session_id)
if not stored_credentials:
return False
# Validate against stored credentials
for cred in stored_credentials:
if validate_credential(cred, challenge, response):
return True
return False
Section sources
Production Considerations
While the local storage implementation is suitable for development and testing, several limitations make it less appropriate for production deployments:
Scalability Limitations
| Aspect | Current Implementation | Production Requirement |
|---|---|---|
| Concurrent Access | Single-process file locking | Distributed coordination |
| Data Size | Unlimited local storage | Managed storage quotas |
| Backup Strategy | Manual backup required | Automated replication |
| Monitoring | Basic logging | Comprehensive metrics |
Security Considerations
The local storage system has several security characteristics:
Advantages:
- Local encryption of sensitive data
- File system permissions control access
- Atomic operations prevent partial writes
- Isolation between sessions
Limitations:
- No built-in encryption at rest
- File system vulnerabilities
- Limited audit logging
- No centralized access control
Alternative Storage Options
For production deployments, consider these alternatives:
- Google Cloud Storage: Enterprise-grade reliability and scalability
- PostgreSQL/MySQL: Structured data with ACID transactions
- Redis/Memcached: High-performance caching with persistence
- Distributed File Systems: Shared storage across multiple nodes
Migration Strategies
When migrating from local to cloud storage:
# Migration detection and execution
def migrate_to_cloud_storage():
if not gcs_enabled():
return
# Export local data
local_data = list_credentials()
# Upload to cloud storage
for username, credentials in local_data.items():
blob_name = build_blob_name(f"{username}_credentials", prefix="migrated/")
upload_bytes(blob_name, pickle.dumps(credentials))
# Verify migration
cloud_data = list_credentials()
if len(local_data) == len(cloud_data):
# Clean up local data
cleanup_local_storage()
Section sources
Troubleshooting Guide
Common issues and their solutions when working with the local storage system:
Permission Errors
Symptoms:
- "Permission denied" errors during file operations
- Unable to create directories or write files
Causes:
- Insufficient file system permissions
- Readonly file system mount
- SELinux/AppArmor restrictions
Solutions:
# Check and fix permissions
chmod 755 /path/to/storage/directory
chown www-data:www-data /path/to/storage/directory
# Verify directory structure
ls -la /path/to/storage/directory
mkdir -p /path/to/storage/directory/subdir
Disk Space Exhaustion
Symptoms:
- "No space left on device" errors
- Slow file operations
- Application hangs during writes
Monitoring:
# Check disk usage
df -h /path/to/storage/directory
# Monitor growth
du -sh /path/to/storage/directory/* | sort -hr
# Set up alerts
watch -n 60 'df -h /path/to/storage/directory'
Solutions:
- Implement automatic cleanup policies
- Configure log rotation
- Set up monitoring and alerting
- Add storage quota enforcement
File Locking Conflicts
Symptoms:
- "Resource temporarily unavailable" errors
- Deadlock situations
- Intermittent failures
Investigation:
# Check for locked files
lsof | grep /path/to/storage
# Monitor file system events
inotifywait -r /path/to/storage/directory
# Check for zombie processes
ps aux | grep python
Prevention:
- Implement proper resource cleanup
- Use timeout mechanisms
- Add retry logic with exponential backoff
- Monitor for long-running operations
Data Corruption Issues
Symptoms:
- JSON parsing errors
- Pickle loading failures
- Inconsistent data
Diagnosis:
# Verify file integrity
import json
try:
with open(filepath, 'r') as f:
data = json.load(f)
except json.JSONDecodeError as e:
print(f"Corrupted JSON: {e}")
# Check file sizes
import os
print(f"File size: {os.path.getsize(filepath)} bytes")
Recovery:
- Implement checksum verification
- Maintain backup copies
- Add data validation
- Use transaction logs
Performance Issues
Symptoms:
- Slow credential lookups
- High CPU usage during cleanup
- Memory leaks
Optimization:
# Optimize file operations
import concurrent.futures
from functools import lru_cache
@lru_cache(maxsize=128)
def get_cached_credentials(username: str):
return readkey(username)
# Parallel processing for bulk operations
def process_bulk_operations(operations):
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = [executor.submit(op) for op in operations]
return [future.result() for future in futures]
Section sources
Best Practices
To maximize the effectiveness and reliability of the local storage implementation:
Configuration Management
- Environment Variables: Use environment variables for all configuration
- Default Values: Provide sensible defaults for development
- Validation: Validate configuration at startup
- Documentation: Document all configuration options
Data Organization
- Logical Separation: Keep different data types in separate directories
- Naming Conventions: Follow consistent naming patterns
- Size Limits: Implement reasonable limits on file sizes
- Cleanup Policies: Establish automated cleanup schedules
Error Handling
- Graceful Degradation: Always provide fallback mechanisms
- Logging: Implement comprehensive logging
- Monitoring: Set up monitoring and alerting
- Testing: Test error conditions thoroughly
Performance Optimization
- Caching: Implement appropriate caching strategies
- Batch Operations: Group related operations
- Async Processing: Use asynchronous operations where possible
- Resource Limits: Set appropriate resource limits
Security Considerations
- File Permissions: Set restrictive file permissions
- Encryption: Consider encrypting sensitive data
- Access Control: Implement proper access controls
- Audit Logging: Maintain audit trails
Testing Strategies
- Unit Tests: Test individual components
- Integration Tests: Test end-to-end flows
- Error Scenarios: Test failure conditions
- Performance Tests: Test under load
Section sources