Baselining ‐ ALS‐based baselining - yadijustforfun/SP-transition GitHub Wiki
ALS-Based Baselining: Overview
ALS (Alternating Least Squares) is a matrix factorization technique often used for anomaly detection and baselining in time series or high-dimensional data. It decomposes data into low-rank components to separate "normal" patterns (baseline) from anomalies or noise.
How ALS Baselining Works
-
Matrix Representation:
- Organize time series data into a matrix (e.g., rows = time points, columns = sensors/features).
- Example: A temperature sensor network with 100 sensors over 1,000 timesteps → 1000×100 matrix.
-
Low-Rank Factorization:
- ALS approximates the matrix as a product of two lower-rank matrices:
[ X \approx U \cdot V^T ]- (U): Latent features for time points.
- (V): Latent features for sensors.
- ALS approximates the matrix as a product of two lower-rank matrices:
-
Baselining:
- The product (U \cdot V^T) represents the "expected" baseline.
- Residuals ((X - U \cdot V^T)) highlight anomalies (large deviations).
-
Alternating Optimization:
- Fix (U), solve for (V); then fix (V), solve for (U) (iteratively).
When to Use ALS Baselining
-
Multi-Sensor/Feature Data:
- Ideal for correlated high-dimensional data (e.g., IoT sensor networks, industrial equipment monitoring).
-
Missing Data Handling:
- ALS works well with missing values (common in real-world sensor data).
-
Anomaly Detection:
- Separates baseline (normal operation) from outliers (e.g., machine failures).
-
Smooth Trends & Seasonality:
- Captures underlying patterns without rigid assumptions (unlike STL or Prophet).
-
Collaborative Filtering:
- Adapted from recommendation systems (e.g., user-item matrices).
When NOT to Use ALS Baselining
-
Univariate Time Series:
- Overkill for single signals; use STL or classical decomposition instead.
-
Non-Linear Patterns:
- ALS assumes linear relationships; use autoencoders or kernel methods for non-linearity.
-
Real-Time Applications:
- ALS is batch-oriented; for streaming data, use incremental PCA or online robust PCA.
-
Small Datasets:
- Requires sufficient data for stable factorization (e.g., at least hundreds of timesteps).
-
Interpretability Needs:
- Latent factors ((U), (V)) are hard to interpret compared to explicit trend/seasonality.
Example: ALS Baselining for Anomaly Detection
import numpy as np
from sklearn.decomposition import NMF # ALS-like (Non-Negative Matrix Factorization)
import matplotlib.pyplot as plt
# Synthetic data: 10 sensors with correlated trends + anomalies
n_time, n_sensors = 1000, 10
X = np.random.randn(n_time, n_sensors) * 0.1
X += np.sin(np.linspace(0, 20, n_time))[:, None] # Common trend
X[500:510, 2] = 10 # Inject anomaly (sensor 2 spikes at t=500-510)
# ALS-like baselining (using NMF for simplicity)
model = NMF(n_components=2) # Low-rank factors
U = model.fit_transform(X)
V = model.components_
X_baseline = U @ V # Reconstructed baseline
# Anomalies = residuals
residuals = np.sum((X - X_baseline)**2, axis=1)
plt.plot(residuals)
plt.axhline(y=np.percentile(residuals, 95), color='r', linestyle='--')
plt.title("Anomaly Scores (Residuals from ALS Baselining)")
plt.show()
Output:
- Peaks in residuals correspond to anomalies (e.g., spike at t=500).
Comparison to Other Methods
Method | Best For | Limitations |
---|---|---|
ALS | Multi-sensor baselining, missing data | Linear assumptions, batch-only |
STL | Univariate trend/seasonality | No multivariate support |
Robust PCA | Anomaly detection in noisy data | Computationally heavy |
Autoencoders | Non-linear patterns | Needs large data, GPU resources |
Key Takeaways
- Use ALS Baselining when:
- You have correlated multivariate data.
- Need to handle missing values or smooth trends.
- Avoid ALS for:
- Simple univariate series or non-linear patterns.
- Real-time applications or small datasets.
Alternatives:
- For univariate data: STL, Prophet.
- For non-linear data: Autoencoders, Gaussian Processes.
Would you like a case study on industrial sensor data?