Baselining ‐ ALS‐based baselining - yadijustforfun/SP-transition GitHub Wiki

ALS-Based Baselining: Overview

ALS (Alternating Least Squares) is a matrix factorization technique often used for anomaly detection and baselining in time series or high-dimensional data. It decomposes data into low-rank components to separate "normal" patterns (baseline) from anomalies or noise.

How ALS Baselining Works

Matrix Representation:
- Organize time series data into a matrix (e.g., rows = time points, columns = sensors/features).
- Example: A temperature sensor network with 100 sensors over 1,000 timesteps → 1000×100 matrix.
Low-Rank Factorization:
- ALS approximates the matrix as a product of two lower-rank matrices:
  [ X \approx U \cdot V^T ]
  - (U): Latent features for time points.
  - (V): Latent features for sensors.
Baselining:
- The product (U \cdot V^T) represents the "expected" baseline.
- Residuals ((X - U \cdot V^T)) highlight anomalies (large deviations).
Alternating Optimization:
- Fix (U), solve for (V); then fix (V), solve for (U) (iteratively).

When to Use ALS Baselining

Multi-Sensor/Feature Data:
- Ideal for correlated high-dimensional data (e.g., IoT sensor networks, industrial equipment monitoring).
Missing Data Handling:
- ALS works well with missing values (common in real-world sensor data).
Anomaly Detection:
- Separates baseline (normal operation) from outliers (e.g., machine failures).
Smooth Trends & Seasonality:
- Captures underlying patterns without rigid assumptions (unlike STL or Prophet).
Collaborative Filtering:
- Adapted from recommendation systems (e.g., user-item matrices).

When NOT to Use ALS Baselining

Univariate Time Series:
- Overkill for single signals; use STL or classical decomposition instead.
Non-Linear Patterns:
- ALS assumes linear relationships; use autoencoders or kernel methods for non-linearity.
Real-Time Applications:
- ALS is batch-oriented; for streaming data, use incremental PCA or online robust PCA.
Small Datasets:
- Requires sufficient data for stable factorization (e.g., at least hundreds of timesteps).
Interpretability Needs:
- Latent factors ((U), (V)) are hard to interpret compared to explicit trend/seasonality.

Example: ALS Baselining for Anomaly Detection

import numpy as np
from sklearn.decomposition import NMF  # ALS-like (Non-Negative Matrix Factorization)
import matplotlib.pyplot as plt

# Synthetic data: 10 sensors with correlated trends + anomalies
n_time, n_sensors = 1000, 10
X = np.random.randn(n_time, n_sensors) * 0.1
X += np.sin(np.linspace(0, 20, n_time))[:, None]  # Common trend
X[500:510, 2] = 10  # Inject anomaly (sensor 2 spikes at t=500-510)

# ALS-like baselining (using NMF for simplicity)
model = NMF(n_components=2)  # Low-rank factors
U = model.fit_transform(X)
V = model.components_
X_baseline = U @ V  # Reconstructed baseline

# Anomalies = residuals
residuals = np.sum((X - X_baseline)**2, axis=1)
plt.plot(residuals)
plt.axhline(y=np.percentile(residuals, 95), color='r', linestyle='--')
plt.title("Anomaly Scores (Residuals from ALS Baselining)")
plt.show()

Output:

Peaks in residuals correspond to anomalies (e.g., spike at t=500).

Comparison to Other Methods

Method	Best For	Limitations
ALS	Multi-sensor baselining, missing data	Linear assumptions, batch-only
STL	Univariate trend/seasonality	No multivariate support
Robust PCA	Anomaly detection in noisy data	Computationally heavy
Autoencoders	Non-linear patterns	Needs large data, GPU resources

Key Takeaways

Use ALS Baselining when:
- You have correlated multivariate data.
- Need to handle missing values or smooth trends.
Avoid ALS for:
- Simple univariate series or non-linear patterns.
- Real-time applications or small datasets.

Alternatives:

For univariate data: STL, Prophet.
For non-linear data: Autoencoders, Gaussian Processes.

Would you like a case study on industrial sensor data?