Baselining ‐ ALS‐based baselining - yadijustforfun/SP-transition GitHub Wiki

ALS-Based Baselining: Overview

ALS (Alternating Least Squares) is a matrix factorization technique often used for anomaly detection and baselining in time series or high-dimensional data. It decomposes data into low-rank components to separate "normal" patterns (baseline) from anomalies or noise.


How ALS Baselining Works

  1. Matrix Representation:

    • Organize time series data into a matrix (e.g., rows = time points, columns = sensors/features).
    • Example: A temperature sensor network with 100 sensors over 1,000 timesteps → 1000×100 matrix.
  2. Low-Rank Factorization:

    • ALS approximates the matrix as a product of two lower-rank matrices:
      [ X \approx U \cdot V^T ]
      • (U): Latent features for time points.
      • (V): Latent features for sensors.
  3. Baselining:

    • The product (U \cdot V^T) represents the "expected" baseline.
    • Residuals ((X - U \cdot V^T)) highlight anomalies (large deviations).
  4. Alternating Optimization:

    • Fix (U), solve for (V); then fix (V), solve for (U) (iteratively).

When to Use ALS Baselining

  1. Multi-Sensor/Feature Data:

    • Ideal for correlated high-dimensional data (e.g., IoT sensor networks, industrial equipment monitoring).
  2. Missing Data Handling:

    • ALS works well with missing values (common in real-world sensor data).
  3. Anomaly Detection:

    • Separates baseline (normal operation) from outliers (e.g., machine failures).
  4. Smooth Trends & Seasonality:

    • Captures underlying patterns without rigid assumptions (unlike STL or Prophet).
  5. Collaborative Filtering:

    • Adapted from recommendation systems (e.g., user-item matrices).

When NOT to Use ALS Baselining

  1. Univariate Time Series:

    • Overkill for single signals; use STL or classical decomposition instead.
  2. Non-Linear Patterns:

    • ALS assumes linear relationships; use autoencoders or kernel methods for non-linearity.
  3. Real-Time Applications:

    • ALS is batch-oriented; for streaming data, use incremental PCA or online robust PCA.
  4. Small Datasets:

    • Requires sufficient data for stable factorization (e.g., at least hundreds of timesteps).
  5. Interpretability Needs:

    • Latent factors ((U), (V)) are hard to interpret compared to explicit trend/seasonality.

Example: ALS Baselining for Anomaly Detection

import numpy as np
from sklearn.decomposition import NMF  # ALS-like (Non-Negative Matrix Factorization)
import matplotlib.pyplot as plt

# Synthetic data: 10 sensors with correlated trends + anomalies
n_time, n_sensors = 1000, 10
X = np.random.randn(n_time, n_sensors) * 0.1
X += np.sin(np.linspace(0, 20, n_time))[:, None]  # Common trend
X[500:510, 2] = 10  # Inject anomaly (sensor 2 spikes at t=500-510)

# ALS-like baselining (using NMF for simplicity)
model = NMF(n_components=2)  # Low-rank factors
U = model.fit_transform(X)
V = model.components_
X_baseline = U @ V  # Reconstructed baseline

# Anomalies = residuals
residuals = np.sum((X - X_baseline)**2, axis=1)
plt.plot(residuals)
plt.axhline(y=np.percentile(residuals, 95), color='r', linestyle='--')
plt.title("Anomaly Scores (Residuals from ALS Baselining)")
plt.show()

Output:

  • Peaks in residuals correspond to anomalies (e.g., spike at t=500).

Comparison to Other Methods

Method Best For Limitations
ALS Multi-sensor baselining, missing data Linear assumptions, batch-only
STL Univariate trend/seasonality No multivariate support
Robust PCA Anomaly detection in noisy data Computationally heavy
Autoencoders Non-linear patterns Needs large data, GPU resources

Key Takeaways

  1. Use ALS Baselining when:
    • You have correlated multivariate data.
    • Need to handle missing values or smooth trends.
  2. Avoid ALS for:
    • Simple univariate series or non-linear patterns.
    • Real-time applications or small datasets.

Alternatives:

  • For univariate data: STL, Prophet.
  • For non-linear data: Autoencoders, Gaussian Processes.

Would you like a case study on industrial sensor data?