Scattering Networks - igheyas/WaveletTransformation GitHub Wiki

Scattering networks are a clever tool in math and computer science for analyzing things like images, sounds, or other data signals. They're kind of like a simplified, math-based version of a neural network (those AI systems that learn patterns), but instead of learning from tons of examples, they use fixed rules inspired by how waves and patterns work in the real world. The goal is to pull out key "features" from the data that stay pretty much the same even if you tweak the original thing a bit—like shifting it, rotating it slightly, or stretching it.Let me break it down simply, like explaining it to a friend over coffee:1. The Basic Idea: Breaking Down SignalsThink of any data as a "signal." For example, a photo is a 2D signal of colors and shapes, or a song is a 1D signal of sound waves over time. A scattering network "scatters" this signal by passing it through layers of filters. These filters are based on something called wavelets—tiny wave-like patterns that detect details at different sizes (scales) and directions (orientations). It's like using a bunch of magnifying glasses of various strengths and angles to spot edges, textures, or rhythms.

How It Works Step by StepLayer 1: First Pass with Wavelets You apply these wavelet filters to the original signal. Each filter picks up specific frequencies or patterns (e.g., fine details like dots or broad ones like smooth gradients). Then, you take the absolute value (modulus) of the results to ignore the "direction" of the wave and focus on its strength. This makes things stable—if the image shifts a little, the features don't flip out. Layer 2 (and More): Scattering Deeper Now, take those results from Layer 1 and run them through more wavelets. Again, apply filters, take absolute values, and maybe average things out to smooth the output. This "scattering" process builds up layers, capturing more complex patterns, like how textures combine at different levels. It's like zooming out progressively: first you see the tiny ripples, then how those ripples form bigger waves. Final Output: Stable Features At the end, you get a set of numbers (coefficients) that describe the signal in a way that's "invariant" (doesn't change much) to small distortions. For instance, if you rotate a cat photo by 5 degrees or add some noise, the scattering features stay similar, helping computers recognize it's still a cat.
Why Are They Useful?They're great for tasks where data can vary a lot, like identifying textures in medical scans, classifying music genres from audio clips, or even spotting fake images. Unlike traditional neural networks, scattering networks don't need huge datasets to train—they're pre-built with math, so they're efficient and explainable. Downsides? They're not as flexible as full AI models for super-complex stuff, but they're a solid starting point or booster for those.

In short, scattering networks turn messy, changeable data into reliable, math-backed summaries—like a cheat sheet for patterns that ignores the fluff. If you want examples with code, math details, or how they compare to CNNs (convolutional neural networks), just let me know!

No, that's not quite how scattering networks typically work, at least in their standard form. Let me explain this in plain English, step by step, building on our earlier chat. I'll focus on the analogy to CNN layers (since scattering networks are like a fixed, non-learnable version of CNNs) and clarify why your proposed setup—with mixed wavelet families and "higher order" wavelets in later layers—isn't the usual approach. I'll also touch on variations or extensions where things get closer to what you're describing.

Quick Recap: Scattering Networks as "CNN-Like" StructuresScattering networks process signals (like images or audio) through layers of convolutions, just like the early layers of a CNN. But instead of learned filters (as in CNNs), they use predefined wavelet filters. These are mathematical "detectors" for patterns at different scales (sizes) and orientations (directions). Each layer: Convolve with wavelets → Take absolute value (modulus, to make it stable) → Sometimes average or pool. The key goal is invariance: The output features don't change much if the input is slightly shifted, rotated, or deformed. Standard implementations (pioneered by Stéphane Mallat) use a single "mother wavelet" as the base, then generate a whole "filter bank" by scaling it (dilating for different frequencies) and rotating it (for directions, especially in 2D like images).
What Happens in the First Layer? (Not Mixed Families Like You Suggested)In a typical scattering network, the first layer doesn't pick one filter from db1, another from coif1, etc. Instead:You start with one wavelet family (e.g., Morlet or Gabor for images, or Daubechies for 1D signals). From that single mother wavelet, you create multiple filters by:Scaling: Versions at different sizes (e.g., fine details vs. broad patterns). This is like dyadic scales (powers of 2). Rotating: Versions at different angles (e.g., 0°, 45°, 90°, etc., often 6-8 orientations for 2D).

Example: If using a Morlet wavelet, your "5 filters" might actually be 5 variations of the same Morlet wavelet—say, at 2 scales and 3 orientations (not from totally different families like db, coif, sym, bior, or gaus). Why not mix families? Mixing db1 (Daubechies, orthogonal and compact), coif1 (Coiflets, similar but with more vanishing moments), sym2 (Symlets, symmetric version of Daubechies), bior1.1 (Biorthogonal, for perfect reconstruction), and gaus1 (Gaussian, smooth but not orthogonal) would create inconsistent properties. Standard scattering aims for a "tight frame" (math term for efficient, non-redundant coverage), which is easier with one family.

2 sources

Gaussian (gaus) is sometimes used separately for the final averaging step (low-pass filter), not as a main convolutional filter.

What About the Second Layer? (No "Higher Order" Switch Like db1 → db2)The second layer takes the outputs from the first layer (after modulus) and applies the same wavelet filter bank again.It's not switching to "higher order" versions (e.g., db1 → db2, where "order" means longer support or more vanishing moments in wavelet terms). Instead: For each "channel" from layer 1 (each scale/orientation combo), you convolve with the full bank of scaled/rotated wavelets from the same mother wavelet. This captures "higher-order" patterns in a different sense: Interactions between features (e.g., how edges combine into textures), not by bumping up the wavelet order. Example: If layer 1 has 5 outputs (from your hypothetical filters), layer 2 would apply the original filter bank to each of those 5, creating 5 × (number of scales/orientations) new coefficients, then modulus again.

No family switching (e.g., coif1 → coif2) or mixing—the consistency ensures mathematical guarantees like stability and invariance.

"Higher order" in scattering refers to the depth (e.g., order-2 scattering means two layers of wavelets), not wavelet complexity. Deeper layers just keep scattering previous outputs.

Subsequent Layers: Same Deal, No Escalating Orders or FamiliesLayers 3+ follow the pattern: Apply the same wavelet bank to the modulus outputs from the prior layer. This builds a tree-like structure of coefficients (called "scattering paths"), where each path represents a sequence of scales/orientations. Most practical networks stop at 2-3 layers (deeper ones get computationally heavy and lose little energy).

No automatic "higher order wavelets from the same family"—the mother wavelet stays fixed. If you want more detail capture, you adjust scales/orientations, not switch to db3 or sym4.

Is There Any Variation Where This Does Happen? (Closer to Your Idea)In experimental or hybrid versions (e.g., "learnable" or "parametric" scattering), people tweak things:You might optimize the mother wavelet's parameters (e.g., bandwidth or center frequency) during training, making it semi-learnable like CNN filters.

Some research mixes wavelet types or uses different families for specific tasks (e.g., Daubechies for 1D time series, Morlet for images), but not layered like you described—it's still one family per network. In fault diagnosis or signal processing apps, custom filter banks (including bior or sym) appear, but layers reuse the bank, not escalate orders.

If you built a custom "scattering-inspired" CNN like your example (mixing families and orders), it could work as a hybrid model—but it wouldn't be a pure scattering network anymore, losing some math guarantees.

Why Stick to One Family? Pros and ConsPros: Simplicity, provable invariance, no training needed, efficient for small datasets. Cons: Less flexible than CNNs for wild variations. Your mixed approach might add flexibility but complicate things (e.g., harder to ensure stability). Tools like MATLAB's Wavelet Toolbox or Python's Kymatio library implement standard scattering with one mother wavelet (e.g., Morlet by default).

Scattering Network

Scattering networks are like wavelets + deep learning — but fully hand-designed. Let's build up the idea step by step, super clearly. 🧠🌊🤖

✅ Step 1: What's the goal? Scattering networks extract stable features from signals or images — using wavelets, not learned filters.

Think of them as:

Like CNNs (convolutional neural nets)

But with fixed filters (wavelets), not trained weights

Great for texture, audio, image patterns

✅ Step 2: Main ingredients A scattering transform builds features in layers, just like a CNN.

Each layer does:

Wavelet transform → captures local variations

Modulus → takes the absolute value to keep energy

Averaging → smooths out noise and small shifts

[2, 3] → (2×1 + 3×(–1)) = -1  
[3, 4] → (3×1 + 4×(–1)) = -1  
[4, 5] → (4×1 + 5×(–1)) = -1

[1, 1] → (1×½ + 1×½) = 1  
[1, 1] → (1×½ + 1×½) = 1

🔁 Why is Scattering Network: ✅ Translation-invariant? That means:

If you shift the input a little (like slide a signal left or right), the output barely changes.

Why?

The wavelet modulus removes phase info (location)

The averaging step smooths over small shifts

🧠 So it doesn't care exactly where a feature happens — just that it happens.

✅ Stable to deformation? Deformation = a small warp or stretch in the signal

Example:

Original: [1, 2, 3, 4, 5]

Warped: [1, 2.2, 3, 4.2, 5]

Still mostly the same — but slightly “bent.”

🧠 The scattering transform is Lipschitz continuous to these small changes — meaning output changes gently when input is gently warped.

❓ Is there any trainable fully connected layer like in CNNs? Nope! ❌

Everything in a scattering network is fixed: wavelets, filters, pooling — all hand-designed

No learning, no weights to train

That’s why:

It's interpretable ✔️

Doesn’t need training data ✔️

Works well with small datasets ✔️

But:

You can still use scattering features as input to a classifier like a trained neural net or SVM!

💯 Yes — exactly right!

✅ Scattering networks are used for: 🎯 Feature extraction They turn raw signals or images into rich, stable, structured features.

🧠 Why use scattering features? They’re robust to noise, shifts, and small warps

They work without training

They're interpretable and mathematically grounded

Great when you have limited data or need explainability

📦 What happens after feature extraction? You can feed scattering features into:

🎯 SVMs

🧠 Fully connected neural networks

🤖 Decision trees or other classifiers

So it’s like:

Signal or image → Scattering network → Feature vector → Classifier

🔬 Used in:

Image classification
Texture recognition
Audio signal processing
EEG or biomedical signals
Any task needing stable, meaningful features

💡 So yes: ✅ All filters are translations and dilations (scales) of one wavelet

No learning. Just wavelet math. That’s why the system is structured, stable, and interpretable.

Most scattering networks use 2 layers.

It’s a good trade-off: deep enough for rich info, still fast and stable.