leon_bottou_05_07_2018 - hassony2/inria-research-wiki GitHub Wiki

Léon Bottou

Problems in machine learning

What is the problem ?

Instead of learning how to detect a bird in an image we learn to solve the statistical task of learning to recognize a lot of birds in a lot of images.

Action recognition systems learn to be statistically correct but can miss the concept. 'Giving a phone call' actually just detects phones, which is statistically correct but misses the underlying concept.

Humans rely on the understanding of the underlying structure of data and not statistical correlations.

'Some of my best friends are linguists' Frederick Jilenick's task --> it is our task to find a way to make use of the knowledge of linguists, and we are failing at this right now.

Why do we fail ?

Because we measure average performances, we fail at understanding the structure. The structure is not something that helps us understand common sentences but rare ones, and make new ones, and this is not measured by averaged metrics.

But statistics and semantics do not exactly overlap, and models learn the biases in the distribution (up to losing the meaning in a translated sentence).

How can we fix it ?

Understand what causes what, and distinguish causation versus correlation. What is the relation between causation and supervised learning.

Causation and statistics

What connects correlaton and causation ? When are A and B correlated ? When A causes B, B causes A or C causes A and B. Depending on each case, if I manipulate B, the impact of this manipulation on A will be different.

The rigorous scientific process is to formulate an experiment and then test the hypotheses by trying to invalidate it. The cure for confounding correlations is randomization ! The only way to know if A causes B is to control A ourselve, and specifically, by randomizing A.

Causal inference

We assume we know what causes what. See paper on causal inference by Bottou

Causal intuition

Simpson confounding (correlation != causation) A model with positive correlation between x and y can have negative correlation if z has specific correlations with x and y. The only way to discover the 'real' causation relationship between x and y, you need to manipulate x by keeping everything else equal. You can only observe the real effect of x on y by manipulation.

Observations can give an intuition of causation but no proofs !

How to understand image data ? Ask questions: 'what happens if we remove the cars, the bridge (what happens to the cars ?!) ?

Affordances