1908.05436.md - hassony2/inria-research-wiki GitHub Wiki

Learning Trajectory Dependencies for Human Motion Prediction, ICCV'19 {paper} {code}{notes}

Wei Mao1, Miaomiao Liu, Mathieu Salzmann, Hongdong Li

Method

given an temporal sequence X_ {1:N}
- replicate the last pose to construct a sequence of length X_{1:N + T} T times
- compute DCT coefficients of this sequence
- aim at predicting real coeffs as a residual vector
- effectively predicting offsets to zero-velocity-baseline in frequency space
modeling dependencies between joints using graph convolutional networks
- learn the connectivity during training

Experiments

DCT coefficient nb analysis
- 35 results in lossless encoding (because 35 frames are used in total)
- given smoothness 10 coefficients are enough to encode reasonable realistic motion (later coefficients encode higher-frequency trajectory modifications)
- observe 10 frames to predict the future 25 frames on H3.6M
- compare to 2-layer fully convolutional netowrk which predicts offsets in DCT coefficients
- weird curve in ablation of number of DCT coefficients (as DCT coeff number increases, we could expect monotonically increasing accurcay, but jittery in angle scape, looks like noise to me)
Ablation analysis
- Preprocessing (DCT, residual connexion, padding)
  - DCT conversion yields the smallest improvement
  - padding especially and also residual formulation is crucial ! (Table 6)
- Architecture
  - compare GCN with learnt connectivity, GCN with hard-coded connectivity and fully connected architecture
  - Learn connectivity is slightly better than fully connected, and significantly better than hard-coded connectivity (Table 7)