1908.05436.md - hassony2/inria-research-wiki GitHub Wiki
{paper} {code}{notes}
Learning Trajectory Dependencies for Human Motion Prediction, ICCV'19Wei Mao1, Miaomiao Liu, Mathieu Salzmann, Hongdong Li
Method
- given an temporal sequence X_ {1:N}
- replicate the last pose to construct a sequence of length X_{1:N + T} T times
- compute DCT coefficients of this sequence
- aim at predicting real coeffs as a residual vector
- effectively predicting offsets to zero-velocity-baseline in frequency space
- modeling dependencies between joints using graph convolutional networks
- learn the connectivity during training
Experiments
-
DCT coefficient nb analysis
- 35 results in lossless encoding (because 35 frames are used in total)
- given smoothness 10 coefficients are enough to encode reasonable realistic motion (later coefficients encode higher-frequency trajectory modifications)
- observe 10 frames to predict the future 25 frames on H3.6M
- compare to 2-layer fully convolutional netowrk which predicts offsets in DCT coefficients
- weird curve in ablation of number of DCT coefficients (as DCT coeff number increases, we could expect monotonically increasing accurcay, but jittery in angle scape, looks like noise to me)
-
Ablation analysis
- Preprocessing (DCT, residual connexion, padding)
- DCT conversion yields the smallest improvement
- padding especially and also residual formulation is crucial ! (Table 6)
- Architecture
- compare GCN with learnt connectivity, GCN with hard-coded connectivity and fully connected architecture
- Learn connectivity is slightly better than fully connected, and significantly better than hard-coded connectivity (Table 7)
- Preprocessing (DCT, residual connexion, padding)