1908.05436.md - hassony2/inria-research-wiki GitHub Wiki

Learning Trajectory Dependencies for Human Motion Prediction, ICCV'19 {paper} {code}{notes}

Wei Mao1, Miaomiao Liu, Mathieu Salzmann, Hongdong Li

Method

  • given an temporal sequence X_ {1:N}
    • replicate the last pose to construct a sequence of length X_{1:N + T} T times
    • compute DCT coefficients of this sequence
    • aim at predicting real coeffs as a residual vector
    • effectively predicting offsets to zero-velocity-baseline in frequency space
  • modeling dependencies between joints using graph convolutional networks
    • learn the connectivity during training

Experiments

  • DCT coefficient nb analysis

    • 35 results in lossless encoding (because 35 frames are used in total)
    • given smoothness 10 coefficients are enough to encode reasonable realistic motion (later coefficients encode higher-frequency trajectory modifications)
    • observe 10 frames to predict the future 25 frames on H3.6M
    • compare to 2-layer fully convolutional netowrk which predicts offsets in DCT coefficients
    • weird curve in ablation of number of DCT coefficients (as DCT coeff number increases, we could expect monotonically increasing accurcay, but jittery in angle scape, looks like noise to me)
  • Ablation analysis

    • Preprocessing (DCT, residual connexion, padding)
      • DCT conversion yields the smallest improvement
      • padding especially and also residual formulation is crucial ! (Table 6)
    • Architecture
      • compare GCN with learnt connectivity, GCN with hard-coded connectivity and fully connected architecture
      • Learn connectivity is slightly better than fully connected, and significantly better than hard-coded connectivity (Table 7)