Awesome Human Action Recognition Repository - caglarmert/MOT-Research GitHub Wiki
awesome-human-action-recognition
list the most popular methods about human action recognition
Table of Contents
arxiv Papers
[arXiv:1808.07507] Model-based Hand Pose Estimation for Generalized Hand Shape with Appearance Normalization. [PDF]
Unaiza Ahsan,Rishi Madhok
[arXiv:1711.04161] End-to-end Video-level Representation Learning for Action Recognition. [PDF][code]
Jiagang Zhu, Wei Zou, Zheng Zhu
Journal Papers
[2017 IEEE Access:TPAMI] Long-Term Temporal Convolutions for Action Recognition [PDF]
Gul Varol , Ivan Laptev, and Cordelia Schmid, Fellow, IEEE
Review works
[PDF]
Human Action Recognition and Prediction: A SurveyYu Kong, Member, IEEE, and Yun Fu, Senior Member, IEEE
Conference Papers
2019 ICCV
Graph Convolutional Networks for Temporal Action Localization 作者:Chuang Gan 等
Action recognition with spatial-temporal discriminative filter banks 作者:Yuanjun Xiong 等
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures 作者:Google Brain
neural architecture search for video understanding——大力出奇迹
DynamoNet: Dynamic Action and Motion Network 作者:Ali Diba Luc Van Gool
Reasoning About Human-Object Interactions Through Dual Attention Networks 作者:Bolei Zhou
Learning Temporal Action Proposals with Fewer Labels 作者:Stanford Feifei组 Juan Carlos Niebles
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition 作者:Dima Damen 等
SlowFast Networks for Video Recognition (文章链接:https://arxiv.org/abs/1812.03982) kaiming 大神 from FAIR
Video Classification with Channel-Separated Convolutional Networks (文章链接:https://arxiv.org/abs/1904.02811) Du Tran 大神 from FAIR
SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition. oral (文章链接:https://arxiv.org/abs/1904.04289) Du Tran 大神 from FAIR
DistInit: Learning Video Representations without a Single Labeled Video. (文章链接:https://arxiv.org/abs/1901.09244) Du Tran 大神 from FAIR 很简单的思路
TSM: Temporal Shift Module for Efficient Video Understanding 作者:Ji Lin, Chuang Gan, Song Han 论文链接:https://arxiv.org/abs/1811.08383 Github链接:https://github.com/mit-han-lab/temporal-shift-module emmm感觉吧,就像是搞了个带Mask的固定卷积核?
BMN: Boundary-Matching Network for Temporal Action Proposal Generation (文章链接:https://arxiv.org/abs/1907.09702) 来自作者大大解读:林天威:[ICCV 2019][时序动作提名] 边界匹配网络详解 (原文链接:https://zhuanlan.zhihu.com/p/75444151)
Weakly Supervised Energy-Based Learning for Action Segmentation.oral 文章链接:https://github.com/JunLi-Galios/CDFL
Pose-aware Dynamic Attention for Human Object Interaction Detection 文章链接:https://github.com/bobwan1995/PMFNet
What Would You Expect? Anticipating Egocentric Actions With Rolling-Unrolling LSTMs and Modality Attention 项目链接:https://iplab.dmi.unict.it/rulstm/ 论文链接:https://arxiv.org/pdf/1905.09035.pdf GitHub:https://github.com/fpv-iplab/rulstm
Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings 论文链接:https://arxiv.org/abs/1908.03477 项目链接:https://mwray.github.io/FGAR/
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips 作者:Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, Josef Sivic 论文链接:https://arxiv.org/abs/1906.03327 项目链接:https://github.com/antoine77340/howto100m code(链接:https://github.com/antoine77340/howto100m)
Temporal Attentive Alignment for Large-Scale Video Domain Adaptation 作者:Min-Hung Chen, Zsolt Kira, Ghassan AlRegib, Jaekwon Woo, Ruxin Chen, Jian Zheng 论文链接:https://arxiv.org/abs/1907.12743 Github链接:https://github.com/cmhungsteve/TA3N
STM- SpatioTemporal and Motion Encoding for Action Recognition from ZJU && SenseTime Group Limited 论文链接:https://arxiv.org/abs/1908.02486
2018 ECCV
[PDF] [code]
[2018,ECCV] Temporal Relational Reasoning in Videos[PDF]
[2018,ECCV] Modality Distillation with Multiple Stream Networks for Action RecognitionBolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torralba
[PDF]
[2018,ECCV] Graph Distillation for Action Detection with Privileged ModalitiesStanford University 2 Google Inc.
above two papers, they are similar, which belong to a new hole
[PDF]
[2018,ECCV] Spatio-Temporal Channel Correlation Networks for Action Classificationnote: qustion:3D network cannot learn the relation between spacial and temporal .why?
[PDF] [code]
[2018,ECCV] Learning Human-Object Interactions by Graph Parsing Neural NetworksSiyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu
[PDF]
[2018,ECCV] Interaction-aware Spatio-temporal Pyramid Attention Networks for Action ClassificationYang Du,Chunfeng Yuan, Bing Li, Lili Zhao, Yangxi Li and Weiming Hu
[PDF]
[2018,ECCV] Action Search: Spotting Actions in Videos and Its Application to Temporal Action LocalizationHumam Alwassel, Fabian Caba Heilbron, and Bernard Ghanem
[PDF]
[2018,ECCV] Action Anticipation with RBF Kernelized Feature Mapping RNNYuge Shi, Basura Fernando, Richard Hartley
[PDF]
[2018,ECCV] Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack LearningChenyang Si, Ya Jing, Wei Wang, Liang Wang, Tieniu Tan
[2018,ECCV] Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset
Jamie Ray, Heng Wang, Du Tran, Yufei Wang, Matt Feiszli, Lorenzo Torresani, Manohar Paluri
[PDF]
[2018,ECCV] End-to-End Joint Semantic Segmentation of Actors and Actions in Video[PDF]
[2018,ECCV] Scenes-Objects-Actions: A Multi-Task, Multi-Label Video DatasetJamie Ray1, Heng Wang1, Du Tran1 Yufei Wang1 ,etc
2018 CVPR
[2018,CVPR] Optical Flow Guided Feature: A Fast and Robust Motion Representation for
Video Action Recognition [PDF] Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei Zhang
[PDF] [code]
[2018,CVPR] Appearance-and-Relation Networks for Video ClassificationL. Wang, W. Li, W. Li, and L. Van Gool
2018 NIPS
[PDF] [code]
[2018,NIPS] Trajectory Convolution for Action RecognitionYue Zhao, Yuanjun,Xiong
2018 Others
2017 ICCV
2017 CVPR
[PDF]
AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in VideosAmlan Kar, Nishant Rai, Karan Sikka,Gaurav Sharma
[PDF]
[2017,CVPR] On the Integration of Optical Flow and Action RecognitionLaura Sevilla-Lara, Yiyi Liao, Fatma Guney, Varun Jampani, Andreas Geiger, Michael J. Black
2017 Others
2016 CVPR
[PDF]
[2016,CVPR] Convolutional Two-Stream Network Fusion for Video Action RecognitionChristoph Feichtenhofer,Axel Pinz,Andrew Zisserman
[PDF]
[2016,CVPR] A Key Volume Mining Deep Framework for Action RecognitionWangjiang Zhu,Jie Hu,Gang Sun,Xudong Cao,Yu Qiao
2016 ECCV
[PDF]
[2016,ECCV] Temporal Segment Networks: Towards Good Practices for Deep Action RecognitionLimin Wang,Yuanjun XiongZhe WangYu QiaoDahua LinXiaoou TangLuc Van Gool
2016 ICCV
2016 Others
2015 CVPR
[PDF]
[2015,CVPR] Action Recognition with Trajectory-Pooled Deep-Convolutional DescriptorsLimin Wang, Yu Qiao, Xiaoou Tang
2015 ECCV
2015 ICCV
[PDF]
[2015,ICCV] Learning Spatiotemporal Features with 3D Convolutional NetworksD. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri
2015 Others
2014 CVPR
[PDF]
[2014,CVPR] Large-Scale Video Classification with Convolutional Neural NetworksA Karpathy , G Toderici , S Shetty , T Leung , R Sukthankar,L. Fei-Fei
2014 ECCV
2014 ICCV
2014 Others
[PDF]
[2014,NIPS] Two-Stream Convolutional Networks for Action Recognition in VideosKaren Simonyan, Andrew Zisserman
Two-Stream Convolutional Networks for Action Recognition in Videos
[PDF]
AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in VideosKaren Simonyan, Andrew Zisserman
Directions
Traditional Machine Learning Methods
Here we pay more attention on DL methods as follows.
Deep Learning Methods
2D convolutional netwoks
[PDF]
AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in VideosAmlan Kar, Nishant Rai, Karan Sikka,Gaurav Sharma
3D convolutional networks
[2014,IEEE Acess:TPAMI] 3D Convolutional Neural Networks for Human Action Recognition
Shuiwang Ji ,Wei Xu,Ming Yang ,Kai Yu
[2017 IEEE Access:TPAMI] Long-Term Temporal Convolutions for Action Recognition [PDF]
Gul Varol , Ivan Laptev, and Cordelia Schmid, Fellow, IEEE
LSTM networks
multistream networks
[PDF]
[2014,NIPS] Two-Stream Convolutional Networks for Action Recognition in Videos[PDF]
[2016,ECCV] Temporal Segment Networks: Towards Good Practices for Deep Action Recognition[PDF] [code]
[2017,ICCV] Temporal Relational Reasoning in Videos[PDF]
[2016,CVPR] A Key Volume Mining Deep Framework for Action Recognitionnew feature
[2018,CVPR] Optical Flow Guided Feature: A Fast and Robust Motion Representation for
Video Action Recognition [PDF] _Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei Zhang
[PDF]
[2015,CVPR] Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors[PDF]
[2017,CVPR] On the Integration of Optical Flow and Action RecognitionLaura Sevilla-Lara, Yiyi Liao, Fatma Guney, Varun Jampani, Andreas Geiger, Michael J. Black
explanation deep representation
[arXiv:1712.08416] What have we learned from deep representations for action recognition?
Laura Sevilla-Lara, Yiyi Liao, Fatma Guney, Varun Jampani, Andreas Geiger, Michael J. Black
semantic
[arXiv:1802] Structured Label Inference for Visual Understanding Nelson Nauata, Hexiang Hu, Guang-Tong Zhou, Zhiwei Deng, Zicheng Liao and Greg Mori
datasets
[PDF]
[2018,ECCV] Scenes-Objects-Actions: A Multi-Task, Multi-Label Video DatasetDatasets
- Year: publish date
- Videos: amount of flips
- Views: amount of view angles
- Actions: amount of action class
- Subjects: people in Videos
- Modility: RGB or RGB-D
- Env: Controlled(C) or Uncontrolled(U)
[PDF]
dataset papers 2017[PDF]
2018 video benchmarks: a review[HTML]
video datasets online(html)[HTML]
compute vision datasets onlineDataset | Year | Videos | Views | Actions | Subjects | Modility | Env(C\U) | Related Paper |
---|---|---|---|---|---|---|---|---|
KTH | 2004 | 599 | 1 | 6 | 25 | RGB | C | Recognizing human actions: A local svm approach, IEEE ICPR 2004 [PDF] |
HMDB51 | 2011 | 7000 | - | 51 | - | RGB | U | LHmdb: A large video database for human motion recognition, ICCV 2011 [PDF] |
UCF101 | 2012 | 13320 | - | 101 | - | RGB | U | Ucf101: A dataset of 101 human action classes from videos in the wild, 2012,cRCV-TR-12-01 [PDF] |
Current Accuracy on Main Datasets
- HDMB51 82.1% 2017