AI Assisted Review of Pose Detection Models in Unity - RutgersGRID/DanceAutism GitHub Wiki

Pose Detection Models for Unity (Sentis) and Python: Overview and Capabilities


πŸ“Œ Introduction

This reference outlines key pose detection models available for use with Unity Sentis and Python, highlighting their capabilities in real-time tracking, segmentation, and 3D space awareness. It includes a detailed section on BodyPixSentis, along with comparisons to other modern models like MediaPipe BlazePose, YOLO-NAS Pose, and Poseidon (ViTPose Extension).


🧠 Unity Sentis-Compatible Models

πŸ”Ή BodyPixSentis

Repo BodyPixSentis is a Unity-compatible adaptation of Google’s BodyPix model, integrated through the Sentis framework. It provides real-time person segmentation and 2D pose estimation.

  • Model Origin: Google TensorFlow.js
  • Architecture: MobileNet or ResNet
  • Pose Type: 2D keypoints (17 landmarks)
  • Segmentation: Person + part segmentation
  • Depth Estimation: ❌ Not supported
  • Unity Integration: βœ… Via ONNX, supports real-time webcam input and visualization

πŸ”Ή BlazePose for Unity (Sentis Version)

Repo

  • Model Origin: Google MediaPipe
  • Variants: Lite, Full, Heavy
  • Pose Type: 33 landmarks with optional z-axis (depth)
  • Strengths: Lightweight and optimized for mobile; supports 3D pose out-of-the-box.
  • Unity Integration: βœ… Official models on Hugging Face ready for Sentis

πŸ”Ή BlazePose GHUM Holistic (Advanced)

  • Extension of BlazePose with GHUM statistical body model
  • Pose Type: Full-body 3D landmarks + hand keypoints (21 per hand)
  • Depth Estimation: βœ… Yes (z-axis for all landmarks)
  • Availability: Not in Unity Sentis by default; requires conversion to ONNX

πŸ”Ή YOLO-NAS Pose

repo

  • Strength: High-speed detection (425 FPS on GPU)
  • Pose Type: 2D keypoints
  • Use Case: Performance-critical applications
  • Unity Integration: Requires custom ONNX conversion

πŸ”Ή Poseidon (ViTPose Extension)

  • Strength: Temporal tracking and multi-frame attention
  • Pose Type: 2D, high accuracy in sequences
  • Depth Estimation: ❌ No native z-axis support
  • Use Case: Research or high-accuracy post-processing

🧭 Depth Estimation Overview

βœ… Supports Depth

  • BlazePose (Full + GHUM): Native z-axis for each keypoint
  • BlazePose GHUM Holistic: Enhanced 3D with full body and hands

❌ Lacks Native Depth

  • BodyPixSentis
  • YOLO-NAS Pose
  • Poseidon (ViTPose)

πŸ”§ Workarounds for Depth in Unity

  • Use stereo camera setups
  • Integrate with ARKit/ARCore depth sensors
  • Fuse with 3D sensor data (e.g., RealSense, Kinect)
  • Apply camera calibration with known skeleton ratios

πŸ“Š Comparative Table

Model/Framework Pose Type Depth Support Unity Sentis Ready Speed (FPS) Landmarks Use Case
BodyPixSentis 2D (17 pts) ❌ βœ… Yes ~45 17 2D segmentation + tracking
BlazePose Lite/Full 3D (33 pts) βœ… Optional βœ… Yes ~60 33 Fitness, AR, apps
BlazePose GHUM Holistic Full 3D body βœ… Yes ⚠️ Manual conversion ~15 33+hands Holistic 3D pose
YOLO-NAS Pose 2D (varies) ❌ ⚠️ Conversion needed ~425 17+ High-speed detection
Poseidon (ViTPose Ext.) 2D (temporal) ❌ ⚠️ Research ~30 17+ Video sequence tracking

🏁 Summary

For real-time 3D pose estimation in Unity, BlazePose Full and BlazePose GHUM are your top options. BodyPixSentis remains a solid 2D solution for segmentation and tracking. If depth estimation is a requirement, prioritize BlazePose-based models or extend Unity’s capabilities through sensor fusion and depth approximation techniques.

Let me know if you want implementation templates, Unity project samples, or ONNX conversion guidance.