AI Assisted Review of Pose Detection Models in Unity - RutgersGRID/DanceAutism GitHub Wiki

Pose Detection Models for Unity (Sentis) and Python: Overview and Capabilities

📌 Introduction

This reference outlines key pose detection models available for use with Unity Sentis and Python, highlighting their capabilities in real-time tracking, segmentation, and 3D space awareness. It includes a detailed section on BodyPixSentis, along with comparisons to other modern models like MediaPipe BlazePose, YOLO-NAS Pose, and Poseidon (ViTPose Extension).

🧠 Unity Sentis-Compatible Models

🔹 BodyPixSentis

Repo BodyPixSentis is a Unity-compatible adaptation of Google’s BodyPix model, integrated through the Sentis framework. It provides real-time person segmentation and 2D pose estimation.

Model Origin: Google TensorFlow.js
Architecture: MobileNet or ResNet
Pose Type: 2D keypoints (17 landmarks)
Segmentation: Person + part segmentation
Depth Estimation: ❌ Not supported
Unity Integration: ✅ Via ONNX, supports real-time webcam input and visualization

🔹 BlazePose for Unity (Sentis Version)

Repo

Model Origin: Google MediaPipe
Variants: Lite, Full, Heavy
Pose Type: 33 landmarks with optional z-axis (depth)
Strengths: Lightweight and optimized for mobile; supports 3D pose out-of-the-box.
Unity Integration: ✅ Official models on Hugging Face ready for Sentis

🔹 BlazePose GHUM Holistic (Advanced)

Extension of BlazePose with GHUM statistical body model
Pose Type: Full-body 3D landmarks + hand keypoints (21 per hand)
Depth Estimation: ✅ Yes (z-axis for all landmarks)
Availability: Not in Unity Sentis by default; requires conversion to ONNX

🔹 YOLO-NAS Pose

repo

Strength: High-speed detection (425 FPS on GPU)
Pose Type: 2D keypoints
Use Case: Performance-critical applications
Unity Integration: Requires custom ONNX conversion

🔹 Poseidon (ViTPose Extension)

Strength: Temporal tracking and multi-frame attention
Pose Type: 2D, high accuracy in sequences
Depth Estimation: ❌ No native z-axis support
Use Case: Research or high-accuracy post-processing

🧭 Depth Estimation Overview

✅ Supports Depth

BlazePose (Full + GHUM): Native z-axis for each keypoint
BlazePose GHUM Holistic: Enhanced 3D with full body and hands

❌ Lacks Native Depth

BodyPixSentis
YOLO-NAS Pose
Poseidon (ViTPose)

🔧 Workarounds for Depth in Unity

Use stereo camera setups
Integrate with ARKit/ARCore depth sensors
Fuse with 3D sensor data (e.g., RealSense, Kinect)
Apply camera calibration with known skeleton ratios

📊 Comparative Table

Model/Framework	Pose Type	Depth Support	Unity Sentis Ready	Speed (FPS)	Landmarks	Use Case
BodyPixSentis	2D (17 pts)	❌	✅ Yes	~45	17	2D segmentation + tracking
BlazePose Lite/Full	3D (33 pts)	✅ Optional	✅ Yes	~60	33	Fitness, AR, apps
BlazePose GHUM Holistic	Full 3D body	✅ Yes	⚠️ Manual conversion	~15	33+hands	Holistic 3D pose
YOLO-NAS Pose	2D (varies)	❌	⚠️ Conversion needed	~425	17+	High-speed detection
Poseidon (ViTPose Ext.)	2D (temporal)	❌	⚠️ Research	~30	17+	Video sequence tracking

🏁 Summary

For real-time 3D pose estimation in Unity, BlazePose Full and BlazePose GHUM are your top options. BodyPixSentis remains a solid 2D solution for segmentation and tracking. If depth estimation is a requirement, prioritize BlazePose-based models or extend Unity’s capabilities through sensor fusion and depth approximation techniques.

Let me know if you want implementation templates, Unity project samples, or ONNX conversion guidance.