AI Assisted Review of Pose Detection Models in Unity - RutgersGRID/DanceAutism GitHub Wiki
Pose Detection Models for Unity (Sentis) and Python: Overview and Capabilities
π Introduction
This reference outlines key pose detection models available for use with Unity Sentis and Python, highlighting their capabilities in real-time tracking, segmentation, and 3D space awareness. It includes a detailed section on BodyPixSentis, along with comparisons to other modern models like MediaPipe BlazePose, YOLO-NAS Pose, and Poseidon (ViTPose Extension).
π§ Unity Sentis-Compatible Models
πΉ BodyPixSentis
Repo BodyPixSentis is a Unity-compatible adaptation of Googleβs BodyPix model, integrated through the Sentis framework. It provides real-time person segmentation and 2D pose estimation.
- Model Origin: Google TensorFlow.js
- Architecture: MobileNet or ResNet
- Pose Type: 2D keypoints (17 landmarks)
- Segmentation: Person + part segmentation
- Depth Estimation: β Not supported
- Unity Integration: β Via ONNX, supports real-time webcam input and visualization
πΉ BlazePose for Unity (Sentis Version)
- Model Origin: Google MediaPipe
- Variants: Lite, Full, Heavy
- Pose Type: 33 landmarks with optional z-axis (depth)
- Strengths: Lightweight and optimized for mobile; supports 3D pose out-of-the-box.
- Unity Integration: β Official models on Hugging Face ready for Sentis
πΉ BlazePose GHUM Holistic (Advanced)
- Extension of BlazePose with GHUM statistical body model
- Pose Type: Full-body 3D landmarks + hand keypoints (21 per hand)
- Depth Estimation: β Yes (z-axis for all landmarks)
- Availability: Not in Unity Sentis by default; requires conversion to ONNX
πΉ YOLO-NAS Pose
- Strength: High-speed detection (425 FPS on GPU)
- Pose Type: 2D keypoints
- Use Case: Performance-critical applications
- Unity Integration: Requires custom ONNX conversion
πΉ Poseidon (ViTPose Extension)
- Strength: Temporal tracking and multi-frame attention
- Pose Type: 2D, high accuracy in sequences
- Depth Estimation: β No native z-axis support
- Use Case: Research or high-accuracy post-processing
π§ Depth Estimation Overview
β Supports Depth
- BlazePose (Full + GHUM): Native z-axis for each keypoint
- BlazePose GHUM Holistic: Enhanced 3D with full body and hands
β Lacks Native Depth
- BodyPixSentis
- YOLO-NAS Pose
- Poseidon (ViTPose)
π§ Workarounds for Depth in Unity
- Use stereo camera setups
- Integrate with ARKit/ARCore depth sensors
- Fuse with 3D sensor data (e.g., RealSense, Kinect)
- Apply camera calibration with known skeleton ratios
π Comparative Table
Model/Framework | Pose Type | Depth Support | Unity Sentis Ready | Speed (FPS) | Landmarks | Use Case |
---|---|---|---|---|---|---|
BodyPixSentis | 2D (17 pts) | β | β Yes | ~45 | 17 | 2D segmentation + tracking |
BlazePose Lite/Full | 3D (33 pts) | β Optional | β Yes | ~60 | 33 | Fitness, AR, apps |
BlazePose GHUM Holistic | Full 3D body | β Yes | β οΈ Manual conversion | ~15 | 33+hands | Holistic 3D pose |
YOLO-NAS Pose | 2D (varies) | β | β οΈ Conversion needed | ~425 | 17+ | High-speed detection |
Poseidon (ViTPose Ext.) | 2D (temporal) | β | β οΈ Research | ~30 | 17+ | Video sequence tracking |
π Summary
For real-time 3D pose estimation in Unity, BlazePose Full and BlazePose GHUM are your top options. BodyPixSentis remains a solid 2D solution for segmentation and tracking. If depth estimation is a requirement, prioritize BlazePose-based models or extend Unityβs capabilities through sensor fusion and depth approximation techniques.
Let me know if you want implementation templates, Unity project samples, or ONNX conversion guidance.