ONNX segmentation model transfer documentation - uiuc-iml/IML-Perception-Box GitHub Wiki
This document outlines requirements and best practices for converting your semantic segmentation models into ONNX format for deployment on the Perception Box. The primary goal is to enable consistent inference using ONNX Runtime, independent of Python preprocessing pipelines.
Core Requirement: Input and Output Format
For compatibility with the Perception Box inference engine, ONNX models must adhere to the following input and output interface specifications:
- RGB Input: A tensor of shape
(H, W, 3)
representing the RGB image, whereH
andW
are camera-specific dimensions. The expected data type isuint8
, with values in the range[0, 255]
. - Depth Input: A tensor of shape
(H, W)
representing the raw depth image in millimeters, of typefloat32
. The format and scale are assumed to match the raw depth output from the specific depth camera used. - Output: A tensor of shape
(H, W, C)
containing per-pixel semantic class logits, whereC
denotes the number of semantic categories.
ONNX models that conform to this I/O signature can be deployed directly, regardless of the internal architecture or preprocessing logic used prior to export.
torch.nn.Module
Wrapper
Suggested Style: While any method that meets the I/O spec is allowed, an easily reproduceable method we’ve used is to create a wrapper as a torch.nn.Module
derived class. This approach is proven to work for:
- Hugging Face transformer-based segmentation models
- ESANet and its fine-tuned variants
This lets you move all preprocessing inside the model and export it as a single self-contained unit.
Wrapper Implementation Pattern
torch.nn.Module
class:
1. Define a class YourONNXWrapper(nn.Module):
def __init__(self):
super().__init__()
...
forward()
, convert inputs:
2. Inside -
Assume input shapes:
rgb
: [H, W, 3]depth
: [H, W]
-
Normalize inputs using pure PyTorch ops:
rgb = rgb.float() / 255.0
rgb = (rgb - mean) / std
depth = (depth - depth_mean) / depth_std
rgb = rgb.permute(2, 0, 1).unsqueeze(0) # [1, 3, H, W]
depth = depth.unsqueeze(0).unsqueeze(0) # [1, 1, H, W]
3. Run model and return softmax output:
logits = self.model(rgb, depth)
return probs.squeeze(0).permute(1, 2, 0).contiguous()
ONNX Export Rules
Allowed in ONNX:
- Pure PyTorch tensor operations:
+
,*
,view
,permute
,interpolate
,softmax
, etc. - Any
torch.nn.functional
ortorch.nn.Module
ops. self.register_buffer(...)
for storing constants like mean/std.
Not Allowed:
- NumPy or OpenCV (
.numpy()
,cv2
, etc.) - Python control flow (
if
,for
,try
) involving tensor values.
Exporting
Use:
torch.onnx.export(
model,
(rgb_tensor, depth_tensor),
"model.onnx",
input_names=["rgb", "depth"],
output_names=["segmentation"],
dynamic_axes={
"rgb": {0: "batch", 1: "height", 2: "width"},
"depth": {0: "batch", 1: "height", 2: "width"},
"segmentation": {0: "height", 1: "width", 2: "classes"}
},
opset_version=12
)
However, not all models support dynamic_axes
during export. For example, ESANet uses internal control flow and hard-coded tensor operations (like F.interpolate
with int(tensor.shape[i] * scale)
) that depend on static sizes. These operations result in ONNX symbolic tracing failures or runtime shape mismatches when exported with dynamic input shapes.
If dynamic shape export fails:
- Fix the input to a known shape (e.g.,
480x640
) - Remove the
dynamic_axes
field - Export with fixed dummy inputs
Examples
See working examples at:
onnx_model_transfer/segformer
– for Hugging Face transformer models.onnx_model_transfer/esanet
– for RGB-D ESANet with raw depth handling.