ACT

🏋️Training guide

ACT for Franka Recordings:

Create a conda env:

conda create -n franka_teleop python=3.8.10 -y 
conda activate franka_teleop 

pip install torchvision torch pyquaternion pyyaml rospkg pexpect mujoco==2.3.7 dm_control==1.0.14 opencv-python matplotlib einops packaging h5py ipython

Before training:

conda activate franka_teleop
cd detr/ && pip install -e .

To train ACT:

python3 imitate_episodes.py \
--dataset_dir ../franka_recordings/ \
--ckpt_dir outputs/ \
--policy_class ACT --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 8 --dim_feedforward 3200 \
--num_epochs 2000  --lr 1e-5 \
--seed 0

🙎‍♂️Inference guide

Download the model/make sure it is in the target device(add it to GELLO/franka_ws/franka_models_testing/
Carefully configure the FrankaInference Node for any mismatches.

Launch the franka_fr3_controllers and also run franka_inference.

ros2 launch franka_fr3_arm_controllers franka_fr3_arm_controllers.launch.py robot_ip:=172.16.0.2 arm_id:=fr3 namespace:=prrobo  load_gripper:=true
ros2 run franka_models_testing franka_inference_node.py

🧑‍🔧Changes Made from Original Codebase:

Modified imitate_episodes.py to:
1. Integrate Weights & Biases (wandb) for logging all training/validation metrics.
2. Implement structured checkpoint saving:
  - Best (policy_best.ckpt)
  - Last (policy_last_<run_name>.ckpt)
  - Periodic (policy_epoch_<epoch>.ckpt)
3. Add support for pretrained weights via --pretrained_ckpt flag.
4. Auto-save plots (loss curves) with proper naming & push to wandb.
Modified utils_franka.py for dataset support:
1. Backward-compatible: auto-fills missing gripper data with zeros.
2. Improved video loading logic to handle .mp4 and .avi formats.
3. Added robust error handling and detailed logging for dataset issues.
Added change_videos.py:
1. Converts AV1-encoded videos to H.264 using ffmpeg.
2. Batch processing support, auto-skip converted files, tqdm for progress.
Updated eval_test.py to:
1. Generate videos showing GT vs predicted trajectories for each episode.
2. Output: {episode_name}_trajectories_over_epochs.mp4
New CLI/config options:
1. -pretrained_ckpt for loading existing model weights.
2. Auto-detects camera names, number of episodes, and lengths from dataset.

Diffusion 3d

🏋️Training guide

You will find a yaml file myenv_full.yaml and setup_env.sh in root dir (https://github.com/jeevesh-perceptyne/Diffusion3d)

conda env create -f myenv_full.yaml #Change the env_name && prefix in yaml if you want
chmod +x setup_env.sh
./setup_env.sh  # Install local editable packages

$*NOTE:*$ This is only on franka_hardware , Refer to INSTALL.md in the same repo for further required installations(vizualizer,sim env)

Now to train:

#On AWS (s3 bucket)
	  python3 train_modified.py --config-name=dp3.yaml
#On ws/pc/o
		python3 train.py --config-name=dp3.yaml

🙎‍♂️Inference guide

Install everything neccessary for training similar to above.

Install required dependencies:

pip install draccus pyrealsense2 franky-control

If you have your model on s3 bucket, specify by using —s3_bucket to download it from speified path. If you have it locally , pass None

Run inference_node.py directly on your local machine (not fully developed)
To run inference using a remote machine;(Real-time)
- Make sure you pass / provide correct(same) port on both sides.
- Run inference_server.py on remote where the model inference happens from the obtained obs and action is returned.
```
cd 3D-Diffusion-Policy
python3 inference_server.py  --config_path diffusion_policy_3d/config/dp3.yaml
```
- Run inference_client.py on local machine that is physically connected to cameras and robot. It will send the required obs to the remote machine.
```
cd 3D-Diffusion-Policy
python3 inference_client.py --cameras_config_path ../cameras.json --server_ip 192.168.3.20
```

To run inference offline:

Collect the data similarly while training.

ros2 launch franka_fr3_arm_controllers franka_fr3_arm_controllers.launch.py robot_ip:=172.16.0.2 arm_id:=fr3 namespace:=prrobo  load_gripper:=true
ros2 launch franka_gello_state_publisher main.launch.py
ros2 run frankapy_record franka_record_data --ros-args -p resume:=true --ros-args -p reset:=true

Convert to pcds if needed by using convert_dataset_to_pcd.py

python3 convert_dataset_to_pcd.py --workspace_bounds -0.1 0.8 -0.35 0.3 -0.1 0.8 --dataset_path test_recordings/

Run inference_test.py to load the model and forward pass all the collected data by providing --dataset_path (single episode)
```
cd 3D-Diffusion-Policy
python3 inference_test.py --config_path diffusion_policy_3d/config/dp3.yaml --latest
```
- writes output actions to a .npy file
After saving the actions by model(above step does it) tranfer it to pc connected to robot physically and run inference_test_client.pythat sends the actions to robot after doing smoothing.
- change actions_path variable to above saved .npy file and run:
```
cd 3D-Diffusion-Policy
python3 inference_test_client.py
```

🧑‍🔧Changes Made from original code:

Written a custom dataset in 3D-Diffusion-Policy/diffusion_policy_3d/dataset/franka_dataset.py
1. It takes the directory of dataset with point_clouds and what point clouds to consider , and creates a dataloader as expected.
2. custom dataset config in 3D-Diffusion-Policy/diffusion_policy_3d/config/franka_custom.yaml
Changed neccessary things in dp3.yaml
Changed train.py in 3D-Diffusion-Policy/ to:
1. save the models effectively in S3 bucket

🤹‍♂️Dataset Conversion to PCD:

Summary:

This script’s main class, DatasetConverter, takes as input the dataset path, camera calibration (cameras.json), and workspace bounds. It converts RGB-D frames to 3D point clouds, applies extrinsic transforms (static or dynamic), crops points within the robot workspace, and downsamples using farthest point sampling. Parallel processing options (--max_workers, --batch_size, --sequential) speed up frame, camera, and episode-level operations. CLI flags like --episodes and --workspace_bounds let you customize which data to process and spatial limits.

Detailed:

Created convert_dataset_to_pcd.py for converting RGB-D dataset to point cloud (.pcd) files:
1. Supports left, right, and wrist camera views.
2. Generates point clouds with:
  - Wrist-only (4000 points)
  - Merged (4000 points)
  - Merged (1024 points)
3. Converts depth images + intrinsics to 3D points (XYZ + RGB), applies extrinsics.
Implemented DatasetConverter class:
1. Takes dataset dir, cameras.json, target points, and workspace bounds.
2. Parallel frame & camera processing using ThreadPoolExecutor & ProcessPoolExecutor.
3. Crops points to robot workspace (default: x[0.3,1.0], y[-0.5,0.5], z[-0.1,0.8]).
Depth → Point Cloud conversion:
1. Wrist: precision depth (/10000), dynamic transform using robot EE pose.
2. Left/Right: static extrinsics, depth scaled by /1000.
Integrated Farthest Point Sampling:
1. Uses pytorch3d.sample_farthest_points (fallback to random).
2. Two FPS levels: 4000 pts (hi-res), 1024 pts (compressed).
Optimized parallelism:
1. Episodes processed in parallel (multiprocessing).
2. Per-frame and per-camera threading.
3. FPS and save ops run concurrently.
Output folder structure per episode:
- wrist_pcd/
- merged_4000/
- merged_1024/
Robust error handling:
1. Skips bad frames; logs all failures.
2. Detailed JSON logs for point counts, timings, errors.
CLI Interface:
- -episodes, -max_workers, -batch_size, -sequential, -workspace_bounds
Dependencies:
- open3d, opencv, pytorch3d, numpy, scipy, tqdm, concurrent.futures

Franka Total Documentation - jeevesh-perceptyne/Diffusion3d GitHub Wiki

ACT

🏋️Training guide

🙎‍♂️Inference guide

🧑‍🔧Changes Made from Original Codebase:

Diffusion 3d

🏋️Training guide

🙎‍♂️Inference guide

🧑‍🔧Changes Made from original code:

🤹‍♂️Dataset Conversion to PCD:

Summary:

Detailed:

⚠️ GitHub.com Fallback ⚠️

Franka Total Documentation - jeevesh-perceptyne/Diffusion3d GitHub Wiki

ACT

🏋️Training guide

🙎‍♂️Inference guide

🧑‍🔧Changes Made from Original Codebase:

Diffusion 3d

🏋️Training guide

🙎‍♂️Inference guide

🧑‍🔧Changes Made from original code:

🤹‍♂️Dataset Conversion to PCD:

Summary:

Detailed:

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️