Franka Total Documentation - jeevesh-perceptyne/Diffusion3d GitHub Wiki

ACT

🏋️Training guide

ACT for Franka Recordings:

Create a conda env:

conda create -n franka_teleop python=3.8.10 -y 
conda activate franka_teleop 

pip install torchvision torch pyquaternion pyyaml rospkg pexpect mujoco==2.3.7 dm_control==1.0.14 opencv-python matplotlib einops packaging h5py ipython

Before training:

conda activate franka_teleop
cd detr/ && pip install -e .

To train ACT:

python3 imitate_episodes.py \
--dataset_dir ../franka_recordings/ \
--ckpt_dir outputs/ \
--policy_class ACT --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 8 --dim_feedforward 3200 \
--num_epochs 2000  --lr 1e-5 \
--seed 0

🙎‍♂️Inference guide

  1. Download the model/make sure it is in the target device(add it to GELLO/franka_ws/franka_models_testing/
  2. Carefully configure the FrankaInference Node for any mismatches.
  3. Launch the franka_fr3_controllers and also run franka_inference.
    ros2 launch franka_fr3_arm_controllers franka_fr3_arm_controllers.launch.py robot_ip:=172.16.0.2 arm_id:=fr3 namespace:=prrobo  load_gripper:=true
    ros2 run franka_models_testing franka_inference_node.py

🧑‍🔧Changes Made from Original Codebase:

  1. Modified imitate_episodes.py to:
    1. Integrate Weights & Biases (wandb) for logging all training/validation metrics.
    2. Implement structured checkpoint saving:
      • Best (policy_best.ckpt)
      • Last (policy_last_<run_name>.ckpt)
      • Periodic (policy_epoch_<epoch>.ckpt)
    3. Add support for pretrained weights via --pretrained_ckpt flag.
    4. Auto-save plots (loss curves) with proper naming & push to wandb.
  2. Modified utils_franka.py for dataset support:
    1. Backward-compatible: auto-fills missing gripper data with zeros.
    2. Improved video loading logic to handle .mp4 and .avi formats.
    3. Added robust error handling and detailed logging for dataset issues.
  3. Added change_videos.py:
    1. Converts AV1-encoded videos to H.264 using ffmpeg.
    2. Batch processing support, auto-skip converted files, tqdm for progress.
  4. Updated eval_test.py to:
    1. Generate videos showing GT vs predicted trajectories for each episode.
    2. Output: {episode_name}_trajectories_over_epochs.mp4
  5. New CLI/config options:
    1. -pretrained_ckpt for loading existing model weights.
    2. Auto-detects camera names, number of episodes, and lengths from dataset.



Diffusion 3d

🏋️Training guide

You will find a yaml file myenv_full.yaml and setup_env.sh in root dir (https://github.com/jeevesh-perceptyne/Diffusion3d)

conda env create -f myenv_full.yaml #Change the env_name && prefix in yaml if you want
chmod +x setup_env.sh
./setup_env.sh  # Install local editable packages

$*NOTE:*$ This is only on franka_hardware , Refer to INSTALL.md in the same repo for further required installations(vizualizer,sim env)

Now to train:

#On AWS (s3 bucket)
	  python3 train_modified.py --config-name=dp3.yaml
#On ws/pc/o
		python3 train.py --config-name=dp3.yaml

🙎‍♂️Inference guide

Install everything neccessary for training similar to above.

Install required dependencies:

pip install draccus pyrealsense2 franky-control

If you have your model on s3 bucket, specify by using —s3_bucket to download it from speified path. If you have it locally , pass None

  • Run inference_node.py directly on your local machine (not fully developed)

  • To run inference using a remote machine;(Real-time)

    • Make sure you pass / provide correct(same) port on both sides.

    • Run inference_server.py on remote where the model inference happens from the obtained obs and action is returned.

      cd 3D-Diffusion-Policy
      python3 inference_server.py  --config_path diffusion_policy_3d/config/dp3.yaml
    • Run inference_client.py on local machine that is physically connected to cameras and robot. It will send the required obs to the remote machine.

      cd 3D-Diffusion-Policy
      python3 inference_client.py --cameras_config_path ../cameras.json --server_ip 192.168.3.20
  • To run inference offline:

    • Collect the data similarly while training.

      ros2 launch franka_fr3_arm_controllers franka_fr3_arm_controllers.launch.py robot_ip:=172.16.0.2 arm_id:=fr3 namespace:=prrobo  load_gripper:=true
      ros2 launch franka_gello_state_publisher main.launch.py
      ros2 run frankapy_record franka_record_data --ros-args -p resume:=true --ros-args -p reset:=true
    • Convert to pcds if needed by using convert_dataset_to_pcd.py

      python3 convert_dataset_to_pcd.py --workspace_bounds -0.1 0.8 -0.35 0.3 -0.1 0.8 --dataset_path test_recordings/
    • Run inference_test.py to load the model and forward pass all the collected data by providing --dataset_path (single episode)

      cd 3D-Diffusion-Policy
      python3 inference_test.py --config_path diffusion_policy_3d/config/dp3.yaml --latest
      • writes output actions to a .npy file
    • After saving the actions by model(above step does it) tranfer it to pc connected to robot physically and run inference_test_client.pythat sends the actions to robot after doing smoothing.

      • change actions_path variable to above saved .npy file and run:
      cd 3D-Diffusion-Policy
      python3 inference_test_client.py

🧑‍🔧Changes Made from original code:

  1. Written a custom dataset in 3D-Diffusion-Policy/diffusion_policy_3d/dataset/franka_dataset.py
    1. It takes the directory of dataset with point_clouds and what point clouds to consider , and creates a dataloader as expected.
    2. custom dataset config in 3D-Diffusion-Policy/diffusion_policy_3d/config/franka_custom.yaml
  2. Changed neccessary things in dp3.yaml
  3. Changed train.py in 3D-Diffusion-Policy/ to:
    1. save the models effectively in S3 bucket

🤹‍♂️Dataset Conversion to PCD:

Summary:

This script’s main class, DatasetConverter, takes as input the dataset path, camera calibration (cameras.json), and workspace bounds. It converts RGB-D frames to 3D point clouds, applies extrinsic transforms (static or dynamic), crops points within the robot workspace, and downsamples using farthest point sampling. Parallel processing options (--max_workers, --batch_size, --sequential) speed up frame, camera, and episode-level operations. CLI flags like --episodes and --workspace_bounds let you customize which data to process and spatial limits.

Detailed:

  1. Created convert_dataset_to_pcd.py for converting RGB-D dataset to point cloud (.pcd) files:
    1. Supports left, right, and wrist camera views.
    2. Generates point clouds with:
      • Wrist-only (4000 points)
      • Merged (4000 points)
      • Merged (1024 points)
    3. Converts depth images + intrinsics to 3D points (XYZ + RGB), applies extrinsics.
  2. Implemented DatasetConverter class:
    1. Takes dataset dir, cameras.json, target points, and workspace bounds.
    2. Parallel frame & camera processing using ThreadPoolExecutor & ProcessPoolExecutor.
    3. Crops points to robot workspace (default: x[0.3,1.0], y[-0.5,0.5], z[-0.1,0.8]).
  3. Depth → Point Cloud conversion:
    1. Wrist: precision depth (/10000), dynamic transform using robot EE pose.
    2. Left/Right: static extrinsics, depth scaled by /1000.
  4. Integrated Farthest Point Sampling:
    1. Uses pytorch3d.sample_farthest_points (fallback to random).
    2. Two FPS levels: 4000 pts (hi-res), 1024 pts (compressed).
  5. Optimized parallelism:
    1. Episodes processed in parallel (multiprocessing).
    2. Per-frame and per-camera threading.
    3. FPS and save ops run concurrently.
  6. Output folder structure per episode:
    • wrist_pcd/
    • merged_4000/
    • merged_1024/
  7. Robust error handling:
    1. Skips bad frames; logs all failures.
    2. Detailed JSON logs for point counts, timings, errors.
  8. CLI Interface:
    • -episodes, -max_workers, -batch_size, -sequential, -workspace_bounds
  9. Dependencies:
    • open3d, opencv, pytorch3d, numpy, scipy, tqdm, concurrent.futures
⚠️ **GitHub.com Fallback** ⚠️