Franka Total Documentation - jeevesh-perceptyne/Diffusion3d GitHub Wiki
ACT for Franka Recordings:
Create a conda env:
conda create -n franka_teleop python=3.8.10 -y
conda activate franka_teleop
pip install torchvision torch pyquaternion pyyaml rospkg pexpect mujoco==2.3.7 dm_control==1.0.14 opencv-python matplotlib einops packaging h5py ipython
Before training:
conda activate franka_teleop
cd detr/ && pip install -e .
To train ACT:
python3 imitate_episodes.py \
--dataset_dir ../franka_recordings/ \
--ckpt_dir outputs/ \
--policy_class ACT --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 8 --dim_feedforward 3200 \
--num_epochs 2000 --lr 1e-5 \
--seed 0
- Download the model/make sure it is in the target device(add it to
GELLO/franka_ws/franka_models_testing/
- Carefully configure the
FrankaInference Node
for any mismatches. - Launch the
franka_fr3_controllers
and also runfranka_inference
.ros2 launch franka_fr3_arm_controllers franka_fr3_arm_controllers.launch.py robot_ip:=172.16.0.2 arm_id:=fr3 namespace:=prrobo load_gripper:=true ros2 run franka_models_testing franka_inference_node.py
- Modified
imitate_episodes.py
to:- Integrate Weights & Biases (wandb) for logging all training/validation metrics.
- Implement structured checkpoint saving:
- Best (
policy_best.ckpt
) - Last (
policy_last_<run_name>.ckpt
) - Periodic (
policy_epoch_<epoch>.ckpt
)
- Best (
- Add support for pretrained weights via
--pretrained_ckpt
flag. - Auto-save plots (loss curves) with proper naming & push to wandb.
- Modified
utils_franka.py
for dataset support:- Backward-compatible: auto-fills missing gripper data with zeros.
- Improved video loading logic to handle
.mp4
and.avi
formats. - Added robust error handling and detailed logging for dataset issues.
- Added
change_videos.py
:- Converts AV1-encoded videos to H.264 using
ffmpeg
. - Batch processing support, auto-skip converted files, tqdm for progress.
- Converts AV1-encoded videos to H.264 using
- Updated
eval_test.py
to:- Generate videos showing GT vs predicted trajectories for each episode.
- Output:
{episode_name}_trajectories_over_epochs.mp4
- New CLI/config options:
-
-pretrained_ckpt
for loading existing model weights. - Auto-detects camera names, number of episodes, and lengths from dataset.
-
You will find a yaml file myenv_full.yaml
and setup_env.sh
in root dir (https://github.com/jeevesh-perceptyne/Diffusion3d)
conda env create -f myenv_full.yaml #Change the env_name && prefix in yaml if you want
chmod +x setup_env.sh
./setup_env.sh # Install local editable packages
$*NOTE:*$
This is only on franka_hardware , Refer to INSTALL.md
in the same repo for further required installations(vizualizer,sim env)
Now to train:
#On AWS (s3 bucket)
python3 train_modified.py --config-name=dp3.yaml
#On ws/pc/o
python3 train.py --config-name=dp3.yaml
Install everything neccessary for training similar to above.
Install required dependencies:
pip install draccus pyrealsense2 franky-control
If you have your model on s3 bucket, specify by using —s3_bucket to download it from speified path. If you have it locally , pass None
-
Run
inference_node.py
directly on your local machine (not fully developed) -
To run inference using a remote machine;(Real-time)
-
Make sure you pass / provide correct(same) port on both sides.
-
Run
inference_server.py
on remote where the model inference happens from the obtained obs and action is returned.cd 3D-Diffusion-Policy python3 inference_server.py --config_path diffusion_policy_3d/config/dp3.yaml
-
Run
inference_client.py
on local machine that is physically connected to cameras and robot. It will send the required obs to the remote machine.cd 3D-Diffusion-Policy python3 inference_client.py --cameras_config_path ../cameras.json --server_ip 192.168.3.20
-
-
To run inference offline:
-
Collect the data similarly while training.
ros2 launch franka_fr3_arm_controllers franka_fr3_arm_controllers.launch.py robot_ip:=172.16.0.2 arm_id:=fr3 namespace:=prrobo load_gripper:=true ros2 launch franka_gello_state_publisher main.launch.py ros2 run frankapy_record franka_record_data --ros-args -p resume:=true --ros-args -p reset:=true
-
Convert to pcds if needed by using
convert_dataset_to_pcd.py
python3 convert_dataset_to_pcd.py --workspace_bounds -0.1 0.8 -0.35 0.3 -0.1 0.8 --dataset_path test_recordings/
-
Run
inference_test.py
to load the model and forward pass all the collected data by providing--dataset_path
(single episode)cd 3D-Diffusion-Policy python3 inference_test.py --config_path diffusion_policy_3d/config/dp3.yaml --latest
- writes output actions to a .npy file
-
After saving the actions by model(above step does it) tranfer it to pc connected to robot physically and run
inference_test_client.py
that sends the actions to robot after doing smoothing.- change
actions_path
variable to above saved .npy file and run:
cd 3D-Diffusion-Policy python3 inference_test_client.py
- change
-
- Written a custom dataset in
3D-Diffusion-Policy/diffusion_policy_3d/dataset/franka_dataset.py
- It takes the directory of dataset with point_clouds and what point clouds to consider , and creates a dataloader as expected.
- custom dataset config in
3D-Diffusion-Policy/diffusion_policy_3d/config/franka_custom.yaml
- Changed neccessary things in
dp3.yaml
- Changed
train.py
in3D-Diffusion-Policy/
to:- save the models effectively in S3 bucket
This script’s main class, DatasetConverter
, takes as input the dataset path, camera calibration (cameras.json
), and workspace bounds. It converts RGB-D frames to 3D point clouds, applies extrinsic transforms (static or dynamic), crops points within the robot workspace, and downsamples using farthest point sampling. Parallel processing options (--max_workers
, --batch_size
, --sequential
) speed up frame, camera, and episode-level operations. CLI flags like --episodes
and --workspace_bounds
let you customize which data to process and spatial limits.
- Created
convert_dataset_to_pcd.py
for converting RGB-D dataset to point cloud (.pcd
) files:- Supports left, right, and wrist camera views.
- Generates point clouds with:
- Wrist-only (4000 points)
- Merged (4000 points)
- Merged (1024 points)
- Converts depth images + intrinsics to 3D points (XYZ + RGB), applies extrinsics.
- Implemented
DatasetConverter
class:- Takes dataset dir,
cameras.json
, target points, and workspace bounds. - Parallel frame & camera processing using
ThreadPoolExecutor
&ProcessPoolExecutor
. - Crops points to robot workspace (default: x[0.3,1.0], y[-0.5,0.5], z[-0.1,0.8]).
- Takes dataset dir,
- Depth → Point Cloud conversion:
- Wrist: precision depth (/10000), dynamic transform using robot EE pose.
- Left/Right: static extrinsics, depth scaled by /1000.
- Integrated Farthest Point Sampling:
- Uses
pytorch3d.sample_farthest_points
(fallback to random). - Two FPS levels: 4000 pts (hi-res), 1024 pts (compressed).
- Uses
- Optimized parallelism:
- Episodes processed in parallel (multiprocessing).
- Per-frame and per-camera threading.
- FPS and save ops run concurrently.
- Output folder structure per episode:
wrist_pcd/
merged_4000/
merged_1024/
- Robust error handling:
- Skips bad frames; logs all failures.
- Detailed JSON logs for point counts, timings, errors.
- CLI Interface:
-
-episodes
,-max_workers
,-batch_size
,-sequential
,-workspace_bounds
-
- Dependencies:
-
open3d
,opencv
,pytorch3d
,numpy
,scipy
,tqdm
,concurrent.futures
-