white_paper - RicoJia/notes GitHub Wiki
There is no right or wrong for what to do. It's ok if we don't have the same level of achievements as others, just do what you enjoy doing, man. Some are working in restaurants, some just want to chill. I like building robots. Nav and computer vision is a goal of the current phase. Building robots is also very simple: you run your programs on the robot and see what comes out of it. So relax, nothing will really impact it.
When we talk about work, We are living a life now. A life should have balance. Have a bi-focus for the week, each day, work on two things alternatively. Focus for 8 hours, no more. This period we have two difficulties: "aloneness" (because we need to find a team env), and "parallelism" (we need to balance the time between finding work and projects).
Along the way, you might be distracted by various things: people's new job postings, new job opportunities that attract you... But don't forget WHO YOU ARE and what you are interested in
- Mumble Rover: a ROS 2 rover, with 3D SLAM, 3D Localization, and 3D Navigation. This should be on a simulator first, then on a real robot (with at least a new microSD card, or even SSD), Rpi5 (oct.2023)
- ROS 2 Notes
- ROS 2 is a high priority:
- Create an image for navigation.
- Gazebo, or IsaacSim?
- Create an image + byobu
- Create an image for navigation.
- Physical Robot:
- Test program TODO (HIGH)
- between rpi and rover IMU
- Between nvidia nano and the IMU
- Burning Question: is the IMU on the waveshare good enough? IMU Calibration.
- Test program TODO (HIGH)
- The slam class: 2D SLAM, 3D SLAM
- Coursera Study: All the way to transformer (Nov 26 - Jan 2, D)
- The transformer project (Training has taken: 397h). Need 580h (24 days) to reach 0.054. 6 more hours. Then release the weights
- Deep Learning Hands-On: train MobileNet V2, DeepLab V3+, ViT, YOLO, SSD.
- Train on Pascal VOC for single-label classification (D)
- Train on coco dataset for multi-label classification (D)
- Assemble and train the MobileNet V2 framework (On-hold, Can unblock on Jan 3 - Jan 6, 8h input)
- Try ViT model (2 work week project, can be on-hold until slam class finishes IMU, Integration, and 2D SLAM)
- Try the exisitng ViT model and weights on object detection. (5h)
- Try an existing backbone while putting together a transformer (25h)
- Train another backbone (e.g., VGG) (optional, 5h + 20h training hours)
- Udacity Machine Learning
During work days, 2h for personal growth (8am - 10 am) during weekends, work hours 5-8h. 15h is safe bet, 20h is a good goal. Friday:
- Grocery Shopping (1.5h) Saturday:
- Houston Robotics Events (1:00pm - 5:00pm)
- Cooking (1.5h) Sunday will be an off day for me.
- Find a chinese immigrant community. Is there a group chat?
- Start a setup.sh for setting up a new laptop (D)
- Find a soccer team
- Document progress, and questions along the way (D)
- Need to find a co-working space, get to know some like-minded people (D)
-
Lidar, new base
-
Google what below does:
- TensorFlow Serving / TorchServe: Serving models in production.
- TensorBoard: For monitoring training processes and visualizing metrics.
- Seaborn: For creating detailed plots and charts.
- DVC (Data Version Control): Managing datasets and model versions.
- Experience with transfer learning, fine-tuning models.
-
Olah's LSTM article (0.5 day)
-
Student T Distribution (0.5 day)
-
Write about t-SNE
-
Look at Autograd, how it works. https://pytorch.org/docs/stable/notes/autograd.html (1 day)
-
Set up Deblurring network. (1 week)
- Rerun 3D model with the deblurring network. (week 2)
-
Add to resume (3 days):
- Add pytorch, deep learning to resume
- Add a section on personal website to showcase your projects
-
Try SAM (1 week)
-
Try RGBD SLAM/ORB SLAM, the official one. (4 days)
-
Try fiftyone for sam2
-
More: Face recognition
-
Gradient clipping
-
TODO: Try backward prop for layer norm. See Karpathy's tutorial.
Try 3D SLAM with Handwritten Math. (Week 3) Try 3D SLAM with stereo camera and build a 3D model of the place (Week 3)
Test Motors on robot, so we can drive it with odom (Week 4) Build a costmap of this place with the 3D Model. Then, use SBPL with TEB as the local planner. (Week 4)
(This should give you a quick result on the robot side of things) Then, develop an online local 3D Model on the robot? (Other methods, like ORB-SLAM2) Play around with Isaac Sim would be nice. Then, explore ROS2 navigation. ROS Navigation, OpenVSLAM
For second iteration of deep learning, we should try implementing: LeNet-5, AlexNet, VGG-16, etc.
- Edge dtection
- Smoothing
- Gradient Calculation
- Optical Flow
- OpenCV DNN
- Object Tracking
-
In-Depth scenario and ask how you would act, like a "flight simulator training"
-
Object Detection (唐宇迪)
- Mask RCNN: https://edu.51cto.com/course/20420.html
- Facebook Impl: https://github.com/facebookresearch/maskrcnn-benchmark
- YOLOV1
- YOLOV2
- YOLOV3
- YOLOV4
- YOLOV5
- EfficientNet
- Efficent Det
- Detr
- Deformable DETR
- Fcos
- YOLOV6
- YOLOV7
- YOLOV8
- Mask RCNN: https://edu.51cto.com/course/20420.html
-
Semantic Segmentation
- UNet
- U2Net?
- DeepLab V1, V2, V3
- Mask2former V1, V2
- SSD?
-
Object Tracking, Instance Segmentation, Semantic change Detection
-
Solid understanding of machine learning training/deployment pipelines and their implementation.
-
Deploying models with TensorRT and ONNX (Serve, Scale AI)
-
Multi-sensor feature extraction and fusion, object detection and tracking, 3D Estimation, and embodied AI with Transformer based models. (Serve)
-
Traversability prediction.
-
Writing and maintaining automated continuous integration tests (D)
-
dice focal and tversky losses?
-
mixed training: https://pytorch.org/blog/what-every-user-should-know-about-mixed-precision-training-in-pytorch/
-
Visual Transformer
- NLP theory: Tokenization and Embeddings:
- Know how images are represented as pixel grids and how they can be transformed into sequences of patches.
- Patch Embeddings: Understand how images are divided into patches and how each patch is embedded into a vector to serve as input tokens.
- Path:
- "Attention is All You Need"
- "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale", 2020
- NLP applications
- NLP theory: Tokenization and Embeddings:
Papers to Read (NLP concepts like tokenization, )
- Shallow layer Conv
- GRU
- LSTM (hard, vanishing gradient theory)
- LSTM: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Above and beyond GANs, VAEs for tasks like image synthesis.
- You're familiar with simulators such as Omniverse, OpenAI Gym, MuJoCo, Unity, or other video game environments.
- Design, train and deploy learning-based perception models for on-robot perception systems. Perception models should be able to do multi-modal learning capturing different semantics such as segmentation, object detection, scene understanding and tracking.
- Build foundational model for vision language and action that can exhibit good reasoning and maneuvering capability. Understands transformer based ML architecture really well.
- Deep Lab Implementation
- Create docker image (0.5h)
- Use rwthik?
- See if you are on orin nano. Not sure how to differentiate? 1. Rename dockerfile
- code
- Coco Data loader
- Deep Lab v3 custom implementation
- Finish the deep lab vids. Organize
- Might need to do resnet first
- Watch radeontop and nvidia smi
- Create your own folder, share trained weights on google.
- Create docker image (0.5h)
- RGBD Slam: Being able to see rgbd slam integration, with hand-written optimization.