white_paper - RicoJia/notes GitHub Wiki
There is no right or wrong for what to do. It's ok if we don't have the same level of achievements as others, just do what you enjoy doing, man. Some are working in restaurants, some just want to chill. I like building robots. Nav and computer vision is a goal of the current phase. Building robots is also very simple: you run your programs on the robot and see what comes out of it. So relax, nothing will really impact it.
When we talk about work, We are living a life now. A life should have balance. Have a bi-focus for the week, each day, work on two things alternatively. Focus for 8 hours, no more. This period we have two difficulties: "aloneness" (because we need to find a team env), and "parallelism" (we need to balance the time between finding work and projects).
Along the way, you might be distracted by various things: people's new job postings, new job opportunities that attract you... But don't forget WHO YOU ARE - what you are interested in,
- Find a chinese immigrant community. Is there a group chat?
- Start a setup.sh for setting up a new laptop (D)
- Find a soccer team
- Document progress, and questions along the way (D)
- Need to find a co-working space, get to know some like-minded people.
- voice command in computer?
Learn CNN on coursera. (week 1) Day 1: Set up Nvidia Jetson Nano (D), Sept 15 Day 2: CNN basics (D) - Quiz - Jupyter Notebook from the ground up - Notes From Before (convolutional neural network) Day 3: CNN Classic Models (Week 2 stuff) - Review Resnet, inception network - Jupyter Notebook for resnet - Mini-Batches. Day 4: - Review Mobile net - Quiz - Jupyter Notebook 2 Day 5-7: - Set up Docker - Try training a resnet from scratch on nvidia orin - Try training google net? Day 10 Sept 25: CNN Object Detection Algos - Half Of Videos. Notes - Yolo V1 Paper Reading Day 11 Sept 26: CNN Object Detection - Half Of Videos, notes - Quiz - ResNet From Scratch Jupyter Notebook: Put together the model Day 12 Setp 27 - ResNet From Scratch Jupyter Notebook, training part 2; - data loading, try on orin - Half Of Videos, notes: UNET (D) - Quiz (D) Day 13, 14, transpose conv, what it means (D) Quiz (D) Example of Transpose with padding (D) Logistic Unit (D) Receptive Field (D) FCN (D) Day 15 Train A model for pixel-wise Image Segmentation. So I can filter out people in images. (week 1) - Set up sage maker (IP) - Try CARLA self-driving car dataset, (D) UNet Homework (D) Day 16 (Oct 1) ResNet Training Analysis: dataloading, verification on training loss, re-training YOLO Homework Smaller topics: - op determinism - Unet enhancement Day 17 (Oct 2, YOLO) YOLO Hw Day 18 (Oct 3,) Training Result Verification - Check the dataset. Make sure it has class_dir -> image structure. - Load data, see the images (0.5h) - Put the images in batches, then feed thru the model Day 19 (Oct 4) Separate Dataset and Data Loader Article (D) Face recognition vids (0.15 * 4 = 1h) Train A model for pixel-wise Image Segmentation. So I can filter out people in images. (week 1) - Understand the model, determine what data to use. Day 20 (Oct 6) Face Recognition Vids Unet Model Building, Data selection - Day 21 - Day 22: UNet model training. Adding tensorboard; run_lab, have option for jupyter turned off; create a .py version of the notebook (D) Pattern Vids (D) - Day 23: - Notes about neural transfer learning (D) - Quiz(D) - Day 24 (Oct 10, Thursday): - UNet Modifications - Programming Assignments1 - Notes for 4 structured learning vids - Day 25 (Oct 10, Thursday): - Programming Assignments1 - UNet notes - Read Conv Learning Paper
-
UNet Finish up (1 week, D)
-
RNN Back prop (3 days, D)
-
3 weeks for RNN coursera
-
Train DEEP LAB (2 weeks)
-
Buy and set up new computer, Lidar, new base
-
Google what below does:
- TensorFlow Serving / TorchServe: Serving models in production.
- TensorBoard: For monitoring training processes and visualizing metrics.
- Seaborn: For creating detailed plots and charts.
- Plotly: For interactive visualizations.
- DVC (Data Version Control): Managing datasets and model versions.
- Experience with transfer learning, fine-tuning models.
-
Impl batch norm, drop out. Classification setup
-
Olah's LSTM article (0.5 day)
-
Student T Distribution (0.5 day)
-
Write about t-SNE
-
Look at Autograd, how it works. https://pytorch.org/docs/stable/notes/autograd.html (1 day)
-
Set up Deblurring network. (1 week)
- Rerun 3D model with the deblurring network. (week 2)
-
Add to resume (3 days):
- Add pytorch, deep learning to resume
- Add a section on personal website to showcase your projects
-
Try SAM (1 week)
-
Try RGBD SLAM/ORB SLAM, the official one. (4 days)
-
Try fiftyone for sam2
-
More: Face recognition
-
Gradient clipping
Play around with Isaac Sim would be nice. Try 3D SLAM with Handwritten Math. (Week 3) Try 3D SLAM with stereo camera and build a 3D model of the place (Currently it's shitty)
- Try DeepLab V3+ network, make a wrapper for it. ✅
- Try SAM (1 week, 🔭)
- Edge dtection
- Smoothing
- Gradient Calculation
- Optical Flow
- OpenCV DNN
- Object Tracking
-
In-Depth scenario and ask how you would act, like a "flight simulator training"
-
Object Detection (唐宇迪)
- Mask RCNN: https://edu.51cto.com/course/20420.html
- Facebook Impl: https://github.com/facebookresearch/maskrcnn-benchmark
- YOLOV1
- YOLOV2
- YOLOV3
- YOLOV4
- YOLOV5
- EfficientNet
- Efficent Det
- Detr
- Deformable DETR
- Fcos
- YOLOV6
- YOLOV7
- YOLOV8
- Mask RCNN: https://edu.51cto.com/course/20420.html
-
Semantic Segmentation
- UNet
- U2Net?
- DeepLab V1, V2, V3
- Mask2former V1, V2
- SSD?
-
Object Tracking, Instance Segmentation, Semantic change Detection
-
Solid understanding of machine learning training/deployment pipelines and their implementation.
-
Deploying models with TensorRT and ONNX (Serve, Scale AI)
-
Multi-sensor feature extraction and fusion, object detection and tracking, 3D Estimation, and embodied AI with Transformer based models. (Serve)
-
Traversability prediction.
-
Writing and maintaining automated continuous integration tests (D)
-
dice focal and tversky losses?
-
mixed training: https://pytorch.org/blog/what-every-user-should-know-about-mixed-precision-training-in-pytorch/
-
Visual Transformer
- NLP theory: Tokenization and Embeddings:
- Know how images are represented as pixel grids and how they can be transformed into sequences of patches.
- Patch Embeddings: Understand how images are divided into patches and how each patch is embedded into a vector to serve as input tokens.
- Path:
- "Attention is All You Need"
- "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale", 2020
- NLP applications
- NLP theory: Tokenization and Embeddings:
Papers to Read (NLP concepts like tokenization, )
- Shallow layer Conv
- GRU
- LSTM (hard, vanishing gradient theory)
- LSTM: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Above and beyond GANs, VAEs for tasks like image synthesis.
- You're familiar with simulators such as Omniverse, OpenAI Gym, MuJoCo, Unity, or other video game environments.
- Design, train and deploy learning-based perception models for on-robot perception systems. Perception models should be able to do multi-modal learning capturing different semantics such as segmentation, object detection, scene understanding and tracking.
- Build foundational model for vision language and action that can exhibit good reasoning and maneuvering capability. Understands transformer based ML architecture really well.
- Deep Lab Implementation
- Create docker image (0.5h)
- Use rwthik?
- See if you are on orin nano. Not sure how to differentiate? 1. Rename dockerfile
- code
- Coco Data loader
- Deep Lab v3 custom implementation
- Finish the deep lab vids. Organize
- Might need to do resnet first
- Watch radeontop and nvidia smi
- Create your own folder, share trained weights on google.
- Create docker image (0.5h)