AI ML - chanandrew96/MyLearning GitHub Wiki

Download Models, Dataset

50 free Machine Learning Datasets: Image Datasets

Orange Data Mining

Kaggle

HuggingFace

OCR related

[Part 1] Evaluating Offline Handwritten Text Recognition: Which Machine Learning Model is the Winner?

TrOCR

LayoutLM / LayoutLMV2

LayoutLMV3

Hugging Face - LayoutLMV3

CRAFT-pytorch

For text detection

LASER

For tokenizer

CDLA

CDLA: A Chinese document layout analysis (CDLA) dataset
Chinese Labeled text data

PaddleOCR

【OCR入门】一、基于深度学习的OCR技术导论和PaddleOCR

HarvestText

HarvestText : A Toolkit for Text Mining and Preprocessing
GitHub - HarvestText
用HarvestText自动识别实体及别名,用于实体链接分析

NLP

「59页PDF」自然语言处理 NLP 基本概念大全(免费下载)
變形金剛與抱臉怪---NLP 應用開發之實戰 系列
# Day6-初探 Hugging Face Dataset Library

nlpaug

nlpaug - Github

Image Generate

Unconditional Image Generation - HuggingFace

Microsoft DALL‑E

Chatbot / Question Answering Model

Question Answering Model based on SQuAD

NLP on Flask web

手把手教你如何开发一个NLP机器学习模型,并将它部署在Flask的Web平台上(译)

Neural Network

用Python实现神经网络(附完整代码)!
神经网络15分钟入门!足够通俗易懂了吧
史上最详细循环神经网络讲解(RNN/LSTM/GRU)

CNN

Day 09:CNN 經典模型應用

Data Annotation / Labeling

Doccano

Label Studio

LabelImg

LabelImg 影像標註工具使用教學,製作深度學習用的資料集

Text Detection

clovaai/deep-text-recognition-benchmark

clovaai/CRAFT-pytorch

Donut

Hugging Face - Donut
GitHub - Donut

Visual NLP + Donut Model

Question Answering in Visual NLP: A Picture is Worth a Thousand Answers

Document Question Answering Model

Data Set

CUDA

SQUAD

DocVQA

arxiv - DocVQA

LongNet

arxiv - LongNet
arxiv PDF - LongNet
Microsoft’s LongNet Scales Transformer to One Billion Tokens
【骆驼读论文】微软发布1B长度的LongNet;长对话模型测评LongEval;工具模型测评ToolQA等12篇串读
GitHub - LongNet

DocQuery

Hugging Face Space - DocQuery: Document Query Engine
GitHub - DocQuery

CRF Model

PaLM (Deep Learning)

arxiv - PaLM
GitHub - PaLM
PaLM API & MakerSuite: an approachable way to start prototyping and building generative AI applications

LangChain (Question Answering Models)

GitHub - LangChain
4 Ways to Do Question Answering in LangChain
The easiest way to work with large language models?
tutorials-LangChain QA
LangChain 中文入门教程
LangChain - Quickstart
LangChain - Chains

Pipeline

到底什么是Pipeline?
机器学习tips:什么是pipeline?
Sklearn中Pipeline的用法介绍 (使用Pipelines简化Python机器学习代码)
Transformers从零到精通教程——Pipeline
使用HuggingFace的Transformers库的学习笔记(pipeline实战+官方readme文件的解读)

Transformer

Simple Transformer

QA

Transformers预训练模型使用:抽取式问答 Extractive Question Answering
Transformers库Question Answering任务样例

Question Answering

Qanary

Qanary手册-如何构建QA pipeline

txtai

训练 QA 模型

FiD

NLP实践——知识图谱问答模型FiD

BERT

中文文章的抽取式摘要—使用bert-extractive-summarizer
没有模型训练情况下用BERT做文本分类

KeyBERT

GitHub - KeyBERT

SpaCy

搜索引擎如何检索结果:Python和spaCy信息提取简介

Serverless & Text Extraction

Serverless 实战:如何结合 NLP 实现文本摘要和关键词提取?

Text Classification

DataScience_ArtificialIntelligence_Utils

OpenMMLab

OpenMMLab
GitHub - OpenMMLab

Pickle - Python

Python 对象序列化

GraphQL

GraphQL
一文看懂GraphQL 是什么?都有哪些优缺点- 红帽

Vector DB

What is a Vector Database & How Does it Work? Use Cases + Examples
The Top 5 Vector Databases
Vector DB 初探與 Weaviate DB 教學

Gemini

Google Gemini: What We Know So Far
Google新AI模型「Gemini」將推出!曝算力比GPT-4強五倍,能打敗OpenAI?

scikit-learn

What is scikit-learn and use cases of scikit-learn?
Who is using scikit-learn?
SciKit - Examples
Installing scikit-learn

Tensorflow

The Sequential model
The Functional API
TensorFlow Core
TensorFlow 2 quickstart for beginners
基本分类:对服装图像进行分类
Introduction to Tensors
【動手玩系列#2】TensorFlow 帶你無師自通成為植物學家
Keras和Tensorflow(CPU)安装、Pytorch(CPU和GPU)安装以及jupyter使用虚拟环境
The Difference Between The Augmentor Library and TensorFlow’s flow_from_directory

Keras

Introduction to Keras for engineers
Keras Tutorial For Beginners | Keras For Deep Learning | Deep Learning Tutorial | Simplilearn

Gensim (NLP)

15分钟入门Gensim
Gensim 官方文件學習筆記

Python音頻預處理

使用Python对音频文件进行数据预处理
音频预处理(数据增强方法总结)
Cocos2d-x中使用音频CocosDenshion引擎介绍与音频文件的预处理

Fine-tuning

ChatGPT 應用系統開發(二) -- 微調(Fine-tuning)企業專屬的模型

BART

Multi-Document Summarization with BART

AutoML

Google 機器學習三大服務:AutoML, Cloud ML Engine, ML API 介紹與比較

word2vec

CSDN - word2vec
从Word2Vec到BERT:上下文嵌入 (Contextual Embedding) 最新综述论文.pdf

DALI

NVIDIA DALI从入门到放弃之三:Data Loading

Text-To-Image

Generative AI 新世界 | 走进文生图(Text-to-Image)领域
一分钟跑出 AI 图像的生成平台(StableStudio)

AutoGPT

拥有自我意识的AI:AutoGPT | 得物技术

LLMs

Generative AI 新世界 | 大型语言模型(LLMs)概述

kwprocessor

Github - kwprocessor

Feature Engineering

特征工程到底是什么?
特征工程到底是什么?
特征工程到底是什么?

Kaggle

Kaggle入门,看这一篇就够了
[資料分析&機器學習] 第1.3講:Kaggle介紹

PyTorch

Learn PyTorch for deep learning in a day. Literally.
Pytorch 基本介紹與教學
【深度学习理论】一文搞透pytorch中的tensor、autograd、反向传播和计算图
什麼是 PyTorch:完整指南
PyTorch 简介
深度學習新手村:PyTorch入門
手刻 Deep Learning -第壹章-PyTorch入門教學-基礎概念與再探線性迴歸

Deep Learning (DL)

深度学习之 Keras vs Tensorflow vs Pytorch 三种深度学习框架
Deep Learning for Natural Language Processing with Pytorch
手刻 Deep Learning — 第零章 — 線性回歸
手刻 Deep Learning — 第零章 — 微分觀念入門

ImageNet

Download ImageNet
ImageNet Large Scale Visual Recognition Challenge (ILSVRC)

Numpy

NumPy: Get the number of dimensions, shape, and size of ndarray

SIFT特征

SIFT特征点
SIFT特征提取分析
SIFT 特征详解
SIFT - Scale-Invariant Feature Transform
Sift算子特征点提取、描述及匹配全流程解析

PCA-SIFT

PCA-SIFT

SIFT vs SURF

What are SIFT and SURF?

LIME

复杂模型局部可解释方法——LIME
模型无关可解释方法LIME 和SHAP概述

Model Explainability 可解釋模型

A Look Into Global, Cohort and Local Model Explainability
Local Model-Agnostic Methods

Machine Learning (ML) Model

How to Build ML Models in a Local Machine and Google Colab at the Same Time?

Softmax

一文详解Softmax函数

Learning Materials

🔥Data Science Full Course 2022 | Data Science | Data Science For Beginners | Simplilearn
How Deep Neural Networks Work - Full Course for Beginners
w3school - Machine Learning
w3school CN - AI人工智能 最常见的机器学习算法
巨樹磐石
初學者學演算法|從時間複雜度認識常見演算法