Cleanlab: Automatically find and fix errors in your ML datasets.
Label Studio: Label Studio is a multi-type data labeling and annotation tool with standardized output format.
Roboflow: The world's largest collection of open source computer vision datasets and APIs.
Libraries
PyCaret: PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that exponentially speeds up the experiment cycle and makes you more productive.
Gradio: Gradio is an open-source Python library that is used to build machine learning and data science demos and web applications. Gradio is useful for demoing your machine learning models and deploying your models quickly with automatic shareable links.
Composer: Composer is a library for training neural networks better, faster, and cheaper. Note: num_workers: usually set this to the number of CPU cores in your machine divided by the number of GPUs.
DeepSpeed : DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
FairScale: FairScale is a PyTorch extension library for high performance and large scale training.
Frameworks
Ludwig: Ludwig is a declarative machine learning framework that makes it easy to define machine learning pipelines using a simple and flexible data-driven configuration system.
Candle: Candle is a minimalist ML framework for Rust with a focus on performance (including GPU support) and ease of use.
Distributed training
Horovod: Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
OpenAI's Whisper: The Whisper models are trained for speech recognition and translation tasks, capable of transcribing speech audio into the text in the language it is spoken (ASR) as well as translated into English (speech translation).
Image generation
Stable Diffusion: an image generation model that takes a text prompt and produces an image. Stable Diffusion can be freely downloaded.
DALL·E: DALL·E 2 is an AI system that can create realistic images and art from a description in natural language.
LLMs
Edge inference
Web LLM: WebLLM is a modular, customizable javascript package that directly brings language model chats directly onto web browsers with hardware acceleration. Everything runs inside the browser with no server support and accelerated with WebGPU.
Run locally
https://ollama.ai: Ollama makes it easy to host large language models locally.
gpt4all: A free-to-use, locally running, privacy-aware chatbot. No GPU or internet required: https://gpt4all.io
OpenWebUI: User-friendly WebUI for LLMs (Formerly Ollama WebUI)
Code Generation
CodeGen: CodeGen is an open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.
Code Llama: Code Llama is a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.
Stable Code 3B: Stable Code 3B is a 3 billion parameter Large Language Model (LLM), at a level on par with models such as CodeLLaMA 7b that are 2.5x larger. Operates offline even without a GPU on common laptops such as a MacBook Air.
OpenDevin: a platform for autonomous software engineers, powered by AI and LLMs. OpenDevin agents collaborate with human developers to write code, fix bugs, and ship features.
Milvus: Milvus is an open-source vector database built to power embedding similarity search and AI applications.
Model explainability
LIME: Local Interpretable Model-agnostic Explanations. It is an explanation technique that interprets an individual prediction locally.
SHAP: Shapley Additive Explanations. The key idea of SHAP is to calculate the Shapley values for each feature of the sample to be interpreted, where each Shapley value represents the impact that the feature to which it is associated, generates in the prediction.
GPUd, an AI-native GPU management utility that reduces GPU cluster unavailability by 4x. Developed at Lepton AI by experts with experience at Meta, Alibaba, and Uber, GPUd automates monitoring, diagnostics, and issue identification for GPUs: https://github.com/leptonai/gpud
Domain-Specific Computer Vision Applications - Large Vision Models (LVMs): https://landing.ai
OpenUI lets you describe UI using your imagination, then see it rendered live. You can ask for changes and convert HTML to React, Svelte, Web Components: https://github.com/wandb/openui