📊 Popular Generative AI Models - CarrieKroutil/generative-ai-sandbox GitHub Wiki

Key generative AI models used today; these are grouped by generated AI area types: language, image, and code generation.

Name Description Area
Generative Pre-trained Transformer (GPT) A large language model developed by OpenAI and trained on a massive dataset of text and code can generate text, translate languages, write various kinds of creative content, and answer your questions informatively. GPT4-Omni (more commonly referred to as GPT-4o) is a multimodal model. At the time of writing, it is the latest version and is a significant upgrade from GPT-4, offering speed, cost, and capability improvements. Language/multimodal
Llama 3 Meta recently released the third version of a natural large language model, open-sourced under a special license. The models come in various sizes and have varying capabilities. Language
Claude 3 Anthropic has introduced the Claude 3 model family, which includes Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus. These models offer a range of capabilities, with Opus being the most intelligent. It is capable of complex tasks and exhibits near-human comprehension and fluency levels. Like OpenAI’s ChatGPT, Claude can generate text, write code, summarize, and reason, among other things, for a given prompt. Language
Cohere Command Cohere offers two models (Command R and Command R+) as part of its Command family. While these LLMMS are optimized for various use cases, Cohere’s newest large language model, Command R+, is optimized for conversational interaction and long-context tasks. It is designed to be highly performant for complex retrieval-augmented generation (RAG) workflows and multistep tool use. Language
Mistral Mistral Large Language Models are advanced AI models designed for text generation and other language tasks. They have models in different sizes from a collection of open source models (Mistral–7B, 8x7B, and 8x22B) and optimized commercial models (Mistral Small, Medium, and Large), each tailored for different reasoning complexities and workloads. Language
Gemini Gemini is Google’s new multimodal model that can understand text, images, videos, and audio. It will be available in different sizes (Ultra, Pro, and Nano), each with different capabilities. Language/multimodal
DALL·E Visual AI model developed by OpenAI that can create realistic images from text prompts. Image
Stable Diffusion Open source image generation model that generates images from a prompt as input. It is primarily used to generate detailed images conditioned on text descriptions and can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations. Image
Midjourney An image generation model using natural language prompts from a startup called Midjourney, Inc., similar to OpenAI’s DALL·E and Stable Diffusion. Image
CodeWhisperer CodeWhisperer is an AWS code-generation model that can generate code in several programming languages, including Python, Java, JavaScript, and TypeScript. Code
CodeLlama CodeLlama is a large language model built on Llama 2 and specifically trained on code. It is available in various sizes and supports multiple popular programming languages. Code
Codex A large language model is trained specifically on code and used to help with code generation. It supports over a dozen programming languages, including some of the more commonly used, such as C#, Java, Python, JavaScript, SQL, Go, PHP, and Shell, among others. Code