📊 Popular Generative AI Models - CarrieKroutil/generative-ai-sandbox GitHub Wiki
Key generative AI models used today; these are grouped by generated AI area types: language, image, and code generation.
Name | Description | Area |
---|---|---|
Generative Pre-trained Transformer (GPT) | A large language model developed by OpenAI and trained on a massive dataset of text and code can generate text, translate languages, write various kinds of creative content, and answer your questions informatively. GPT4-Omni (more commonly referred to as GPT-4o) is a multimodal model. At the time of writing, it is the latest version and is a significant upgrade from GPT-4, offering speed, cost, and capability improvements. | Language/multimodal |
Llama 3 | Meta recently released the third version of a natural large language model, open-sourced under a special license. The models come in various sizes and have varying capabilities. | Language |
Claude 3 | Anthropic has introduced the Claude 3 model family, which includes Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus. These models offer a range of capabilities, with Opus being the most intelligent. It is capable of complex tasks and exhibits near-human comprehension and fluency levels. Like OpenAI’s ChatGPT, Claude can generate text, write code, summarize, and reason, among other things, for a given prompt. | Language |
Cohere Command | Cohere offers two models (Command R and Command R+) as part of its Command family. While these LLMMS are optimized for various use cases, Cohere’s newest large language model, Command R+, is optimized for conversational interaction and long-context tasks. It is designed to be highly performant for complex retrieval-augmented generation (RAG) workflows and multistep tool use. | Language |
Mistral | Mistral Large Language Models are advanced AI models designed for text generation and other language tasks. They have models in different sizes from a collection of open source models (Mistral–7B, 8x7B, and 8x22B) and optimized commercial models (Mistral Small, Medium, and Large), each tailored for different reasoning complexities and workloads. | Language |
Gemini | Gemini is Google’s new multimodal model that can understand text, images, videos, and audio. It will be available in different sizes (Ultra, Pro, and Nano), each with different capabilities. | Language/multimodal |
DALL·E | Visual AI model developed by OpenAI that can create realistic images from text prompts. | Image |
Stable Diffusion | Open source image generation model that generates images from a prompt as input. It is primarily used to generate detailed images conditioned on text descriptions and can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations. | Image |
Midjourney | An image generation model using natural language prompts from a startup called Midjourney, Inc., similar to OpenAI’s DALL·E and Stable Diffusion. | Image |
CodeWhisperer | CodeWhisperer is an AWS code-generation model that can generate code in several programming languages, including Python, Java, JavaScript, and TypeScript. | Code |
CodeLlama | CodeLlama is a large language model built on Llama 2 and specifically trained on code. It is available in various sizes and supports multiple popular programming languages. | Code |
Codex | A large language model is trained specifically on code and used to help with code generation. It supports over a dozen programming languages, including some of the more commonly used, such as C#, Java, Python, JavaScript, SQL, Go, PHP, and Shell, among others. | Code |