LLMs (Large Language Models) - vintagedon/ai-ops-home-lab GitHub Wiki

🧠 AI Models in Our AI-Human Collaboration System

Table of Contents

🧭 General Philosophy

The Project leverages a diverse set of AI models to optimize performance across various tasks. We primarily use models from DeepInfra, supplemented by other specialized models and services. This multi-model approach allows us to:

  1. Balance Performance and Efficiency: By selecting models based on task complexity and type, we optimize for quality, speed, and cost.
  2. Leverage Specialized Capabilities: Different models excel in different areas, allowing us to choose the best tool for each job.
  3. Ensure Robustness: Using multiple models helps mitigate individual model biases and limitations.
  4. Enable Multi-modal Interactions: Incorporating various model types (text, image, speech) enhances our system's versatility.

We chose DeepInfra as our primary platform due to its diverse model offerings, competitive pricing, and robust API that integrates well with our automation pipeline. Both ChatGPT and Claude.ai paid plans supplement this. Both plans have their usefulness.

🚀 Large Language Models (LLMs)

DeepInfra has a huge selection of models at a very reasonable price of $2.70/1m tokens (Sep 2024), which is absurdly cheap. Although I only used it 2 months out of August 2024, I had some incredibly long sessions a couple of times, and I didn't even break $1.

image

DeepInfra LLMs

Meta-Llama-3-70B-Instruct

  • Size: 70 billion parameters
  • Use Cases: Complex reasoning, task planning, generating detailed responses
  • Why We Use It: Our go-to model for high-level decision making and complex task decomposition in meta-sessions.

Meta-Llama-3-8B-Instruct

  • Size: 8 billion parameters
  • Use Cases: Faster responses for simpler tasks, initial drafts, brainstorming sessions
  • Why We Use It: Provides a good balance of speed and quality for less complex tasks and quick iterations.

Mixtral-8x7B-Instruct-v0.1

  • Architecture: Mixture-of-Experts (8 experts of 7 billion parameters each)
  • Use Cases: Versatile tasks, specialized subtasks, alternative perspective generation
  • Why We Use It: Its unique architecture adapts to a wide range of tasks, valuable for diverse applications.

Other LLMs

  • Claude.ai: Used for tasks requiring strong ethical reasoning or handling of sensitive information. Also a low-key monster at coding.
  • ChatGPT-4: Leveraged for its up-to-date knowledge and specific capabilities not found in other models.

🔤 Embedding Models

BAAI/bge-large-en-v1.5

  • Type: Text embedding model
  • Language: English
  • Use Cases: Semantic search, document clustering, similarity analysis
  • Why We Use It: Provides high-quality semantic embeddings for our text-based data, enhancing our information retrieval and analysis capabilities.

sentence-transformers/clip-ViT-B-32

  • Type: Multi-modal embedding model
  • Capability: Can embed both text and images
  • Use Cases: Cross-modal similarity, multi-modal information retrieval
  • Why We Use It: Enables us to create comparable embeddings across text and image modalities, useful for future multi-modal task expansions.

🎨 Text-to-Image Models

stabilityai/stable-diffusion-2-1

  • Type: Text-to-image generation model
  • Use Cases: Generating images from text descriptions, concept visualization
  • Why We Use It: Enhances our system's ability to create visual content, useful for diagram generation, concept illustration, and enriching textual outputs with relevant imagery.

🎤 Speech Recognition Models

openai/whisper-large

  • Type: Automatic speech recognition model
  • Use Cases: Transcribing audio to text, voice input processing
  • Why We Use It: Enables our system to work with audio inputs, expanding our capability to handle multi-modal data and potentially support voice-based interactions.

💻 Model Access

  • DeepInfra Models: Accessed primarily through our custom automation pipeline, which interfaces with the DeepInfra API.
  • Interactive Sessions: We use OpenWebUI for chat-based interactions with DeepInfra models, allowing for more dynamic and exploratory work.
  • Other Models and Services: Accessed through their respective official interfaces or APIs as needed.