LLMs (Large Language Models) - vintagedon/ai-ops-home-lab GitHub Wiki

🧠 AI Models in Our AI-Human Collaboration System

General Philosophy
Large Language Models (LLMs)
- DeepInfra LLMs
- Other LLMs
Embedding Models
Text-to-Image Models
Speech Recognition Models
Model Access

🧭 General Philosophy

The Project leverages a diverse set of AI models to optimize performance across various tasks. We primarily use models from DeepInfra, supplemented by other specialized models and services. This multi-model approach allows us to:

Balance Performance and Efficiency: By selecting models based on task complexity and type, we optimize for quality, speed, and cost.
Leverage Specialized Capabilities: Different models excel in different areas, allowing us to choose the best tool for each job.
Ensure Robustness: Using multiple models helps mitigate individual model biases and limitations.
Enable Multi-modal Interactions: Incorporating various model types (text, image, speech) enhances our system's versatility.

We chose DeepInfra as our primary platform due to its diverse model offerings, competitive pricing, and robust API that integrates well with our automation pipeline. Both ChatGPT and Claude.ai paid plans supplement this. Both plans have their usefulness.

🚀 Large Language Models (LLMs)

DeepInfra has a huge selection of models at a very reasonable price of $2.70/1m tokens (Sep 2024), which is absurdly cheap. Although I only used it 2 months out of August 2024, I had some incredibly long sessions a couple of times, and I didn't even break $1.

DeepInfra LLMs

Meta-Llama-3-70B-Instruct

Size: 70 billion parameters
Use Cases: Complex reasoning, task planning, generating detailed responses
Why We Use It: Our go-to model for high-level decision making and complex task decomposition in meta-sessions.

Meta-Llama-3-8B-Instruct

Size: 8 billion parameters
Use Cases: Faster responses for simpler tasks, initial drafts, brainstorming sessions
Why We Use It: Provides a good balance of speed and quality for less complex tasks and quick iterations.

Mixtral-8x7B-Instruct-v0.1

Architecture: Mixture-of-Experts (8 experts of 7 billion parameters each)
Use Cases: Versatile tasks, specialized subtasks, alternative perspective generation
Why We Use It: Its unique architecture adapts to a wide range of tasks, valuable for diverse applications.

Other LLMs

Claude.ai: Used for tasks requiring strong ethical reasoning or handling of sensitive information. Also a low-key monster at coding.
ChatGPT-4: Leveraged for its up-to-date knowledge and specific capabilities not found in other models.

🔤 Embedding Models

BAAI/bge-large-en-v1.5

Type: Text embedding model
Language: English
Use Cases: Semantic search, document clustering, similarity analysis
Why We Use It: Provides high-quality semantic embeddings for our text-based data, enhancing our information retrieval and analysis capabilities.

sentence-transformers/clip-ViT-B-32

Type: Multi-modal embedding model
Capability: Can embed both text and images
Use Cases: Cross-modal similarity, multi-modal information retrieval
Why We Use It: Enables us to create comparable embeddings across text and image modalities, useful for future multi-modal task expansions.

🎨 Text-to-Image Models

stabilityai/stable-diffusion-2-1

Type: Text-to-image generation model
Use Cases: Generating images from text descriptions, concept visualization
Why We Use It: Enhances our system's ability to create visual content, useful for diagram generation, concept illustration, and enriching textual outputs with relevant imagery.

🎤 Speech Recognition Models

openai/whisper-large

Type: Automatic speech recognition model
Use Cases: Transcribing audio to text, voice input processing
Why We Use It: Enables our system to work with audio inputs, expanding our capability to handle multi-modal data and potentially support voice-based interactions.

💻 Model Access

DeepInfra Models: Accessed primarily through our custom automation pipeline, which interfaces with the DeepInfra API.
Interactive Sessions: We use OpenWebUI for chat-based interactions with DeepInfra models, allowing for more dynamic and exploratory work.
Other Models and Services: Accessed through their respective official interfaces or APIs as needed.