Page Index - PrototypeJam/lake_merritt GitHub Wiki
111 page(s) in this GitHub Wiki:
- Home
- PROMPT:
- YOUR TASK:
- Output (Gemini)
- 1. Summary of prototypejam-agento_app.txt
- 2. Detailed Plan for "Modular Python Streamlit App for Evals Initial Prototype"
- 3. Somewhat Detailed Plan for SHORT TERM ToDo Features (Page 2 Enhancements)
- 4. Somewhat Detailed Plan for ROADMAP of LONGER TERM ASPIRATIONS
- PROMPT
- OUTPUT (Gemini)
- Project Overview: A General-Purpose, Modular, and Extensible AI Evaluation Platform
- User Flow and UI/UX Perspective (Initial Prototype)
- Roadmap (What & Why - Not How)
- APPROACHES TO IMPLEMENT THESE HIGHER LEVEL PROJECT REQUIREMENTS FOLLOW:
- ChatGPT-o3
- 1 Directory / File Layout
- 2 Module Responsibilities
- 3 Extensibility & Future Features Hooks
- 4 Immediate Next Steps
- Gemini 2.5 Pro 06-05 (newest model)
- Architectural Approach
- Directory Structure
- High-Level Content of Each File/Module
- app.py
- pages/1_System_Configuration.py
- pages/2_Evaluation_Setup.py
- pages/3_View_Results.py
- pages/4_Download_Center.py
- Core Logic (src/) Files
- src/data_models.py
- src/llm_clients/
- src/scoring/
- src/generation.py
- src/evaluation.py
- src/data_io.py
- src/utils.py
- Claude Opus 4 (newest Model)
- Directory Structure
- File Contents Overview
- app.py
- config/settings.py
- config/schemas.py
- core/models/base.py
- core/scoring/base.py
- core/evaluation/runner.py
- ui/pages/1_system_config.py
- Key Design Decisions and Future-Proofing
- 1. Modular Architecture with Clear Interfaces
- 2. Abstract Base Classes for Providers and Scorers
- 3. Pydantic Models for Data Exchange
- 4. Centralized Retry Logic
- 5. Session State Abstraction
- 6. Component-Based UI
- 7. Prepared for Async Operations
- 8. Logging and Observability Ready
- 9. Feature Flags via Settings
- 10. Test Structure Mirrors Code Structure
- ASSESSMENT/CRITIQUE OF POTENTIAL APPROACHES, and PROPOSED WAY FORWARD FOLLOWS
- PROMPT
- OUTPUT: Gemini 2.5 Pro 06-05
- 1. Assessment and Critique of Each Approach
- Approach 1: ChatGPT-o3
- Approach 2: Gemini
- Approach 3: Claude Opus
- 2. The Ideal Approach: A Hybrid, with Gemini as the Foundation
- WHY This Hybrid Approach is Best
- OUTPUT - Claude Opus 4
- 1. Individual Approach Assessment
- ChatGPT-o3 Approach
- Gemini 2.5 Pro Approach
- Claude Opus 4 Approach
- 2. Recommended Approach: Best Combined Approach
- Ideal Combined Structure:
- Key Decisions from Each Approach:
- Why This Combination Works Best:
- Implementation Priority:
- OUTPUT - OpenAI o3
- 1. Suitability analysis against each time-horizon
- a) Initial prototype (evaluate existing outputs – “Mode A”)
- b) Near-term additions (generate outputs, multiple scorers – “Mode B”)
- c) 12--18 month roadmap (cross-run analytics, live telemetry, prompt versioning, OTLP)
- 2. Recommend-or-blend decision
- Is there one clear winner?
- Why not the others?
- How to future-proof the chosen base without “re-build” risk
- Action plan
- KEY DECISIONS/PREFERENCES and ASSESSMENT OF CRITIQUES AND RECOMMENDATIONS FOLLOW
- PROMPT
- Gemini 2.5 Pro 05-06
- 1. Key Decisions & Preferences You Need to Make/Articulate
- 2. My Analysis and Recommended Implementation Path
- Claude Opus 4
- 1. Key Decisions You Need to Make
- Technical Preferences
- Development Philosophy
- Team and Timeline
- 2. My Analysis and Recommendation
- The Winner: Modified ChatGPT-o3 Approach
- Core Strengths of ChatGPT-o3:
- Why Not the Others:
- My Recommended Hybrid Approach:
- Implementation Sequence:
- Key Principles:
- Why This Approach Wins:
- OpenAI o3
- 1 Key decisions & preferences you should lock in now
- 2 My synthesis & recommendation
- Claude, Gemini, GPT API refinement info
- Please reload this page
- Complete OpenAI Codex Setup Guide for Python Projects with Tests
- Please reload this page