How to create config with GenAI - grishasen/proof_of_value GitHub Wiki

GenAI Application Config Generator

Overview

GenAI config page provides an interactive interface to:

  1. Upload an Interaction History (IH) sample (Parquet/JSON/ZIP/GZIP)
  2. Inspect its schema and data
  3. Auto-generate a tailored configuration file via an LLM
  4. Download the generated config for use in the Value Dashboard pipeline

Outcomes

  • Interactive Exploration: Quickly understand your IH dataset’s schema & quality.
  • Automated Configs: Eliminate manual editing of complex config templates — LLM crafts a tailored configuration.
  • Tweak prompt or rerun with new data samples to refine pipeline settings.
  • Generated configs slot directly into the Value Dashboard pipeline ready for reporting.

Workflow

  1. Template Load & API Key Setup

    • Reads a TOML template (config_template.toml) from disk.
    • Sidebar: prompts for OpenAI API key (or falls back to OPENAI_API_KEY env var).
    • Lets the user select a supported chat model (e.g. gpt-4o-mini).
  2. Dataset Upload & Pre-Processing

    • User uploads a data file (ZIP/Parquet/JSON/GZIP).
    • Columns are renamed to “Title Case” for consistency.
    • Missing “extension” columns from the template are added or filled with defaults.
  3. Feature Engineering & Cleanup

    • Parses timestamp columns (OutcomeTime, DecisionTime) into Polars Datetime.
    • Derives new fields:
      • Day, Month, Year, Quarter from OutcomeDateTime
      • ResponseTime = time delta between decision and outcome
    • Drops irrelevant or redundant columns (IDs, labels, metadata).
  4. Schema & Data Summary

    • Builds and displays a schema table with unique counts.
    • Shows overall DataFrame summary statistics.
    • Offers an expandable sample-data view.
  5. LLM-Driven Config Generation

    • Constructs a detailed prompt that:
      • Ingests the dataset schema, template config, and file name/type.
      • Instructs the LLM to map template reports, metrics, filters, and grouping keys to actual columns.
      • Specifies rules for grouping on categorical/string columns (unique values between 2 and 99), plus time dimensions.
    • On “Generate config” click:
      • Calls OpenAI.chat_completion() to produce a new config file.
      • Writes to a new file under temp_configs/ with a UUID name.
      • Reloads the app configuration (set_config) to clear caches and apply the new file.
      • Presents a Download button for the user’s generated config.toml.

Requirements

  • Valid OpenAI API credentials (environment or input).
  • A template config at value_dashboard/config/config_template.toml.
  • Uploaded dataset must be valid CDH IH export.

Summary

This one-page tool streamlines the end-to-end process of:

  1. Loading & inspecting IH datasets
  2. Deriving date/time features & metrics
  3. Auto-generating matching pipeline configs via GenAI
  4. Downloading and applying configurations for value reporting

It greatly reduces manual effort in maintaining report definitions and ensures that the dashboard pipeline is always aligned with the latest data schema.