Customer Discovery Summary - StanfordCS194/spr26-Team-21 GitHub Wiki

1. Target Audience (ICP)

Aperture targets two tightly related personas inside enterprises operating in data-constrained or data-restricted industries — specifically Healthcare (Clinical), Finance, Consulting, Insurance, Law, Government, and Computer Vision.

Primary ICP: ML/AI Engineers who own model training, fine-tuning, and evaluation but are blocked by sparse, regulated, or proprietary datasets. They need targeted synthetic data for fine-tuning, edge-case coverage, and stress testing — not bulk pretraining data.
Secondary ICP: Data Science / Analytics Teams at consulting firms and enterprises who need to prototype and demo models before production data access is granted, and who must justify data reliability to non-technical stakeholders.

Rationale for narrowing: Horizontal players (Gretel/NVIDIA, Mostly AI) already serve the broad synthetic-data market. Our interviews consistently showed that the pain is sharpest in verticals where (a) real data is legally or practically inaccessible, (b) domain structure is non-trivial, and (c) downstream stakeholders demand an explainable validation story. That combination is exactly where horizontal tools underperform and where a vertical-leaning wedge can win.

2. Building Expertise on the User

We built expertise along four tracks:

Breadth of interviews. Eight 1:1 conversations across five team members, spanning frontier ML (robot foundation models, interpretability), production ML (Apple embedded CV, Tesla Autopilot simulation), regulated verticals (biotech, nanomedicine), enterprise consulting (Kubrick Group), and early-stage startup ML.
Competitive landscape. Reviewed NVIDIA NeMo DataDesigner, Gretel's positioning post-NVIDIA acquisition, and adjacent Google Cloud partnerships to understand what horizontal tooling already covers — and what it does not (vertical grounding, internal-data integration, trust reporting).
Domain literature. Read rare-disease synthetic-data research (PMC11958975) to pressure-test the claim that SDG meaningfully helps in regulated, sparse-data settings.
Framing-shift discipline. After each interview, the interviewer logged a single "framing shift" — the one belief we held going in that had to change. Two shifts proved load-bearing and are reflected in the PRD updates below: "Synthetic data is not a supply problem — it's a decision and evaluation problem" (Ali Ahmed) and "Synthetic data is not a replacement for real data — it is a tool for augmentation, coverage, and edge cases" (Daniel Tirado).

3. Customer Interactions

Eight real-time interviews conducted over Zoom / in person, several recorded via Otter.ai for transcript analysis. Roles ranged from individual contributors to a former Head of AI and a current AI/ML manager at Apple. Breakdown:

Enterprise / production ML: 3 (Babak Rasolzadeh — Apple; Taide Ding — ex-Tesla, now VALUENEX; Samuel Pekofsky — Kubrick Group)
Frontier ML research: 2 (Ali Ahmed — Rhoda AI; Daniel Tirado — King's College London)
Applied domain researchers: 2 (Alp Tartici — Stanford Genetics; Kelvin Tieku — Brown / Desai Lab)
Startup / growth-stage ML: 1 (Viktor Lado Naess)

4. Prototypes & Experiments

At this phase we were validating the problem and positioning, not a functional product. What we put in front of interviewees:

Concept pitch: a verbal walkthrough of the three-stage pipeline — prompt-to-action agent → internal data connector → validated synthetic dataset via DataDesigner — framed against the specific interviewee's domain.
Use-case probes: we described four candidate wedges (fine-tuning, eval/stress-testing, validation, augmentation) and watched which ones the interviewee leaned into unprompted.
Wedge tests: for interviewees in regulated or data-scarce domains, we floated the "prototype-to-first-real-data" framing and the "augmentation, not replacement" framing to see which landed.

How we measured response:

Qualitative: unprompted pain language, push-back clarity, whether the interviewee volunteered a use case we hadn't named, and whether they could articulate who else in their org would care.
Lightweight quantitative: post-interview Pain (1–5) and Interest (1–5) scores derived from how concretely the interviewee described the problem and whether they asked to see a demo or be kept in the loop.

We did not yet collect adoption or usage data — that's the next phase once the MVP is in a clickable state.

5. Data Summary

User	Role	Prototype	Key Action Taken	Pain Level (1–5)	Interest (1–5)	Key Quote
Viktor Lado Naess	Former Head of AI, startup	Concept pitch	Validated edge-case and validation wedges; offered to intro us to peers	5	4	Synthetic pipelines are "ad hoc and non-scalable"; the real value is in edge cases, not bulk.
Babak Rasolzadeh	AI/ML Manager, Apple (CV, embedded ML)	Concept pitch	Confirmed privacy-driven necessity; pushed hard on validation standardization	5	4	Synthetic data is a necessity under privacy constraints, but "poor-quality synthetic data actively harms performance."
Ali Ahmed	ML Engineer, Rhoda AI (robot foundation models)	Concept pitch	Rejected the "evaluate data upfront" premise; redirected us toward experiment-acceleration	3	2	"Sometimes you just have to do the experiment and see what happens."
Daniel Tirado	MSc, King's College London (interpretability)	Concept pitch	Reframed the product from replacement to augmentation; flagged model collapse risk	4	3	"Every time you train a model on synthetic data, you end up degrading the quality… that's what model collapse is."
Taide Ding	ex-Tesla Autopilot, AI Engineer at VALUENEX	Concept pitch	Identified the prototype-to-first-real-data wedge; challenged vendor-backup idea	4	4	"It's useful as a prototype. Once you get past the prototype, nobody wants to do synthetic data."
Kelvin Tieku	Undergraduate Researcher, Desai Lab (nanomedicine)	Concept pitch	Confirmed wet-lab prioritization use case; demanded explainable validation	4	4	"Trust is key — without it, synthetic data generation is just another LLM."
Samuel Pekofsky	Data Scientist, Kubrick Group (consulting)	Concept pitch	Validated enterprise prototyping need; emphasized stakeholder-facing reports	5	5	"If the synthetic data we use is not reliable, then we lose a client in the span of 5 minutes."
Alp Tartici	PhD Candidate, Stanford Genetics	Concept pitch	Shared a concrete synthetic-data failure mode; requested diagnostic tooling	4	3	Theoretically-grounded synthetic data "didn't transfer to real data" — and he couldn't tell whether the data or the architecture was at fault.

Aggregate signal: Mean Pain = 4.25 / 5, Mean Interest = 3.6 / 5. Interest tracks pain closely, with the exception of Ali Ahmed (high domain sophistication, lower interest) — his skepticism is the single strongest piece of disconfirming evidence we collected and directly shaped the PRD updates below.

6. PRD Updates

The interviews forced five concrete changes to the PRD:

1. Elevate validation from a pipeline stage to a first-class product pillar. Every interviewee who had tried synthetic data — Babak, Alp, Kelvin, Samuel, Taide — independently pointed at validation as the bottleneck, not generation. The original PRD treated validation as one bullet inside the pipeline. It now needs its own surface area: benchmark-against-real-data reports, multi-metric dashboards, and a shareable "explainability report" for non-technical stakeholders (directly from Samuel's "lose a client in 5 minutes" concern and Kelvin's trust requirement).

2. Reframe the wedge from "replacement" to "augmentation + prototype-phase unlock." Daniel's model-collapse warning and Taide's "nobody wants synthetic data past the prototype" make it clear that pitching Aperture as a real-data substitute is a losing narrative. The PRD's Value Prop section should reposition Aperture as: (a) the tool that unblocks the prototype-to-first-real-data phase, and (b) an augmentation layer that fills gaps, covers edge cases, and enables stress testing.

3. Prioritize fine-tuning / post-training and eval use cases over pretraining. Ali was explicit that data matters most in post-training, not pretraining, and that frontier labs already use synthetic data but only judge it by downstream task success. The MVP use case list should be reordered to lead with (i) fine-tuning dataset generation and (ii) evaluation and stress-testing dataset generation. Data augmentation comes third; pretraining-scale generation drops off the MVP entirely.

4. Add diagnostic / experiment-acceleration tooling to the roadmap. Alp's failure case — unable to tell whether the synthetic labels or the model architecture were the problem — and Ali's call for faster experimentation cycles point to a feature category the PRD did not contemplate: diagnostics that isolate data-quality issues from model-architecture issues, plus experiment-tracking that helps teams converge on what works. We are adding this to the Nice-to-haves section with a note that it may move up if it proves to be the true wedge.

5. Sharpen the industry niche and explicitly rule some out. Taide pressed hard on the need to name our industry. The PRD currently lists seven verticals; the interviews most strongly supported Healthcare (Clinical), Finance, and Consulting as the first targets — with Computer Vision and Insurance as fast-followers. Law and Government are being deprioritized for MVP given longer sales cycles and lack of direct interview signal. The Opportunity section should be tightened accordingly.

Still open after this round: whether the Internal Data Connector or the Validation Pillar is the stronger MVP wedge. We left that question unresolved deliberately — it is the next thing we plan to test with a clickable prototype in front of a subset of these same interviewees.