Carlos's conversation uses a chosen model

A digital services project by Flexion

About Flexion

We build digital services for federal, state, and local government agencies. Learn more.

Open source

This project is developed in the open. View the source code on GitHub.

Forms Lab

Home
Forms
Catalog
Sign in

Color theme

In this section

← Back to Catalog
All Stories
#2 Maya signs in to access form authoring
#3 Maya uploads a PDF and reviews the extracted specs
#4 Maya shapes the form experience
#5 Maya reviews her proposed changes before publishing
#6 Carlos fills out a published form
#7 Maya receives a completed PDF
#8 Maya refines the data model with LLM assistance
#9 Carlos completes complex sections through conversation
#14 Developer establishes and maintains a living threat model
#48 Evaluator experiences a guided walkthrough of the project
#52 LLM responds to review comments to evolve forms
#59 Maya chooses her shaping model
#60 Carlos's conversation uses a chosen model
#61 Maya verifies AcroForm mapping
#62 Maya's extractions cite the law
#63 Maya's extractions learn from curated examples
#64 Maya's extractions use a tuned prompt
#65 Maya's extractions use our fine-tuned model
#66 Maya extracts via structured tool-use
#71 Developer navigates LLM integrations via a screaming service layer
#73 Maya's extractions use a tuned prompt (prompt optimization)
#74 Maya's extractions cite the law (RAG)
#75 Run live shaping model evaluation
#87 Maya authors a SNAP form guided by policy corpus and auto-evaluation

Catalog
Stories
Carlos's conversation uses a chosen model

openFinal Project

llm-integration

GitHub #60

User Story

As Maya, in order to pick which LLM drives Carlos’s conversational form-filling experience and see how different models perform on adaptive interviewing, I want filling variants to be selectable from Settings → Variants, backed by persona-driven evaluation.

Preconditions

#58 (Story 10 variant picker) merged to main
#9 (Story 9 conversational sections) merged to main

Acceptance Criteria

New evaluation kind filling-interview scores a variant against scripted personas (completeness of collected fields + conditional routing accuracy), mirroring the assignment-10 TextGrad evaluator pattern
Variants registered: filling/haiku, filling/sonnet, filling/opus
Filling tab in Settings → Variants renders all three variants
Session records variantId at start (new column on form_sessions table)
<VariantBadge task="filling" ...> rendered on conversation UI
Persona fixtures committed (5 train + 3 test minimum)
New catalog suite catalog/experiments/filling-model-comparison/ with _suite.md + one markdown per variant, each with metrics + findings
catalog/experiments/_roadmap.md updated with shipped status and one-line finding

Success Metrics

Observable tradeoff across variants (e.g., Opus higher completeness, Haiku faster/cheaper)
Persona-driven eval runs reproducibly from bun run cli evaluate run <variant>

Notes

Port evaluator shape from llm-class-2026-winter-cohort/notes/assignment-10/ — persona-driven simulator + completeness scorer
Filling registry stub exists at src/services/forms/filling/registry.ts (empty); extend it

Definition of Done

Acceptance criteria met
Tests pass
Type checking passes
CI pipeline green
Deployed and demoable

Home
Catalog
Design System

Forms Lab — LLM-Assisted Forms Platform