Maya's extractions use a tuned prompt

A digital services project by Flexion

About Flexion

We build digital services for federal, state, and local government agencies. Learn more.

Open source

This project is developed in the open. View the source code on GitHub.

Forms Lab

Home
Forms
Catalog
Sign in

Color theme

In this section

← Back to Catalog
All Stories
#2 Maya signs in to access form authoring
#3 Maya uploads a PDF and reviews the extracted specs
#4 Maya shapes the form experience
#5 Maya reviews her proposed changes before publishing
#6 Carlos fills out a published form
#7 Maya receives a completed PDF
#8 Maya refines the data model with LLM assistance
#9 Carlos completes complex sections through conversation
#14 Developer establishes and maintains a living threat model
#48 Evaluator experiences a guided walkthrough of the project
#52 LLM responds to review comments to evolve forms
#59 Maya chooses her shaping model
#60 Carlos's conversation uses a chosen model
#61 Maya verifies AcroForm mapping
#62 Maya's extractions cite the law
#63 Maya's extractions learn from curated examples
#64 Maya's extractions use a tuned prompt
#65 Maya's extractions use our fine-tuned model
#66 Maya extracts via structured tool-use
#71 Developer navigates LLM integrations via a screaming service layer
#73 Maya's extractions use a tuned prompt (prompt optimization)
#74 Maya's extractions cite the law (RAG)
#75 Run live shaping model evaluation
#87 Maya authors a SNAP form guided by policy corpus and auto-evaluation

Catalog
Stories
Maya's extractions use a tuned prompt

openFinal Project

llm-integration

GitHub #64

User Story

As Maya, in order to get better extraction quality without changing the underlying model, I want an extraction variant whose system prompt has been automatically optimized against the evaluation suite.

Preconditions

#58 (Story 10 variant picker) merged to main

Acceptance Criteria

New service src/services/prompt-optimization/ — thin wrapper around TextGrad (or equivalent) that drives our existing EvaluationKind harness as its training signal
Optimization script produces a concrete optimized prompt for extraction, committed as a file under src/services/extraction/prompts/optimized-v1.txt (or similar)
New variant extraction/sonnet-optimized-v1 that loads the optimized prompt at construction
Extraction tab in Settings → Variants lists the optimized variant
Evaluation run comparing baseline sonnet vs sonnet-optimized-v1 on all fixtures
New catalog page catalog/experiments/pdf-field-extraction/sonnet-optimized-v1.md including: optimization setup (epochs, batch size, forward/backward models), before/after prompt snippets, and metric deltas
catalog/experiments/_roadmap.md updated with shipped status and one-line finding
Harness documented well enough that follow-on stories can run it against shaping and filling suites

Success Metrics

Optimized variant beats baseline on at least one metric (recall, precision, or sensitivity)
Optimization is reproducible via a single CLI command

Notes

Class topic: prompt optimization, Assignment 10
Use Opus as the backward model if it fits the budget; Sonnet otherwise
Document hyperparameters for reproducibility
Follow-on stories (not in scope here): shaping/sonnet-optimized-v1, filling/sonnet-optimized-v1 — same harness, different eval kind

Definition of Done

Acceptance criteria met
Tests pass
Type checking passes
CI pipeline green
Deployed and demoable

Home
Catalog
Design System

Forms Lab — LLM-Assisted Forms Platform