Maya authors a SNAP form guided by policy corpus and auto-evaluation

A digital services project by Flexion

About Flexion

We build digital services for federal, state, and local government agencies. Learn more.

Open source

This project is developed in the open. View the source code on GitHub.

Forms Lab

Home
Forms
Catalog
Sign in

Color theme

In this section

← Back to Catalog
All Stories
#2 Maya signs in to access form authoring
#3 Maya uploads a PDF and reviews the extracted specs
#4 Maya shapes the form experience
#5 Maya reviews her proposed changes before publishing
#6 Carlos fills out a published form
#7 Maya receives a completed PDF
#8 Maya refines the data model with LLM assistance
#9 Carlos completes complex sections through conversation
#14 Developer establishes and maintains a living threat model
#48 Evaluator experiences a guided walkthrough of the project
#52 LLM responds to review comments to evolve forms
#59 Maya chooses her shaping model
#60 Carlos's conversation uses a chosen model
#61 Maya verifies AcroForm mapping
#62 Maya's extractions cite the law
#63 Maya's extractions learn from curated examples
#64 Maya's extractions use a tuned prompt
#65 Maya's extractions use our fine-tuned model
#66 Maya extracts via structured tool-use
#71 Developer navigates LLM integrations via a screaming service layer
#73 Maya's extractions use a tuned prompt (prompt optimization)
#74 Maya's extractions cite the law (RAG)
#75 Run live shaping model evaluation
#87 Maya authors a SNAP form guided by policy corpus and auto-evaluation

Catalog
Stories
Maya authors a SNAP form guided by policy corpus and auto-evaluation

closedFinal ProjectGitHub #87

User Story

As a form creator (Maya), in order to build a compliant SNAP application grounded in federal and state policy without memorizing regulations, I want an agent-guided pipeline that reads my policy corpus, proposes evaluation criteria, generates form structure and fields with regulatory citations, and auto-evaluates its output before I review it

Preconditions

Existing shaping command infrastructure (24 commands, executor, staged-changes UI)
RAG service with corpus loader, embeddings, and in-memory retrieval
Project git repo with references/ directory support
Bedrock access for Sonnet and Haiku models

Acceptance Criteria

SNAP Wisconsin policy corpus (~10-15 chunks from 7 CFR 273) available as fixture in catalog/references/snap-wisconsin.md
SNAP Wisconsin test fixture in fixtures/snap-wisconsin/ with manifest, ground-truth, and source PDF
Corpus loader supports project-scoped references/*.md via projectDir option
Stage 1: Agent analyzes corpus and proposes evaluation criteria as English sentences with regulatory citations
Stage 1: Human can review, edit, add, remove, and approve criteria via UI
Stage 2: Agent proposes page/group skeleton as shaping commands grounded in approved criteria and corpus
Stage 2: Human reviews commands in staged-changes UI, can accept/reject/redirect via chat
Stage 3: Agent generates fields per section with topic-scoped RAG retrieval (~5-15 commands per group)
Stage 4: LLM-as-judge evaluates each section against approved criteria (pass/fail/partial)
Stage 4: Failed criteria trigger automatic retry with specific feedback (max 2 retries)
Eval scorecard displayed alongside staged commands before human review
Pipeline artifacts persisted in git: criteria.json, eval-results.json
Per-stage model configuration (Sonnet for stages 1-3, Haiku for eval)
Stage indicator in project UI showing pipeline progress
Existing reactive shaping chat continues to work at any pipeline stage

Success Metrics

Pipeline produces a complete SNAP form (6+ pages, 10+ groups, 50+ fields) from corpus alone
Auto-eval catches at least one missing regulatory requirement per run and self-corrects via retry
Policy expert (persona A) can skip stages and shape manually without friction
Form output usable as test fixture by existing evaluation harness

Notes

Design spec: notes/2026-04-19-rag-authoring-pipeline-design.md
New service at src/services/form-authoring/ — does not modify existing shaper
When source PDF is provided (Stage 0), existing extraction pipeline bootstraps initial state
Corpus upload UI is deferred — corpus pre-loaded as fixtures for now
Repeating groups (household members) modeled as static Member 1/2/3 — known limitation
Fallback scope: cut auto-eval inner loop if time-constrained; criteria still generated and reviewed
Temperature 0 across all stages for determinism
Estimated ~25 Bedrock calls for a full SNAP form run (~3-5 min wall clock)

Definition of Done

Acceptance criteria met
Threat model updated if security-relevant
Tests pass
Type checking passes
CI pipeline green
Deployed and demoable

Home
Catalog
Design System

Forms Lab — LLM-Assisted Forms Platform