U.S. flagA digital services project by Flexion

LLM Integrations

This page enumerates every place Forms Lab calls a large language model. Use it as the map to “find every LLM call site” in the codebase.

Links use the src: scheme: they resolve to GitHub permalinks pinned to the exact commit the running app was built from. Clicking a link opens the source at the commit currently deployed.

Extraction

PDF field extraction (form-documents)

Tool-use extraction (form-documents)

Shaping

Form shaping (forms/shaping)

Filling

Conversational form filling (forms/filling-agent)

Evaluation

LLM-as-judge (evaluation)

Retrieval-augmented generation (RAG)

The RAG service (src/services/rag/) provides in-memory vector retrieval over policy corpus documents. Used by both extraction (to ground field generation in regulatory text) and the authoring pipeline (to drive criteria analysis and field generation from corpus alone).

Components:

  • Corpus loader (corpus.ts) — Reads catalog/references/*.md files with YAML frontmatter + ## Section — <citation> headings. Also supports project-scoped references/ via projectDir option.
  • Retriever (retrieval.ts) — In-memory cosine similarity over L2-normalized embeddings. O(n·d) per query; fast for ≤50 chunks.
  • Embedder — AWS Bedrock Titan Embed V2 (production) or deterministic hash fallback (offline/testing).
  • Policy corpus — 13 sections from 7 CFR 273 (SNAP Wisconsin) at catalog/references/snap-wisconsin.md.

Query flow (extraction):

  1. Extraction variant registers a retriever via rag-corpus.ts — lazy-initializes embeddings on first use
  2. During extraction (extraction.ts:192-198), the retriever is queried with the fixture slug (or first 500 bytes of PDF as fallback)
  3. Top-k chunks (default: 2) are retrieved by cosine similarity over Titan embeddings
  4. Retrieved chunks are formatted via buildPolicyContextSection() and prepended to the extraction prompt

Corpus flow (authoring):

  1. The authoring pipeline loads the FULL corpus via loadPolicyCorpus({ slug: 'snap-wisconsin' }) — no vector retrieval needed since the corpus is small (~13 chunks)
  2. All chunks are passed directly into prompt builders (prompts.ts)
  3. Each prompt includes the complete regulatory text so the LLM can cite specific sections

Integration points:

  • src/services/extraction/ — RAG-grounded extraction variants use vector retrieval for top-k lookup (rag-corpus.ts)
  • src/services/form-documents/extraction.ts — Call site where retriever.retrieve() is awaited and chunks injected into prompts
  • src/services/form-authoring/ — Full pipeline passes entire corpus to each stage (no retrieval query needed at current scale)

Variant settings: Three independent tasks in the variant system:

  • Authoring: Criteria Analysis (Sonnet/Haiku/Opus)
  • Authoring: Structure Generation (Sonnet/Haiku/Opus)
  • Authoring: Field Generation (Sonnet/Haiku/Opus)

Related experiments: Authoring pipeline, RAG extraction

Form authoring pipeline

The authoring pipeline (src/services/form-authoring/) generates complete form specifications from a policy corpus without a source PDF. Runs as a server-side background task with client polling for progress.

Stages:

  1. Criteria analysis — LLM reads corpus, produces evaluation criteria with regulatory citations (generateObject with structured schema)
  2. Structure generation — LLM proposes pages via addPage tool calls (generateText with toolChoice: required)
  3. Group creation — Deterministic: one group per page using real page IDs from step 2
  4. Field generation — Per group, LLM proposes addField tool calls grounded in criteria and corpus

Key files:

  • Pipeline (pipeline.ts) — createAuthoringPipeline(config) factory, per-stage model configuration
  • Prompts (prompts.ts) — Stage-specific prompt builders
  • Registry (registry.ts) — Three variant registries (criteria, structure, generation)
  • Route orchestrator (authoring.tsx) — Server-side runBuild background task with progress tracking
  • Evaluator (evaluator.ts) — LLM-as-judge for criteria scoring (Haiku)

CLI evaluation: bun run cli evaluate authoring <variant-id> runs the pipeline against SNAP ground truth fixture and scores field recall/precision/type accuracy.

Related experiment: Authoring pipeline

Adding a new LLM integration

When you introduce a new LLM call site:

  1. Place the invocation inside a service. If it doesn’t fit an existing service, propose a new one (see software architecture).
  2. Expose the orchestrating function through the service’s index.ts.
  3. Add an entry on this page with a src: link to the invocation site and any relevant experiment cross-link.