LLM Integrations

This page enumerates every place Forms Lab calls a large language model. Use it as the map to “find every LLM call site” in the codebase.

Links use the src: scheme: they resolve to GitHub permalinks pinned to the exact commit the running app was built from. Clicking a link opens the source at the commit currently deployed.

Extraction

PDF field extraction (form-documents)

Purpose. Parse an uploaded government form PDF into a structured field list with confidence scores.
Model. Claude Sonnet 4.6 via Amazon Bedrock.
Public entrypoint. services/form-documents
Invocation site. extraction.ts createBedrockPdfExtractor
Prompt construction. extraction-steps.ts and the inline schema prompt in extraction.ts.
Related experiment. PDF field extraction

Tool-use extraction (form-documents)

Purpose. Same intent as above, but the model emits fields via tool calls instead of JSON-in-text.
Model. Claude Sonnet 4.6 via Amazon Bedrock.
Public entrypoint. services/form-documents
Invocation site. tool-use-extraction.ts createToolUsePdfExtractor
Tool definitions. extraction-tools.ts
Related experiment. PDF field extraction

Shaping

Form shaping (forms/shaping)

Purpose. Given a DataCollectionSpec, propose an edited FormSpec matching the owner’s shaping intent.
Model. Claude Sonnet 4.6 via Amazon Bedrock.
Public entrypoint. services/forms
Invocation site. bedrock-shaper.ts createBedrockFormShaper
Prompt builder. Inline buildPrompt in bedrock-shaper.ts.
Related experiment. Shaping model comparison

Filling

Conversational form filling (forms/filling-agent)

Purpose. Converse with a filler to collect answers for a FormSpec; produce a structured submission.
Model. Claude Sonnet 4.6 via Amazon Bedrock.
Public entrypoint. services/forms
Invocation site. bedrock.ts BedrockFillingAgent.advance
System prompt builder. system-prompt-builder.ts
Related story. #9 conversational sections (PR #72)

Evaluation

LLM-as-judge (evaluation)

Purpose. Score extraction or shaping outputs against ground-truth fixtures; summarize correctness.
Model. Claude Sonnet 4.6 via Amazon Bedrock.
Public entrypoint. services/evaluation
Invocation site. judge.ts createBedrockFieldJudge
Judge prompt. judge-prompt.ts
Related experiment. PDF field extraction

Retrieval-augmented generation (RAG)

The RAG service (src/services/rag/) provides in-memory vector retrieval over policy corpus documents. Used by both extraction (to ground field generation in regulatory text) and the authoring pipeline (to drive criteria analysis and field generation from corpus alone).

Components:

Corpus loader (corpus.ts) — Reads catalog/references/*.md files with YAML frontmatter + ## Section — <citation> headings. Also supports project-scoped references/ via projectDir option.
Retriever (retrieval.ts) — In-memory cosine similarity over L2-normalized embeddings. O(n·d) per query; fast for ≤50 chunks.
Embedder — AWS Bedrock Titan Embed V2 (production) or deterministic hash fallback (offline/testing).
Policy corpus — 13 sections from 7 CFR 273 (SNAP Wisconsin) at catalog/references/snap-wisconsin.md.

Query flow (extraction):

Extraction variant registers a retriever via rag-corpus.ts — lazy-initializes embeddings on first use
During extraction (extraction.ts:192-198), the retriever is queried with the fixture slug (or first 500 bytes of PDF as fallback)
Top-k chunks (default: 2) are retrieved by cosine similarity over Titan embeddings
Retrieved chunks are formatted via buildPolicyContextSection() and prepended to the extraction prompt

Corpus flow (authoring):

The authoring pipeline loads the FULL corpus via loadPolicyCorpus({ slug: 'snap-wisconsin' }) — no vector retrieval needed since the corpus is small (~13 chunks)
All chunks are passed directly into prompt builders (prompts.ts)
Each prompt includes the complete regulatory text so the LLM can cite specific sections

Integration points:

src/services/extraction/ — RAG-grounded extraction variants use vector retrieval for top-k lookup (rag-corpus.ts)
src/services/form-documents/extraction.ts — Call site where retriever.retrieve() is awaited and chunks injected into prompts
src/services/form-authoring/ — Full pipeline passes entire corpus to each stage (no retrieval query needed at current scale)

Variant settings: Three independent tasks in the variant system:

Authoring: Criteria Analysis (Sonnet/Haiku/Opus)
Authoring: Structure Generation (Sonnet/Haiku/Opus)
Authoring: Field Generation (Sonnet/Haiku/Opus)

Related experiments: Authoring pipeline, RAG extraction

Form authoring pipeline

The authoring pipeline (src/services/form-authoring/) generates complete form specifications from a policy corpus without a source PDF. Runs as a server-side background task with client polling for progress.

Stages:

Criteria analysis — LLM reads corpus, produces evaluation criteria with regulatory citations (generateObject with structured schema)
Structure generation — LLM proposes pages via addPage tool calls (generateText with toolChoice: required)
Group creation — Deterministic: one group per page using real page IDs from step 2
Field generation — Per group, LLM proposes addField tool calls grounded in criteria and corpus

Key files:

Pipeline (pipeline.ts) — createAuthoringPipeline(config) factory, per-stage model configuration
Prompts (prompts.ts) — Stage-specific prompt builders
Registry (registry.ts) — Three variant registries (criteria, structure, generation)
Route orchestrator (authoring.tsx) — Server-side runBuild background task with progress tracking
Evaluator (evaluator.ts) — LLM-as-judge for criteria scoring (Haiku)

CLI evaluation: bun run cli evaluate authoring <variant-id> runs the pipeline against SNAP ground truth fixture and scores field recall/precision/type accuracy.

Related experiment: Authoring pipeline

Adding a new LLM integration

When you introduce a new LLM call site:

Place the invocation inside a service. If it doesn’t fit an existing service, propose a new one (see software architecture).
Expose the orchestrating function through the service’s index.ts.
Add an entry on this page with a src: link to the invocation site and any relevant experiment cross-link.