User Story:
As a form creator (Maya), in order to digitize a paper form without technical skills, I want to upload a PDF form and review the structured specs the system extracted from it
Preconditions:
- Maya is authenticated (Slice 1)
- PDF form available for upload
Acceptance Criteria:
- Upload page accepts PDF files
- System extracts structure from PDF and produces a DataCollectionSpec
- System generates a default FormSpec based on the extracted DataCollectionSpec
- Both specs are displayed in the catalog as browsable, reviewable content
- Maya can see what fields were extracted, their types, grouping, and conditions
- Maya can see the proposed form layout (pages, sections, delivery modes)
- Extracted specs are persisted as a FormProject in git
- Extraction errors or low-confidence fields are flagged for review
- Form projects are stored as bare git repos with version history
- Project detail page shows version history with commit-level snapshots
- Projects are publicly viewable at user-scoped URLs (/:owner/:slug)
- Mutations (delete, re-extract) are restricted to project owners via service-layer permission checks
- Authenticated users can fork projects they do not own
- User profile pages list a user’s projects at /:owner
- Git repository browsing (tree, blob, commits) available at GitHub-style URLs
- Read-only git clone served over HTTP
- Home page shows dashboard for authenticated users, landing page for anonymous visitors
Success Metrics:
- Extraction accuracy: percentage of fields correctly identified vs. source PDF
- Time from upload to reviewable spec < 30 seconds
- Establish baseline evaluation metrics for LLM extraction quality
Notes:
- First LLM integration point — uses Claude API (Opus/Sonnet baseline)
- LLM service uses strategy pattern:
PdfExtractorinterface withApiPdfExtractorimplementation - Evaluation: compare extracted spec against manually-created ground truth for test PDFs
- Future experiments: alternative models, prompting strategies, chunking approaches
- Form projects stored as bare git repos at
data/repos/<slug>.git - ProjectService layer enforces ownership permissions; route handlers are thin wrappers
- GitHub-style URL structure:
/:owner/:slug,/:owner/:slug/tree/:ref/*,/:owner/:slug/settings, etc.
Definition of Done:
- Acceptance criteria met
- Threat model updated – any new trust boundaries, data flows, or attack surfaces are reflected in
catalog/architecture/threat-model.md - Technical documentation updated – architecture docs and decisions are current
- LLM extraction service has interface abstraction (swappable implementations)
- At least one test PDF with ground truth for evaluation
- Tests pass
- Type checking passes
- CI pipeline green
- Deployed and demoable
A digital services project by Flexion