Skip to content

Data Flow: From Upload to Rendered Review

This page traces the complete lifecycle of a document review through the dashboard, from file upload through provider analysis to rendered results.

A document, in transit — five lifelines from user to provider to storage and back

Figure 1 — the 2026 sequence diagram.

Data flow (broadsheet rebrand) — five lifelines from user to provider to storage

Figure 1.a — the broadsheet rebrand of the older framing.

Data flow from upload through AI analysis to rendered results

Figure 1.b — the original data-flow plate, kept for reference.

Overview Diagram

Mermaid source — the diagram above is generated from this
mermaid
sequenceDiagram
  autonumber
  participant U as User
  participant B as Browser
  participant API as Next.js API
  participant Prov as Provider<br/>(Anthropic / OpenAI / Codex)
  participant Store as Neon + R2

  U->>B: Upload PDF / DOCX / paste
  B->>API: POST /api/document/extract
  API-->>B: { text, ocrUsed? }
  B->>API: POST /api/review (Accept: SSE)
  Note over API: validate · matter ownership
  API->>Prov: stream(skill prompt + submit_review tool)
  Prov-->>API: progress chunks
  API-->>B: SSE: progress
  Prov-->>API: tool_use: review JSON
  API->>Store: INSERT review · PUT original.docx
  API-->>B: SSE: result
  B->>U: Render verdict

Step 1: Document Upload

The user uploads a file via the FileUpload component, which supports three input methods:

MethodHow It Works
Drag and dropUser drags a file onto the drop zone
File pickerUser clicks the upload area to open a native file dialog
PasteUser pastes text directly into the input area

Accepted file types: .pdf, .docx, .txt, .md

If a file is uploaded (not pasted text), the client sends it to the extraction endpoint:

http
POST /api/document/extract
Content-Type: multipart/form-data
Body: { file: <uploaded file> }

Step 2: Server-Side Text Extraction

The route handler in /api/document/extract calls extractTextFromUpload() from document-extraction.ts.

Extraction Pipeline

File received

  ├─ Get file extension
  ├─ Validate against SUPPORTED_LIVE_DOCUMENT_EXTENSIONS
  ├─ Convert File to Buffer
  ├─ Write to temporary directory

  ├─ Switch on extension:
  │   ├─ .txt / .md → Raw UTF-8 read
  │   ├─ .docx → textutil (macOS) or python-docx fallback
  │   └─ .pdf → pdftotext (poppler) or pypdf fallback
  │             └─ If text is sparse → OCR fallback

  ├─ Clean extracted text (remove null bytes, normalise line endings)
  ├─ Delete temporary directory
  └─ Return text to client

Extraction Methods by Format

FormatPrimary MethodCommandFallback
.txt, .mdRaw readBuffer.toString("utf8")None
.docxmacOS textutil/usr/bin/textutil -convert txt -stdout <path>python3 -c "from docx import Document; ..."
.pdfpopplerpdftotext -layout <path> -python3 -c "from pypdf import PdfReader; ..."

Capability Detection

On first use, the server probes for available backends (textutil, pdftotext, python3, python-docx, pypdf) and caches the results. This avoids repeated subprocess calls to check availability.

OCR Fallback

If PDF text extraction returns sparse content (indicating a scanned document), shouldAttemptOcrForPdf() triggers performOcrFallback():

PDF text extraction → insufficient text?
  ├─ No  → return extracted text
  └─ Yes → performOcrFallback()
              ├─ Check OCR_PROVIDER env var
              ├─ Currently: OpenAI (sends image to GPT-4 Vision)
              └─ Returns OCR'd text

OCR requires additional configuration

OCR is only attempted when the OCR_PROVIDER (or AI_LEGAL_UK_OCR_PROVIDER) env var is set and the corresponding API key is available. Without configuration, scanned PDFs will return an error indicating insufficient extractable text.

Step 3: Skill Selection and Request

The user selects an analysis skill from the UI. The client constructs the analysis request:

http
POST /api/review
Content-Type: application/json
Accept: text/event-stream

{
  "text": "<extracted document text>",
  "skill": "legal-review",
  "apiKey": "sk-ant-..."
}

SSE vs JSON

The Accept: text/event-stream header triggers SSE streaming mode. Without it, the server falls back to a single buffered JSON response. The client always sends this header for the best user experience (real-time progress updates).

Step 4: Server Validation

The route handler in app/api/review/route.ts calls validateReviewRouteBody() from api-route-utils.ts. Four checks are performed in sequence:

CheckValidationFailure Response
API keyMust be present and start with sk-401 Unauthorized -- "Missing or invalid API key"
SkillMust be in VALID_LIVE_REVIEW_SKILLS (the live-review allowlist defined in dashboard/lib/api-route-utils.ts)400 Bad Request -- "Invalid skill"
TextMust be present and non-empty400 Bad Request -- "Missing or empty document text"
LengthMust not exceed MAX_LIVE_DOCUMENT_CHARS (50,000)413 Payload Too Large -- "Document is too large"
typescript
// Validation result types
interface ReviewRouteValidationSuccess {
  ok: true;
  status: 200;
  apiKey: string;
  skill: ValidSkill;
  text: string;
}

interface ReviewRouteValidationFailure {
  ok: false;
  status: number;  // 400, 401, or 413
  error: string;
}

Step 5: Provider API Call

On successful validation, the server dispatches the request through the provider abstraction in dashboard/lib/model-providers/. The provider is selected from the prefix of the user's API key — Anthropic, OpenAI, or OpenAI Codex CLI — and each adapter exposes the same stream() contract.

For an Anthropic key, the resulting call looks like this:

typescript
const client = new Anthropic({ apiKey });

const stream = client.messages.stream({
  model: ANALYSIS_MODEL,               // env-configurable, default in lib/model-providers/anthropic.ts
  max_tokens: 8192,
  system: SKILL_PROMPTS[skill],        // Skill-specific system prompt
  tools: [REVIEW_TOOL],                // submit_review tool schema
  tool_choice: { type: "tool", name: "submit_review" },
  messages: [
    {
      role: "user",
      content: `Please analyse the following document and submit your structured review:\n\n${documentText}`,
    },
  ],
});

Key points:

  • tool_choice: { type: "tool", name: "submit_review" } forces the model to always call the submit_review tool, guaranteeing structured JSON output
  • max_tokens: 8192 allows sufficient space for detailed clause-by-clause analysis
  • System prompt defines the model's persona, assessment criteria, and legislation references specific to the selected skill
  • model: ANALYSIS_MODEL — the analysis model ID is read from dashboard/lib/model-providers/anthropic.ts, configurable via the ANALYSIS_MODEL env var. Use OPENAI_ANALYSIS_MODEL to override the OpenAI provider model ID.
  • The user's API key is used directly -- the server holds no keys of its own

Step 6: SSE Streaming Events

During analysis, the streamSkillAnalysis() function emits SSE events to the client via a ReadableStream:

Progress Events

event: progress
data: {"stage":"Connecting to Claude...","percent":5}

event: progress
data: {"stage":"Analysing clauses...","percent":15}

event: progress
data: {"stage":"Reviewing provisions...","percent":30}

event: progress
data: {"stage":"Scoring risks...","percent":55}

event: progress
data: {"stage":"Finalising recommendations...","percent":75}

event: progress
data: {"stage":"Building results...","percent":90}

event: progress
data: {"stage":"Complete","percent":100}

Progress stages are mapped from the chunk count as the streaming response arrives:

Percent RangeStage Label
5%Connecting to Claude...
15--25%Analysing clauses...
25--40%Reviewing provisions...
40--60%Scoring risks...
60--85%Finalising recommendations...
90%Building results...
100%Complete

Result Event

event: result
data: {"id":"abc123","type":"contract","score":72,"grade":"C","clauses":[...],...}

Error Event

event: error
data: {"error":"Rate limit exceeded. Please try again later."}

Step 7: Review Construction

When the stream completes, the server extracts the tool_use block from Claude's final message:

typescript
const toolBlock = finalMessage.content.find(
  (block): block is Anthropic.ToolUseBlock => block.type === "tool_use"
);

The buildReview() function constructs a typed Review object:

  1. Generates a UUID for the review (crypto.randomUUID())
  2. Looks up the review type from SKILL_TO_TYPE mapping
  3. Extracts common fields: documentName, summary, metadata
  4. Switches on review type to build the specific variant:
Review TypeConstructed ObjectKey Fields
contractContractReviewscore, grade, clauses[], recommendations[]
employmentEmploymentReviewscore, grade, clauses[], era2025Dashboard[], equalityActMatrix[], obligations[], financialExposure
ir35IR35Assessmentir35Score, status, confidence, factors[], riskIndicators[], contractAmendments[], financialExposure
complianceComplianceAuditscore, grade, frameworks[], checkItems[], recommendations[]

Fallback Handling

If Claude does not return a tool_use block (e.g., returns plain text instead), buildFallbackReview() creates a minimal review object with default scores of 50 and the raw text as the summary. This ensures the UI always has a renderable result.

Skill-to-Type Mapping

SkillsReview Type
legal-review, legal-risks, legal-missing, legal-plain, legal-freelancer, legal-property, legal-corporate, legal-negotiate, legal-dispute, legal-benchmark, legal-due-diligence, legal-tenancy, legal-ip, legal-debt, legal-willscontract
legal-employmentemployment
legal-ir35ir35
legal-compliance, legal-aml, legal-consumer, legal-esg, legal-ai-compliance, legal-regulatory-calendar, legal-legislation-tracker, legal-gdpr, legal-immigrationcompliance

Step 8: Client-Side Storage

The client receives the result SSE event and:

  1. Parses the Review JSON
  2. Calls saveReview() from storage.ts
  3. saveReview() calls mergeSavedReviews() to add the new review to the existing array (deduplicating by ID)
  4. Serialises the updated array to localStorage under the key ai-legal-uk-reviews
  5. Emits a ai-legal-uk:storage-sync custom event for cross-tab synchronisation
typescript
// storage.ts
export function saveReview(review: Review): void {
  localStorage.setItem(
    KEYS.REVIEWS,
    JSON.stringify(mergeSavedReviews(getReviews(), review)),
  );
  emitStorageSync();
}

Step 9: Redirect and Rendering

After saving, the client redirects to /review/[id] where [id] is the review's UUID.

The review detail page (app/review/[id]/page.tsx) loads the review from localStorage and renders:

Common Elements (All Review Types)

ComponentWhat It Shows
Score gaugeAnimated SVG circular display of the overall score (0--100). Colour: green (75+), amber (50--74), red (below 50).
Grade badgeLetter grade (A--F) prominently displayed next to the score.
SummaryExecutive summary of findings with statute links (via linkify-statutes.tsx).
MetadataParties, effective date, governing law, total value, contract type.
RecommendationsPrioritised action items: critical (red), high (orange), medium (amber), low (green). Each shows current text and replacement text.

Variant-Specific Sections

Review TypeAdditional Sections
ContractReviewClause cards (expandable, each with risk score, issues, and recommendation), risk heatmap (interactive grid)
EmploymentReviewERA 2025 compliance dashboard (pass/fail/warning per right), Equality Act matrix (status per protected characteristic), obligations table (employer/employee with triggers and deadlines), financial exposure breakdown
IR35AssessmentStatus badge (inside/outside/borderline), confidence percentage, factor-by-factor analysis (8 CEST factors with scores and evidence), risk indicators list, contract amendment recommendations, financial exposure if caught inside IR35
ComplianceAuditFramework scores table (name, score, max, weight, status per framework), individual check items (framework, reference, check description, pass/fail/warning status, notes)

Clause Card Detail

Each clause card in the ContractReview view is expandable and shows:

┌─────────────────────────────────────────┐
│ [Ref] Clause Title              [Risk: 72]
│ ─────────────────────────────────────── │
│ Clause text (collapsed by default)      │
│                                         │
│ Issues:                                 │
│  • Issue 1                              │
│  • Issue 2                              │
│                                         │
│ Recommendation: [text]          [Copy]  │
└─────────────────────────────────────────┘

Error Handling

Errors can occur at multiple points in the flow. Each is handled gracefully:

StageErrorHandling
UploadUnsupported file typeClient shows validation message before upload
ExtractionBackend unavailablegetMissingExtractionBackendMessage() returns a user-friendly message
ExtractionNo extractable text (scanned PDF)OCR fallback attempted; if that fails, error returned
ValidationInvalid API key401 with message "Missing or invalid API key"
ValidationInvalid skill400 with valid skill list
ValidationDocument too large413 with character limit
API callAuthentication failuremapReviewRouteError() maps to 401
API callRate limitmapReviewRouteError() maps to 429
API callOther Anthropic errorSSE error event with message
StreamingNetwork interruptionStream closes; client shows error state
Review buildNo tool_use blockbuildFallbackReview() creates minimal review with raw text

Demo Mode Flow

When the dashboard is in demo mode (localStorage: ai-legal-uk-mode = "demo"), the flow is significantly shorter:

User selects skill → Client loads fixture from demo-data/ → Renders immediately

No API calls, no file upload to server, no SSE streaming. Demo fixtures are pre-built Review objects stored in dashboard/lib/demo-data/ that exercise all four review types and all UI components.


Tracked-changes export — recommendations to redlines

Once a review is rendered, the user can export an amended Word document with the recommendations applied as Word tracked changes. Two variants:

  • tracked-changes — the user's original .docx is round-tripped: every recommendation that matches verbatim is applied as a <w:ins> / <w:del> pair under the configured author; recommendations that don't match are preserved as Word comments. Byte-for-byte fidelity to the original.
  • clean-draft — a fresh .docx is built from the recommendations alone via the pure-TS generator in dashboard/lib/word-draft-server.ts (no Python shellout — required for Vercel serverless).

Recommendations to redlines — the tracked-changes pipeline

Mermaid source — the diagram above is generated from this
mermaid
flowchart LR
  REC["Recommendations<br/>currentText · replacementText · author"]
  ORIG["Original .docx<br/>fetched from R2 by documentId"]
  PIZ["I · PizZip<br/>unzip word/document.xml"]
  MATCH["II · Three-layer fuzzy match<br/>verbatim · whitespace-tolerant · run-tolerant"]
  APPLY["III · Apply<br/>&lt;w:ins&gt; / &lt;w:del&gt; pairs"]
  SKIP["Unmatched<br/>→ Word comments<br/>(skipped[])"]
  OUT["Tracked-changes .docx<br/>byte-for-byte fidelity"]

  REC --> MATCH
  ORIG --> PIZ
  PIZ --> APPLY
  MATCH --> APPLY
  MATCH -.-> SKIP
  APPLY --> OUT
  SKIP --> OUT

The pipeline lives in dashboard/lib/track-changes.ts and is exercised by the tracked-changes variant of app/api/review/word/route.ts. The route fetches the original .docx bytes from R2 by documentId, runs them through track-changes.ts, and streams the resulting .docx back to the browser. Tests live alongside in track-changes.test.ts.

AI Legal UK · The Counsel — Established MMXXVI · Built for England & Wales · Not legal advice.