Data Protection Impact Assessment (DPIA)

This DPIA covers the document-review processing carried out by The Counsel (the hosted dashboard at the-counsel.co.uk). It follows the structure of the ICO DPIA template. Self-hosted deployments should adapt this document to their own infrastructure and complete their own sign-off.

Controller: The Counsel (the-counsel.co.uk) · Contact: privacy@the-counsel.co.ukLast reviewed: 11 June 2026

Step 1 · Identify the need for a DPIA

The service analyses legal documents that users upload. Legal documents are unpredictable in content: a tenancy agreement, employment contract or dispute bundle can incidentally contain special category data (health, union membership, criminal allegations) and data about third parties who are not the user. Processing involves several AI providers acting as processors. Innovative-technology use plus the potential for sensitive content makes a DPIA appropriate under UK GDPR Article 35.

Step 2 · Describe the processing

Nature

The pipeline is user-initiated, per document:

Upload — the user uploads a PDF/DOCX or pastes text on /review.
Extraction — text is extracted locally in the application (no third party). For scanned PDFs, page images go to an OCR provider: Cloudflare Workers AI, with chained fallback to OpenAI; a Firecrawl provider option also exists and is run with its zeroDataRetention flag enabled.
AI analysis — the extracted text is sent to Anthropic or OpenAI (per the user's provider selection) for clause, risk and compliance analysis.
Storage — review results are stored as rows in Neon Postgres; the original file in Cloudflare R2. Optionally, a review summary is sent to Google Gemini TTS to generate an audio briefing, which is cached in R2.

Scope — data categories

Category	Examples	Source
Account data	Email, name, profile image	Clerk sign-up
Document content	Anything a legal document can contain — including potential special category data and third-party personal data	User upload
Derived analysis	Clause findings, risk scores, recommendations, audio narration	Generated
Operational data	Usage counters, review-run telemetry, extraction telemetry (no document text), request logs	Generated
Usage analytics	Aggregated, cookieless page views and funnel events (Vercel Web Analytics) — no cookies, no cross-site tracking, no identifiers	Generated

Context and purposes

Users are individuals and small businesses in England & Wales reviewing their own legal documents. The sole purpose is to deliver the analysis the user explicitly requests. Documents are not used to train models — by us or, under their API business terms, by Anthropic or OpenAI.

Step 3 · Consultation

Single-developer product; no DPO is appointed (not required at current scale). User-facing processing is described in the public privacy notice, and users can raise concerns via privacy@the-counsel.co.uk. Processor terms (DPAs) are linked from the privacy notice.

Step 4 · Necessity and proportionality

Necessity: the service cannot function without processing the document — analysing its text is the service. There is no less-intrusive alternative that still delivers a clause-level review.
Lawful basis: performance of a contract (the review the user requests); legitimate interests for security and fair-use limits.
Proportionality: processing is user-initiated per document; nothing is analysed in the background. Document content goes to AI providers only at the moment of review. Telemetry deliberately excludes document text. Users control deletion (per review, per matter, or whole account) and can export their data from Settings.

Step 5 · Identify and assess risks

#	Risk	Likelihood	Severity	Overall
R1	Cross-tenant access — one user reads or writes another user's documents/reviews via a tampered identifier	Low	High	Medium
R2	Processor retention — document content persists at an AI provider beyond the review	Low	Medium	Medium
R3	Scanned personal documents — OCR sends page images (which may show ID documents, signatures, handwriting) to a third-party provider	Medium	Medium	Medium
R4	Output leakage / prompt injection — adversarial text inside an uploaded document steers the model into reproducing other context or off-purpose content in the review output	Low	Medium	Low–Medium
R5	Orphaned data after account deletion — files or rows survive an account erasure	Low	High	Medium
R6	Runaway hosted usage — abuse of hosted keys causes excessive processing volume	Medium	Low	Low

Step 6 · Measures to reduce risk

All measures below exist in the codebase today.

Risk	Measure
R1	Every non-user row carries a `userId` (defence-in-depth tenancy); routes that accept a client-supplied `matterId` must pass `resolveOwnedMatterId`, which confirms ownership before any write; `[id]` API routes guard against malformed identifiers.
R2	Providers are used under API business terms that exclude training on customer data; OpenAI's API abuse-monitoring retention is capped at ≤30 days; the Firecrawl OCR path runs with `zeroDataRetention` enabled.
R3	OCR is invoked only for scanned documents the user explicitly submits; document-extraction telemetry contains no document text or images and is rotated within 90 days.
R4	Reviews are scoped to a single document per run — the model receives only the submitted document and the system prompt, so there is no cross-user context to leak; outputs are rendered as data, not executed.
R5	Account deletion cascades: Postgres cascade deletes wire user → matter → review/document and review → audio briefing, and the account-deletion route erases the corresponding R2 objects; a Clerk `user.deleted` webhook performs the same erasure if the account is deleted at the identity provider.
R6	Hosted-trial usage is quota-capped (monthly and per-feature daily limits, with per-company overrides), so a compromised session cannot drive unbounded processing.
All	Operational usage records and review-run telemetry are deleted after 365 days by a retention job; users can self-serve a full export of their data (Settings → Export my data) and full erasure at any time.

Step 7 · Residual risk and sign-off

Residual risk: low. The remaining exposure is inherent to the service: document content must transit AI providers to be analysed, and users may upload documents containing third-party or special-category data that we cannot screen in advance. These are accepted on the basis of the contractual safeguards above, user-initiated processing, and self-service erasure/export.

No residual high risk is identified, so prior consultation with the ICO under Article 36 is not required.

Item	Name / date	Notes
Measures approved by	[controller — name, date]	Integrate actions back into project plan
Residual risks approved by	[controller — name, date]	If accepting any high residual risk, consult the ICO before going live
DPIA review date	[controller — set + 12 months]	Re-run on any change to providers, retention windows or data categories

Data Protection Impact Assessment (DPIA) ​

Step 1 · Identify the need for a DPIA ​

Step 2 · Describe the processing ​

Nature ​

Scope — data categories ​

Context and purposes ​

Step 3 · Consultation ​

Step 4 · Necessity and proportionality ​

Step 5 · Identify and assess risks ​

Step 6 · Measures to reduce risk ​

Step 7 · Residual risk and sign-off ​