Data Protection Impact Assessment (DPIA)
This DPIA covers the document-review processing carried out by The Counsel (the hosted dashboard at the-counsel.co.uk). It follows the structure of the ICO DPIA template. Self-hosted deployments should adapt this document to their own infrastructure and complete their own sign-off.
Controller: The Counsel (the-counsel.co.uk) · Contact: privacy@the-counsel.co.ukLast reviewed: 11 June 2026
Step 1 · Identify the need for a DPIA
The service analyses legal documents that users upload. Legal documents are unpredictable in content: a tenancy agreement, employment contract or dispute bundle can incidentally contain special category data (health, union membership, criminal allegations) and data about third parties who are not the user. Processing involves several AI providers acting as processors. Innovative-technology use plus the potential for sensitive content makes a DPIA appropriate under UK GDPR Article 35.
Step 2 · Describe the processing
Nature
The pipeline is user-initiated, per document:
- Upload — the user uploads a PDF/DOCX or pastes text on
/review. - Extraction — text is extracted locally in the application (no third party). For scanned PDFs, page images go to an OCR provider: Cloudflare Workers AI, with chained fallback to OpenAI; a Firecrawl provider option also exists and is run with its
zeroDataRetentionflag enabled. - AI analysis — the extracted text is sent to Anthropic or OpenAI (per the user's provider selection) for clause, risk and compliance analysis.
- Storage — review results are stored as rows in Neon Postgres; the original file in Cloudflare R2. Optionally, a review summary is sent to Google Gemini TTS to generate an audio briefing, which is cached in R2.
Scope — data categories
| Category | Examples | Source |
|---|---|---|
| Account data | Email, name, profile image | Clerk sign-up |
| Document content | Anything a legal document can contain — including potential special category data and third-party personal data | User upload |
| Derived analysis | Clause findings, risk scores, recommendations, audio narration | Generated |
| Operational data | Usage counters, review-run telemetry, extraction telemetry (no document text), request logs | Generated |
Context and purposes
Users are individuals and small businesses in England & Wales reviewing their own legal documents. The sole purpose is to deliver the analysis the user explicitly requests. Documents are not used to train models — by us or, under their API business terms, by Anthropic or OpenAI.
Step 3 · Consultation
Single-developer product; no DPO is appointed (not required at current scale). User-facing processing is described in the public privacy notice, and users can raise concerns via privacy@the-counsel.co.uk. Processor terms (DPAs) are linked from the privacy notice.
Step 4 · Necessity and proportionality
- Necessity: the service cannot function without processing the document — analysing its text is the service. There is no less-intrusive alternative that still delivers a clause-level review.
- Lawful basis: performance of a contract (the review the user requests); legitimate interests for security and fair-use limits.
- Proportionality: processing is user-initiated per document; nothing is analysed in the background. Document content goes to AI providers only at the moment of review. Telemetry deliberately excludes document text. Users control deletion (per review, per matter, or whole account) and can export their data from Settings.
Step 5 · Identify and assess risks
| # | Risk | Likelihood | Severity | Overall |
|---|---|---|---|---|
| R1 | Cross-tenant access — one user reads or writes another user's documents/reviews via a tampered identifier | Low | High | Medium |
| R2 | Processor retention — document content persists at an AI provider beyond the review | Low | Medium | Medium |
| R3 | Scanned personal documents — OCR sends page images (which may show ID documents, signatures, handwriting) to a third-party provider | Medium | Medium | Medium |
| R4 | Output leakage / prompt injection — adversarial text inside an uploaded document steers the model into reproducing other context or off-purpose content in the review output | Low | Medium | Low–Medium |
| R5 | Orphaned data after account deletion — files or rows survive an account erasure | Low | High | Medium |
| R6 | Runaway hosted usage — abuse of hosted keys causes excessive processing volume | Medium | Low | Low |
Step 6 · Measures to reduce risk
All measures below exist in the codebase today.
| Risk | Measure |
|---|---|
| R1 | Every non-user row carries a userId (defence-in-depth tenancy); routes that accept a client-supplied matterId must pass resolveOwnedMatterId, which confirms ownership before any write; [id] API routes guard against malformed identifiers. |
| R2 | Providers are used under API business terms that exclude training on customer data; OpenAI's API abuse-monitoring retention is capped at ≤30 days; the Firecrawl OCR path runs with zeroDataRetention enabled. |
| R3 | OCR is invoked only for scanned documents the user explicitly submits; document-extraction telemetry contains no document text or images and is rotated within 90 days. |
| R4 | Reviews are scoped to a single document per run — the model receives only the submitted document and the system prompt, so there is no cross-user context to leak; outputs are rendered as data, not executed. |
| R5 | Account deletion cascades: Postgres cascade deletes wire user → matter → review/document and review → audio briefing, and the account-deletion route erases the corresponding R2 objects; a Clerk user.deleted webhook performs the same erasure if the account is deleted at the identity provider. |
| R6 | Hosted-trial usage is quota-capped (monthly and per-feature daily limits, with per-company overrides), so a compromised session cannot drive unbounded processing. |
| All | Operational usage records and review-run telemetry are deleted after 365 days by a retention job; users can self-serve a full export of their data (Settings → Export my data) and full erasure at any time. |
Step 7 · Residual risk and sign-off
Residual risk: low. The remaining exposure is inherent to the service: document content must transit AI providers to be analysed, and users may upload documents containing third-party or special-category data that we cannot screen in advance. These are accepted on the basis of the contractual safeguards above, user-initiated processing, and self-service erasure/export.
No residual high risk is identified, so prior consultation with the ICO under Article 36 is not required.
| Item | Name / date | Notes |
|---|---|---|
| Measures approved by | [controller — name, date] | Integrate actions back into project plan |
| Residual risks approved by | [controller — name, date] | If accepting any high residual risk, consult the ICO before going live |
| DPIA review date | [controller — set + 12 months] | Re-run on any change to providers, retention windows or data categories |