- What it does — reads PDFs, images, and scanned documents; identifies the document type; and pulls out named fields with per-field confidence scores.
- Who it’s for — health-tech teams that need reliable, privacy-preserving extraction from clinical and administrative documents at scale.
- Where it runs — Azure Canada Central, Law 25-aligned, with optional de-identification built into the pipeline.
How the pipeline works
POST /v1/documents call returns a 202 with a document id almost instantly. The result is ready seconds to minutes later depending on document size and pipeline configuration.
How these docs are organised
| Section | What you’ll find |
|---|---|
| Get started | Quickstart walkthrough, API key authentication |
| Guides | Async model, webhooks, custom extraction schemas, de-identification, confidence & HITL, language support |
| API reference | Full endpoint specs auto-generated from the OpenAPI schema |
Start here
Quickstart
Submit your first document and retrieve a structured result in under five minutes.
Authentication
Learn how API keys work and how to keep them safe.
Async model
Understand the queued → processing → completed lifecycle and how to poll or subscribe to results.
Custom schemas
Define your own field list for any document type.