By default Folio infers what to extract from the detected document type. Custom schemas let you declare the exact fields and table columns you need, regardless of document type — and to get typed values with validation flags when extraction results don’t match expectations.
Two ways to supply a schema
Pass the schema as a JSON string in the extraction_schema multipart form field. The schema is used for that single request only and is not saved.
curl -X POST https://api.glialhealth.com/v1/documents \
-H "Authorization: Bearer sk_test_..." \
-F file=@lab_report.pdf \
-F document_type=lab_report \
-F extraction_schema='{
"fields": [
{"key": "patient_name", "type": "string", "required": true},
{"key": "test_date", "type": "date", "required": true},
{"key": "hemoglobin", "type": "number", "required": false,
"hint": "Hemoglobin value in g/dL"}
],
"tables": [
{"key": "test_results", "columns": ["test_name", "value", "reference_range"]}
]
}'
Save the schema once with POST /v1/extraction-schemas, then reference it by ID. Saved schemas get a stable schm_… ID that you can reuse across many submissions.
Create the schema
curl -X POST https://api.glialhealth.com/v1/extraction-schemas \
-H "Authorization: Bearer sk_test_..." \
-H "Content-Type: application/json" \
-d '{
"name": "Lab report v1",
"document_type": "lab_report",
"fields": [
{"key": "patient_name", "type": "string", "required": true},
{"key": "test_date", "type": "date", "required": true},
{"key": "hemoglobin", "type": "number", "required": false,
"hint": "Hemoglobin value in g/dL"}
],
"tables": [
{"key": "test_results", "columns": ["test_name", "value", "reference_range"]}
]
}'
Response (201 Created):{
"id": "schm_01jaxyz...",
"object": "extraction_schema",
"name": "Lab report v1",
"document_type": "lab_report",
"fields": [...],
"tables": [...],
"created_at": "2025-10-14T18:23:00Z"
}
Reference by ID at submission time
curl -X POST https://api.glialhealth.com/v1/documents \
-H "Authorization: Bearer sk_test_..." \
-F file=@lab_report.pdf \
-F extraction_schema_id=schm_01jaxyz...
Providing both extraction_schema and extraction_schema_id in the same request returns 422 with code schema_conflict. Use one or the other.
Field spec reference
Each field in fields is a CustomFieldSpec:
| Property | Type | Default | Description |
|---|
key | string | — | Field name (required). Used as the key in the extract result object. |
type | string | number | date | boolean | string | Expected value type. |
required | boolean | false | If true and the field is absent, a missing_field:<key> flag is raised. |
pattern | string (regex) | — | Optional regex; mismatch raises invalid_field:<key>. |
hint | string | — | Natural-language prompt hint for the extractor. |
Each entry in tables is a CustomTableSpec:
| Property | Type | Description |
|---|
key | string | Table name (required). |
columns | string[] | Column names (1–20 required). |
Validation flags
When a schema is present, Folio compares extracted values against the spec and adds flags to the result:
| Flag | Meaning |
|---|
missing_field:<key> | A required field was not found in the document. |
invalid_field:<key> | The extracted value failed the pattern check or could not be coerced to the declared type. |
Both flag types appear in the flags array of the result and also contribute to review_status. See Confidence and HITL for how flags interact with the human-review workflow.
Managing saved schemas
List
curl "https://api.glialhealth.com/v1/extraction-schemas?limit=20" \
-H "Authorization: Bearer sk_test_..."
Get by ID
curl https://api.glialhealth.com/v1/extraction-schemas/schm_01jaxyz... \
-H "Authorization: Bearer sk_test_..."
Delete (soft)
curl -X DELETE https://api.glialhealth.com/v1/extraction-schemas/schm_01jaxyz... \
-H "Authorization: Bearer sk_test_..."
Deletion is a soft delete — the schema is deactivated and no longer returned in list results, but historical documents that referenced it are unaffected. Attempting to use the ID on a new submission after deletion returns 422.
Saved schemas are immutable once created. To change a schema, create a new one and update your submissions to reference the new ID.