Skip to main content
By default Folio infers what to extract from the detected document type. Custom schemas let you declare the exact fields and table columns you need, regardless of document type — and to get typed values with validation flags when extraction results don’t match expectations.

Two ways to supply a schema

Inline extraction_schema

Pass the schema as a JSON string in the extraction_schema multipart form field. The schema is used for that single request only and is not saved.
curl -X POST https://api.glialhealth.com/v1/documents \
  -H "Authorization: Bearer sk_test_..." \
  -F file=@lab_report.pdf \
  -F document_type=lab_report \
  -F extraction_schema='{
    "fields": [
      {"key": "patient_name",  "type": "string",  "required": true},
      {"key": "test_date",     "type": "date",    "required": true},
      {"key": "hemoglobin",    "type": "number",  "required": false,
       "hint": "Hemoglobin value in g/dL"}
    ],
    "tables": [
      {"key": "test_results", "columns": ["test_name", "value", "reference_range"]}
    ]
  }'

Saved schema (extraction_schema_id)

Save the schema once with POST /v1/extraction-schemas, then reference it by ID. Saved schemas get a stable schm_… ID that you can reuse across many submissions.
1

Create the schema

curl -X POST https://api.glialhealth.com/v1/extraction-schemas \
  -H "Authorization: Bearer sk_test_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Lab report v1",
    "document_type": "lab_report",
    "fields": [
      {"key": "patient_name", "type": "string",  "required": true},
      {"key": "test_date",    "type": "date",    "required": true},
      {"key": "hemoglobin",   "type": "number",  "required": false,
       "hint": "Hemoglobin value in g/dL"}
    ],
    "tables": [
      {"key": "test_results", "columns": ["test_name", "value", "reference_range"]}
    ]
  }'
Response (201 Created):
{
  "id": "schm_01jaxyz...",
  "object": "extraction_schema",
  "name": "Lab report v1",
  "document_type": "lab_report",
  "fields": [...],
  "tables": [...],
  "created_at": "2025-10-14T18:23:00Z"
}
2

Reference by ID at submission time

curl -X POST https://api.glialhealth.com/v1/documents \
  -H "Authorization: Bearer sk_test_..." \
  -F file=@lab_report.pdf \
  -F extraction_schema_id=schm_01jaxyz...
Providing both extraction_schema and extraction_schema_id in the same request returns 422 with code schema_conflict. Use one or the other.

Field spec reference

Each field in fields is a CustomFieldSpec:
PropertyTypeDefaultDescription
keystringField name (required). Used as the key in the extract result object.
typestring | number | date | booleanstringExpected value type.
requiredbooleanfalseIf true and the field is absent, a missing_field:<key> flag is raised.
patternstring (regex)Optional regex; mismatch raises invalid_field:<key>.
hintstringNatural-language prompt hint for the extractor.
Each entry in tables is a CustomTableSpec:
PropertyTypeDescription
keystringTable name (required).
columnsstring[]Column names (1–20 required).

Validation flags

When a schema is present, Folio compares extracted values against the spec and adds flags to the result:
FlagMeaning
missing_field:<key>A required field was not found in the document.
invalid_field:<key>The extracted value failed the pattern check or could not be coerced to the declared type.
Both flag types appear in the flags array of the result and also contribute to review_status. See Confidence and HITL for how flags interact with the human-review workflow.

Managing saved schemas

List

curl "https://api.glialhealth.com/v1/extraction-schemas?limit=20" \
  -H "Authorization: Bearer sk_test_..."

Get by ID

curl https://api.glialhealth.com/v1/extraction-schemas/schm_01jaxyz... \
  -H "Authorization: Bearer sk_test_..."

Delete (soft)

curl -X DELETE https://api.glialhealth.com/v1/extraction-schemas/schm_01jaxyz... \
  -H "Authorization: Bearer sk_test_..."
Deletion is a soft delete — the schema is deactivated and no longer returned in list results, but historical documents that referenced it are unaffected. Attempting to use the ID on a new submission after deletion returns 422.
Saved schemas are immutable once created. To change a schema, create a new one and update your submissions to reference the new ID.