Skip to main content
When you set deidentify=true on a submission, Folio detects and masks protected health information (PHI) across all surfaces of the extracted result — field values, table cells, and raw recognition output — before anything is stored or returned. The original document bytes are never stored after masking completes.

Enabling de-identification

Add deidentify=true as a form field:
curl -X POST https://api.glialhealth.com/v1/documents \
  -H "Authorization: Bearer sk_test_..." \
  -F file=@patient_record.pdf \
  -F document_type=lab_report \
  -F deidentify=true
The result’s deidentified field will be true to confirm masking was applied.

Typed-token masking

Each detected PHI entity is replaced with a typed token that preserves the category of information without exposing the actual value:
TokenPHI category
[NAME]Person names
[DATE]Dates (birth dates, test dates, appointment dates)
[ADDRESS]Street addresses, postal codes, cities
[PHONE]Phone and fax numbers
[EMAIL]Email addresses
[ID]Patient IDs, health card numbers, MRNs, SSNs
[OTHER]Any other PHI that doesn’t fit a specific category

Before / after example

Before de-identification (deidentify=false):
{
  "extract": {
    "patient_name": {
      "value": "Jane Doe",
      "confidence": 0.99,
      "found": true,
      "page": 1,
      "bbox": {"x": 0.05, "y": 0.12, "width": 0.15, "height": 0.02},
      "block_ids": ["p1-b3"]
    },
    "date_of_birth": {
      "value": "1985-03-22",
      "confidence": 0.97,
      "found": true,
      "page": 1,
      "bbox": {"x": 0.05, "y": 0.14, "width": 0.11, "height": 0.02},
      "block_ids": ["p1-b5"]
    }
  },
  "deidentified": false
}
After de-identification (deidentify=true):
{
  "extract": {
    "patient_name": {
      "value": "[NAME]",
      "confidence": 0.99,
      "found": true,
      "page": 1,
      "bbox": {"x": 0.05, "y": 0.12, "width": 0.15, "height": 0.02},
      "block_ids": ["p1-b3"]
    },
    "date_of_birth": {
      "value": "[DATE]",
      "confidence": 0.97,
      "found": true,
      "page": 1,
      "bbox": {"x": 0.05, "y": 0.14, "width": 0.11, "height": 0.02},
      "block_ids": ["p1-b5"]
    }
  },
  "deidentified": true
}
Bounding boxes, confidence scores, and found flags are preserved — only the value is masked.

Masking scope

De-identification applies to all three surfaces of the result:
  • extract field values — each field’s value string is scanned and PHI tokens are substituted.
  • tables cell values — every cell in every extracted table is scanned.
  • Recognition output — if include_recognition=true is also set, the raw OCR text in the recognition payload is masked before it is returned.

Secondary-use and regulatory context

De-identification is designed to support secondary use of health data under Quebec Law 25 (Act to modernize legislative provisions as regards the protection of personal information) and similar frameworks that require personal information to be anonymised or de-identified before use for research, analytics, or model training. Consult your privacy and legal team to confirm that Folio’s token-based approach meets the specific de-identification standard required for your use case.
De-identification is performed on extracted text, not on the document image itself. If you retain the original PDF or image file outside of Folio, those files still contain the original PHI.