Natural Language Processing

Welcome to Health Gorilla's Developer Portal. Access API documentation for our suite of clinical data APIs.

We provide a natural language processing (NLP) API that structures clinical information from unstructured text and image files, such as jpg, png, and pdf files. This is part of our RESTful API and is based on the FHIR 3 protocol.

Resources

The workflow of our NLP API consists of four stages:

  • Add document to the patient’s chart. See “Health Gorilla FHIR RESTful API specification” for more details
  • Create Subscription to be notified about task completeness and receive results
  • Start NLP job for the given document
  • Receive results

Create Subscription

First, you should add a subscription to receive operation results. It should be a webhook. Health Gorilla sends HTTP POST to the specified URL, when operation is complete. Payload contains the result of the operation performed for the given document.

References:

Below you can find required attributes.

AttributeValue
criteriaDocumentReference.OCR
channel.endpointTLS1.2 +
channel.payloadapplication/hg-ocr+json
channel.headerHTTP header that is used to authorise the request

Example:

{
    "resourceType": "Subscription",
    "status": "requested",
    "end": "2022-01-01T00:00:00Z",
    "reason": "NLP",
    "criteria": "DocumentReference.OCR",
    "channel": {
        "type": "rest-hook",
        "endpoint": "https://your-site.com/hg/webhook",
        "payload": "application/hg-ocr+json",
        "header": [
            "Authorization: Bearer your-secret-token"
        ]
    }
}

Start NLP job

Resources:

Health Gorilla defines a custom OCR operation applicable to the DocumentReference resource.

To extract text from the given document send HTTP GET:

https://api.healthgorilla.com/fhir/DocumentReference/DOCUMENT_ID/$ocr

To extract text and medical info from the given document send HTTP GET:

https://api.healthgorilla.com/fhir/DocumentReference/DOCUMENT_ID/$ocr?cm=true

If you have active subscriptions that can receive operation results, then you’ll get the HTTP 200 OK response that contains list of active subscriptions.

{
    "resourceType": "Parameters",
    "parameter": [
        {
            "name": "subscription",
            "valueId": "Subscription/2913165d0f881b6a0bd35afd"
        }
    ]
}

If you have not active subscriptions then you’ll get HTTP 405 Method Not Allowed response.

Receive the Results

Once operation completes you will receive the HTTP GET request to your Webhook endpoint.

Request will be in JSON format and consist of the attributes:

AttributeTypeDescription
resourceStringDocumentReference
idStringThe ID of the resource
ocrArrayList of JSON Object

OCR Object

AttributeTypeDescription
nameStringThe name of the document
contentTypeStringExample: application/pdf
successBooleanTrue if the document was processed, false otherwise.
error_codeStringThe error code in case of failure.
pagesArrayList of Page objects that contains text extracted from the document.
entitiesArrayList of Entity objects
unmappedAttributesArrayList of Attribute objects. This array includes list of specific attributes that were extracted but were not mapped to an entity.

Page Object

AttributeTypeDescription
numberStringPage number
linesArrayList of lines found.

Line Object

AttributeTypeDescription
confidenceFloatThe confidence score that stores the accuracy of the recognized text.
Minimum value of 0. Maximum value of 100.
textString
pageNumber

Entity Object

Entity provides information about an extracted medical term

AttributeTypeDescription
IdIntegerNumber identifier
CategoryStringCategory of the entity. The following entities are supported at the moment:
MEDICATION
MEDICAL_CONDITION
PROTECTED_HEALTH_INFORMATION
TEST_TREATMENT_PROCEDURE
* ANATOMY
AttributesArrayList of Attribute objects that relate to this entity. Dependent on entity category
BeginOffsetInteger0-based offset in the input text that shows where the entity starts
EndOffsetInteger0-based offset in the input text that shows where the entity ends
ScoreFloatThe level of confidence in the accuracy of the detection
TextStringSegment of input text extracted as the entity
TraitsArrayArray of Traits objects

Attribute Object

An extracted segment of the text that is an attribute of an entity, or otherwise related to an entity, such as the dosage of a medication taken. It contains information about the attribute such as id, begin and end offset within the input text, and the segment of the input text

AttributeTypeDescription
IdIntegerNumber identifier
BeginOffsetInteger0-based offset in the input text that shows where the entity starts
EndOffsetInteger0-based offset in the input text that shows where the entity ends
ScoreFloatThe level of confidence in the accuracy of the detection
RelationshipScoreFloatThe level of confidence in the accuracy that attribute relates to the entity
TextStringSegment of input text extracted as the entity
TraitsArrayArray of Traits objects
TypeStringSpecific type of entity. Currently the following types are supported:
NAME
DOSAGE
FORM
FREQUENCY
DURATION
GENERIC_NAME
BRAND_NAME
STRENGTH
RATE
TEST_NAME
TEST_UNITS
PROCEDURE_NAME
TREATMENT_NAME
DATE
AGE
CONTACT_POINT
EMAIL
IDENTIFIER
URL
ADDRESS
SYSTEM_ORGAN_SITE
DIRECTION
QUALITY
QUANTITY

Trait Object

Provides contextual information about the extracted entity.

AttributeTypeDescription
NameStringName or contextual description about the trait
ScoreFloatThe level of confidence in the accuracy of the detection

Example

{  
   "resource":"DocumentReference",
   "id":"7e00155db23e480d67609b59",
   "ocr":[  
      {  
         "name":"laboratory_result.pdf",
         "contentType":"application/pdf",
         "Success":true,
         "pages":[  
            {  
               "number":1,
               "lines":[  
                  {  
                     "confidence":55.2390022277832,
                     "text":"From",
                     "page":1
                  },
                  {  
                     "confidence":93.70309448242188,
                     "text":"Mon 05 Nov 2012 12:10:19 PM PST",
                     "page":1
                  },
                  ...
            },
            ...
         ],
         "entities":[  
            {  
               "resource":"Entity",
               "id":0,
               "category":"PROTECTED_HEALTH_INFORMATION",
               "type":"DATE",
               "text":"Mon 05 Nov 2012",
               "score":0.99408334
            },
            {  
               "resource":"Entity",
               "id":52,
               "category":"MEDICAL_CONDITION",
               "type":"DX_NAME",
               "text":"venous thrombosis",
               "score":0.9674222,
               "traits":[  
                  {  
                     "resource":"Trait",
                     "name":"DIAGNOSIS",
                     "score":0.9529462
                  }
               ]
            },
            ...
         ]
      }
   ]
}

Limits

Health Gorilla NLP API has set of restrictions.

  • Only PDF, JPEG, PNG documents can be processed
  • The maximum document image (JPEG/PNG) size is 5 MB.
  • The maximum PDF file size is 500 MB.
  • The maximum number of pages in a PDF file is 3000