Natural Language Processing

We provide a natural language processing (NLP) API that structures clinical information from unstructured text and image files, such as jpg, png, and pdf files. This is part of our RESTful API and is based on the FHIR 3 protocol.

Resources

FHIR http://hl7.org/fhir/
SMART on FHIR http://docs.smarthealthit.org/
Health Gorilla FHIR RESTful API specification
Health Gorilla Clinical Network OAuth 2.0 Guide

The workflow of our NLP API consists of four stages:

Add document to the patient’s chart. See “Health Gorilla FHIR RESTful API specification” for more details
Create Subscription to be notified about task completeness and receive results
Start NLP job for the given document
Receive results

Create Subscription

First, you should add a subscription to receive operation results. It should be a webhook. Health Gorilla sends HTTP POST to the specified URL, when operation is complete. Payload contains the result of the operation performed for the given document.

References:

FHIR STU3 Subscription https://www.hl7.org/fhir/STU3/subscription.html
Health Gorilla FHIR RESTful API specification, chapter ‘Subscriptions’

Below you can find required attributes.

Attribute	Value
criteria	DocumentReference.OCR
channel.endpoint	TLS1.2 +
channel.payload	application/hg-ocr+json
channel.header	HTTP header that is used to authorise the request

Example:

{
    "resourceType": "Subscription",
    "status": "requested",
    "end": "2022-01-01T00:00:00Z",
    "reason": "NLP",
    "criteria": "DocumentReference.OCR",
    "channel": {
        "type": "rest-hook",
        "endpoint": "https://your-site.com/hg/webhook",
        "payload": "application/hg-ocr+json",
        "header": [
            "Authorization: Bearer your-secret-token"
        ]
    }
}

Start NLP job

Resources:

FHIR STU3 DocumentReference https://www.hl7.org/fhir/STU3/documentreference.html
Health Gorilla FHIR RESTful API specification, chapter ‘DocumentReference’

Health Gorilla defines a custom OCR operation applicable to the DocumentReference resource.

To extract text from the given document send HTTP GET:

https://api.healthgorilla.com/fhir/DocumentReference/DOCUMENT_ID/$ocr

To extract text and medical info from the given document send HTTP GET:

https://api.healthgorilla.com/fhir/DocumentReference/DOCUMENT_ID/$ocr?cm=true

If you have active subscriptions that can receive operation results, then you’ll get the HTTP 200 OK response that contains list of active subscriptions.

{
    "resourceType": "Parameters",
    "parameter": [
        {
            "name": "subscription",
            "valueId": "Subscription/2913165d0f881b6a0bd35afd"
        }
    ]
}

If you have not active subscriptions then you’ll get HTTP 405 Method Not Allowed response.

Receive the Results

Once operation completes you will receive the HTTP GET request to your Webhook endpoint.

Request will be in JSON format and consist of the attributes:

Attribute	Type	Description
resource	String	DocumentReference
id	String	The ID of the resource
ocr	Array	List of JSON Object

OCR Object

Attribute	Type	Description
name	String	The name of the document
contentType	String	Example: application/pdf
success	Boolean	True if the document was processed, false otherwise.
error_code	String	The error code in case of failure.
pages	Array	List of Page objects that contains text extracted from the document.
entities	Array	List of Entity objects
unmappedAttributes	Array	List of Attribute objects. This array includes list of specific attributes that were extracted but were not mapped to an entity.

Page Object

Attribute	Type	Description
number	String	Page number
lines	Array	List of lines found.

Line Object

Attribute	Type	Description
confidence	Float	The confidence score that stores the accuracy of the recognized text. Minimum value of 0. Maximum value of 100.
text	String
page	Number

Entity Object

Entity provides information about an extracted medical term

Attribute	Type	Description
Id	Integer	Number identifier
Category	String	Category of the entity. The following entities are supported at the moment: MEDICATION MEDICAL_CONDITION PROTECTED_HEALTH_INFORMATION TEST_TREATMENT_PROCEDURE * ANATOMY
Attributes	Array	List of Attribute objects that relate to this entity. Dependent on entity category
BeginOffset	Integer	0-based offset in the input text that shows where the entity starts
EndOffset	Integer	0-based offset in the input text that shows where the entity ends
Score	Float	The level of confidence in the accuracy of the detection
Text	String	Segment of input text extracted as the entity
Traits	Array	Array of Traits objects

Attribute Object

An extracted segment of the text that is an attribute of an entity, or otherwise related to an entity, such as the dosage of a medication taken. It contains information about the attribute such as id, begin and end offset within the input text, and the segment of the input text

Attribute	Type	Description
Id	Integer	Number identifier
BeginOffset	Integer	0-based offset in the input text that shows where the entity starts
EndOffset	Integer	0-based offset in the input text that shows where the entity ends
Score	Float	The level of confidence in the accuracy of the detection
RelationshipScore	Float	The level of confidence in the accuracy that attribute relates to the entity
Text	String	Segment of input text extracted as the entity
Traits	Array	Array of Traits objects
Type	String	Specific type of entity. Currently the following types are supported: NAME DOSAGE FORM FREQUENCY DURATION GENERIC_NAME BRAND_NAME STRENGTH RATE TEST_NAME TEST_UNITS PROCEDURE_NAME TREATMENT_NAME DATE AGE CONTACT_POINT EMAIL IDENTIFIER URL ADDRESS SYSTEM_ORGAN_SITE DIRECTION QUALITY QUANTITY

Trait Object

Provides contextual information about the extracted entity.

Attribute	Type	Description
Name	String	Name or contextual description about the trait
Score	Float	The level of confidence in the accuracy of the detection

Example

{  
   "resource":"DocumentReference",
   "id":"7e00155db23e480d67609b59",
   "ocr":[  
      {  
         "name":"laboratory_result.pdf",
         "contentType":"application/pdf",
         "Success":true,
         "pages":[  
            {  
               "number":1,
               "lines":[  
                  {  
                     "confidence":55.2390022277832,
                     "text":"From",
                     "page":1
                  },
                  {  
                     "confidence":93.70309448242188,
                     "text":"Mon 05 Nov 2012 12:10:19 PM PST",
                     "page":1
                  },
                  ...
            },
            ...
         ],
         "entities":[  
            {  
               "resource":"Entity",
               "id":0,
               "category":"PROTECTED_HEALTH_INFORMATION",
               "type":"DATE",
               "text":"Mon 05 Nov 2012",
               "score":0.99408334
            },
            {  
               "resource":"Entity",
               "id":52,
               "category":"MEDICAL_CONDITION",
               "type":"DX_NAME",
               "text":"venous thrombosis",
               "score":0.9674222,
               "traits":[  
                  {  
                     "resource":"Trait",
                     "name":"DIAGNOSIS",
                     "score":0.9529462
                  }
               ]
            },
            ...
         ]
      }
   ]
}

Limits

Health Gorilla NLP API has set of restrictions.

Only PDF, JPEG, PNG documents can be processed
The maximum document image (JPEG/PNG) size is 5 MB.
The maximum PDF file size is 500 MB.
The maximum number of pages in a PDF file is 3000