Natural Language Processing
Welcome to Health Gorilla's Developer Portal. Access API documentation for our suite of clinical data APIs.
We provide a natural language processing (NLP) API that structures clinical information from unstructured text and image files, such as jpg, png, and pdf files. This is part of our RESTful API and is based on the FHIR 3 protocol.
Resources
- FHIR http://hl7.org/fhir/
- SMART on FHIR http://docs.smarthealthit.org/
- Health Gorilla FHIR RESTful API specification
- Health Gorilla Clinical Network OAuth 2.0 Guide
The workflow of our NLP API consists of four stages:
- Add document to the patient’s chart. See “Health Gorilla FHIR RESTful API specification” for more details
- Create Subscription to be notified about task completeness and receive results
- Start NLP job for the given document
- Receive results
Create Subscription
First, you should add a subscription to receive operation results. It should be a webhook. Health Gorilla sends HTTP POST to the specified URL, when operation is complete. Payload contains the result of the operation performed for the given document.
References:
- FHIR STU3 Subscription https://www.hl7.org/fhir/STU3/subscription.html
- Health Gorilla FHIR RESTful API specification, chapter ‘Subscriptions’
Below you can find required attributes.
Attribute | Value |
---|---|
criteria | DocumentReference.OCR |
channel.endpoint | TLS1.2 + |
channel.payload | application/hg-ocr+json |
channel.header | HTTP header that is used to authorise the request |
Example:
{
"resourceType": "Subscription",
"status": "requested",
"end": "2022-01-01T00:00:00Z",
"reason": "NLP",
"criteria": "DocumentReference.OCR",
"channel": {
"type": "rest-hook",
"endpoint": "https://your-site.com/hg/webhook",
"payload": "application/hg-ocr+json",
"header": [
"Authorization: Bearer your-secret-token"
]
}
}
Start NLP job
Resources:
- FHIR STU3 DocumentReference https://www.hl7.org/fhir/STU3/documentreference.html
- Health Gorilla FHIR RESTful API specification, chapter ‘DocumentReference’
Health Gorilla defines a custom OCR operation applicable to the DocumentReference resource.
To extract text from the given document send HTTP GET:
https://api.healthgorilla.com/fhir/DocumentReference/DOCUMENT_ID/$ocr
To extract text and medical info from the given document send HTTP GET:
https://api.healthgorilla.com/fhir/DocumentReference/DOCUMENT_ID/$ocr?cm=true
If you have active subscriptions that can receive operation results, then you’ll get the HTTP 200 OK response that contains list of active subscriptions.
{
"resourceType": "Parameters",
"parameter": [
{
"name": "subscription",
"valueId": "Subscription/2913165d0f881b6a0bd35afd"
}
]
}
If you have not active subscriptions then you’ll get HTTP 405 Method Not Allowed response.
Receive the Results
Once operation completes you will receive the HTTP GET request to your Webhook endpoint.
Request will be in JSON format and consist of the attributes:
Attribute | Type | Description |
---|---|---|
resource | String | DocumentReference |
id | String | The ID of the resource |
ocr | Array | List of JSON Object |
OCR Object
Attribute | Type | Description |
---|---|---|
name | String | The name of the document |
contentType | String | Example: application/pdf |
success | Boolean | True if the document was processed, false otherwise. |
error_code | String | The error code in case of failure. |
pages | Array | List of Page objects that contains text extracted from the document. |
entities | Array | List of Entity objects |
unmappedAttributes | Array | List of Attribute objects. This array includes list of specific attributes that were extracted but were not mapped to an entity. |
Page Object
Attribute | Type | Description |
---|---|---|
number | String | Page number |
lines | Array | List of lines found. |
Line Object
Attribute | Type | Description |
---|---|---|
confidence | Float | The confidence score that stores the accuracy of the recognized text. Minimum value of 0. Maximum value of 100. |
text | String | |
page | Number |
Entity Object
Entity provides information about an extracted medical term
Attribute | Type | Description |
---|---|---|
Id | Integer | Number identifier |
Category | String | Category of the entity. The following entities are supported at the moment: MEDICATION MEDICAL_CONDITION PROTECTED_HEALTH_INFORMATION TEST_TREATMENT_PROCEDURE * ANATOMY |
Attributes | Array | List of Attribute objects that relate to this entity. Dependent on entity category |
BeginOffset | Integer | 0-based offset in the input text that shows where the entity starts |
EndOffset | Integer | 0-based offset in the input text that shows where the entity ends |
Score | Float | The level of confidence in the accuracy of the detection |
Text | String | Segment of input text extracted as the entity |
Traits | Array | Array of Traits objects |
Attribute Object
An extracted segment of the text that is an attribute of an entity, or otherwise related to an entity, such as the dosage of a medication taken. It contains information about the attribute such as id, begin and end offset within the input text, and the segment of the input text
Attribute | Type | Description |
---|---|---|
Id | Integer | Number identifier |
BeginOffset | Integer | 0-based offset in the input text that shows where the entity starts |
EndOffset | Integer | 0-based offset in the input text that shows where the entity ends |
Score | Float | The level of confidence in the accuracy of the detection |
RelationshipScore | Float | The level of confidence in the accuracy that attribute relates to the entity |
Text | String | Segment of input text extracted as the entity |
Traits | Array | Array of Traits objects |
Type | String | Specific type of entity. Currently the following types are supported: NAME DOSAGE FORM FREQUENCY DURATION GENERIC_NAME BRAND_NAME STRENGTH RATE TEST_NAME TEST_UNITS PROCEDURE_NAME TREATMENT_NAME DATE AGE CONTACT_POINT IDENTIFIER URL ADDRESS SYSTEM_ORGAN_SITE DIRECTION QUALITY QUANTITY |
Trait Object
Provides contextual information about the extracted entity.
Attribute | Type | Description |
---|---|---|
Name | String | Name or contextual description about the trait |
Score | Float | The level of confidence in the accuracy of the detection |
Example
{
"resource":"DocumentReference",
"id":"7e00155db23e480d67609b59",
"ocr":[
{
"name":"laboratory_result.pdf",
"contentType":"application/pdf",
"Success":true,
"pages":[
{
"number":1,
"lines":[
{
"confidence":55.2390022277832,
"text":"From",
"page":1
},
{
"confidence":93.70309448242188,
"text":"Mon 05 Nov 2012 12:10:19 PM PST",
"page":1
},
...
},
...
],
"entities":[
{
"resource":"Entity",
"id":0,
"category":"PROTECTED_HEALTH_INFORMATION",
"type":"DATE",
"text":"Mon 05 Nov 2012",
"score":0.99408334
},
{
"resource":"Entity",
"id":52,
"category":"MEDICAL_CONDITION",
"type":"DX_NAME",
"text":"venous thrombosis",
"score":0.9674222,
"traits":[
{
"resource":"Trait",
"name":"DIAGNOSIS",
"score":0.9529462
}
]
},
...
]
}
]
}
Limits
Health Gorilla NLP API has set of restrictions.
- Only PDF, JPEG, PNG documents can be processed
- The maximum document image (JPEG/PNG) size is 5 MB.
- The maximum PDF file size is 500 MB.
- The maximum number of pages in a PDF file is 3000
Updated about 3 years ago