ETL Process: Patient Timeline Table
FHIR Resource to Patient Timeline and Patient Info (SQL-on-FHIR)
Insert something here about sql on fhir flow
Patient Timeline Base Table
This documentation aims to provide the reader with context and information about the methods used to transform the patient timeline data to the tables used in the dashboard. For this transformation Data Build Tool (DBT) is used.
[Github Code] (https://github.com/PharmAccess/open-mnch-dbt-analytics)
DBT workflow input
The input for the DBT workflow is a folder named “base_reports”. This folder contains two subfolders called “patient timeline” and “patient info”. These folders contain parquet files with patient timeline and patient information extracted from FHIR resources.
Patient timeline → contains information about patient encounters.
Patient info → contains patient information extracted from FHIR resources.
These Parquet files are used as sources in this DBT project. Additional descriptions for sources can be found in the DBT docs.
Filter, enrich and combine patient data
The next step is to apply the necessary filters to the patient timeline, enrich the patient timeline and patient info data and lastly combine the two datasets. This results in the “patient_timeline_combined” dataset. The code for the models (tables) can be found at models/value_points/staging. In the following sections, we will discuss all models displayed in the image below.
Filtering
The timeline data can include multiple pregnancy journeys per patient. For now, the timeline is filtered to include only the most recent pregnancy journey for every patient. Consequently, the timeline is also filtered to include only the events at the enrollment clinic.
DESCRIBE HOW ENROLLMENT CLINIC IS DETERMINED
So for example: if a mother has had multiple pregnancy journeys and visited multiple clinics within those journeys, only the events from the most recent pregnancy journey performed at the enrollment clinic in that journey are retained in the data.
The patient info data is filtered to only include patients with ages between 10 and 50. Any patients in the data with an age outside that range are assumed to be registered incorrectly and thus removed.
Enriching
To the patient info data two columns are added containing more info about the type of mother (teenage or high risk).
Teenage: the mother is younger than 20 years old at enrollment.
High risk: the mother is younger than 20 years old at enrollment or older than 40 years old. Or when she has one or more conditions indicated by a list of ICD10 codes.
`list_icd10_codes = ('O99.0', 'O98.7', 'B54', 'O95', 'P96.8', 'O64', 'O75', 'O70', 'O72', 'O60', 'P95', 'O98.0', 'O15.0', 'O13', 'O04', 'O24', 'O03', 'O14', 'P21', 'P36', 'O85')
Four new columns are added to the patient_timeline_filtered data: a column containing the visit number, a column containing the visit type number, a column containing the gestation week wherein the event happened based on the expected delivery date (edd) and a column containing the gestation week wherein the event happened based on the actual delivery date (if present in the data).
Combining
Lastly, patient_timeline_enriched and patient_info_enriched are combined to create the definite patient timeline data patient_timeline_combined.