Importing CSV data into a FHIR server

Suppose you have a csv file of data, and you want to load that into a FHIR server. There are a number of ways that you could do that – depending on the nature of the csv file and the capabilities of the FHIR server that you want to update.

Let’s work through a simple scenario.

Here’s a part of a simple csv file (maybe an extract from a database holding patient problems):

david,jones,ert5543,12/03/1976,m,asthma,195917001
david,jones,ert5543,12/03/1976,m,diabetes,7231109
wendy,smith,uiy6654,17/03/1977,f,hypertension,38341003
wendy,smith,uiy6654,17/03/1977,f,angina,194818000

It has the following fields:

  1. First name
  2. Last name
  3. Patient Identifier
  4. Date of birth (day / month / year)
  5. Gender
  6. Problem name
  7. SNOMED code

So, this is going to be a combination of Patient and Condition (with a reference from Condition back to the Patient).

The actual design for getting the data into the FHIR server, is going to depend on what the capabilities of the server are. If this is a very simple server, which supports the simple CRUD interactions only, then here’s what we’d need to do:

Parse the file, processing each line in turn.

Query the server to see if it has a patient that matches the one in the file. This would most likely be a query on identifier (field 3) – something like:

GET [host]/Patient?identifier={}.

If there’s a single patient then we’re good to go – we’ll be able to create the Condition then POST it to the server (It’ll have a reference to the patient we just retrieved). If there are no matching patients, then we create a Patient resource from fields 1 -> 5 and save it to the server. The server will return a patient id (in the location header) that we can use for the Condition to Patient reference. If there’s more than one matching patient, then we’ll either have to select one of them, or flag this as an error.

Construct a Condition resource from fields 6 & 7. The Condition.subject will be a reference to the patient. We’re going to have to set the Condition.clinicalStatus as this is required unless the verificationStatus is ‘entered-in-error’ (This is what’s called a constraint – see ‘con-3’ in the spec.)

This will work, but each line will require at least 2 HTTP calls – possibly 3, and there’s no easy way that we can treat the import as if it were a transaction – completely succeeds or fails. Of course we could do something like this:

Parse the file, checking for duplicate patients on the server (based on the identifier). If there are any duplicates then flag an error and stop. Update the file with the patient id’s that do exist.

Parse the file again, POSTing all the patients that are new to the server. Update the file with the patient ids returned by the server.

Parse the file again, creating and POSTing the Conditions.

But there are ways that this could be way more efficient – using transactions and conditional update.

In FHIR a transaction is implemented as a Bundle resource that contains a list of resources, and the actions to perform on those them (actually you can do a GET as well). It’s POSTed to the server root, and all actions must succeed, or the transaction is rolled back. There’s a variant of this called a Batch that will allow some to succeed, and others to fail. In either case, the server returns a Bundle with the results of the operation.

Conditional Update is a neat feature that allows a resource to be created or updated, depending on the outcome of a search query. Take the following operation:

PUT [host]/Patient?identifier=abc123

If a server supports conditional update then it will first perform the search on identifier. If there are no matches then a resource is created. If there is a single resource then it is updated. If there is more than one match, then an error is raised. Sound familiar?

So – with a server that supports these 2 operations, we can create a bundle that we can POST to the server, and it will do everything for us (We’ll use simple JavaScript with minimal validation to do this – there are other ways of course, depending on the environment you’re using).

Here’s what the flow would look like in Node-RED:nrIt’s a very simple flow, with most of the work being done in the ‘make bundle’ function. The CSV node simply converts the csv file to an array of json objects, with each column being a property in the object. Each column in the file is given a name of ‘col{index}’ – col1, col2 and so on. The array is then set to the msg.payload property for the next node to process.

This is what the make bundle code look like:

js.png

Lines 1 – 3 set up the bundle that will be the output

Line 4,  patientHash – is a hash keyed on the patient identifier. When we need to create a new patient, we add this to the hash as well as the bundle, so it only gets included in the bundle once.

Lines 9 – 22 are where a new Patient is created if this is the first time we’ve come across it in the input csv file array. It’s pretty straight forward – the most complicated bit being how to set the id of the Patient. In a transaction, if you don’t have a resource id (and we don’t here) then the solution is to set the Bundle.entry.fullUrl to a UUID. Then, other resources (like our condition) can use that as the resource reference. The server will replace the UUID with its own ID, and adjust all the other resource references accordingly. So line 10 creates the UUID (using a small function) and line 17 sets the fullUrl property.

Lines 23 -34 create a Condition resource (setting the subject property to the fullUrl) and adds it to the bundle.

There are a couple of helper functions makeUUid() and makeGender() that I haven’t listed here.

And that’s about it. In this flow we simply return the transaction bundle, but there’s nothing stopping us from calling the server directly. Note that this code is highly dependent on the structure of the input CSV – for example we assumed that:

  • There were a specific set of fields in the CSV file, in a given order
  • That there was a unique patient identifier available (we should really have put a system in there actually)
  • Specific format for Date of Birth
  • That the condition code was SNOMED

I’m not sure how that can be avoided. Certainly, we could create a configurable mapper function but that does seem like quite a lot of work (maybe justified if we’re dealing with a lot of CSV files in differing formats). We could consider the use of a FHIR library – like FHIR.js rather than plain javascript, which might be advisable for more complex use cases. Certainly, the use of the $validate operation is invaluable when debugging.

It’s also worth mentioning the bulk data project – with a goal to standardize the bulk export (and import) of data. I don’t believe that the intention is to support CSV – but does raise the possibility of transforming the CSV into an ndjson file and having the server import that. Worth keeping an eye on…

One thing we haven’t considered, is how to avoid creating the Condition resources multiple times if we process the transaction multiple times. Maybe a topic for a follow up post.

We might also need to consider inactive or merged patients – maybe including an ‘active=true’ in our search queries would be a good idea…

 

About David Hay
I'm an independent contractor working with companies like Orion Health and Rhapsody, HL7 Fellow, Chair emeritus of HL7 New Zealand and a co-chair of the FHIR Management Group. I have a keen interest in health IT, especially health interoperability with HL7 and the new FHIR standard.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: