Chapter 5 Harmonisation with synthetic data
In this section we describe how to harmonise synthetic data on the client side. This assumes that you have used one of the previous methods to generate your synthetic data set.
Recall that we are aiming to use synthetic data on the client side to design harmonisation algorithms, and then implement these on Opal on the server side using the real data. This removes the need for the user to have full access to the data. Harmonisation algorithms can be implemented in Opal using MagmaScript (JavaScript with some additional functions) without having full access to the data. The idea is that writing JavaScript on the client side, having full access to the synthetic data, is easier than writing the code on the server side with only access to summaries.
The steps for harmonisation following generation of synthetic data are:
- User requests synthetic copy of real data
- Synthetic data generated & available on client side
- Synthetic data loaded into JavaScript (JS). User writes harmonisation code (MagmaScript) on client side.
- When complete, MagmaScript code implemented on server side to run on real data to generate new, harmonised data set
5.1 Getting set up
First we start a JavaScript session and load the additional MagmaScript functionality that is found in Opal. We also load our synthetic data into the JavaScript session.
## Using V8 engine 9.6.180.12
## [1] "true"
We then go into the JavaScript v8 console.
5.2 Experiment with a single row
A MagmaScript function grabs the first row of data. We can then write some JavaScript to operate on that single row and show the result:
5.3 Test on whole dataset
Now we test our code against the whole dataset. This is done by:
- Defining the script as a string assigned to a variable
- Execute this script in a loop through each row of data
- Each time capture the output
myScript = `
if ($('y3age').value() > 25 ){
out = 1
} else {
out = 0
}
`
var my_out = [];
var out = NULL;
for (j = 0; j < synth_data.length; j++){
my_out.push(MagmaScript.evaluator(myScript, synth_data[j]))
}
exit
And pull the results into R for inspection:
5.4 Run the code on the real data
If we are happy with the code, we can paste it directly into the Opal script interface so that it can be executed on the real data:
This will generate a harmonised variable in the view on Opal which can be used in analyses. The summary statistics of the harmonised data can be checked to make sure the harmonisation is working correctly.
A similar process could be conducted in a platform like MOLGENIS.