Chapter 1 Preliminaries

1.1 Synopsis

This is a book written to provide a tutorial for the dsSynthetic and dsSyntheticClient DataSHIELD R packages. dsSynthetic helps in situations where you can’t have full access to the data (which is the default mode of working when using DataSHIELD) and it is hard to write, debug and check the logic of code that uses that data. The package is designed to give you a realistic, non-disclosive copy of the real data against which you can perfect your code. When things are working correctly you can then execute your code against the real data held on a remote server.

1.2 Structure

In the next chapter we introduce the reasons for wanting to generate synthetic data via DataSHIELD and how it can be used. The second chapter explains how to generate a synthetic data set that can be used on the client side. This is followed by an analysis code use case which details the steps taken. Similarly, an example of how synthetic data can help harmonisation of data is described. The final chapter shows an alternative method for generating synthetic data via DataSHIELD.