Tutorial 1. Generate a fake data file

Aims of tutorial

This tutorial will take you through installing Headfake, creating a Headfake configuration file and using it to generate a data file.

Install Headfake

You will need at least Python 3.6+ installed in order to use Headfake. The easiest way to install it is through pip. In the terminal enter:

pip install headfake

Check it is installed correctly by running:

headfake --help

You should get some information on the usage of the tool.

Creating a template file

Headfake is powered through a plain text template file in YAML format. The easiest way to demonstrate its functionality is to try out an example.

Open a file tutorial.yml in your home directory, copy and paste the information below into it and save it:

Warning

Be careful to use EITHER spaces OR tabs to indent the YAML file. If you mix them up then Headfake is likely to throw errors. It also important to make the indentation consistent when doing this.

fieldset:
  class: headfake.Fieldset
  fields:
    - name: main_pat_id
      class: headfake.field.IdField
      prefix: S
      generator:
        class: headfake.field.IncrementIdGenerator
        length: 7
        min_value: 1000000

    - name: gender
      class: headfake.field.GenderField
      male_value: "M"
      female_value: "F"
      male_probability: 0.3

    - name: last_name
      class: headfake.field.LastNameField
      gender_field: gender

    - name: dob
      class: headfake.field.DateOfBirthField
      min: 0
      max: 105
      mean: 45
      sd: 13
      distribution: scipy.stats.norm
      date_format: "%Y-%m-%d"

Generating fake data

Now on the command-line run:

headfake /path/to/tutorial1.yml -o /path/to/tutorial1.txt -n100

Here you are specifying the template to use, and then an output file (/path/to/tutorial1.txt) and number of rows to generate (100)

If you now open the output file you will see 1000 rows of generated data.

You can also run without the -o option and the data will be output to the screen:

headfake /path/to/tutorial1.yml  -n100

This latter approach is often better when building templates in Headfake as it more immediate.

Analysis

This is a simple example of what Headfake can do. It is also possible to use YAML to provide your fields as a dictionary, or to embed it directly as Python data structures and/or code. There is additional information on different ways for initialising Headfake.

The template you used defines a set of fields containing autogenerated IDs, gender, a last name and a date of birth following a normal distribution.

You can see from this tutorial how straightforward it would be to add additional fields to the template and to generate more/less rows of data.

In the next tutorial we will take a closer look at the setup of the fields in the YAML file and modify them to create different data and formats.