Tutorial 2. Changing Headfake template files
Aims of tutorial
This tutorial will take a closer look at the YAML template including how changing the YAML file modifies the configuration, introduces some more advanced fields and look at how transformers can help post-process data.
Adjust the proportion of male individuals
Open tutorial1.yml
that you created in Tutorial 1 in a text editor.
You will see that it consists of a hierarchy of Python classes, with the fieldset as the base element, which contains one or more fields. The properties of the fields are the arguments passed when the class is initialised, so by changing these values in the YAML file we change how the fields behave.
In the 'gender' field change the default male_probability
from 0.3 to 0.5 and save the file as tutorial2.yml
:
e.g.
- name: gender
class: headfake.field.GenderField
male_value: "M"
female_value: "F"
male_probability: 0.5
Then re-run Headfake:
headfake /path/to/tutorial2.yml -o /path/to/tutorial2.txt -n100
If you open tutorial2.txt
and compare with tutorial1.txt
you should see that the balance in the former is about a third, whereas the latter is half.
Add a first name field
You can also add additional fields by adding them into the 'fields' section of the YAML file.
You can try this by copying the following and pasting it into tutorial2.yml
and saving:
- name: first_name
class: headfake.field.FirstNameField
gender_field: gender
Then re-run Headfake:
headfake /path/to/tutorial2.yml
The output should show that the new first_name field is generated according to the specified gender_field.
Generating patients who are deceased
Next you will add a field which can be very useful from a health perspective as it is a flag for deceased status.
Open tutorial2.yml
and add a new field to the fieldset config as below:
e.g.
- name: deceased
class: headfake.field.DeceasedField
dob_field: dob
deceased_date_field: date_of_death
age_field: age
date_format: "%Y-%m-%d"
# 1 in X risks taken from here. Used Male values. http://www.bandolier.org.uk/booth/Risk/dyingage.html
risk_of_death:
0-1: 177
1-4: 4386
5-14: 8333
15-24: 1908
25-34: 1215
35-44: 663
45-54: 279
55-64: 112
65-74: 42
75-84: 15
85-120: 6
Then re-run Headfake:
headfake /path/to/tutorial2.yml
The output will show some individuals are now flagged as deceased, along with additional fields containing the date of death, and the age at which they died (these are optional and can be omitted from the field). The likelihood of death is calculated according to the risk of death supplied for the particular age ranges. Internally, patient aging is simulated and the likelihood of death determined accordingly.
Changing generated data using a transformer
Transformers are special classes which transform the data after it is generated. Here we are going to use two different ones to do two things: i) make last name uppercase and ii) create blank last name entries in our data.
Change the last_name in tutorial2.yml
to add both transformers:
- name: last_name
class: headfake.field.LastNameField
gender_field: gender
transformers:
- class: headfake.transformer.IntermittentBlanks
blank_probability: 0.2
blank_value: NULL
- class: headfake.transformer.UpperCase
And re-run the generation
headfake /path/to/tutorial2.yml -o /path/to/tutorial2d.txt -n100
As expected, ~20% of the values will now be blank and those which are not will now be uppercase. You can use any value in place of the NULL value (e.g. NA)
Analysis
In this tutorial we were able to adjust the field parameters in the YAML file to change the data generated, we also added an in-built dependent field to generate gender appropriate first names and risk-based deceased status.
We added to this by showing how (transformers)[../../api/transformer] can be to used to pre- and post-process the generated field values.
In the final tutorial we will take a look at how conditional fields can be used to create a chain of fields dependent on each other.