Combine field values
Operation fields enable you to generate data by combining values from two fields using a specified function. For example, you could use an operation field to add, subtract or multiply generated values together. When operation fields are used it is best to use field lists rather than field maps as this ensures the correct order of field generation.
In general, functions in Python's operator package are used, but you can actually specify any function (including your own) which accepts two input values and returns a single output value.
The examples below shows how this can be used to generate a hospital discharge date based on the admission date and a generated length of stay.
Example 1 - 'length_of_stay' setup as a separate field and included in the data set:
fieldset:
class: headfake.Fieldset
fields:
- name: admission_date
class: headfake.field.DateField
min: "2020-01-01"
max: "2020-12-31"
distribution: scipy.stats.norm
mean: "2020-06-01"
sd: 30
min_format: "%Y-%m-%d"
max_format: "%Y-%m-%d"
mean_format: "%Y-%m-%d"
- name: length_of_stay
class: headfake.field.NumberField
min: 18
max: 80
distribution: scipy.stats.norm
mean: 50
sd: 15
dp: 0
transformers:
- class: headfake.transformer.ConvertToDaysDelta
final_transformers:
- class: headfake.transformer.GetProperty
prop_name: days
- name: discharge_date
class: headfake.field.OperationField
operator: operator.add
first_value:
class: headfake.field.LookupField
field: admission_date
second_value:
class: headfake.field.LookupField
field: length_of_stay
final_transformers:
- class: headfake.transformer.FormatDateTime
format: "%Y-%m-%d"
Example 2 - length of stay embedded within the 'discharge_date' field so it is not available separately
fieldset:
class: headfake.Fieldset
fields:
- name: admission_date
class: headfake.field.DateField
min: "2020-01-01"
max: "2020-12-31"
distribution: scipy.stats.norm
mean: "2020-06-01"
sd: 30
min_format: "%Y-%m-%d"
max_format: "%Y-%m-%d"
mean_format: "%Y-%m-%d"
- name: discharge_date
class: headfake.field.OperationField
operator: operator.add
first_value:
class: headfake.field.LookupField
field: admission_date
second_value:
class: headfake.field.NumberField
min: 18
max: 80
distribution: scipy.stats.norm
mean: 50
sd: 15
dp: 0
transformers:
- class: headfake.transformer.ConvertToDaysDelta
final_transformers:
- class: headfake.transformer.FormatDateTime
format: "%Y-%m-%d"
In both examples, the 'admission_date' field is a straightforward date field - it generates dates which follow a normal distribution. The 'discharge_date' is more complex. It is an OperationField which receives two values. In Example 1 the value is a look up in the separate 'length_of_stay' field while in Example 2 the value is generated within the OperationField. In both cases, the length of stay is a random number which follows a normal distribution.
The same effect (ie. not including the length of stay field in the output could also have been achieved in Example 1 by adding a hidden = true
property to the 'length_of_stay' field.
The length of stay is then converted into a Python timedelta object using a transformer and through the OperationField is added to the 'admission_date'.
The critical thing within the operation is that the value types (e.g. objects) need to be compatible with the operation function - this is why it was necessary to convert the numeric length of stay into a timedelta object as this can be added to a date object using operator.add
(operator.add(value1, value2)
is equivalent to value1 + value2
).
Headfake comes with a number of conversion transformers which will change values into dates, numbers or strings and it is straightforward to create custom transformers to do this.