In many data sets there will be correlated variables. This means that the value of one variable can be predicted from another variable. Some examples include:
- Date of birth of a baby and date of discharge from a
hospital.
- Date of death and date of an autopsy.
- Weight at birth and weight of baby at discharge from a
hospital.
- Age and date of graduation.
In the context of de-identification correlated variables must be dealt with explicitly. For example, if the correlated variables are date of birth and date of discharge from hospital, then if we de-identify one to, say, a month and year and leave the other one as the full date, then the de-identification was meaningless. The full date of birth can be predicted from the full date of discharge even if the date of birth is generalized to month/year or just year of birth.
In PARAT it is possible to specify such relationships and the tool will automatically ensure that the generalizations are the same. The video below illustrates how to do that.
One thing to note that in PARAT only variables of the same type can be correlated and they must also have the same depth in their generalization hierarchy.
The
author(s) retain all copyright to this knowledgebase article. Please
include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.