Browse Ask a Question
Tools Add
Rss Categories

Specifying and de-identifying correlated variables in PARAT

Author: Khaled El Emam Views: 290 Created: 05-03-2010 19:00 Last Updated: 06-03-2010 14:00

In many data sets there will be correlated variables. This means that the value of one variable can be predicted from another variable. Some examples include:

  • Date of birth of a baby and date of discharge from a hospital.
  • Date of death and date of an autopsy.
  • Weight at birth and weight of baby at discharge from a hospital.
  • Age and date of graduation.


In the context of de-identification correlated variables must be dealt with explicitly. For example, if the correlated variables are date of birth and date of discharge from hospital, then if we de-identify one to, say, a month and year and leave the other one as the full date, then the de-identification was meaningless. The full date of birth can be predicted from the full date of discharge even if the date of birth is generalized to month/year or just year of birth.

In PARAT it is possible to specify such relationships and the tool will automatically ensure that the generalizations are the same. The video below illustrates how to do that.

One thing to note that in PARAT only variables of the same type can be correlated and they must also have  the same depth in their generalization hierarchy.





The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


Rss Comments
  • There are no comments for this article.
Info Add Comment
Nickname: Email (will not be shown): Subject: Question:
Info Ask a Question