Data anonymization

Anonymizing data removes identifiers altogether.

Anonymizing data removes all link to the individual, as well as links across datasets. However, as with all de-identification methods, it may still be possible to re-identify individuals through indirect identifiers and/or links to related datasets.

For example, the following shows a small section of a dataset containing identifiers:

Name Address Postal code Year of birth Gender Occupation Salary
Sally Xi 123 City Roadway, Vancouver, BC V5V 1P2 1970 Female Manager 90,000
Sam Cooper 4576 Town Way, Smalltown, BC V8A 1A5 1982 Male Machinist 65,000

 

 

 

 

 

 

An anonymized version of that dataset might look like this:

Postal code Year of birth Gender Occupation Salary
V5V 1P2 1970 Female Manager 90,000
V8A 1A5 1982 Male Machinist 65,000

 

In some cases, this might be enough to ensure that the data is not re-identified. However, often the anonymized data may be easily re-identified. For example, if there are not many machinists in the V8A 1A5 postal code, then there is a strong risk of re-identification for the data related to Sam Cooper.

Researchers are increasingly using algorithm-based tools to help anonymize their data and manage the risk of reidentifying their anonymized data. Examples of anonymization tools include:

Need help? Contact research.data@ubc.ca