Anonymization | Research Data Management

Anonymizing data removes identifiers altogether

Anonymizing data removes all link to the individual, as well as links across datasets. However, as with all de-identification methods, it may still be possible to re-identify individuals through indirect identifiers and/or links to related datasets.

For example, the following shows a small section of a dataset containing identifiers:

Name	Address	Postal code	Year of birth	Gender	Occupation	Salary
Sally Xi	123 City Roadway, Vancouver, BC	V5V 1P2	1970	Female	Manager	90,000
Sam Cooper	4576 Town Way, Smalltown, BC	V8A 1A5	1982	Male	Machinist	65,000

An anonymized version of that dataset might look like this:

Postal code	Year of birth	Gender	Occupation	Salary
V5V 1P2	1970	Female	Manager	90,000
V8A 1A5	1982	Male	Machinist	65,000

In some cases, this might be enough to ensure that the data is not re-identified. However, often the anonymized data may be easily re-identified. For example, if there are not many machinists in the V8A 1A5 postal code, then there is a strong risk of re-identification for the data related to Sam Cooper.

Researchers are increasingly using algorithm-based tools to help anonymize their data and manage the risk of reidentifying their anonymized data. Examples of anonymization tools include:

Need help? Contact research.data@ubc.ca