Anonymizing data removes identifiers altogether
Anonymizing data removes all link to the individual, as well as links across datasets. However, as with all de-identification methods, it may still be possible to re-identify individuals through indirect identifiers and/or links to related datasets.
For example, the following shows a small section of a dataset containing identifiers:
Name | Address | Postal code | Year of birth | Gender | Occupation | Salary |
Sally Xi | 123 City Roadway, Vancouver, BC | V5V 1P2 | 1970 | Female | Manager | 90,000 |
Sam Cooper | 4576 Town Way, Smalltown, BC | V8A 1A5 | 1982 | Male | Machinist | 65,000 |
An anonymized version of that dataset might look like this:
Postal code | Year of birth | Gender | Occupation | Salary |
V5V 1P2 | 1970 | Female | Manager | 90,000 |
V8A 1A5 | 1982 | Male | Machinist | 65,000 |
In some cases, this might be enough to ensure that the data is not re-identified. However, often the anonymized data may be easily re-identified. For example, if there are not many machinists in the V8A 1A5 postal code, then there is a strong risk of re-identification for the data related to Sam Cooper.
Researchers are increasingly using algorithm-based tools to help anonymize their data and manage the risk of reidentifying their anonymized data. Examples of anonymization tools include:
Need help? Contact research.data@ubc.ca