Pseudonymization is a method of de-identification that replaces identifiers with pseudonyms or identifiers that are generated by the researcher. Using pseudonyms allows researchers to link de-identified data to the same individual across multiple datasets while retaining confidentiality of the individual.
This means that, unlike anonymized data, pseudonymized data can be linked across datasets. Linking across datasets can make data more useful, but it can also increase the risk of re-identification. Researchers can also choose to use different pseudonyms for different datasets, which may remove some analytical value but also decrease the risk of re-identification.
It is important to distinguish between anonymization and pseudonymization.
Name | Anonymized | Pseudonymized |
Sally Xi | ANON | P12L25 |
Sam Cooper | ANON | P38Q27 |
Sunil Gupta | ANON | P59M16 |
Sam Cooper | ANON | P38Q27 |
Sally Xi | ANON | P12L25 |
When the data is anonymized, the link between the individual and the data is removed altogether. Users of the dataset can no longer tell whether multiple records come from the same person. When the data is pseudonymized, it is clear whether the same person or different people responded. However, it is important to remember that both the anonymized and pseudonymized datasets still contain re-identification risks.
Need help? Contact research.data@ubc.ca