Pseudonymization replaces personal identifiers with codes to protect participant privacy while allowing data linkage.
What is pseudonymization:
Pseudonymization is a method of de-identification that replaces identifiers with pseudonyms or identifiers that are generated by the researcher. Using pseudonyms allows researchers to link de-identified data to the same individual across multiple datasets while retaining the confidentiality of the individual.
This means that, unlike anonymized data, pseudonymized data can be linked across datasets. Linking across datasets can make data more useful, but it can also increase the risk of re-identification. Researchers can also choose to use different pseudonyms for different datasets, which may remove some analytical value but also decrease the risk of re-identification.
Difference between anonymization and pseudonymization:
It is important to distinguish between anonymization and pseudonymization.
| Name | Anonymized | Pseudonymized |
| Sally Xi | ANON | P12L25 |
| Sam Cooper | ANON | P38Q27 |
| Sunil Gupta | ANON | P59M16 |
When the data is anonymized, the link between the individual and the data is removed altogether. Users of the dataset can no longer tell whether multiple records come from the same person. When the data is pseudonymized, it is clear whether the same person or different people responded. Unlike anonymization, where identifying links are permanently removed, pseudonymized data still carries residual risk and requires careful management. However, it is important to remember that both the anonymized and pseudonymized datasets still contain re-identification risks.
When to choose pseudonymization vs. anonymization
| Feature | Pseudonymization | Anonymization |
|---|---|---|
| Identity Risk | May be re-identifiable with a key | Not identifiable |
| Linkage Capability | Can link across datasets | No linkage across records |
| Research Utility | Retains analysis value | May reduce utility |
| Privacy Risk | Higher residual risk | Lower risk if well-executed |
You can use pseudonymization when longitudinal or linked analysis is essential, but always ensure that re-identification keys are securely managed or destroyed when no longer needed.
Need help? Contact research.data@ubc.ca