Pseudonymization

Pseudonymization replaces personal identifiers with codes to protect participant privacy while allowing data linkage.

What is pseudonymization:

Pseudonymization is a method of de-identification that replaces identifiers with pseudonyms or identifiers that are generated by the researcher. Using pseudonyms allows researchers to link de-identified data to the same individual across multiple datasets while retaining the confidentiality of the individual.

This means that, unlike anonymized data, pseudonymized data can be linked across datasets. Linking across datasets can make data more useful, but it can also increase the risk of re-identification. Researchers can also choose to use different pseudonyms for different datasets, which may remove some analytical value but also decrease the risk of re-identification.

Difference between anonymization and pseudonymization:

It is important to distinguish between anonymization and pseudonymization.

NameAnonymizedPseudonymized
Sally XiANONP12L25
Sam CooperANONP38Q27
Sunil GuptaANONP59M16

When the data is anonymized, the link between the individual and the data is removed altogether. Users of the dataset can no longer tell whether multiple records come from the same person. When the data is pseudonymized, it is clear whether the same person or different people responded. Unlike anonymization, where identifying links are permanently removed, pseudonymized data still carries residual risk and requires careful management. However, it is important to remember that both the anonymized and pseudonymized datasets still contain re-identification risks.

 

When to choose pseudonymization vs. anonymization

Feature Pseudonymization Anonymization
Identity Risk May be re-identifiable with a key Not identifiable
Linkage Capability Can link across datasets No linkage across records
Research Utility Retains analysis value May reduce utility
Privacy Risk Higher residual risk Lower risk if well-executed

You can use pseudonymization when longitudinal or linked analysis is essential, but always ensure that re-identification keys are securely managed or destroyed when no longer needed.

 

Need help? Contact research.data@ubc.ca