Pseudonymization | Research Data Management

Pseudonymization replaces personal identifiers with codes to protect participant privacy while allowing data linkage.

What is pseudonymization:

Pseudonymization is a method of de-identification that replaces identifiers with pseudonyms or identifiers that are generated by the researcher. Using pseudonyms allows researchers to link de-identified data to the same individual across multiple datasets while retaining the confidentiality of the individual.

This means that, unlike anonymized data, pseudonymized data can be linked across datasets. Linking across datasets can make data more useful, but it can also increase the risk of re-identification. Researchers can also choose to use different pseudonyms for different datasets, which may remove some analytical value but also decrease the risk of re-identification.

Difference between anonymization and pseudonymization:

It is important to distinguish between anonymization and pseudonymization.

Name	Anonymized	Pseudonymized
Sally Xi	ANON	P12L25
Sam Cooper	ANON	P38Q27
Sunil Gupta	ANON	P59M16

When the data is anonymized, the link between the individual and the data is removed altogether. Users of the dataset can no longer tell whether multiple records come from the same person. When the data is pseudonymized, it is clear whether the same person or different people responded. Unlike anonymization, where identifying links are permanently removed, pseudonymized data still carries residual risk and requires careful management. However, it is important to remember that both the anonymized and pseudonymized datasets still contain re-identification risks.

When to choose pseudonymization vs. anonymization

Feature	Pseudonymization	Anonymization
Identity Risk	May be re-identifiable with a key	Not identifiable
Linkage Capability	Can link across datasets	No linkage across records
Research Utility	Retains analysis value	May reduce utility
Privacy Risk	Higher residual risk	Lower risk if well-executed

You can use pseudonymization when longitudinal or linked analysis is essential, but always ensure that re-identification keys are securely managed or destroyed when no longer needed.

Need help? Contact research.data@ubc.ca