Anonymize and De-identify

Protecting sensitive data is not only a legal obligation—it’s a commitment to ethical, trustworthy, and responsible research.

Sharing research data is an important scholarly practice. Still, when working with sensitive or personal information, appropriate de-identification techniques are essential to protect participant confidentiality and comply with ethical and legal requirements.

Sensitive data refers to information that could identify individuals or groups, reveal private details, or pose risks if disclosed. This includes, but is not limited to:

  • Personally identifiable information (PII)
    (e.g., names, student numbers, contact details, health records)

  • Personal health information (PHI)
    (as defined under BC’s Freedom of Information and Protection of Privacy Act (FIPPA))

  • Indigenous knowledge or community-specific data

  • Confidential business or institutional data

  • Geospatial or ecological data that may be restricted for protection reasons

UBC research information classification

To help you understand whether your data is public, internal, confidential, or restricted, the UBC Advanced Research Computing (ARC) team has created a Research Information Classification framework that provides guidance to determine the appropriate level for data security risk.

Anonymization and/or de-identification techniques

Before sharing or preserving sensitive data, researchers are strongly encouraged to apply anonymization or de-identification techniques:

  • De-identification involves removing or masking direct identifiers (e.g., names, contact info) and reducing indirect identifiers that could be used in combination to re-identify individuals (e.g., postal codes, birthdates).

  • Anonymization goes further by ensuring that re-identification is no longer reasonably possible, even when combining datasets.

While true anonymization is difficult to guarantee, careful de-identification significantly reduces risk and is often sufficient for data sharing and reuse.

Removing identifiers is important to protect the confidentiality of research participants. But there is always a risk of re-identifying data, and changing technology introduces new ways to re-identify data. Managing that risk is an important part of sharing research data.

There are several ways of approaching de-identification, each of which has benefits and drawbacks:


Need help? Contact 
research.data@ubc.ca