Anonymize and De-identify

Making research data available for others to use is an important scholarly practice.

Some kinds of data are sensitive, and cannot be shared for legal or ethical reasons. This can include:

  • Personal identifiers
  • Sensitive ecological data
  • Sacred or protected cultural practices

De-identification means removing identifying data from a dataset. Once a dataset has been de-identified, the dataset can be shared without disclosing identifying information.

Removing identifiers is important to protect the confidentiality of research participants. But there is always a risk of re-identifying data, and changing technology introduces new ways to re-identify data. Managing that risk is an important part of sharing research data.

There are several ways of approaching de-identification, each of which has benefits and drawbacks:


