Skip to Main Content

Research Data Management

Key Ethical Considerations

If you are conducting research with human participants, you will need to take extra steps when organizing your research data to protect confidentiality. When making a Data Management Plan (DMP) consider the following if you are working with sensitive data:

  • If your project includes sensitive data, how will you ensure that it is securely managed and accessible only to approved members of the project?
  • If applicable, what strategies will you undertake to address secondary uses of sensitive data?
  • How will you manage legal, ethical, and intellectual property issues?
  • Have participants given informed consent?

For more information see the links below.

Ethical Management of Data

There are several components to consider when ethically managing sensitive data.

Protecting Confidentiality

There are several types of variables that can endanger the the confidentiality of research subjects.

1. Direct Identifiers: Variables that directly identify participants or place participants at immediate risk of being re-identified. These include:

  • Full or partial names or initials
  • Dates linked to individuals (e.g. birthdays, hospitalization dates, graduation dates)
  • Full or partial postal codes (first three digits may be acceptable)
  • Telephone or fax numbers
  • Email addresses
  • Web usernames or social media identifiers
  • IP addresses
  • License plates or other vehicle identifiers
  • Identifying numbers (e.g. SIN, Student or Staff ID Numbers, pet ID numbers)
  • Audiovisual materials of individuals, their locations, or medical images
  • Audio recording of individuals
  • Biometric data
  • Any other unique or recognizable characteristics (e.g. Job Titles, prestigious awards)

Researchers should also ensure any digital files should be checked for embedded information that may be identifying.

2. Indirect Identifiers: Variables that do not directly identify a participant, but when combined with other variables, may reveal someone's identity. These include:

  • Age
  • Gender Identity
  • Income
  • Occupation or Industry
  • Geographic Variables
  • Ethnic or Immigration Variables
  • Membership in organizations
  • Use of Specific Services

3. Hidden Identifiers: When testing for risk, consider the size of the number of participants and number of variables as with large datasets, confidentiality can be breached using machine learning approaches. For example, a variable showing distance from the nearest major city could be combined with the information that a survey respondent lived on a First Nations reserve to pinpoint the locations of some respondents. This would be difficult to do by hand, but would be easy with a computer

Consent Language and TCPS 2

In Canada, ethical guidelines for human research participants are outlined in the TCPS 2 policy (see link the Key Ethical Considerations section of this page). Consent language is highly scrutinized by Research Ethics Boards to ensure participants' privacy and confidentiality are preserved and that participants are informed about the scope and manner of their participation in the research. Consent forms should contain the following information:

  • Participation is voluntary
  • Participants may withdraw from the research even after the study is underway
  • A concise description of the study as well as the potential risks and benefits to participants, all in plain language
  • Whether the data will be available to other researchers or to the public, under what conditions, in which specific repository, and in what format or including what information (e.g. whether it may contain direct or quasi-identifiers)

Other Categories of Sensitive Data

Human participant data are not the only types of sensitive data, even if they are the most common form of sensitive data. When researchers collaborate with industry partners to develop technologies and inventions, data may be considered "trade secrets" and must be safeguarded according to contractual obligations. Here are some other categories of sensitive data:

  • Intellectual Property
  • Dual-use Data
  • Data subject to import/export control
  • Third-party Licensed Data
  • Locations of Endangered Species

This material has been adapted/revised from Research Data Management in the Canadian Context: A Guide for Practitioners and Learners created by Kristi Thompson; Elizabeth Hill; Emily Carlisle-Johnston; Danielle Dennie; and Émilie Fortin published with Pressbooks. The original is freely available under the terms of the CC BY-NC 4.0 license at https://ecampusontario.pressbooks.pub/canadardm.

 

Anonymization of Research Data

De-Identifying Qualitative Data

As qualitative data are often stored and analyzed in unstructured formats (e.g. interviews, focus groups, transcripts, or field notes) it can be difficult to anonymize, but it's still possible with some software programs and digital tools.

Qualitative data must be carefully reviewed as a participant may inadvertently identify themself when responding to interview questions or discussing lived experiences and predetermined categories for variables (e.g. age, religion, gender) are less common, so you may not be able to predict how much identifying information is in a dataset prior to data collection and analysis.

As contextual information is often vital in qualitative studies, researchers will often assign categorical codes to replace identifying information. For example, the Finnish Social Science Data Archive recommends using square brackets to denote cases where de-identification in a transcript has occurred, to avoid commonly used punctuation. 

When redacting qualitative information or replacing detailed information with categories, document these decisions and the category definitions in a codebook that accompanies the dataset.

Interview transcripts should be anonymized even if the researcher doesn't intend to publish the data. This reduces the risk of harm in the case of a breach. Anonymization should be irreversible, and when anonymizing, researchers should consider both potential harm to participants if identifiable information were made public as well as the researcher's ability to analyze the data at the necessary level of nuance . If the purpose of a research project is to analyze a sensitive topic, it might not make sense to de-identify the data, and the data may require additional safeguards.

This material has been adapted/revised from Research Data Management in the Canadian Context: A Guide for Practitioners and Learners created by Kristi Thompson; Elizabeth Hill; Emily Carlisle-Johnston; Danielle Dennie; and Émilie Fortin published with Pressbooks. The original is freely available under the terms of the CC BY-NC 4.0 license at https://ecampusontario.pressbooks.pub/canadardm.