If you are conducting research with human participants, you will need to take extra steps when organizing your research data to protect confidentiality. When making a Data Management Plan (DMP) consider the following if you are working with sensitive data:
For more information see the links below.
There are several components to consider when ethically managing sensitive data.
Protecting Confidentiality
There are several types of variables that can endanger the the confidentiality of research subjects.
1. Direct Identifiers: Variables that directly identify participants or place participants at immediate risk of being re-identified. These include:
Researchers should also ensure any digital files should be checked for embedded information that may be identifying.
2. Indirect Identifiers: Variables that do not directly identify a participant, but when combined with other variables, may reveal someone's identity. These include:
3. Hidden Identifiers: When testing for risk, consider the size of the number of participants and number of variables as with large datasets, confidentiality can be breached using machine learning approaches. For example, a variable showing distance from the nearest major city could be combined with the information that a survey respondent lived on a First Nations reserve to pinpoint the locations of some respondents. This would be difficult to do by hand, but would be easy with a computer
Consent Language and TCPS 2
In Canada, ethical guidelines for human research participants are outlined in the TCPS 2 policy (see link the Key Ethical Considerations section of this page). Consent language is highly scrutinized by Research Ethics Boards to ensure participants' privacy and confidentiality are preserved and that participants are informed about the scope and manner of their participation in the research. Consent forms should contain the following information:
Other Categories of Sensitive Data
Human participant data are not the only types of sensitive data, even if they are the most common form of sensitive data. When researchers collaborate with industry partners to develop technologies and inventions, data may be considered "trade secrets" and must be safeguarded according to contractual obligations. Here are some other categories of sensitive data:
This material has been adapted/revised from Research Data Management in the Canadian Context: A Guide for Practitioners and Learners created by Kristi Thompson; Elizabeth Hill; Emily Carlisle-Johnston; Danielle Dennie; and Émilie Fortin published with Pressbooks. The original is freely available under the terms of the CC BY-NC 4.0 license at https://ecampusontario.pressbooks.pub/canadardm.
De-Identifying Qualitative Data
As qualitative data are often stored and analyzed in unstructured formats (e.g. interviews, focus groups, transcripts, or field notes) it can be difficult to anonymize, but it's still possible with some software programs and digital tools.
Qualitative data must be carefully reviewed as a participant may inadvertently identify themself when responding to interview questions or discussing lived experiences and predetermined categories for variables (e.g. age, religion, gender) are less common, so you may not be able to predict how much identifying information is in a dataset prior to data collection and analysis.
As contextual information is often vital in qualitative studies, researchers will often assign categorical codes to replace identifying information. For example, the Finnish Social Science Data Archive recommends using square brackets to denote cases where de-identification in a transcript has occurred, to avoid commonly used punctuation.
When redacting qualitative information or replacing detailed information with categories, document these decisions and the category definitions in a codebook that accompanies the dataset.
Interview transcripts should be anonymized even if the researcher doesn't intend to publish the data. This reduces the risk of harm in the case of a breach. Anonymization should be irreversible, and when anonymizing, researchers should consider both potential harm to participants if identifiable information were made public as well as the researcher's ability to analyze the data at the necessary level of nuance . If the purpose of a research project is to analyze a sensitive topic, it might not make sense to de-identify the data, and the data may require additional safeguards.
This material has been adapted/revised from Research Data Management in the Canadian Context: A Guide for Practitioners and Learners created by Kristi Thompson; Elizabeth Hill; Emily Carlisle-Johnston; Danielle Dennie; and Émilie Fortin published with Pressbooks. The original is freely available under the terms of the CC BY-NC 4.0 license at https://ecampusontario.pressbooks.pub/canadardm.