Qualitative Data
In qualitative data, we focus on individual experience rather than average effects, and so disclosure is far more likely in this form of data. The individuals are highly identifiable. These data can come in the form of interviews, psychiatric case studies, videos and images, court records, etc.
See more
Qualitative data is much more likely to contain confidential information than quantitative data. This is because, by its nature, qualitative data enquiry seeks to explore the individuality of experience; in contrast, quantitative data is more interested in average effects, and any one individual is less important than the analysis as a whole.
Because of this, it is worth considering how any qualitative data you collect sits in a wider data governance framework. For example, it could be that a quote from a senior politician, and the knowledge of who made that statement, is important for your research. In such a case, rather than trying to anonymise the quote, getting permission for the politician to be identified as the author of the quote may be the better research outcome.
Current solutions to this issue include the use of natural language processing tools and an automation of data analysis. These currently only work for text-based outputs, it can only apply a broad approach in redacting names and location. It cannot yet address the complexity of contextual identifiers.
There are currently almost no guidelines for how to protect confidentiality in qualitative outputs. Hence this page is largely blank. We are aware of a number of projects investigating this, and we will be very pleased to update the information here as new guidance appears – please get in touch via the contact form.
Census Bureau Guidelines
The US Census Bureau suggested the following guidelines for protecting the identity of participants in qualitative research:
- Remove names and locations
- Remove dates
- Remove proper nouns
- Descriptive demographics can be provided if it is not combined with other information that could uniquely identify participants
- Remove all information that could be linked with publicly available data bases
- Remove photos
- Remove videos/voice recordings
- Direct quotes are acceptable for release if they do not contain unique information
Note that these could be applied to the data itself, so lower the risk should the data be mislaid or inappropriately shared. However, it is better to have a secure storage facility than trying to remove all potential identifying information from the data.
When guidance isn't available
When there is no guidance, we must use our own ethical compass. There is a set of principles used which outlines the following research criteria (DHHS, 1979):
- Respecting individuals involves safeguarding their autonomy and treating them with respect
- Beneficence dictates a commitment to minimising harm and minimising benefits for both the research and participant.
- Justice requires fair and equitable procedures, ensuring the reasonable distribution of costs and benefits among participants.
Other complex types of output
There are varied practices surrounding medical data, from depositing raw data in the public domain to secure access controls. Tools are being developed but the uptake of these is insufficient. There are frequently unforeseen consequences encountered, and maintaining disclosure control in medicine will be a benefit to society.
There are many types of data that hold concerns, and potential solutions:
FMRI
A huge amount of data including the brain anatomy and structure, the concern for disclosure arises from rebuilding a face based on the structure.
To help the disclosure control we could remove the facial attributes, however this could lead to a loss of data, and in FMRI scans there are disclosure concerns through all parts of the data.
Genomic data
A microarray data of specimen that is often shared in the public domain, and they contain descriptive variables such as age, gender and prognosis.
Basic SDC principles could improve the disclosure control, as well as online tools that generate Kaplan-Meir curves with SDC inbuilt.
Dermatology photographs
There is an online photo repository providing examples of skin disorders, and it has a huge public benefit to help individuals identify ailments.
Blurring the surrounding features, save from the direct ailment could reduce any identifiable information.