How to best share and archive research data even when you think it can’t be shared openly

Author: 
Emilia Chiscop-Head, PhD

Interview with Jen Darragh and Sophia Lafferty-Hess, Senior Research Data Management Consultants at Duke University Libraries

Emilia Chiscop-Head: How many customers do you serve every year through your workshops and consultations? Are most of these faculty, staff or students?

Jen Darragh: Our data management workshops allow us to reach a large number of graduate students and faculty and staff, as most, if not all, are typically included as responsible conduct of research credit courses for the Graduate School and the Duke Office of Scientific Integrity. We usually offer five workshops during each semester (fall and spring), and sometimes a special topical event such as a panel discussion on data sharing or reproducibility. Our consultations often are with research staff - both graduate and professional - who are charged with managing the data of their lab or research unit. We have had around 25 consultations since January. Many customers come back when they have new data management challenges - which makes us proud of the work we’re doing and the relationships we’re building.

“Primarily people ask us about how to best share their data”

ECH: Which area of data management drives most interest? What do most people want to know or need help with?

<Listen to Jen Darragh's answer here or continue reading the interview:>

JD: Primarily people ask us about how to best share their data given the growth in requirements and discussions around open research. Guidance is not always provided as to how people should share, and given the proliferation of options, it can be confusing. At times researchers think they have to build their own solution because they are not aware of what resources are available to them. Providing researchers with guidance on how and where to publish, and having a resource here at Duke like the Research Data Repository that facilitates making their data meet the FAIR guiding principles makes for happy researchers as they have local advice and a sustainable local solution as well.

We also answer questions about optimal data workflow design, including storage, organization, and backups. We often partner with others in our department and on campus that support data science to provide the best advice.

As for data management planning this occurs less often as more researchers are familiar with data management planning requirements of federal and private funders. However, we did partner with you all at ASIST on developing data management planning guidance for Duke Researchers regardless of funding. We feel that this was a valuable addition to the scientific integrity landscape here and hope that many researchers will take advantage of this tailored advice. 

"Since COVID we have seen an increase in researchers sharing data through the repository"

ECH: How has the current pandemic impacted the demand for DM services or training? Have you experienced an increase or a shift in the areas of interest?

JD: Since COVID we have seen an increase in researchers sharing data through the repository, perhaps due to the time to focus on article submissions but I’d also like to think that the push for transparency and openness in COVID-related research is helping to drive a broader appreciation for open research. Since many Duke researchers were already regularly working in the digital space, here hasn’t been a huge change in areas of interest or help topics. Offering virtual workshops online has been quite popular because of its flexibility.

ECH: What are the most common data management challenges that you see across research disciplines and how can they be overcome?

<Listen to Sophia Lafferty-Hess' answer here or continue reading the interview:>

Sophia Lafferty-Hess: I would say there are three common challenges we see. One is the challenge that comes with establishing workflows, especially with collaborative research. The more people engaged in a project, the more complex the workflow. Good data management practices rely upon consistency, commitments to good documentation, and establishing standardized procedures. When you are working on a team this requires buy-in to follow procedures, developing open lines of communication, and forethought in planning. We found many groups at Duke that are committed to doing it right, building consensus on a team, and then iterating to improve. All of these workflow choices have downstream implications for things like reproducibility, data sharing, and research integrity. Another common challenge is sharing research data ethically and legally. With human subjects data and other sensitive data, you have to be really careful to make sure you are fully de-identifying data (which isn’t always straightforward). You also need to think ahead and make sure you consent participants to allow for data sharing. Often, researchers use data from secondary sources and they need to make sure that they have the rights to disseminate those data. And while we recognize the value of openness, some data just can’t be shared openly. In these situations we have to identify options for safely sharing or archiving.

The last challenge we have addressed is helping researchers to see their data through the lens of a secondary user. People are often so close to their data and their processes that it is sometimes a challenge to step outside their experience in order to understand the documentation that is necessary for others to understand and reuse their data. We try to help by curating data and try to simplify the publication process while still adhering to standards.

ECH: What are some of the long term goals/projects that your department is now working to implement?

<Listen to Sophia Lafferty-Hess' answer here or continue reading the interview:>

SLH: One of our big projects this summer is collaborating with OIT and research computing to integrate Globus with the repository. Globus is a platform supported by Duke that facilitates large file transfers. The goal of this project is to make it easier for researchers to send us large data for deposit and for end users to download these data. We hope to roll these features out in early fall and we are really excited to be able to more effectively support the disciplines that are producing large scale data at Duke.

We are also working on revising our curriculum that teaches graduate students data management skills. As a partner institution within the Data Curation Network, we have been re-tooling DCN curriculum aimed at teaching librarians and archivists data curation skills. Researchers are the main curators of their own data, so we want to help students practice these skills through hands-on activities and discussion so they can apply them during their careers.

ECH: What is one message you would like to send the researchers, students and faculty who read this interview?

SLF: Good data management is not an all-or-nothing thing. I would encourage researchers to identify small goals. Whether that is “I am going to implement a file naming convention and stick with it for my next project” or “I am going to use Friday afternoons to make sure I keep my documentation up-to-date” or “For my next project, I am going to share my data openly through a data repository.”  or “I am going to start learning a programming language to make my research more reproducible.” Think about the incremental changes you can make, build on that, and then those practices become the norm. Finally, remember that you don’t have to do it alone. We are lucky here at Duke to have a village to support data management.