
In dialogue with Dr. Patrick Charbonneau, faculty champion of the Duke Data Repository
Professor Patrick Charbonneau is Associate Professor of Chemistry and Physics at Duke University. He studies soft materials using theory and computer simulations, and he has co-authored more than 80 peer-reviewed papers on this topic. Professor Charbonneau has earned a National Science Foundation CAREER Award, a Sloan Fellowship, and an Oak Ridge National Lab Ralph E. Powe award.
Originally from Montreal, Patrick Charbonneau obtained his Ph.D. in chemical physics from Harvard University in 2006 and was then a Marie-Curie Fellow at Amolf, in Amsterdam, before joining Duke in 2008.
He has been actively engaged in building the Duke research data repository and holds the record of having deposited the largest number of data sets to the repository so far.
Emilia Chiscop-Head: You have been the first Duke researcher to request a data repository and the one who uses it the most. What is the story behind this statement?
Patrick Charbonneau: While I was serving on ITAC in 2016, Tim McGeary, the Associate University Librarian for Digital Strategies and Technology, gave a talk about the Duke repository. Despite my best efforts to remember the data produced in the early days of my group I was starting to be more and more anxious when, once in a while--right out of the blue--someone would send me an email saying, “I read your paper, could you send me the data?” So, I emailed Tim and asked if it would be possible to deposit data every time I publish a paper. At that point, Duke did not have the infrastructure that exists today and depositing data was a rather clumsy operation. But with help from a then graduate student in my group, Yuan Zhuang, who was very good at coding and organizing (he now works at Microsoft), we started depositing data after each publication. It quickly became the norm. Today Duke Libraries have data specialists who built up a web interface which makes this process very easy - by essentially filling a form and dropping data on Box. So, I'm the biggest depositor, because I am the one who asked for this resource to exist at Duke and because we quickly figured out an optimal way for us to use this resource in our own research process.
“To shift from the old model to the routine deposit and sharing of data, one only needs to overcome a cultural barrier”
ECH: Is it hard to deposit data? What are the main benefits of using a repository?
PC: If you're just a bit careful about how you prepare data for publication, it’s really easy to deposit that data. It will even save you time down the road. Duke Libraries is currently setting up a bulk download function, and version control for datasets that get updated multiple times is in planning. In short, the infrastructure is available, and the team is now fine-tuning the process.
More funding agencies are asking for data openness and some journals have started to require it as well. The statement we have long seen in journals that data “is available upon request” is not technically always true and is a very old fashioned way of proceeding. The current infrastructure makes depositing data possible without effort. To shift from the old model to the routine deposit and sharing of data, one only needs to overcome a cultural barrier. If the word gets out and the idea becomes embedded in research practice, then more people will deposit data. I've given talks at faculty events, I’ve pitched the idea in faculty meetings and I hope this interview will help further spread the word. I expect that with the growing use of electronic notebooks, data that is well-structured from the start will be even easier to deposit.
ECH: Do you think depositing data and making it publicly available should be required or optional?
PC: Someone could debate as to whether a specific data set should or shouldn’t be published because it may or may not be reused. I think, however, that nearly every paper should have some data deposited. If institutions mandate that some data be deposited for every paper, then the share of data that gets deposited will simply grow with time. But after a first experience with depositing, you realize that the next time around you could deposit even more data because that additional information could also be useful to others. Data depositing then becomes the norm and becomes truly significant. As a first step, I think depositing data should be required.
ECH: Do you think that each discipline should establish good practices for depositing data?
PC: Every scholar that puts a plot in a paper or a book should be able to deposit the coordinates of the points in that plot, at the bare minimum. And that is essentially discipline independent, from history to biology to economics and even literature. Whether you go deeper in the pipeline of data generation, however, is more discipline specific.
ECH: How did you get where you are today in your career since you graduated?
PC: I'm from Montreal. My patronymic ancestor moved to what was called New France in 1659. I'm a native French speaker, and my education was in French up until I went to college. As an undergraduate at McGill, I majored in chemistry with a minor in computer science. (My parents, who are both computer scientists, would have preferred I do the opposite.) I then went to Harvard for graduate school in chemical physics. Computation and theory have thus been with me for a long while. After a post-doctoral fellowship in Amsterdam at an institute for atomic and molecular physics I moved to Duke as a junior faculty in chemistry. I work on soft matter theory and simulation: soft materials, gels, liquids, glasses.
ECH: How would you explain your science and its social impact in accessible language?
PC: We try to understand how small components, such as atoms, molecules or larger particles, come together and give a material its properties, like its squishiness, its response to heat, etc. At the theoretical level, we come up with simple models, hoping to explain universality classes of material assembly. Throughout my career, I've been examining different types of such assemblies, including glass formation and protein crystallization. The latter is an important step of the drug discovery process--a very popular theme during the Covid-19 pandemic. In short, in order to be able to screen drug-like molecules and understand whether these molecules have a chance of blocking the action of a viral protein, you need to know the shape of that protein. A classic way of determining that shape is to obtain a crystal of the purified protein, shine an X-ray beam on that crystal, and see how its diffracts the radiation. From the resulting diffraction patterns, you can work out the protein structure.
ECH: What role did mentors play in your career?
PC: There are many people from whom I draw very strong inspiration and who have been very generous and supportive throughout my career. I found my postdoctoral advisor Daan Frenkel, in particular, to be an extremely inspiring researcher. Even now, when I look at a problem or prepare to give a talk I often wonder how would Daan do this? I found his way of doing science to be extremely elegant, thoughtful and inspiring.
ECH: What are the challenges that researchers in your area face today which could have an impact on the integrity of the research?
PC: I'm fortunate enough to work in a subfield where the financial stakes are so low that the reputational risk of cheating makes little sense. The flip side is that some of my colleagues may think that I am not working on problems that are important enough. In other disciplines, the financial stakes can be really high and this can create incredibly high pressures. High funding and revenue-oriented pressures do not always bring the best in people.
ECH: What is your solution to this problem?
PC: Every researcher should get a minimum base funding they could always fall back on--a common model in many countries. I think that this structure weakens some of the misaligned financial incitements of a winner-takes-all model. But this issue falls well beyond my purview as a Duke faculty. A more accessible issue is the accessibility and reviewability of research data. I invested effort in the repository because I thought it was a genuine opportunity to solve a problem. Another issue is the commercial publishing industry with the high cost of journals, on the one hand, and the existence of marginal journals, on the other. A solution will perhaps emerge from the tension between libraries refusing to support weak publications and the open-access model. As current Library Council member, I hope to contribute there as well.
Meet Professor Charbonneau:
Duke Today: Chemistry in the Kitchen, Cooking in the Classroom