Research data management: Sharing data to advance knowledge

Research data management lets researchers record, use and share research data. It supports the development of new knowledge and replication and validation of research findings.

What do Jean-François Lapierre, Simon Dufour, Frédérick Bastien, Chantal Gagnon and Nadia Gosselin have in common? They are the five Université de Montréal professors who have shared the most datasets, according to the latest Borealis report.

Borealis is a multidisciplinary data repository that uses Dataverse software, developed by Harvard University, and is supported by a partnership among Canadian academic libraries, institutions, research organizations and the Digital Research Alliance of Canada. The technical infrastructure is hosted by Scholars Portal and University of Toronto Libraries.

UdeM is among the 15 Canadian educational institutions and research centres that disseminate the most datasets on Borealis, where they can be reproduced and reused for academic or other purposes.

The Dataverse space of the Interuniversity Research Group in Limnology ranks first among UdeM contributors to Borealis (counting all categories, i.e. individual researchers and research groups), having shared more than 30 datasets. One of these has been downloaded more than 250 times since it was shared in December 2023.

30,000 nights of data

Nadia Gosselin , a professor in the Department of Psychology and director of the Centre for Advanced Research in Sleep Medicine (CARSM), is the UdeM researcher with the largest Dataverse collection, having shared 10 datasets collected over decades by the Montreal Archive of Sleep Studies (MASS) project. They cover 200 nights of sleep, each with millions of data points recorded using a standard protocol and devices that capture data every 4 milliseconds. Gosselin’s datasets have been downloaded more than 4,300 times.

Over the past 24 years, CARSM research teams have collected and recorded a total of 30,000 nights of sleep data!

"At CARSM, our philosophy is that publicly funded research should be accessible to all," Gosselin said. "The 200 nights of data that we have made available through the Digital Research Alliance of Canada have attracted hundreds of requests from research teams around the world, who then publish their studies in scientific journals. Research data management helps us advance knowledge and develop new tools that promote interdisciplinarity, since these data are then used in different fields."

The datasets from the MASS project are in fact being reused for other research. For example, a Google Data search turns up a computer science research project using the data for deep learning.

Gosselin and her colleague Aude Motulsky are on the Digital Research Alliance of Canada’s list of 18 Data Champions for 2022-2023.

RDM ushers in a new era in research

Research data management, or RDM, is now an essential part of any research project’s lifecycle.

Since 2021, federal funding agencies have had a policy that requires researchers to submit a data management plan with some grant applications. Before beginning their work, research teams must specify how the data will be collected and stored, and how it will be shared once the project is over. As well, since Quebec’s Act 25 dealing with the protection of personal information came into effect, the data management plan must explicitly describe how data related to human participants, which may be sensitive, will be handled.

Similarly, federal policy requires educational and research institutions to adopt a public RDM strategy. UdeM submitted its strategy in spring 2023.

"People have misgivings about research data management and it’s understandable, since it requires financial and human resources," said Gosselin. "You have to hire experts. So it makes sense for researchers and institutions to join forces to create a strong ecosystem."

As part of the overall framework for open and responsible science, RDM facilitates data sharing and reuse, improves research organization, saves resources and money, ensures business continuity and reduces the risk of data loss.

Pivotal role for UdeM Libraries

UdeM library staff are already working with research teams to help phase in an "RDM culture."

An introductory webinar is offered on a regular basis and a support service helps researchers prepare a data management plan, organize and document their data, and select a data repository such as Borealis to publish their datasets and discover new ones.

The Interuniversity Research Group in Limnology’s Dataverse space is a good example of the personalized "FAIRification" service offered by UdeM Libraries. The datasets it has uploaded to Borealis have been enriched by metadata librarian Teresa Bascik in accordance with the four FAIR principles, namely that data should be:

* Findable

* Accessible

* Interoperable

* Reusable

This service from UdeM Libraries supports the federal research data management policy, which refers to the FAIR principles.

"With a protocol that meets these principles, it’s easier for research teams to find, access, download and use datasets, whatever the platform," said librarian Stéphanie Pham-Dang from the Support for Success, Research and Teaching department. "A controlled vocabulary makes the datasets interoperable, so they can be reused for future research, in accordance with the standards of the research community."