Research Dataset-Level Metrics

From CASRAI

Research Dataset-Level Metrics (DLM) is a CASRAI pilot activity that brings together a diverse set of stakeholders with a common interest in better understanding and communicating the different kinds of metrics for assessing research datasets.

Interest Group

The DLM community-of-interest is organized via an Interest Group (IG). If you have an interest in this topic or an affinity for the problems being solved by this activity we encourage you to take a first step by joining this Interest Group. You can see who is currently in this Interest Group.

Participation is free and open to all and has the following benefits:

  • lend your vote and voice to supporting an open, sustainable and transparent solution to a shared problem,
  • receive regular updates on DLM-related work, implementations and news,
  • receive regular progress reports from any Working Groups tasked with drafting DLM information agreements,
  • participate in the review of any new draft revisions produced by Working Groups,
  • participate in conversations on various DLM-related sub-topics with colleagues on the mailing-list.

Working Groups

A Working Group (WG) can be triggered by an Interest Group when there is a critical mass of input and interest in creating a new (or revising an existing) standard information agreement. The following Working Groups have been triggered for DLM:

Background

This Interest Group, initiated and led by Rebecca Lawrence, F1000Research and Sunje Dallmeier-Tiessen, CERN is convening discussion among related initiatives and interested stakeholders (information providers, managers, users; i.e. from policy makers to data(base) providers to publishers) with a focus on the “dataset level metrics” challenge. The interest group is also charged with initiating projects to deliver specific outputs to support database- and vendor-neutral interoperability of information about research data between repositories, publishers, academic administrators, funding agencies and researchers. Projects launched by this interest group may be limited to delivering common agreements on terminology and vocabularies or they may also deliver selected proof-of-concepts, with volunteers (platforms, publishers, funders and institutes) offering their facilities to implement suggested metrics and assess the usage, meaning and impact. Resourcing requirements and contributions will be determined as part of the creation and scoping of any projects. Such projects will collaborate closely with related projects and initiatives identified by the interest group (for example the NISO group on metrics for alternative output types: NISO Alternative Assessment Metrics (Altmetrics) Initiative).

A growing interest in enhanced metric reflecting data and software publications is evident and reflected in a number of initiatives by for example PLOS/CDL. The objective is to enable all stakeholders to better recognise the contribution and impact of the production of good quality research data.

The production of high-quality research data underpins most scientific research. There has been much progress over the past few years in the development of a range of new metrics to try and better recognise the impact of research articles (especially the development of article-level metrics (ALMs) and altmetrics).

During this time, there has also been an increased recognition by governments, funders, institutions and ultimately researchers as to the value and importance of sharing the research data underpinning new research findings, and sharing them in a way that enables others to try to replicate, reanalyse and reuse the data. However, there has not been much progress to-date in the development of metrics that specifically work well for data. Article-level metrics typically do not work well for datasets where user behaviour in how the data are used are very different, and the range of types of data and the challenges of continually updated data or datasets produced by large consortia are not particularly amenable to existing metrics.

Many researchers are also concerned about sharing their data. Some of the biggest concerns include not being able to extract all the major findings out of the data before someone else gets to benefit (and potentially identify a significant discovery) from the data they have produced. As assessment of researchers’ output for future grants and career progression tends to focus around articles rather than data output, some researchers worry that they will lose much of the benefit of their hard work in producing the data. Additionally, researchers are often concerned about the sometimes extensive amount of time required to get the data (and associated metadata) into a form that enables others can really reanalyse and reuse it.

Both of these concerns could be at least partly alleviated by the development of better tools to measure the impact and importance of the generation of a new dataset, that academic administrators and funding agencies could then use as part of their assessment processes. This may shift the risk/benefit balance for the researchers. The purpose of this working group is therefore to recommend some metrics that publishers and data repositories can implement and measure, and academic administrators and funding agencies can then use as part of their assessment.

Potential use cases for such metrics include:

  • As a publisher wanting to display metrics on research data published as part of articles so the authors can know the ‘impact’ of the data they have generated, I need some detailed standards that have been agreed to be recognised by research funders and research administrators;
  • As a data repository or community platform wanting to provide data submitters and data users with additional information about how their data has been used and about the impact of particular datasets, I need to know what to show. Which metrics are suitable for cross disciplinary or disciplinary repositories? How do I get them, show them and possibly export them? I need recommendations to address these challenges across disciplines and stakeholders. They should be standardized, but leave flexibility for the reflection of disciplinary practices;
  • As a research funder wanting to evaluate results of the funding we provide, I need standardised recording of metrics that can provide information on the ‘impact’ that a new dataset has had on scientific progress or on the broader needs of society. This can contribute towards evaluation of the output from the research funding we provide, as well as provide valuable information towards our assessment of new grant applications;
  • As a university administrator, I need standardised recording of metrics that can provide information on the ‘impact’ that a new research dataset has had on scientific progress or on the broader needs of society. This can help contribute towards decisions on recruitment of new faculty and promotion, as well as provide information that may be used in institutional reviews that impact institution-level funding.