Snowball Metrics Meeting Mar 12 2015

From CASRAI

Attendance

   George Chacko, University of Illinois
   Anna Clements, University of St Andrews (Co-chair)
   Lisa Colledge, Elsevier Inc (Co-chair)
   Mark Cox, Kings College London
   Jorge Herskovic, University of Texas
   Linda Naughton, Jisc
   Simon Pratt, Thomson Reuters
   Beverley Sherbon, MRC
   Thomas Vestdam, Elsevier Inc
   David Baker, Casrai
   Sheri Belisle, Casrai

Agenda

   Welcome and orientation to new members (Chairs)
   Introduce draft Standard Profile for Awards Volume (David)
   How to handle code mappings, ie: HCC in UK (Anna)
   Institution ID Types (Lisa)
   Next steps (David)
   AOB (all)

Supporting Materials

   Snowball Metrics Charter
   Snowball Input Metrics Data Profile

Agreed Actions

   Edits to the profile per discussion
       Action owner: Casrai
       Completed by: Apr 2
   Extend meeting schedule
       Action owner: Sheri
       Completed by: Next Meeting
   Reach out and invite needed persons to the group
       Action owner: Chairs
       Completed by: Next Meeting

Discussion

Meeting brought to order at 1:30 PM GMT.

Welcome to new members; no new members had questions on the Charter; everyone is very pleased with the scope.

Agenda Items 1 – 3: Draft Standard Profile for Awards Volume; Code Mappings; Institution ID Types

Casrai’s intent is to take the core contents of the Snowball Metrics and define them as a series of profiles in the Casrai dictionary along with any underlying terms and definitions that are needed to support them.

The main idea behind a Casrai profile is to clearly define scope by focusing in on a single use case (user story) that the profile aims to solve by being implemented as an interoperability tool between different databases of the stakeholders in the use case. In this case its institutional benchmarking – institutions partnering on various research activities we need to be able to privately benchmark our respective metrics in order to understand where we are in comparison.

“Privately” is a key word for defining this benchmark (see note about public ranking). Institutions are free to decide what they want to make public but the primary use case is around private benchmarking.

Why is use case so specific? The whole point of Snowball is that its a bottom-up initiative from the institutions, so looking at what the institutions want to benchmark themselves against. There are situations where others might be interested in this information as well, like funders etc, and that may well be something to move into but the primary objective was for the institutions to have the information to be able to benchmark each other in an apples to apples way. Trying to keep that primary driver clear as there are many other things to potentially move into. Its to exchange metrics, not the data underlying them, between institutions.

The Casrai focus is incremental. We’re not saying this is the only use case, just that we’re starting with this one to ensure we define scope and deliver it properly. Any use case that could be supported by Snowball Metrics becomes a use case for a profile to develop. We do them one at a time to contain our work.

What is Date? Date of award or something else? Perhaps not as critical since this is for internal use. Good point, as this came up during discussion. The only reliable date the institutions have is the date the award was entered into the system. Some funders would have many others like start date (when it was expected to start, when it actually started, when it was awarded…), some might not have those dates available. Use one for now; the profile can be expanded to cater to others at a later date.

That’s key, the use case is focussed on institutions. When a use case is introduced that widens to things that funders could benefit from Snowball Metrics, that triggers a new process to determine what elements are needed/missing or which ones already there need tweaking to adjust to the new use case.

The exchange of metrics without the underlying data implies trust without verification. How is that handled as the consortium grows larger? Will accuracy of benchmark calculations be questioned? Whole point/concept of Snowball is that its being designed from the ground up with institutions so there has to be an element of trust. Metrics can be shared with those chosen – its not public information, its private information so institutions are dealing with their peers, on the basis of trusting those peers. Fundamental to the project. Objective is private sharing. Removes a major motivation to provide misleading information. There is no intent to have a public Snowball ranking. Gaps are unavoidable, notes can be added regarding this so that anyone receiving the metric also receives the note about what gaps are affecting the metric without seeing the actual data.

Orgs can share data if they want to. Initial discussion was that both options (sharing or not sharing the data) be available; institutions that are not willing to share data can just share metric – quick benchmark for comparison. Profile can open up for sharing data; aggregated information behind the metric. For implementers, need a little more information on what data needs to be dug out of systems.

Are there any agreements around confidentiality or non-disclosure after metrics have been exchanged? There are no formal agreements in place at the moment, but I think it is understood that recipients will use the metrics with sensitivity – and remember that others can do to your metrics as you do to theirs!

In the profile definition, some contextual information has been added that frames the use of the profile. That should be part of the dictionary. Questions should be anticipated and responses framed in the introduction of the dictionary.

Items end up in the dictionary at a couple of layers. Every single term the group feels needs to be included and defined clearly, is added. Groupings of the terms become an entry in the dictionary as profiles (this one is award volume) – there will be 25 or 26 upon completion. Many will reuse underlying terms. Idea is that the profile is a business or policy agreement around what information is needed for an exchange to be complete. The exchange is given a name: Snowball Metric Award Volume Profile, each component is identified and listed. Beyond the general terminology in the dictionary, clear definitions are provided, and the terms are grouped together in an object/attribute collection. In order to understand more about, for example, funding award currency, you might want to go “up” and understand the object “funding award”. Depending on your view (technology versus policy etc) this is not intending to dictate how data is stored, goal is not a data model, but an agreement model between business and policy users that they can understand.

Two objects of interest in for Award Volume: Research Institution and Funding Award. Each object has a list of attributes needed. Gives everything needed to run a calculation on the metric.

Why Research Institution and Funding Award, and not Research Inst., Award, and Funder? Question of what level of normalization is needed to reach main goal of business and policy agreement. Could have been like that, but simpler to have the Funder as a property of the award. (Doesn’t reflect how its stored.) Part of the role of the group to challenge this. The group might agree that it needs those three objects. Its about the award and the inst that’s holding it – so for a business discussion, that’s what’s important. You could go much deeper, looking at cost centres and funding types etc, but it needs to be kept at a level that’s instantly recognisable as the metric “award volume”.

Casrai standards are useless if they don’t get implemented in software systems. If we express them at the physical layer of data modelling, they become useless for the stakeholder – business or policy owner. Trying to find balance. Storage is a local decision, but this is displaying it at a report level.

Researcher Inst. ID Type is a general pattern that’s evolving in the Casrai dictionary from all groups where there’s rarely a single ID. Casrai doesn’t advocate for any single ID system. Need to have an element that qualifies the ID so people know what to expect for the string of the ID itself. (Like ORCID, ISNI, etc). Policy is to let the community decide what IDs to use. The dictionary can evolve over time to reflect what’s being used by the community. Only one would be captured, and its type will tell implementers what it should look like.

Source Org ID listed, need Source Org ID Type. Action: Add Source Org ID Type to profile.

Classification includes HESA, which may be popular in the UK, but not elsewhere so this field is necessary to identify. Also allows us to internationalise the profile by adding classifications schemes from other countries. NSF-HERD is mapped to HESA now. Action: Add NSF-HERD to list. Include the mapping if its something that would be maintained and used in its own right. Action: Add Snowball Mapping to the list (this is the NSF-HERD HESA mapping). Capture as many as possible, can change in the future.

Should a version or date be attached to these? With respect to RC in Casrai, Uber Research are proposing that in addition to there being a number of taxonomies for classifying things, would like a standardized taxonomy profile that captures a point in time of which were used and what were their current versions and dates. That might be of interest to the group. Action: Add Classification Version to the profile. Is this IN the classification type? Will go in as Funding Award/Classification Version for now.

Currency conversion factors are important. To benchmark between them, everyone needs to be doing the same thing. Use historical data to do conversions for metrics are over time. Options are to include an element or frame it in the background. Use the latter, to start, and if over time a need arises, it would become a clear change request to introduce something. Element should contain full ISO listing. Action: make currency element the full ISO listing; add background with regard to currency conversions.

Financial year structures to represent the different structures in different countries. Some use several year-ends. Need to record a local reality for baseline. A central time zone was also an option but capturing something local seemed better. Would adding the funding end date eliminate the need for this? Need to know what’s meant by “year” to be able to compare the metric. Is financial quarter needed? Yes, for granularity of the awards and to keep info as current as possible. In a year by year comparison, have the mechanisms needed to know what is meant by “year” so adjustments can be made, and quarter allows for granularity. Should be taken back to the Snowball working group for clarity as to why quarter was included in the first place. Is it an attribute then of the institution then, rather than the award? Action: move all those elements to institution. Amount awarded that year needed, rather than how many years over which its awarded.

Agenda Item 5: Next steps

One more meeting scheduled. Action: Sheri to extend meeting schedule. Note that the call has been recorded, minutes will go for review and then become public record on Casrai’s website. Moving into a monthly pattern of these calls to evolve the profile and its underlying definitions to get it ready for review. We’ll start mapping the other recipes and propose a sequence and schedule to roll them out to the group (at the next meeting).

Agenda Item 6: AOB

Consider who else, if anyone, should be in the discussion. Reach out to those who didn’t make it. Action: Chairs will reach out and invite accordingly. Next meeting is April 9, same time.

Meeting adjourned at 1:36 PM GMT.