Dlm-wg-repositories - July 2015 Meeting
CASRAI DLM - Quality DLM - Institutional Repositories Working Group
MEETING NOTES Tue, Jul 28, 2015 9:00 AM - 10:00 AM EDT
- Alex Ball
- Rebecca Lawrence (Co-chair)
- Sünje Dallmeier-Tiessen (Co-chair)
- Mark Hahnel
- Monica Duke
- David Baker (Co-chair) (Meeting Chair)
Welcome & Introductions
Review CASRAI profiling process Information requirements, not software requirements. Profile - business/policy-level agreement/spec around what information is needed and what the different elements mean. Focus on exchange of information between two parties: data user(s) and data provider(s). It is not about software requirements. Terms, objects, attributes, lists. CASRAI dictionary - everything that ends up in there starts as a term that we all agree on. These terms can be further defined as objects that have attributes, and lists that constrain those attributes. Managing scope via use cases. Specific use cases help to contain the work for a project and keep it in scope and manageable. To use the MoSCoW method for priorities - Must Should Could Won't
Discuss profiling qDLM for Repository (generic) use case Who are the information 'provider(s)' and 'user(s)' in this use case? At least two repository-related use cases. One has the researcher as provider; repository as user. Second one has the repository manager as provider; funder/research manager/data re-user as user. Will start by focussing on the first use case. Also question whether institutional vs generic repository use case - plan to start with institutional repository use case(s) and then can expand out to generic repository use case(s). Are we separating quality from impact - 'impact' at this stage may be premature but possibly 'antipated impact'? What is the question-set for for this use case (Institutional Repository Manager)? Question set listed below. ALL to continue to add to this list in the pad below during the rest of this week. At what stage in what business processes is/would qDLM information be shared/exchanged? This use case focuses on dataset submission by a researcehr to an IR.
Discuss next steps CHAIRS start off writing an initial definition for early terms below; ALL then edit the definitions. Wider consultation on early draft - once WG comfortable with outputs, CASRAI would like to add it as a draft profile to the dictionary for dissemination and structured capture of comments and feedback from the wider community - agreed. Next call due in early September, and then do monthly until draft ready to publish. Dissemination opportunity: Wellcome Trust workshop in early November - http://www.biomedbridges.eu/workshop-better-metrics-measuring-data-quality. This group to discuss in September meeting what to present at that meeting and how best to maximise that opportunity. CHAIRS to discuss how to communicate the approach we are taking to the broader group on the DLM project so they know what is happening, and also decide how/when best to broaden out to other use cases to keep the interest of the broader group
DOCUMENTATION ITEMS: External Links / Related Efforts:
- Data stewardship maturity matrix https://www.cicsnc.org/fact-sheets/scientific_data_stewardship
- NERC checklist http://www.nerc.ac.uk/research/sites/data/policy/data-value-checklist/
- FORCE11 FAIR Principles https://www.force11.org/group/fairgroup/fairprinciples
- UK Data archive data review (presentation) http://www.dcc.ac.uk/sites/default/files/documents/IDCC13presentations/Preparde_vdEynden_17Jan13.pdf
Early Term Definitions:
- Research Dataset Quality -
- Research Dataset Impact -
- Research Dataset - http://dictionary.casrai.org/Research_Dataset
Early Question Sets: UC 1 - IR Manager Use Case: What kinds of questions would an IR manager ask of a reseacher regarding the quality of a research dataset being deposited?
- What potential impact/reuse do you anticipate from this dataset?
- How has the dataset been licenced?
- Are there legal or access requirements for the data?
- How has your dataset been documented for reuse?
- Is the data structured and labelled
- To what degree is your dataset reproducable (processing raw>derived)?
- Have you provided additional disciplinary-associated metadata
- Has the data been peer reviewed (externally)?
- What is the result?
- What kinds of format(s) is your dataset in? [associated idea of acceptable or supported formats] (http://dictionary.casrai.org/Output_ID_Types) - not an example for this use case but illustrative
- related to Q4 is the format one that supports re-usability (e.g. does not require specialised software)
- Has your dataset been checked/ "accepted" by a curator before publishing in the repository?
- Was the data collected according to an accepted methodology? [is the methodology referenced?]
- Are there automated checks/QA/QC mechanisms in the data respository that check "data quality"? Is there 'ingest' integrity?
UC 2 - Funder/Research Manager Use Case: What kinds of questions would a Funder or Research Manager ask of an IR Manager regarding the quality of a research dataset already deposited?
- How many access requests were recevied before embargo period was over?
- How many times has the dataset been cited or re-used? (and ideally, how has the data been used? i.e. What have the consequences been of making the data available -- note that this is a question the data creator also would often like to know)
Draft Use Case: Summary: As an institution running an institutional data repository that accepts deposits of data from different disciplines, a definition of quality is needed in order to perform internal quality-checking processes and provide quality information downstream. A repository at a UK HE institution accepts data sets to be deposited by their researchers when an external discipline-specific repository is not available, in order to meet funder requirements and institutional policies that safeguard data and help disseminate it. The repository administrators would like to ascertain that the data deposited meets a level of quality. The repository managers are not themselves experts in the discipline, they rely on the researcher’s knowledge of the tools and process that produced the data, and of the accepted and known standards in their community. They are however able to provide an outline of the types of requirements that the data should meet. The deposit is mediated: the researcher deposits the data, this is checked by the repository administrators before being made live. The process may involve some exchanges between the repository staff and the researcher [iteration] until all the information required to ascertain quality is collected. The repository administrators need to strike a balance of placing reasonable demands on researchers making the deposit and ensuring quality. They also need to give adequate guidance on what is required to make deposit acceptable, in a way that is understandable to users (depositors). The administrators themselves may need a checklist and guidelines to go over when accepting a submission (examples here:). Depositors can also be pointed to external resources for guidance. (Information above partly based on discussion and information provided on the Research-DataMan JISCMail list “Quality control of research data” started on 4 June 2015)