Data management plans as linked open data: exploiting ARGOS FAIR and machine actionable outputs in the OpenAIRE research graph

The comparative study [14] between OpenDMP, OpenAIRE RG and the DMP Common standard models revealed common entities, properties, and relationships, clearly portraying the state-of-the-art for DMPs as well as the opportunities arising with the adoption and publication of ma-DMPs in OSGs.

Mapping between ARGOS and the DMP Common Standard

The mapping highlighted the need for updates in ARGOS to be fully compliant to the RDA standard and strengthen its ma-DMP outputs. In Fig. 4, we show the entities and their specific properties that were found to be absent from ARGOS.

Fig. 4figure 4

DMP Common Standard elements not in ARGOS

Mapping between the OpenAIRE RG and DMP Common Standard

Given that some researchers publish their DMPs in repositories, the OpenAIRE RG already included metadata about few DMPs made available by content providers as records compliant to the OpenAIRE guidelines [9, 10]. Metadata includes properties and relationships that are useful for findability (persistent identifiers), accessibility (landing pages and download URLs of different versions of DMPs, access right information), citation and discovery (bibliographic metadata properties), and for tracking (links to funding grants and involved organisations). Observing the DMPs that were available in the OpenAIRE RG, indicated that they were “disguised as project deliverables”. Hence, OpenAIRE exploited the COAR vocabulary which introduced resource type “DMP”Footnote 27 in the latest release to enable the proper typing of DMPs that would clearly identify them on the OpenAIRE RG. Following that, Zenodo introduced a new type of resource so that users can select the type “Data Management Plan” upon deposition. Because ARGOS uses Zenodo as its publishing mechanism and Zenodo is exploited by OpenAIRE as one of its core data sources, ARGOS published ma-DMPs become immediately an integral part of the OpenAIRE RG. That way, ARGOS ma-DMPs are exposed with proper resource_type and are treated as independent entities in the OpenAIRE RG contributing to making searching for DMPs and their links with other outputs an established process.

As it is shown in Fig. 5, when analysing the mapping of the OpenAIRE and DMP Common Standard models, we observed that most of the ma-DMP entities can be mapped directly to the OpenAIRE RG Entities.

Fig. 5figure 5

RDA standard and the OpenAIRE Research Graph Model

Those entities are:

Identifier (DMP_id), indicating the persistent identifier of the DMP record.

Title of the DMP.

Date of creation and modification of the DMP record.

Description regarding the context that the DMP is created.

Additionally, Fig. 6 shows that the mapping activity identified areas where the OpenAIRE RG can be strengthened.

Fig. 6figure 6

DMP Common Standard properties and entities not in OpenAIRE

ARGOS ma-DMPs enrich the OpenAIRE RG with the identified entities and properties using researchers’ input. This input might be inferred from an API selection, such as. OpenAIRE, EOSC, Zenodo, (authoritative source) or be the compilation of free text statements provided in the DMP (non authoritative source). In Fig. 7, we show which entities and properties and in what way ARGOS can provide them to enrich the graph with missing information.

Fig. 7figure 7

ARGOs DMP elements enriching the OpenAIRE RG

Particularly for dataset metadata about ethics, security, quality, and preservation available in ma-DMPs, OpenAIRE could consider revisiting its guidelines [9, 10] so that this information always reaches the OpenAIRE data model from data deposits in OpenAIRE compliant repositories.

Besides expanding information captured in the OpenAIRE RG, additional efforts are needed to create value out of this captured information. Below we describe what relation types are available in the OpenAIRE RG and what changed to accommodate links between ma-DMPs and other entities.

The OpenAIRE Research Graph features a dedicated relation type between research products and project grants: “isProducedBy/produces” (a product is produced by a project; a project produces research product). This type of relationship can exist between projects and research products of any type (publications, datasets, software and other types of research products), and is used by OpenAIRE to model the association between DMPs and project grants. Considering the importance assigned by funders to DMPs (e.g. for H2020 grants, the DMPs is a “living deliverable” that must be updated frequently during the lifetime of the project), and to support the development of added-value services on top of machine actionable DMPs, the OpenAIRE Research Graph will include a relationship with a dedicated semantics “hasDMP”/”hasProject”.

Interestingly, the DMPs that are currently available in the OpenAIRE Research Graph do not include explicit links to the datasets they refer to. Clearly, such links may not exist when the first version of a DMP is published - because the datasets may not yet exist -, but the expectation is that the datasets are made available during the life-time of the project. Thanks to the different versions of the DMPs, therefore, it would be possible for OpenAIRE to add relationships with specific semantics between a DMP and the referred datasets. The relationships available in the OpenAIRE Research Graph, inspired from CERIF [8] and DataciteFootnote 28, do not include a specific semantics that could depict the association between a DMP and its datasets. OpenAIRE is therefore planning to add a new relationship with semantics “hasDataset”, drawn from the core ontology of the RDA standard [3], and a corresponding inverse relationship, which is instead not defined in the standard, to link datasets to the DMPs (“hasDMP”).

Utility and discussion

Observations made during the mapping activity of entities and properties between the three data models of OpenDMP, OpenAIRE RG and DMP Common Standard showed that when direct mapping couldn’t be fulfilled, information might still be able to be found in more abstract/ general fields of OpenAIRE or ARGOS though diverged in cardinality and / or data type; some information might still be covered by ARGOS DMP outputs as they enter the OpenAIRE RG or tweaked to accommodate the needs of ma-DMPs documentation or, rarely, they may be omitted in information exchange. OpenAIRE, also, highlighted the value added in contextualising ma-DMPs as they contain structured and specialised information, especially about datasets, which cannot be found in original / traditional DMP documents.

Searching OpenAIRE for DMP outputs outside ARGOS, showed that DMPs are still typed as generic publications or reports, and not as data management plans. The introduction of a specific term for DMPs in global vocabularies, such as COAR’s, is expected to improve the current status, although the adoption of the new term may take some time to be widespread. The recent update of the Datacite metadata schemaFootnote 29 which includes DMP IDs, will significantly contribute towards wider adoption by service providers.

Furthermore, in order to conform to the needs of ma-DMP fixed schema, without losing the versatility of its templating mechanism, ARGOS software follows an approach that engages an extensible mechanism for attaching export format converters (ma-DMP being one of them) and semantic tagging of template elements that can be used “at-will” by those converters. The ma-DMP converter makes use of its knowledge of the fixed part of ARGOS data model as well as attributes attached to various dataset description fields in order to pick the data required for a ma-DMP file.

The flexibility of ARGOS machine actionable templating system combined with the integration of the publication mechanism of Zenodo and the interlinking with the OpenAIRE RG is crucial as it fosters change in the scene for DMPs content, distribution, exploitation, and reusability. According to the recent study of DMPs in Horizon 2020 [15], there are more than 1500 publicly available DMPs in CORDIS that most of them do not indicate the ways in which they can be re-distributed and used. That practice reverts the situation to the known problem of open access. The licenses assigned to ARGOS DMPs and the access rights indicated in its context, eliminate such confusion. Moreover, another study on Monitoring the Open Access Policy of Horizon 2020 [16], proves the importance of individual dataset descriptions in the exploitation of datasets and DMPs. That way, all datasets described in DMPs become individually identifiable in a granular way that links them with their associated metadata, repositories, back up policies etc. Similarly, re-used datasets become identifiable and can be further quantified, analysed and validated in reproducibility studies.

Moreover, it is foreseen that the OpenAIRE data model will be complemented with information that is currently not present, such as DMP cost or metadata. Cost information is not crucial to improve the level of FAIRness of DMPs or datasets, but it might enable analysis about data management costs useful to funders, organisations, and project administrators. Similarly, this kind of information is potentially useful to studies about responsible research and may be integrated to serve specific analysis or use cases related to RRI (Responsible Research & Innovation) monitoring.

Finally, this case study explored information carried out in published, i.e. DOIed DMPs. An opportunity that ARGOS has already started to exploit with Zenodo is pre-filling of ma-DMPs with information from deposited metadata records. This automation highly contributes to the needed cultural shift that Open Science cultivates and incentivizes researchers to follow best practices while minimizing their compliance effort. Among the issues being examined in this implementation are capturing of the state of datasets that are pre-filled (e.g. reused dataset in a DMP) and adopting the pre-filling mechanism in other repository software, such as Dataverse. Further connections with data analysis PIDs are expected in the context of the Research Analysis Identifier System - RAISE project.

Comments (0)

No login
gif