AI development in Biotechnology needs high quality data.
•The FAIR principles, biobanking standards, IVDR and MDR define requirements on specimen and data provenance.
•A framework to record and publish provenance information is presented.
•A use case in computational pathology illustrates our approach.
AbstractAI development in biotechnology relies on high-quality data to train and validate algorithms. The FAIR principles (Findable, Accessible, Interoperable, and Reusable) and regulatory frameworks such as the In Vitro Diagnostic Regulation (IVDR) and the Medical Device Regulation (MDR) specify requirements on specimen and data provenance to ensure the quality and traceability of data used in AI development. In this paper, a framework is presented for recording and publishing provenance information to meet these requirements. The framework is based on the use of standardized models and protocols, such as the W3C PROV model and the ISO 23494 series, to capture and record provenance information at various stages of the data generation and analysis process. The framework and use case illustrate the role of provenance information in supporting the development of high-quality AI algorithms in biotechnology. Finally, the principles of the framework are illustrated in a simple computational pathology use case, showing how specimen and data provenance can be used in the development and documentation of an AI algorithm. The use case demonstrates the importance of managing and integrating distributed provenance information and highlights the complex task of considering factors such as semantic interoperability, confidentiality, and the verification of authenticity and integrity.
AbbreviationsAIArtificial Intelligence
FAIRFindable, Accessible, Interoperable and Reusable
IVDRIn Vitro Diagnostic Regulation
MDRMedical Device Regulation
IVMDIn Vitro Diagnostic Medical Device
SOPStandard operating procedure
CPMCommon Provenance Model
SPRECSpecimen Preparation and Reporting for Evaluation of Clinical Data
DAGDirected Acyclic Graphs
HDFHierarchical Data Format
JSONJavaScript Object Notation
RO-CrateResearch Object Crate
EOSCEuropean Open Science Cloud
KeywordsArtificial intelligence
Provenance
Biological material
Traceability
© 2023 The Authors. Published by Elsevier B.V.
留言 (0)