An Opportunity for Constructing the Future of Data Sharing in Otolaryngology

Whereas journals have strived for data deposition in repositories, this mainly pertains to genomics, transcriptomics, proteomics, and crystallography. JARO, for instance, recommends data to be included as supplemental material. However, for a research field as broad as otolaryngology, which encompasses a large number of sub-research areas where data can reach terabytes (e.g., audiology, neuroimaging, genetics, metabolomics, modeling, and recently cytometry and immunology), it can be useful to consider existing repositories. Publishers like Springer and Elsevier have research data policies and examples of repositories to be used. Springer also launched a journal called Scientific Data that allows authors to describe their data sets in a format called Data Descriptors. JARO too may become the future platform for the publication of Data Descriptors of the otolaryngology community. We also note that code sharing through platforms like GitHub is important for facilitating the reproducibility of data analyses.

General purpose data repositories, like the Open Science Framework, provide another reasonable data sharing option. However, datasets from different studies within a research field can often be scattered across various data sharing resources, thus making it difficult to find and aggregate data across studies for novel investigations on existing data. Moreover, such unstructured data sharing approaches can be limiting if researchers do not provide sufficient information for understanding the data, including information about best-use practices for analysis and interpretation of the data. Open access data repositories should require clearly defined data, or meta-data, and guidance about how to best use the data. For example, the Australian Data Archive is organized within a Dataverse platform for the storage of data from 1000s of studies where meta-data about sampling, data collection approaches, demographics, and participant response rates are available in an open access format prior to a formal request for data access. Similarly, the Zenodo platform allows for sharing of curated data like a European Union–supported multi-site project on tinnitus (UNITI) with data descriptors for the full data set that will optimize usability.

Perhaps the most useful multi-site data sharing approach to date has been to generate data repositories dedicated to model organisms (e.g., Mouse Genome Organisms) and domain-specific data (e.g., gEAR) [7]. Here, datasets can be curated with common variable types, as well as integrated and analyzed using methods to deal with missing data and different measurements for the same construct. Moreover, a community of collaboration can develop within these resources. These types of resources can also provide secondary data to researchers as a form of incentive to share data and to facilitate replication. For example, the Dyslexia Data Consortium and Hearing Health Institute’s data repositories are emerging resources where researchers can share their neuroimaging data, which is automatically processed to provide visualization and secondary data generation (e.g., regional brain volume predictors of dyslexia). That is, contributors can leverage data processing functions that would otherwise require computational resources and personnel training, in addition to providing access to these important datasets in one location. Development of these types of resources is labor intensive and requires significant resources to ensure long-term viability, which has been addressed with data access fees for some data repositories (e.g., UK Biobank). Ideally, there would be no cost to limited data access under the findable, accessible, interoperable, and reusable (FAIR), as well as collective benefit, authority, responsibility, and ethics (CARE) principles. There are many potential solutions for limiting costs (e.g., sliding fee structures for academic to industry access) that will depend on the organizations housing the data.

留言 (0)

沒有登入
gif