In reproducible computational research, “all details of the computations — code and data — are made conveniently available to others” \citep{Donoho_2009}. Strictly speaking, a promise in a published paper to make code and/or data “available upon request” is not a reproducible practice: digital artifacts should already be in a suitable repository. For code, we have the framework of open-source software, with its licensing and tooling ecosystem. The case of data is, in many ways, different. Raw data are considered by US law as facts, and not protected by copyright. But the manner of “selection, coordination or arrangement” of data can be considered an original work, and copyright applies. Principles and practices for open data are better established than is the case for code, with several funding agencies now mandating a Data Management Plan as part of grant proposals.
Good data management is a basic responsibility of all researchers, but it remained a fuzzy concept until recently. Through the coordinated work of representatives from interested communities (researchers, publishers, funders, archivists, and more), we now have the
FAIR Principles: digital artifacts of research should be Findable, Accessible, Interoperable and Reusable (FAIR,
https://www.force11.org/group/fairgroup/fairprinciples) for machines and for people
\citep{Wilkinson_2016}. These high-level principles guide technology choices, practices, and standards. For example, the Findable principle leads to the practice of describing data with metadata, assigning to it a unique and persistent identifier, and registering the data in a searchable index. To be Accessible, data should be retrievable at any time using its unique identifier. By being Interoperable, data can integrate with other data and with standard workflows, which means using a shared language in metadata. The Reusable principle implies that data products should be released with a standard reuse license—typically a Creative Commons Attribution license (CC-BY,
https://creativecommons.org/licenses/by/4.0/) or a Creative Commons public-domain dedication (CC0,
https://creativecommons.org/publicdomain/zero/1.0/). Of course, exceptions to sharing data must be made for research involving human subjects, when privacy concerns take precedent.