Karen Stocks - Authorea

The Rolling Deck to Repository (R2R; www.rvdata.us) program is entering its second decade of managing underway data from US-operated academic research vessels to ensure preservation of, and access to, these national oceanographic research assets. Reflecting on the move from decentralized data submission by chief scientists to an operational centralized facility has brought insights that may inform other communities with distributed networks of data acquisition providers with diverse practices and resources. 4,000 cruises and 100+TB of data later, here are lessons R2R has learned. - Managing data via a central aggregating system where both curation and domain data expertise can be optimally leveraged promotes more complete and efficient data preservation. - Identifying key organizing elements for the data, and implementing persistent identifiers and metadata for those elements, facilitates management and usability. R2R developed authoritative DOIs and standard metadata for cruises to organize R2R data for discoverability and access, and facilitate reciprocal linking to related data in external repositories. When data submissions from diverse providers are heterogeneous, standardizing data at ingest supports data aggregation and synthesis that promote broad data re-use. - Providing tools and expertise to assist with standardization, such as recommended data structures and best practice guidance for data acquisition, reduces heterogenoeus practices over time even when compliance is voluntary. - Developing organized and persistent communication mechanisms with all main stakeholders is central to success. R2R has annual community-level meetings, as well as more frequent individual interactions, with vessel operators/technicians, the NOAA National Centers for Environmental Information staff, and oceanographic research scientists. These communications have been critical to informing high level priorities, overall approaches, and specific technical details and decisions.