Karen Stocks

and 11 more

The Rolling Deck to Repository (R2R; www.rvdata.us) program is entering its second decade of managing underway data from US-operated academic research vessels to ensure preservation of, and access to, these national oceanographic research assets. Reflecting on the move from decentralized data submission by chief scientists to an operational centralized facility has brought insights that may inform other communities with distributed networks of data acquisition providers with diverse practices and resources. 4,000 cruises and 100+TB of data later, here are lessons R2R has learned. - Managing data via a central aggregating system where both curation and domain data expertise can be optimally leveraged promotes more complete and efficient data preservation. - Identifying key organizing elements for the data, and implementing persistent identifiers and metadata for those elements, facilitates management and usability. R2R developed authoritative DOIs and standard metadata for cruises to organize R2R data for discoverability and access, and facilitate reciprocal linking to related data in external repositories. When data submissions from diverse providers are heterogeneous, standardizing data at ingest supports data aggregation and synthesis that promote broad data re-use. - Providing tools and expertise to assist with standardization, such as recommended data structures and best practice guidance for data acquisition, reduces heterogenoeus practices over time even when compliance is voluntary. - Developing organized and persistent communication mechanisms with all main stakeholders is central to success. R2R has annual community-level meetings, as well as more frequent individual interactions, with vessel operators/technicians, the NOAA National Centers for Environmental Information staff, and oceanographic research scientists. These communications have been critical to informing high level priorities, overall approaches, and specific technical details and decisions.

Vicki Ferrini

and 6 more

The Global Multi-Resolution Topography (GMRT) Synthesis is an elevation model that includes curated deep-water multibeam bathymetry data at ~100 m resolution covering more than 9% of the ocean. GMRT is built with a scalable tiled raster architecture that efficiently stores and presents high-resolution elevation data nested within low resolution data. A set of tools are available for users to access the compilation through simple user interfaces (e.g. GMRT MapTool) and web services, while also providing full attribution and access to source swath files. The availability of raw/unprocessed multibeam sonar data in the National Centers for Environmental Information (NCEI) archive has increased dramatically over the last decade, but transforming these data into high-quality integrated products suitable for use by scientists and the public alike requires significant effort. The GMRT Team has built workflows and tools for data preparation and review that are optimized for cleaning and integrating sparse globally distributed multibeam data, enabling the addition of ~60-80 research cruises per year. Once raw swath data files are cleaned and corrected, they are gridded/tiled with the GMRT Tiling tools so they can be reviewed and quality controlled in the context of other data in the GMRT Synthesis. Working with processed swath files generated by the community, we have observed that this process frequently reveals issues that are overlooked during data processing. In order to accelerate the rate of data integration and leverage the data processing efforts of the community, GMRT Tiling tools are being adapted for distributed use. Ocean Exploration Trust is an initial partner in this effort, and all processed swath files from the 2017-2019 Nautilus field seasons were prepared with GMRT Tiling tools and reviewed by the OET team. This revealed problems in processed swath data files from several cruises that were addressed prior to submission to NCEI, thereby improving the quality of data in the archive. We are now working to include the GMRT Tiling tools into at-sea standard operating procedures of the Nautilus as a testbed for broader community distribution, to ensure consistent quality of processed multibeam data, and to accelerate the production of high-quality integrated data products including GMRT and Seabed 2030.