See Requirements for the Central Service. Please address your strategy to implementing each requirement referring to the numbering described in that section. If a consortium indicate which organizations would be responsible for which components. Requirements are itemized by “Done” for those in
production or “ToDo” for those to be built and naming
“COS” or “partners” leading execution.
Preprints Commons Database
Act as a repository for life sciences preprints, which
includes:
Author’s original manuscript (.doc, .tex etc.) and the
converted manuscript (XML);
[Done: COS] OSF preprints become part of the OSF
repository along with any associated data, materials, protocols, and
connections with other repositories. Metadata is aggregated by SHARE (11
preprint sources, >2 million preprints).
[ToDo: COS] COS will ingest preprint full-text when
allowed by license, indexed for discovery, and archived for preservation
and to facilitate programmatic access for data-mining research
applications.
All files associated with that manuscript, such as figures and
any supplementary data; including video and datasets, or links to data
stored in appropriate external repositories;
[Done: COS] As an interface to the OSF The Commons can make use of all its repository capabilities. Users can
add supplementary data materials links or other files. When made publicly accessible files are displayed with the preprint and available via API. OSF can surface links to external repositories or integrate the external repositories to appear native to the OSF via add-ons. Eight storage add-ons are available now--figshare, dataverse, box.com, dropbox, Google Drive, GitHub, Owncloud, and Amazon S3--and five more in Q2 2017--Dryad Bitbucket Gitlab OneDrive and Fedora.
On registration (i.e., timestamped, project snapshotting) in the OSF,
data is copied from all connected services into a preservation
environment. In the future, that environment can be an integrated,
external repository. For example, on sharing a preprint, (1) authors may
connect private storage to facilitate later sharing, or (2) authors may
transfer data from active storage to preservation storage.
[ToDo: COS] We will add domain-specific repositories. Letters of support from 15 are attached:
Dryad,
Protein Data Bank,
TalkBank,
NIAGADS,
NeuroMorpho,
NAHDAP,
ICPSR,
Figshare,
flybase,
Mouse Phenome Database,
Dataverse,
Protocols,
Sage Bionetworks (Synapse),
DIP and
Vector Base. An integrated seamless mechanism for deposit of data materials and protocols to domain-relevant repositories will increase discovery and deposit by authors and discovery by readers.
Appropriate metadata
[Done: COS] The submission process collects metadata
including title, authors, abstract, discipline, keywords, peer-reviewed
DOI, license. Metadata is available via API and HTML meta tags for
discoverability (e.g., Google Scholar, FAIR).
SHARE harvests data across preprint services regardless of metadata
schema. The data is queryable and normalized to a schema developed for
working with diverse sources (draws heavily from schema.org and
DataCite). SHARE is relatively schema-agnostic–flexible to additions
across sources.
[ToDo: COS] The GB will define metadata standards
for The Commons. Fields will be added to relevant workflows (e.g.,
submission, moderation).
Includes a stable, long-term preservation strategy.
[ToDo: COS] We will add domain-specific repositories. Letters of support from 15 are attached:
Dryad,
Protein Data Bank,
TalkBank,
NIAGADS,
NeuroMorpho,
NAHDAP,
ICPSR,
Figshare,
flybase,
Mouse Phenome Database,
Dataverse,
Protocols,
Sage Bionetworks (Synapse),
DIP, and
Vector Base. An integrated seamless mechanism for deposit of data materials and protocols to domain-relevant repositories will increase discovery and deposit by authors and discovery by readers.
[ToDo: COS] COS is exploring technologies and
partnerships to add preservation features including the use of torrent
protocols for distributing publicly stored data, the blockchain to
create a guaranteed-immutable provenance record, mirroring of public
content that individuals could backup or host on cheap, commodity
hardware, and partnerships like Internet Archive. The latter includes
connecting institutional systems for automatic preservation (e.g.,
CurateND, Notre Dame).
At inception of the preprint service, the database should be
populated with legacy preprints and associated metadata from existing
approved servers. Conversion of these legacy files to XML could be
considered but is not a specific requirement for this application.
[Done: COS] The GB will decide eligibility for
aggregated search. Metadata is already harvested by SHARE; adding
preprint services is trivial.
[ToDo: COS] Before launch, COS will harvest
full-text preprints from eligible services to accompany metadata
harvested by SHARE. Documents will run through a text-extraction
pipeline and be indexed for discovery purposes. Word and LaTeX should be
convertible to XML; PDF will be difficult (see section 4).
The provider(s) of these services are also required to remove or
flag any manuscripts that violate the standards set forth by the
Governance Body.
[Done: COS] Moderation is manual; an administrator
moderation dashboard is prototyped and scheduled for July 2017 release.
[ToDo: COS] The GB will define moderation standards
for The Commons. The dashboard will enable administrators to apply those
standards to new submissions. The GB will decide whether to include
preprints from services that do not provide the necessary metadata
(e.g., license information). Ideally, The Commons will facilitate
standards for metadata across external services.
Preprints Commons Human Interface
Provide a web interface for browsing and searching.
Display abstracts and links to source
Potentially provide download functionality for content held
in the CS in a variety of formats: PDF, XML, HTML, etc
Display snippets (like Google Books) to place full-text
search results in context
If the source is not available elsewhere, or with consent of
the ingestion source, display the full manuscript and
figures/supplementary files.
Display a clear statement indicating that the material is a
preprint
Link to other versions of the manuscript elsewhere,
especially journal versions, using Crossref metadata or information from other sources
Schema.org tags
Make available metrics and anonymized usage data to humans.
Be developed in line with good user-experience web
principles, be fully responsive
Support login functionality - via ORCID
[Done: COS] The preprints search interface offers
aggregated search, faceting by preprint service and discipline and sort
by search relevance or upload date
(Figure \ref{493406}).