Menu

Plan, Package and Post Your Data!

You now have the opportunity to publish the data produced by your scientific experiments and projects in addition to publishing papers and articles. Publishing the data in this way will boost citations of your associated publications as well as attracting separate citations for the data records. In turn, you will be able to study and re-use other scientists’ data to support your work so that you can concentrate on making advances with a reduced risk of having to repeat experiments which were done by others. Analysing their data may also give you new ideas for your own research.

HYDRALAB+ has provided a simple, three step process for achieving this:

1.     Plan
Take some time to plan and define the data outputs from your experiment(s).

2.     Package
Arrange your output data into a package that will be easy for other scientists to understand.

3.     Post
Upload your data package into the Zenodo repository through the HYDRLAB+ website.


 

 

Plan

If you take some time to plan and define the data which is going to be produced by your experiments then this will help to structure the experiments and maximise the impact of your findings. In particular, consider which datasets you would like to share with others. Your experiment may produce a vast amount of data – is it necessary to share all of it?

Also, consider how the results data will be offered to future users. If the overall package size is going to be large (greater than 1GB or with many files) then break it down into sensible and well described sub-packages. Users can then select those that they are interested in without having to download the whole package.

Your data will end up posted into a repository (see below). Each data package uploaded to the repository will automatically receive a unique Digital Object Identifier (DOI) to allow it to be referenced and located in the future: 

  • You may wish to create a single package and DOI for your project, containing results from a variety of experiments;
  • You may wish to create a single package and DOI for the results from each experiment;
  • You may wish to create a single data package and DOI for each results dataset.

You will know best how to present your data to future users, but remember to make sure that all necessary data and supporting information is included in each data package. Your data package should be complete and coherent at the point it is published.

Document all of this information as part of your Data Storage Report. The HYDRALAB+ Data Storage Report template can be found here: https://zenodo.org/record/1318030.

Package

It is important to package your data so that it is easy for others to analyse and understand. Sufficient supporting information should be provided alongside the data itself, including the Data Storage Report. It is up to you how much data you include in your package and how it is arranged. Please include a ‘README.txt’ file describing what is in the package. An example of a data package which has been prepared according to these guidelines can be found here: https://zenodo.org/record/1197273.

Data Format

Select a good file format for storing your data: one which is appropriate for the data structure and size; one which is usable and sustainable. Laboratory systems often have formats defined as standard, but you may have some opportunity to decide which you use.

  • Does the data format match the natural data structure of your data (i.e. flat, hierarchical, multidimensional)?
  • Does the data format allow you to comfortably store the entire final dataset?
  • Is the data format broadly understood within your community and acceptable to funders?
  • Is the data format supported by other communities and likely to be compatible with future common operating systems and applications?
  • Is there a broad range of software that can read / write the data format?
  • Are the terms and conditions for use of the read / write software favourable? Is it free? Is it proprietary?
  • Is the conversion process from the data format to / from other formats cheap and easy?

Many experimenters output their data in simple formats such as csv or xslx. Leading, more complex formats for long timeseries data include TimeseriesML or WaterML2: Part 1 - Timeseries. If your dataset is too large to be stored in these formats then a leading format for larger, multi-dimensional array-based data is netCDF.

Supporting Information

Supporting information needs to be supplied together with the data itself. All data should be accompanied by adequate metadata so that future users can understand and apply the contents. Sometimes adequate metadata is stored within the data structure format, sometimes an additional file is required. If possible use an established metadata standard. Remember, the next person to use the dataset is likely to be you! How will you understand this data a year from now?

Dublin Core is an ISO standard (ISO 15836:2009). Another, more complex, ISO standard for describing metadata is ISO19115/19139.

Parameter Names and Units

Too many data files have column names called ‘MyCol’ or ‘Col1’. Sometimes the units are omitted even when the column name is clear. It is also important to give this information to someone who needs to understand your data.

  • Avoid meaningless field names and remember to include the units.
  • If possible, take all parameter names and units from established vocabularies. When you use a vocabulary to describe parameters, include a reference to its on-line record.
  • If you use your own field names then make sure they are defined somewhere nearby.

Leading vocabularies include SeaDataNet, CF Standard Names, CSDMS Standard Names and ITTC Symbols and Terminology List.

Permissions

Make sure that you remember to include information giving future users permission to use the data they have obtained. If possible, include an open license, with as few restrictions as possible, to allow others to use your data. Sometimes an embargo period is applied, to give the originators time to publish articles based on the data.

  • A suitable list of licenses is given here: http://opendefinition.org/licenses/. The default license for HYDRALAB+ is the Creative Commons Attribution 4.0 International (CC BY 4.0) license (https://creativecommons.org/licenses/by/4.0/legalcode).
  • If an embargo period is required, for HYDRALAB+ experiment and research activity publication, you should select an embargo period of not more than two years.
  • The embargo period should be included in the data management plan.

 

Post

Zenodo logoThose performing experiments as part of the HYDRALAB+ project are asked to store the dataset package from their experiment(s) in the Zenodo repository. This process will automatically give the dataset a DOI to allow it to be uniquely referenced. It can be done through the HYDRALAB+ website.

There are two ways to do this, depending on whether or not you are posting data associated with a Transnational Access Project. The Transnational Access interface simply contains a few more fields directly associated with those projects, otherwise it is the same.

1. If you are posting data associated with a Transnational Access Project:

  • Use the Transnational Access form provided on the HYDRALAB+ website (Add Dataset). Click on the ‘Add Dataset’ option under ‘TAKING PART’, ‘Transnational Access Projects’.
  • The form itself contains some additional metadata required by the Zenodo repository.
  • How to instructions.
  • Video instructions.

2. If you are posting any other HYDRALAB+ data including JRA project data or project deliverables:

  • Go to http://hydralab.eu/. Log in and go to the 'Participant Area' menu and select 'DOI Datasets'. This will take you to http://hydralab.eu/participant-area/zenodo-doi/
  • Click on 'Add New Dataset' and fill in the form, also including the datasets you are uploading. Click on 'Save'.
  • A draft version of your data package will then appear in the 'DATASET LIST' shown. You can keep editing it until you are happy with the record.
  • When you are ready to publish it in Zenodo and you do not want to make any more changes, hit Publish in the 'Actions' column of the DATASET LIST table. Once you publish your data package you will not be able to change it, although you will be able to upload more recent versions.