Recommended Set

20. Data access

A data sharing statement describes how others can access the data on which the paper is based. Sharing adequately annotated data allows others to replicate data analyses, so that results can be independently tested and verified. Data sharing allows the data to be repurposed and new datasets to be created by combining data from multiple studies (e.g. to be used in secondary analyses). This allows others to explore new topics and increases the impact of the study, potentially preventing unnecessary use of animals and providing more value for money. Access to raw data also facilitates text and automated data mining [1].

An increasing number of publishers and funding bodies require authors or grant holders to make their data publicly available [2]. Journal articles with accompanying data may be cited more frequently [3,4]. Datasets can also be independently cited in their own right, which provides additional credit for authors. This practice is gaining increasing recognition and acceptance [5]. 

Where possible, make available all data that contribute to summary estimates or claims presented in the paper. Data should follow the FAIR guiding principles [6], that is data are findable, accessible (i.e. do not use outdated file types), interoperable (can be used on multiple platforms and with multiple software packages) and re-usable (i.e. have adequate data descriptors).

Data can be made publicly available via a structured, specialised (domain-specific), open access repository such as those maintained by NCBI (National Center for Biotechnology Information, https://www.ncbi.nlm.nih.gov/) or EBI (European Bioinformatics Institute, https://www.ebi.ac.uk/). If such a repository is not available, data can be deposited in unstructured but publicly available repositories (e.g. Figshare (https://figshare.com/), Dryad (https://datadryad.org/), Zenodo (https://zenodo.org/) or Open Science Framework (https://osf.io/)). There are also search platforms to identify relevant repositories with rigorous standards, e.g. FairSharing (https://fairsharing.org/) and re3data (https://www.re3data.org/).

 

References

  1. Kafkafi N, Mayo CL and Elmer GI (2014). Mining mouse behavior for patterns predicting psychiatric drug classification. Psychopharmacology (Berl). http://dx.doi.org/10.1007/s00213-013-3230-6
  2. Stodden V, Guo P and Ma Z (2013). Toward reproducible computational research: an empirical analysis of data and code policy adoption by journals. PLOS ONE. http://dx.doi.org/10.1371/journal.pone.0067111
  3. Piwowar HA, Day RS and Fridsma DB (2007). Sharing detailed research data is associated with increased citation rate. PLoS ONE. http://dx.doi.org/10.1371/journal.pone.0000308
  4. McKiernan EC, Bourne PE, Brown CT, Buck S, Kenall A, Lin J, et al. How open science helps researchers succeed. Elife. 2016;5. Epub 2016/07/09. pmid:27387362 https://doi.org/10.7554/eLife.16800
  5. DataCitationSynthesisGroup Joint declaration of data citation principles. (Access Date: 22 May). Available at: https://doi.org/10.25490/a97f-egyk
  6. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJG, Groth P, Goble C, Grethe JS, Heringa J, ’t Hoen PAC, Hooft R, Kuhn T, Kok R, Kok J, et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data. http://dx.doi.org/10.1038/sdata.2016.18

 

Example 1

“Data Availability Statement: All data are available from Figshare at doi: 10.6084/m9.figshare.1288935.” [1]

Example 2

“A fundamental goal in generating this dataset is to facilitate access to spiny mouse transcript sequence information for external collaborators and researchers. The sequence reads and metadata are available from the NCBI (PRJNA342864) and assembled transcriptomes (Trinity_v2.3.2 and tr2aacds_v2) are available from the Zenodo repository (https://doi.org/10.5281/zenodo.808870), however accessing and utilizing this data can be challenging for researchers lacking bioinformatics expertise. To address this problem we are hosting a SequenceServer32 BLAST-search website (http://spinymouse.erc.monash.edu/sequenceserver/http://spinymouse.erc.monash.edu/sequenceserver/). This resource provides a user-friendly interface to access sequence information from the tr2aacds_v2 assembly (to explore annotated protein-coding transcripts) and/or the Trinity_v2.3.2 assembly (to explore non-coding transcripts).” [2]

 

References

  1. Federer LM, Lu Y-L, Joubert DJ, Welsh J and Brandys B (2015). Biomedical Data Sharing and Reuse: Attitudes and Practices of Clinical and Scientific Research Staff. PLoS ONE. doi: 10.1371/journal.pone.0129506
  2. Mamrot J, Legaie R, Ellery SJ, Wilson T, Seemann T, Powell DR, Gardner DK, Walker DW, Temple-Smith P, Papenfuss AT and Dickinson H (2017). De novo transcriptome assembly for the spiny mouse (Acomys cahirinus). Scientific reports. doi: 10.1038/s41598-017-09334-7