Data access policy


General information

Experiments (sets of assays) and array designs in ArrayExpress are either 'public' or 'private' (often pre-publication and under peer review). Data will be kept private until a specified release date, or until an associated paper containing the experiment or array design accession number is published. Access to private data is possible via the BioStudies login created upon successful loading of an experiment or array design into the database.

In private status, a submitter or reviewer can view the full ArrayExpress record and download all associated files. There are two exceptions:

  1. For sequencing experiments there is no access to fastq raw data files until the experiment is made public. The reason is that raw data files are brokered to the European Nucleotide Archive (ENA), which is part of the Sequence Read Archive (SRA), a collaboration between ENA (EBI), Genbank (NCBI), and DDBJ (Japan). There is currently no infrastructure to access privately held data files in SRA.
  2. Experiments tagged for double-blind peer review will hide submitter-related metadata and block view of the full samples table, while data download remains unaffected. See submitter anonymity for details.

For sensitive data from human samples and individuals that can potentially lead to the identification of the donors (e.g. genomic DNA sequences), ArrayExpress 'private' data sets and submitter/reviewer login accounts cannot provide the high level of data encryption and access control required. In such cases, there should be a data access committee overlooking data privacy issues, and the data should be submitted to the European Genome-phenome Archive (EGA).

Human-identifiable data can still be submitted to ArrayExpress if the data has been consented for public release. Such approvals typically would be given by the relevant ethics committees and ensuring this is the responsibility of the submitters.

 


Release policy

Experiments are made public when

  • the specified release date is reached
  • the submitter emails to tell us that the data can now be released (usually this is when an associated publication is accepted or published)
  • we identify a publication in which the ArrayExpress experiment accession number is cited

Array designs are made public when

  • the specified release date is reached
  • the submitter emails to tell us that the data can now be released
  • when an experiment the array design is linked to is made public
  • we identify a publication in which the ArrayExpress array design accession number is cited

 


Annotare Data Retention Policy

  • Annotare data retention policy describes how long Annotare stores the data files for submissions. This policy will be applicable to all submissions created in Annotare and all data files uploaded via Annotare or uploaded via FTP/Aspera.
  • We distinguish between two data types, archived data (that has been submitted and is successfully achieved in BioStudies) and non-archived data (which is not successfully achieved yet in BioStudies).

Type

Status

Description

Retention time

Action

Non - Archived Data

In Progress

Any submission started on Annotare but not submitted for curation or any submission reopened and awaiting user’s action.

1 year

The data files stored in the Annotare datastore will be removed after one year from initial upload.

Submitted /     In Curation

Any submission submitted on Annotare and waiting for release to BioStudies

NA

The data files stored in the Annotare datastore will not be deleted.

Archived Data

Public

Any Submission released to BioStudies and made Public

2 months

The data files stored in the Annotare datastore will be removed after two months from submission made public in BioStudies.

Private

Any Submission released to BioStudies but not yet made Public

1 year

The data files stored in the Annotare datastore will be removed after one year from submission released to BioStudies