Sequencing submissions


Types of sequencing data that can be submitted

ArrayExpress accepts submissions of functional genomics data generated using high throughput sequencing (HTS) assays like RNA-seq and ChIP-seq, mostly from non-human and human non-identifiable samples, with the following exceptions:

  1. Metagenomic/metatranscriptomic data: please submit to the EBI Metagenomics service for optimised organisation of your metadata (e.g. sample annotation).
  2. De novo assembly of transcriptome: the raw RNA-seq reads should be submitted to ArrayExpress. Once we have finished processing your raw reads, submit the assembled transcriptome file (often in fasta format) directly to the European Nucleotide Archive.

If you have potentially identifiable human data, please see below.

To submit to ArrayExpress, all you need to do is send us metadata for your experiment (e.g. experiment description, samples and their attributes, all protocols used) and the raw data files; see the sequencing file requirements. Submissions without raw data files will not be accepted unless there are exceptional circumstances.

The metadata about your experiment will be stored at ArrayExpress, and the raw data files (e.g. fastq files) are stored at the Sequence Read Archive (SRA) of the European Nucleotide Archive (ENA). ArrayExpress will transfer the raw data files to the ENA for you so you do not need to submit those files separately to the ENA. You can also send us processed data (i.e. processed from the raw reads, e.g. BAM alignment files, differential expression data, expression values linked to genome coordinates, etc). Depending on the file format, it will either be stored at ArrayExpress or the ENA. Given the split of metadata and data files between ArrayExpress and ENA, once your submission is fully processed, it is a lengthy process to modify/update it. Some changes (e.g. cancelling an ENA record which has been released to the public) will not be possible. Please take a look at our sequencing experiment update/cancellation policy before proceeding.

 


Potentially identifiable human data

Data from human samples and individuals that can potentially lead to the identification of the donors (e.g. genomic DNA sequences) can be submitted to ArrayExpress if consent for public release of the data has been given. Such approvals typically would be given by the relevant ethics committees and ensuring this is the responsibility of the submitter.

Identifiable data approved for controlled access should be submitted directly to the European Genome-Phenome Archive (EGA), not ArrayExpress. Cases are possible where identifiable data (e.g. raw sequences) are submitted to the EGA, while the related processed data (e.g. RPKM values) are submitted to ArrayExpress, but it is up to the submitter to ensure that such a submission complies with the respective ethics requirements.

To submit processed data to ArrayExpress, please begin by creating a submission in Annotare. When you are ready, email us at annotare@ebi.ac.uk (or use the "Contact Us" form) and tell us the relevant EGA study and dataset accession numbers. To circumvent the requirement for raw data in Annotare, a curator will then approve the submission manually.

In certain cases, we may be able to import non-human-identifiable metadata from EGA, ENA or BioSamples and match the metadata with the submitted processed data. For this, make sure to add all relevant EGA, ENA or BioSamples accessions to the sample annotation table.

 


Experiment metadata

Apart from the experiment description and sample annotation, sequencing experiments require further details describing the sequencing library (as they are needed for ENA submission). Please see this guide for more information about the library specifications.