ArrayExpress accepts all functional genomics data generated from microarray or next-generation sequencing (NGS) platforms. This includes single-cell sequencing experiments as well as sequencing-based spatial transcriptomics data (e.g. 10x Visium or GeoMx); see the Spatial Transcriptomics Submission Guide. Popular experiment types include transcription profiling (mRNA and miRNA), SNP genotyping, chromatin immunoprecipitation (ChIP), single-cell RNA-seq or ATAC-seq, and many others. A full list of experiment types in ArrayExpress is available.
For NGS submissions, raw data files are brokered to the European Nucleotide Archive (ENA), while ArrayExpress archives any processed data. Special rules apply to potentially identifiable human data, see below.
We currently do not accept metagenomics/metatranscriptomics data (please submit to the EBI Metagenomics service) or de novo transcriptome assembly data (the raw RNA-seq reads should be submitted to ArrayExpress, but the assembled transcriptome file should go directly to ENA).
The aim is that an ArrayExpress user should have everything they need for the data set to make sense and be reproducible without referring to an associated paper.
Microarray submissions follow the "Minimum Information About a Microarray Experiment" (MIAME) guidelines. Sequencing submissions follow "Minimum Information About a Sequencing Experiment" (MINSEQE), while MINSCE applies for single-cell experiments.
As a submitter, you may need to consult with your colleagues — for example, collaborators or the core facility personnel performing microarray hybridisation or sequencing for you — to gather all the detailed information needed for a successful submission.
Sequencing: compressed raw sequence read files (e.g. fastq.gz). See accepted sequencing raw data files.
Microarray: files obtained directly from the microarray scanner (e.g. Affymetrix CEL files, Agilent feature extraction txt files, Illumina idat files). See the full list of accepted microarray raw data formats.
Submissions without raw data files will not be accepted unless there are exceptional circumstances. Special rules apply for potentially identifiable human data — see the Identifiable human material section below.
Certain types of submissions need extra steps or workarounds. Please review the scenarios below before you begin your submission in Annotare. If none of these apply, you can go ahead and start your submission straight away, following technology-specific guidance (see the Annotare step-by-step section of our submission guide).
Data from human samples and individuals that could potentially lead to donor identification (e.g. genomic DNA sequences) can be submitted to ArrayExpress only if the data has been consented for public release. Such approvals are typically given by the relevant ethics committees, and ensuring this consent is in place is the responsibility of the submitter.
Identifiable data approved for controlled access should be submitted directly to the European Genome-phenome Archive (EGA), not ArrayExpress. It is possible to submit identifiable data (e.g. raw sequences) to the EGA while the related processed data (e.g. RPKM values) are submitted to ArrayExpress. It is the submitter’s responsibility to ensure that such a split submission complies with the relevant ethics requirements.
To submit the processed data via Annotare, create a new submission and include the sample annotation and processed data files. Make sure to add the EGA study accession and, if available, the EGA sample, experiment, and run accessions as sample attributes. We can then add a link to the corresponding EGA study page. Email us at annotare@ebi.ac.uk if you have questions about this type of submission.
Annotare currently does not handle more than 1,000 samples per submission, so a workaround is needed. Fill in as much information as possible in the Annotare form, including protocols and the experiment description. You do not need to create all 1,000+ samples — create just 4–10 representative samples with their attributes.
Upload all of your data files via the FTP server and connect the uploaded files to the first few samples you have created. This should allow you to pass validation and submit normally. Use the feedback box to let us know that you have more samples to add, and we will send you the samples and data relationship format (SDRF) spreadsheet so you can fill in the remaining sample information.
Annotare design expects each submission to contain just one experiment technology type, you will be asked to specify the technology and, if applicable, an array design, and this is then applied to the entire submission. If you have used different types of technology for the same set of samples (e.g. microarray plus sequencing, or two different array platforms, or sequencing and single-cell sequencing), please create separate submissions in Annotare. Make sure each submission has a distinct title, even if they belong to the same study, to avoid processing mistakes.
If your samples were run on more than one assay of broadly the same technology type - e.g. different assays (like RNA-seq and ATAC-seq) but both bulk high-throughput sequencing, it is possible to deposit these data together in a single submission, but please observe the following: