This guide explains how to upload data files for your ArrayExpress submission through Annotare. The upload method you choose depends on your file sizes and the number of files in your submission.
Direct upload through the Annotare web interface is the simplest method and is suitable for individual files smaller than 1 GB. For larger files or submissions with many files, use one of the alternative upload methods described in Section 2.
Tip: Use time-saving features such as uploading multiple files in one go and bulk-assigning them to samples using the Paste Into Column button. (Watch a video tutorial here)
For files 1 GB or larger, or for submissions with many files (even if individual files are under 1 GB), use one of the following upload methods: FTP, Aspera, or Globus.
Globus is an efficient solution for large file transfers, offering better reliability for uploading large data files. It is the recommended method for very large datasets.
When accessing Globus for the first time, you have three authentication options:

Once you have a Globus account, you can transfer files using one of the following methods:
This is the recommended option for transferring multiple large files.
Download and install the Globus Personal Connect client for your operating system:
Follow the installation instructions to set up your local endpoint/collection.
Important: Please select only files, not directories.

For more information about sharing data via Globus: https://www.globus.org/globus-connect-personal
FTP (File Transfer Protocol) is a standard method for transferring files. Ensure you are connected to the internet via a physical (wired) connection rather than wireless for best results.
Server: ftp-private.ebi.ac.uk
Username: annotare
Password: annotare1
We recommend using FileZilla as an FTP client for Windows users. Windows File Explorer is not recommended as it may not support all FTP features and can be less secure.
ftp-private.ebi.ac.ukannotareannotare121/prod/ibtd1rmo-20r7k3g747sup).ftp -p ftp-private.ebi.ac.ukannotare and password annotare1/prod/ibtd1rmo-20r7k3g747sup). Type: cd /prod/your-directory-nameput filename (single file) or mput * (multiple files).quit to exit. You will see a message confirming whether the transfer was successful.
Note for Mac users: The standard FTP client on Mac (embedded in Finder) may fail due to incompatible settings. Consider using an alternative FTP client such as CyberDuck.
Aspera is a commercial file transfer protocol that provides significantly better transfer speeds than FTP, especially over long distances. It is recommended for very large files or when FTP performance is inadequate.
C:\aspera on Windows or /opt/aspera on Mac/Linux)../aspera/bin directory where the ascp command-line tool is located.prod/abcd1efg-hijk2lmno3pqr).ascp -P 33001 -c aes128gcm <yourfile.txt> annotare@fasp.ebi.ac.uk:<your upload directory>/
annotare1For more information on using Aspera, see: How to transfer files to and from EMBL-EBI using IBM Aspera (EMBL Service Now KB0011565).
Before registering files uploaded via FTP, Aspera, or Globus, you must calculate the MD5 checksum for each file. The checksum is a hexadecimal "fingerprint" for a file (e.g., eef75461035fb66d9173799d4e26ea97). Annotare uses these checksums to verify file integrity after transfer.
You can watch a short video tutorial demonstrating how to calculate MD5 checksums for your files and use them to register your uploaded files in Annotare here.
Open Command Prompt or PowerShell and run:
certutil -hashfile <filename> MD5
Alternatively, use a tool such as WinMD5.
Open Terminal and run:
md5 <filename>
For more details: Mac MD5 guide.
Open a terminal and run:
md5sum <filename>
For more details: Linux MD5 guide.
This step is required for all files uploaded via FTP, Aspera, or Globus. After transferring your files, you must register them in Annotare with their MD5 checksums so that Annotare can verify the files are present and intact.
.fastq.gz file) that was transferred, not the uncompressed file.
A short video tutorial demonstrating how to calculate MD5 checksums for your files and use them to register your uploaded files in Annotare is available here
Once the file registration is complete, your files will appear in the upload pane in Annotare with the status 'uploaded', which means they are ready to be assigned to samples - see Section 3.
A submission is not complete without specifying which data file belongs to which sample. After uploading (and registering, if applicable), you must assign files to samples.
Note: Files that are not assigned to any samples are not included in the submission and carried forward to processing.
Four types of data files are distinguished:
Note: Data file types available for file assignment differ for different technology templates in Annotare.
Sequencing experiments should contain individual sequencing raw read files for each sample. To assign raw sequencing files, select the file type "Raw". Then fill the columns with the names of the corresponding file for each sample.
For experiments using paired-end sequencing, two individual raw read files should be provided. To assign both files to one sample, create two "Raw" data file columns by clicking on Assign Files... twice. Then link the two corresponding sequencing files to each sample.

For one-colour microarray experiments the raw output files can be either individual files per sample or a matrix file (e.g. Illumina arrays).
If you have individual files per sample, create a "Raw Data File" column and assign each sample to the respective file. For raw data matrices, all samples need to be linked to the same data file. To do so, create a "Raw Matrix Data File" column and select the name of the matrix file. Use Fill Down Value to propagate the file name to all samples.

Two-colour microarray experiments need extra attention when assigning data files to samples. The basic assumption is that the data for both channels are stored in the same file (2 samples → 1 hybridisation → 1 file). See the two-colour microarrays help page for more details.
The basic steps are:

For two-colour microarray experiments that use a common reference sample, the reference sample needs to be duplicated for as many times as it was used on an array. Then assign each test sample and one of the reference samples per file. If the numbering of the reference sample should be removed, leave a comment about this in the feedback dialogue after submission.

For a dye-swap design, leave both labels selected for each sample in "Create labeled extracts and assign labels". The list of labelled extracts should show each sample twice, each time with the respective label. Follow the same instructions as above to link the data files.

"Processed" data refers to all data files and formats that are derived from the raw files and have been manipulated in any way (e.g. background correction, log2 transformation, normalisation, read trimming).
To add processed data files, create another column by pressing Assign Files.... Then select "Processed" if you have a separate file for each sample, or "Processed Matrix" if you have a file which contains processed data for more than one sample. Proceed to select the filenames of your uploaded processed data files for the relevant sample(s).
If you would like to include any other additional data types, please upload the files and leave a note in the feedback dialogue after submission, and a curator will add the files as supplement to the submission.


There are two different types of technical replicates:
If the same biological sample was tested on two different arrays, please create separate sample rows and assign one file to each of these technical replicates. Name your replicate samples with the same prefix and then add e.g. "_techrep1"/"_techrep2".