Please provide individual unprocessed raw data files for each sample, in FASTQ or BAM format, and prepare your files according to ENA specifications. This is a quickly developing field so please do check the specifications every time you submit a new experiment. Data files which do not satisfy ENA's requirements will not be accepted.
To ensure your BAM files contain unaligned reads, you can run the following commands:
samtools view -c -F 4 bam_file
(counts how many reads are aligned and should return 0)
samtools view -c -f 4 bam_file
(counts how many reads are unaligned and should return
at least 1)
If your BAM files contain mapped reads, then please either create unmapped BAM files, or send us the original read files (e.g. fastq.gz files) as raw data files. BAM files containing mapped reads can be included in your submission as processed files, as long as they satisfy ENA's specification and that the reference genome used for alignment has been accessioned in the International Nucleotide Sequence Database Collaboration (INSDC, involving DDBJ, ENA, and GenBank).
If you are submitting single cell sequencing data, please check the single-cell raw data file requirements, as a few special rules apply for certain types of single cell experiments.
We accept all commonly used processed sequencing data or analysis files. There is no need to compress or zip up these files one by one or as a bundle. Upload them in Annotare and assign to your samples in the same way as you would for raw files, choosing "Processed" or "Processed Matrix" as file type.
Data analysis commonly produces data matrices, e.g. a table with FPKM values, raw count values or output from differential expression analysis, with genes in rows and samples in columns.
Please save any matrix files as tab-delimited text format (not Excel) and use the file extension .txt.
Also make sure that the sample names in your matrix file match with the sample names used in Annotare.
We also accept BAM alignment files, bed/bigwig files, and any other commonly used alignment data format.
As this is a rapidly evolving field we also welcome other types of processed sequencing data. Ideally the file formats are a standard in the field and non-proprietary.