Understanding Your File Validation Error Report


1. What is the validation error email?

When you submit sequencing data files, they are automatically checked before being forwarded to the European Nucleotide Archive (ENA). If any problems are detected, you will receive an email listing each error so that you can correct them and re-submit.

This guide explains how to read that email, what each error code means, and the steps you should take to fix the problem.


2. How to read the error report email

Each error is reported on its own line and follows this pattern:

[ERROR CODE]: filename( checksum ) - line N: error detail
Part What it tells you
[ERROR CODE] The type of problem detected (see Section 3)
filename( checksum ) The name of your file, plus a unique fingerprint (MD5 checksum) to identify it
line N The line number inside the file where the problem was found
error detail A short description of what went wrong

Example line from an error report:

[INVALID_FASTQ]: file.fastq.gz ( 3a81d5cb9606b6feceb9e5e8315ce214 ) - line 4: wrong header >novel.1

This tells you: the file file.fastq.gz has an invalid FASTQ format — specifically, line 4 contains a header that is not recognised.


3. Error code reference

[INVALID_FASTQ] — "wrong header"

What it means: A sequence record has a header line that does not match the expected FASTQ format.

Likely cause: The file is not in a valid FASTQ format/unaligned bam file format, or was given the wrong file extension.

What to do: Check that your data file assignments — only files in FASTQ or unaligned bam file formats can be assigned as raw data files. Upload raw data files in a genuine FASTQ format. Each record in a FASTQ file must have exactly four lines: a header line starting with @, the sequence, a + line, and the quality scores.


[INVALID_FASTQ] — "sequence and quality don't have the same length"

What it means: The number of bases in the sequence line does not match the number of quality score characters.

Likely cause: The file was truncated or corrupted during upload.

What to do: Re-upload the original file. If the error persists, regenerate the file from your source data.


[INVALID_FASTQ] — "unpaired read"

What it means: The FASTQ file validation process cannot identify matching reads across the two (or more) files expected for PAIRED library layout.

Likely cause: Misassignment of files to samples e.g. for pair-end libraries there was only one file found.

What to do: Check your file-to-sample assignment and library strategy in Annotare. For paired-end libraries, both files must be linked to the same sample (see Assign Files to Samples). If both the file assignment and library layout are correct, check the file names comply with the file naming requirements, see Accepted Sequencing Raw Data.


[INVALID_FASTQ] — "duplicated sequence"

What it means: The same read appears more than once in the same file.

Likely cause: FASTQ files are expected to contain unique read identifiers, the validation process fails when duplicated identifiers are found within the same file.

What to do: Check and correct the affected FASTQ files and then resubmit them to Annotare.


[UNSUPPORTED_FILE_TYPE]

What it means: The file format is not one of the accepted types.

Likely cause: Wrong file format or an unsupported compression method.

What to do: Provide files in an accepted format (FASTQ or BAM) and with correct compression. See Accepted Sequencing Raw Data.


[PARAMETER_FAILURE] — "invalid compressed data" / "file truncated" / "unexpected end of file"

What it means: The compressed file cannot be read fully — it appears incomplete or damaged.

Likely cause: The file upload was interrupted before it completed.

What to do: Re-upload the same file. Make sure your internet connection is stable during the upload.


[PARAMETER_FAILURE] — "Aligned reads found"

What it means: A BAM file contains reads that have already been mapped to a reference genome. ENA only accepts unaligned (raw) reads in BAM format.

Likely cause: A pre-aligned BAM file was submitted by mistake.

What to do: Either generate a new BAM file containing only unaligned reads, or upload the original unprocessed FASTQ.gz files instead. See BAM specification.


4. Step-by-step: fixing and re-submitting

Step 1 — Read the email carefully
Note every error code and the filename(s) affected. If the same MD5 checksum appears on more than one line, those files are identical and multiple errors have been triggered for the same file.

Step 2 — Identify the problem
Match the error code in the email to the descriptions in Section 3 to understand what is wrong and what you need to do.

Step 3 — Fix or replace the file(s)
Make the necessary corrections on your computer before re-uploading. Do not simply rename the file — the underlying data must be corrected.

Step 4 — Wait for Annotare to re-open
You will receive a separate email from Annotare confirming that your submission form has been re-opened for editing.

Step 5 — Upload corrected files and re-assign them
Log in to Annotare, upload your corrected files, and assign them to the correct sample(s). For paired-end libraries, both files must be linked to the same sample.

Step 6 — Re-submit
Once all files are correctly assigned, submit again. The validation checks will re-run automatically.



6. Still need help?

If you have worked through the steps above and are still unsure how to resolve your error, reply directly to the validation error email and the Annotare/ArrayExpress team will be happy to assist you.