Accepted raw microarray files formats < Guide < Annotare

Accepted processed microarray files formats

Important notes on file preparation:

Do not compress your microarray data files.
Make sure the file names are constructed only from alphanumerals [A-Z,a-z,0-9], underscores [_] and dots [.], with no whitespaces, brackets, other punctuations or symbols.
Any spreadsheet/matrix file should be saved in tab-delimited text (*.txt) format and not Excel format (*.xls, *.xlsx). If you're unfamiliar with this format, please see OpenOffice Calc or Microsoft Excel guide.

What are "processed" files?

Processed files are generated from raw files by procedures such as background correction, normalisation, and further statistical analyses (e.g. calculating fold-changes and associated p-values). We accept either "native" processed files from microarray scanner software (e.g. ".chp" files from Affymetrix scanners, output files from GenomeStudio software for Illumina BeadChip), or two-dimensional spreadsheet files in tab-delimited text (.txt) format. For the latter, the probes/probesets/gene names are in rows, and data from one or more hybridisations are in columns. We accept processed files from the following scenarios:

one processed file per hybridisation, i.e. you have a series of processed files;
one spreadsheet ("matrix") file containing normalised data from all hybridisations;
several spreadsheet ("matrix") files containing normalised data from different stages of data processing, e.g. one file containing normalised probe intensities and another containing fold-change data summarised at the gene level.

What should a processed text file look like?

In the two-dimensional table, you should have probes/genes in rows and samples/data in columns:

Probes/genes in rows: Where possible, as row headers, you should use official probe names/identifiers, matching those in the array design file, so one can map each row of data to the correct probe. Put the probe identifiers in the first column under a heading Reporter Identifier (for probes) or CompositeSequence Identifier (for "composite" collation of probes, most common example being Affymetrix probe sets). If probe identifiers are not available, try to use proper gene symbols or other identifiers (e.g. GenBank cDNA accession, UniProt protein accession).
Samples/Data in columns: Where possible, label each data column with the same sample names as you declare on the Annotare forms, or use terms as they appear in your manuscript. This would allow mapping of a column of data to correct sample(s).

A processed .txt file containing data from one single hybridisation should look like this:

Reporter Identifier	sample 1 normalised intensity	sample 1 background
probe_name_1	233.5	69.1
probe_name_2	129.4	27.6

And here is an example where gene names are used as row headings:

Human HGNC gene name	sample 1 normalised intensity	sample 1 background
CDKN2A	233.5	69.1
BRCA2	129.4	27.6

Processed "matrices" summarising data from multiple hybridisations should look like the following. Again, as for per-hybridisation processed files, if probe identifiers are not available, try to use proper gene symbols or other identifiers (e.g. GenBank cDNA accession, UniProt protein accession).

Matrix of normalised values per sample:

Reporter Identifier	sample 1 normalised	sample 2 normalised	sample 3 normalised	sample 4 normalised
probe_name_1	26.9	44.3	62.3	58.5
probe_name_2	22.9	43.7	58.2	67.4

GenBank accession	sample 1 normalised	sample 2 normalised	sample 3 normalised	sample 4 normalised
BC000578	26.9	44.3	62.3	58.5
M31642	22.9	43.7	58.2	67.4

Matrix of summarised values (one column of data maps to multiple samples):

Reporter Identifier	drug A treated average	drug B treated average	untreated control average
probe_name_1	44.6	89.3	290.15
probe_name_2	98.3	36.7	100.52

Processed matrix files in strict MAGE-TAB format (for advanced users)

For submitters who are familiar with MAGE-TAB specification, we also accept matrix files in strict MAGE-TAB format, which allows each data point in the file (in a given row and a given column) to be mapped to a particular assay in the experiment and to a particular probe/probe set in the array design file in a human readable way and also programmatically. Check out this guide on the strict matrix format for more information.