Important notes on file preparation:
*.txt) format and not
Excel format (*.xls, *.xlsx). If you're unfamiliar with this format, please see
OpenOffice Calc
or Microsoft Excel guide.
Processed files are generated from raw files by procedures such as background correction, normalisation, and further statistical analyses (e.g. calculating fold-changes and associated p-values). We accept either "native" processed files from microarray scanner software (e.g. ".chp" files from Affymetrix scanners, output files from GenomeStudio software for Illumina BeadChip), or two-dimensional spreadsheet files in tab-delimited text (.txt) format. For the latter, the probes/probesets/gene names are in rows, and data from one or more hybridisations are in columns. We accept processed files from the following scenarios:
In the two-dimensional table, you should have probes/genes in rows and samples/data in columns:
Reporter Identifier (for probes) or CompositeSequence Identifier
(for "composite" collation of probes, most common example being Affymetrix probe sets). If probe identifiers are
not available, try to use proper gene symbols or other identifiers (e.g. GenBank cDNA accession, UniProt protein
accession).A processed .txt file containing data from one single hybridisation should look like this:
| Reporter Identifier | sample 1 normalised intensity | sample 1 background |
| probe_name_1 | 233.5 | 69.1 |
| probe_name_2 | 129.4 | 27.6 |
And here is an example where gene names are used as row headings:
| Human HGNC gene name | sample 1 normalised intensity | sample 1 background |
| CDKN2A | 233.5 | 69.1 |
| BRCA2 | 129.4 | 27.6 |
Processed "matrices" summarising data from multiple hybridisations should look like the following. Again, as for per-hybridisation processed files, if probe identifiers are not available, try to use proper gene symbols or other identifiers (e.g. GenBank cDNA accession, UniProt protein accession).
Matrix of normalised values per sample:| Reporter Identifier | sample 1 normalised | sample 2 normalised | sample 3 normalised | sample 4 normalised |
| probe_name_1 | 26.9 | 44.3 | 62.3 | 58.5 |
| probe_name_2 | 22.9 | 43.7 | 58.2 | 67.4 |
| GenBank accession | sample 1 normalised | sample 2 normalised | sample 3 normalised | sample 4 normalised |
| BC000578 | 26.9 | 44.3 | 62.3 | 58.5 |
| M31642 | 22.9 | 43.7 | 58.2 | 67.4 |
Matrix of summarised values (one column of data maps to multiple samples):
| Reporter Identifier | drug A treated average | drug B treated average | untreated control average |
| probe_name_1 | 44.6 | 89.3 | 290.15 |
| probe_name_2 | 98.3 | 36.7 | 100.52 |
For submitters who are familiar with MAGE-TAB specification, we also accept matrix files in strict MAGE-TAB format, which allows each data point in the file (in a given row and a given column) to be mapped to a particular assay in the experiment and to a particular probe/probe set in the array design file in a human readable way and also programmatically. Check out this guide on the strict matrix format for more information.