1. What is an array design format (ADF) file?
2. ADF metadata header
3. ADF table
3.1. Features, reporters and composite elements
3.2. Annotation of reporters
3.3. Reporter role - experimental or control?
3.4. Reporter grouping by species (multi-species ADFs only)
3.5. Annotation of composite elements
The ADF (Array Design Format) file captures information about a microarray chip.
The file contains two section, separated by a [main]
tag:
The ADF file is always in tab-delimited text (*.txt) format, and will open in any spreadsheet program for viewing or editing, e.g. OpenOffice Calc or Microsoft Excel.
Here is a snippet of an ADF document:
Each row (field) of meta data starts with a heading (bold in the screenshot below), each appearing only once and coming from a controlled vocabulary. The headings would be pre-filled for you when you download one of our ADF templates so you don't need to insert them from scratch. Please do not edit the headings, but fill in each field as much as possible, like the blue text below:
ADF header field | What is it? | Allowed values | Example |
---|---|---|---|
* Array Design Name | An informative title of the array design. Should include the name of the manufacturer (e.g. Agilent) or the lab which designed it, the species, version number, purpose of the array (e.g. genotyping), the number of probes/features (e.g 450k) | Agilent human micoRNA microarray miRBase Release 14.0, 8x15k (GridName 029297, 82 cols x 192 rows) | |
Version | Version (or revision) number of the array design. | version 3.0 | |
* Provider | Name and contact email address of the array design's submitter. | Joe Bloggs (joebloggs@someemail.com) | |
* Comment[Organism] | The species from which the probe sequences were derived from. | Must use Latin binomial nomenclature. Separate names by semicolon for multi-species arrays. | Homo sapiens |
* Comment[Description] | A concise description of how the microarray is designed and what it is for. Include URLs to stable online resources if possible. | This custom commercial microarray consists of 84881 (86205 total including 1324 control probes) 60-mer oligonucleotide probes derived from the genomes and EST database of Emiliania huxleyi strain (... etc...) | |
* Comment[ArrayExpressReleaseDate] | Keep pre-published array designs private by inserting a future date (can be amended later). Otherwise, insert the date of submission. | Must use YYYY-MM-DD format. | 2014-04-25 |
Printing Protocol | How the probes were printed on the array. Include URLs to stable online resources if possible. Especially important if you are submitting a custom array printed with non-propreitary methods. | The 60-mer oligonucleotides were synthessized in situ using Agilent inkjet SurePrint technology. Four arrays were printed on each 1 x 3-inch glass slide (...etc...) | |
Technology Type | Printing/synthesis technology used when creating the microarray. | "in situ oligo features", "spotted antibody features", "spotted colony features", "spotted ds DNA features", "spotted protein features", "spotted_ss_PCR_amplicon_features", "spotted_ss_oligo_features" | in situ oligo features |
Surface Type | The chemical coating on the surface of the substrate (see below). Probes are immobilised by the coating. | polylysine, aminosilane, "other surface type" | polylysine |
Substrate Type | The material/substrate of the microarray. | glass, nitrocellulose, nylon, silicon, "other substrate type" | glass |
Sequence Polymer Type | The polymer that makes up the probes. | DNA, RNA, protein | DNA |
Term Source Name | In the ADF table, for each cross-reference provided, e.g.
Reporter Database Entry [uniprot] , the cross-referenced
resource's name needs to be entered here, so the resource can be looked
up via the URL in Term Source File (see header number 13 below). |
External database names from this allowed list | uniprot |
Term Source File | The URL where an external database resource mentioned in Term Source Name (see header number 12 above) is hosted. |
A valid URL. | http://www.uniprot.org |
* denotes mandatory fields.
Just like the metadata header fields, column headings in the ADF table come from a controlled vocabulary. Please use one of the ADF templates, where correct headings have been prefilled for you, saving you inserting them from scratch. Please don't change the column headings in the template..
Block Column
, Block Row
, Column
and Row
. Each set of 4 coordinates represents a unique position on the microarray, so features cannot be duplicated on an array. Include all features in your ADF file, even if there is nothing spotted there (e.g. control spots). Here is a schematic diagram showing the 4 coordinates:Reporter Name
column, i.e. each name in the column should only appear once, except for spotted arrays (arrays with "features", see above), where the same probe can be printed at multiple locations on the same array, so the same reporter name can be repeated at different feature locations.Composite Element Name
column. Affymetrix array designs will always have composite elements because they are based on "probe sets" and not simply "probes".
Here is a snippet from a hypothetical example of an array design, showing seven features, two reporters and one composite element:
Annotate each reporter by providing its sequence, cross-referenced accessions* in external databases (e.g. RefSeq accession numbers of the cDNA sequences from which the reporters were designed), and/or its genomic mapping location. Use the Reporter Sequence
column for the sequence, and one or more Reporter Database Entry [xxx]
columns for the cross-references and mapping location.
If a reporter appears multiple times in an ADF table, e.g. for spotted arrays where the same reporter is printed at different locations on the microarray, annotation for the same reporter must be consistent (i.e. identical) across the entire ADF table.
For reporters with no cross-references, e.g. a blank control spot, leave annotation blank. Do not put down custom values such as "NA"
, "empty"
or "unmapped"
, as curators will validate the annotation values against expected formats and custom values will not be accepted.
Reporter Database Entry [xxx]
, e.g. Reporter Database Entry [genbank]
. Then, fill the column with the appropriate accession*. You can enter multiple accessions from the same database by separating them with semicolons, e.g. AJ12345;BX45678
. For annotations from different database sources, they must be provided in separate Reporter Database Entry [xxx]
columns. Don't forget to add the database name and URL to the Term Source Name
and Term Source File
fields of the ADF header.NM_
followed by a few digits.chromosome_coordinate:<genome_assembly_name>
inside the square brackets of Reporter Database Entry [xxx]
, e.g. Reporter Database Entry [chromosome_coordinate:GRCh38]
. Try to use official genome assembly names from public databases (ENA, GenBank or DDBJ), e.g GRCh38 for the human genome assembly released on 21 December 2013. Genomic coordinates in the column should follow this format: chrName:start position-end position
, e.g. chr1:1234-5678
.
Following on the example in section 3.1, this is how the hypothetical reporter annotations would be like. Notice how you can enter multiple accessions from the same database by separating them with semicolons:
For each reporter, enter its role: experimental
or control
in the Reporter Group [role]
column.
For each control reporter, describe what type of control it is in the Control Type
column. (Do not fill in this column for any experimental reporters.) The allowed control type values taken from Experimental Factor Ontology (EFO). Here is a glossary:
Allowed terms | Meaning |
---|---|
array control biosequence | E.g. a spiked sequence from E. coli for a human microarray. Reporter Sequence must be provided. |
array control buffer | Buffer spotted on the array. Do not provide Reporter Sequence. |
array control empty | Nothing spotted on the array (blank). Do not provide Reporter Sequence. |
array control genomic DNA | E.g. salmon sperm DNA |
array control label | Landing lights. Do not provide Reporter Sequence. |
array control reporter size | Size standards. |
array control spike calibration | E.g. the same spike sequence introduced at varying concentrations |
array control design | (when none of the above applies, i.e. a control was used, but no further details available) |
Here is an example of reporter roles in the ADF table:
Where reporters were designed from more than one species, you can indicate the species source in the Reporter Group [species]
column. Use species name from Latin binomial nomenclature where possible, e.g. Homo sapiens
.
An example of a microarray with probes derived from viral and human microRNA sequences:
Annotation of composite elements is usually in the form of cross-references to external databases, in a very similar way as for reporters, using one or more Composite Element Database Entry [xxx]
columns. In addition, a Composite Element Comment
column can be added to insert free-text comments or descriptions for each element.