NOTE: The funding terminates July 30th 2007, and we are thus unable to accept ne w project applications.
Array Design
The current version of our array consists of ~9K features(spots) printed in duplicate (i.e.~18K including controls).
Each feature has been designed to be as unique as possible and covers approximately 70% of the total genome.
This estimation is based on the current assembly (Feb 2005) which gave a geneDB prediction of 13575 genes from which we have designed enough features to represent ~9300
genes.
Unique features are designed using the gene prediction consensus of three software algorithms where possible to produce likely gene candidates.
Each candidate gene is inspected manually using the Artemis genome viewer to remove unsuitable prediction matches, this process involves inspecting the
coding regions for each gene and checking for possible shifts in DNA sequence that may alter prediction results.
Manual checking of gene predictions is a time consuming step of the array design process but insures confidence that the genes included are representative of
the genome.
In the following Artemis figure, gene predictions figures can be seen from the three algorithms used (GeneID, Genefinder and HMMgene).
For the majority of predictions there is a consensus across the three algorithms for a prediction. In others only one or two predictions may match, in
these cases the nucleotide and aminoacid sequences are analysed to determine likely candidates.
Also represented are EST sequences and primers that have been designed.
Primers are designed towards the 3' end of
each gene candidate using Primer3. This process is made particulary difficult due to the
A-T content of Dictyostelium which restricts the avalibility of suitable sites that a primer can be designed.
Other cavverts of the process include small genes that are difficult to detect and design for, as well as the number of multi-gene families that often contain
genes that appear in different contigs of the assembly.
Our approach here is to pick a gene that is representative of each gene family and to design one set of primers for that family.
The majority of the ~70% of genes that are found on our arrays have clear predictions and were simple to design primers for. The remaining 30% are a mixture of small genes and
multi-gene families and small genes, both of which are included in the continuing design process.
In the future it is our intention to refine our approach to obtaining as much of the represented genes from the genome as possible.
As new features are designed our arrays will be updated accordingly to accomodate the data that is avalible to us.
For successfully chosen candidates, genes are amplified and resulting products are checked for clarity of the product, expected size and multiple
banding. Any products not passing these criteria are removed from the feature set.
Printing
Good quality PCR products are resuspended in 50mM sodium phosphate spotting buffer and filtered before being re-arrayed from 96-well to
384-well format for printing using a Biorobotics Microgrid II robot (Genomic Solutions).
PCR products are attached covalently to the chemical surface of GE Healthcare (Amersham) Codelink
slides using modified amino groups.
48 pins are used to print the array in duplicate with an entire copy of represented genes printed on each side of the slide. Duplication of
individual features gives statistically robust data when multiple slides are used within an experiment, it also allows for most features to be
present if any printing errors occur.
The array includes ~9300 individual dicty genes, and features from Bacillus subtilis which can be used as either positive or negative controls
for use in subsequent normalisation and analysis.
