Frequently Asked Questions
1. How do I find out about a knockout in my favourite gene?
If you register with the Zebrafish Mutation Project then we will send you an update when we find a mutation in your gene of interest. There is also a list of knockouts and you can search by Ensembl ID, gene name and even human and mouse orthologues.
2. What does each allele status mean?
A heterozygous nonsense or splice mutation has been detected in DNA prepared from F1 individuals. We expect about 90% of such mutations to be confirmed by KASPar genotyping. The time taken to reach this stage is completely random and is dependent on the size of the gene.
An F2 population has been generated by outcrossing F1 carriers. The time taken to reach this stage after a mutation has been detected will vary, but we prioritise F1 carriers according to the number of registrations we have received for the mutations they contain. Depending on the priority, it may take a matter of weeks or up to several months for an allele to reach this stage.
The allele has been confirmed by KASPar genotyping in the F2 population. The time taken to reach this stage will vary, but we again prioritise according to the number of registrations. Again, depending on the priority, it may take a matter of weeks or up to several months for an allele to reach this stage.
Carriers have been identified and the allele is available for shipment from ZIRC.
3. Where do I get the fish carrying alleles from?
Currently from the Zebrafish International Resource Center (ZIRC). Each allele has a button for making requests. Alleles are no longer available directly from the Zebrafish Mutation Project, but we plan to make alleles available from the European Zebrafish Resource Center (EZRC) in the future.
4. Can you outline the mutation detection process?
The process is outlined opposite. We now screen for every gene present in the current Ensembl genebuild. DNA is prepared from F1 individuals which are heterozygous for mutations across the genome. There is an average of 10 nonsense and 5 splice mutations per F1 individual. A sequencing library is prepared and enriched for coding exons using Agilent SureSelect and sequenced on the Illumina HiSeq platform. An F2 population is generated by outcrossing F1 carriers and each individual is genotyped for all nonsense and splice mutations. F2 carriers are frozen to capture mutations and are made available via ZIRC.
F2 carriers are also incrossed and phenotyped morphologically for the first five days of development. Data will be published for all mutations and the phenotype will be described where applicable. Approximately 5 to 10% of mutations cause a morphological phenotype during the first five days of development.
5. How do you determine the phenotypes of alleles?
We perform morphological and behavioural phenotypic analysis of F3 embryos from F2 incrosses during the first five days of development. This is followed by genotype analysis of phenotypic and non-phenotypic embryos to determine associations between observed phenotypes and nonsense or essential splice site mutations present in the family.
We consider a genotype-phenotype correlation to be true when all of at least 12 phenotypic embryos are homozygous mutant and non-phenotypic siblings are either heterozygous or wild-type for that allele. A 10% error margin allows for partial penetrance and pipetting mistakes. Phenotypic alleles are then outcrossed and the association is confirmed on F4 embryos from 12 independent clutches.
This linkage analysis does not constitute proof of causality. We cannot guarantee the assocation and a phenotype might be moved to a different gene in the future.
6. What can I do if my gene is not in Ensembl?
If you have a cDNA that is not represented in the current Ensembl genebuild then it needs to be submitted to a public database (ENA, Genbank or DDBJ). The gene model can then be annotated by the HAVANA group. HAVANA genes are merged with Ensembl genes every other Ensembl release (approximately every 6 months). New genes will be added to the exome enrichment baits periodically.
7. How long will the screening take?
The process is completely random and it's not possible for us to give an estimate. However, the larger a gene is, the more quickly we expect to find a mutation. For example, we obtained 5 alleles of titin (ttna) in the first 92 fish analysed. The longest ttna peptide in Ensembl is over 30,000 amino acids.
8. Can other labs ask for alleles in genes I requested?
Yes, we are an open resource and release all lines to the community.
9. Are requests anonymous?
10. How do I genotype the fish I receive from the Zebrafish Mutation Project?
We routinely genotype individuals by allele-specific amplification using reagents from KBioscience. Each allele has an associated KASPar assay that can be ordered directly from KBioscience. The IDs for each assay are displayed on the Zebrafish Mutation Project Web site. Alternatively, standard PCR primers can be designed around the allele and the product can be sequenced to identify the genotype of each individual.
11. What is a "normal" phenotype?
In the Zebrafish Mutation Project, a phenotype is described as "normal" where no morphological or behavioural difference is observed using a dissecting microscope during the first five days of development.
12. The allele I was interested in is no longer listed. What happened?
This is a high-throughput project and it is unfortunately inevitable that some alleles won't make it all the way to the end of our pipeline. We may detect a mutation in F1 DNA, but then it may fail to confirm in an F2 line. Even after we've made an allele available for distribution and you've requested the allele there is still a very small chance that we may lose the F2 line and not be able to distribute it. For example, the only F2 fish carrying a particular allele might be lost during cryopreservation (although in this case the allele would become available again later from ZIRC).
13. How should I cite the Zebrafish Mutation Project?
The Zebrafish Mutation Project has been published as:
A systematic genome-wide analysis of zebrafish protein-coding gene function
Ross N. W. Kettleborough, Elisabeth M. Busch-Nentwich, Steven A. Harvey, Christopher M. Dooley, Ewart de Bruijn, Freek van Eeden, Ian Sealy, Richard J. White, Colin Herd, Isaac J. Nijman, Fruzsina Fényes, Selina Mehroke, Catherine Scahill, Richard Gibbons, Neha Wali, Samantha Carruthers, Amanda Hall, Jennifer Yen, Edwin Cuppen & Derek L. Stemple
Nature 496, 494-497
PUBMED: 23594742; DOI: doi:10.1038/nature11992
14. How are the transcriptome profiles produced?
Zebrafish carrying mutant alleles discovered in the Zebrafish Mutation Project are incrossed and morphologoically phenotypic embyros and morphologically normal sibling embryos are collected from multiple clutches. The samples collected match the pictures and ontology described for each allele. RNA is extracted and fragmented and Illumina libraries enriched for short fragments with a polyA tail are prepared and sequenced using Illumina HiSeq. The sequence is available from the European Bioinformatics Institute via the links on each transcriptome profile page. The reads are trimmed, non-zebrafish sequence removed, mapped to the reference sequence Zv9 and suspected duplicate reads created during library amplification flagged. Genome coordinates which define the base directly 5' of a polyA tail are identified using mapped read 1s. Count data from the genomic region immediately 5' of these genomic coordinates are collected from mapped reads 2s for each sample. These data are strand specific. DESeq (Anders and Huber Genome Biology 2010 11:R106) is used to detect differential transcript abundance between mutant and sibling samples and expressed as a p-value adjusted for multiple testing with a log2 fold change. The experimentally identified genome coordinates are compared to the 3' end of the closest Ensembl transcript on the same strand. If the coordinates match within +/- 100 bases the Ensembl transcript/gene IDs, gene name and gene description are associated with the region. Each transcriptome profile shows all genomic regions detected with a differential transcript abundance p-value < 0.05.