Artemis: visualising, analysing and browsing next generation sequence data

Examples and use cases are shown below.

Example 1: Plasmodium falciparum 3D7

P. falciparum 3D7 transcriptome data in Artemis.

P. falciparum 3D7 transcriptome data in Artemis.

zoom

A region in chromosome 1 of Plasmodium falciparum 3D7. Displaying transcriptome data of the expression levels at two time intervals, 8 hours and 24 hours. The sequence, annotation and BAM alignment files can be launched in Artemis (via Java Web Start) from this link:

When this example has launched, right click on the BAM view and open the coverage plot from the 'Graph' option in the pop-up menu. This shows the different levels of expression at 8 hours (red) and 24 hours (green). As well as the differences in expression at different times this example also confirms the annotation of the exon boundaries.

  • New insights into the blood-stage transcriptome of Plasmodium falciparum using RNA-Seq.

    Otto TD, Wilinski D, Assefa S, Keane TM, Sarry LR, Böhme U, Lemieux J, Barrell B, Pain A, Berriman M, Newbold C and Llinás M

    Molecular microbiology 2010;76;1;12-24

Example 2: Chlamydia trachomatis

Chlamydia trachomatis in Artemis.

Chlamydia trachomatis in Artemis.

zoom

The Chlamydia trachomatis genome and plasmid are concatenated together in this example. It can be seen from the reads mapped to the plasmid sequence that there is a deletion of ca. 370 bp in the variant of this organism. The deletion removes the primer binding site used in the standard diagnostic test. This led to a significant increase in cases of what is now called the new variant C. trachomatis in Sweden where it is found.

Changing the view to plot the reads by their inferred size shows the characteristic increase in this value for the reads where they span the deleted region in the variant. This view is illustrated in the image opposite.

  • Co-evolution of genomes and plasmids within Chlamydia trachomatis and the emergence in Sweden of a new variant strain.

    Seth-Smith HM, Harris SR, Persson K, Marsh P, Barron A, Bignell A, Bjartling C, Clark L, Cutcliffe LT, Lambden PR, Lennard N, Lockey SJ, Quail MA, Salim O, Skilton RJ, Wang Y, Holland MJ, Parkhill J, Thomson NR and Clarke IN

    BMC genomics 2009;10;239

Example 3: Mycobacterium tuberculosis H37Rv

Mycobacterium tuberculosis H37Rv genome showing Illumina re-sequencing data.

Mycobacterium tuberculosis H37Rv genome showing Illumina re-sequencing data.

zoom

Mycobacterium tuberculosis H37Rv genome example with Illumina re-sequencing data. Right clicking on the BAM view gives an option to 'Show' -> 'SNP marks' and vertical red lines on the reads indicate differences to the reference. Additionally a SNP density plot can be shown ('Graph' -> 'SNP'). Sequencing errors produce randomly distributed lines and true SNPs can be seen where lines align vertically.

Example 4: Streptococcus pneumoniae

S. pneumoniae variation data in Artemis.

S. pneumoniae variation data in Artemis.

zoom

A region of Streptococcus pneumoniae antibiotic resistant pandemic clone PMEN1. This shows the variation data for 11 strains of this compared to the reference. The pspA gene in the reference is a pseudogene as a result of a frame shift. The magenta insertions in most strains show where this frame shift has been caused by a deletion in the reference strain (and one of the other strains). There are also some other points of interest - the gene has independently become a pseudogene in a second strain, because of a SNP leading to a premature stop codon, and a third strain actually has a recombination that spans this whole region (shown by the increased SNP density). For single base changes the colour represents the base it is being changed to, i.e. T black, G blue, A green, C red. Right clicking on the panel that is displaying the variation data provides a number of options including filtering and different colour schemes.

Example 5: Plasmodium berghei

P. berghei example

P. berghei example

zoom

An example in Plasmodium berghei of a repeat that was collapsed during the assembly. Clone the BAM panel and zoom out. Right click on the top BAM panel an change the view to show the coverage (Views -> Coverage). In the bottom BAM panel right click and use the 'Filter' option to hide proper pairs. There are three ways for confirming this in artemis:

  • the coverage over the region is doubled
  • there are heterozygous SNP in the VCF view
  • the amount of reads that are not proper pairs increases at the break point of the duplication

Example 6: Salmonella bongori

S. bongori example

S. bongori example

zoom

An example in Salmonella bongori of directional transcriptome sequencing using Illumina technology. When launched change the view to 'Strand Stack' (right-click: Views -> Strand Stack). For this protocol, the reads indicating the direction are on the opposite strand.

  • A simple method for directional transcriptome sequencing using Illumina technology.

    Croucher NJ, Fookes MC, Perkins TT, Turner DJ, Marguerat SB, Keane T, Quail MA, He M, Assefa S, Bähler J, Kingsley RA, Parkhill J, Bentley SD, Dougan G and Thomson NR

    Nucleic acids research 2009;37;22;e148

Example 7: Plasmodium berghei

P. berghei ACT example

P. berghei ACT example

zoom

An example in Plasmodium berghei of a misassembly shown in ACT. A BAM panel is loaded for each assembly to identify a misassembly in the top sequence in the region 1060000-1080000 bp. The bottom sequence is the de novo assembly. To speed up the display of the BAM files (over HTTP) for the ACT launch below the Illumina BAM's contain just 10% of the reads. The view can be cloned to show a coverage plot of mapped Illumina reads and the insert size view of mapped 454 reads. In the corrected genome the coverage is more even and does not drop except in the gap regions.

  • Launch P.berghei ACT Example

Example 8: Plasmodium falciparum

P. falciparum ACT example

P. falciparum ACT example

zoom

Genetic crosses can be studied using ACT. In this example reads from a genetic cross are mapped against the Pf3D7 and HB3 Plasmodium falciparum chromosome 14 sequences. This shows the variations found when compared to these two sequences. This can be used to identiy crossover in the progeny.

  • Launch P.falciparum ACT Example

Example 9: Mouse Genomes

Mouse Genomes example

Mouse Genomes example

zoom

Artemis showing data from the Mouse Genomes Project the reference genome is loaded from an indexed FASTA file. Read alignments of 3 strains are shown in the `paired stack' view with read pairs joined by a grey line. The reads are colour coded by strain with the coverage plot for each underneath. These reveal a deletion in chromosome 19 as seen from the coverage of 129S1/SvImJ (red) and not in two of the strains, A/J (green) and C57BL/6NJ (blue).

Downloading Data

  1. Download the Mouse Genome sequence (NCBIM37_um.fa) and the indexed fasta file (NCBIM37_um.fa.fai).
  2. The read alignment files (in BAM format) can be download along with their index files (*.bai). These files are large (40-140 GB) which can make them prohibitive to download locally. So an alternative is to download the region or chromosome of interest. This can be done using samtools. For example the script below can be used to download the read alignments for the 17 strains used in the Mouse Genomes Project containing just the region in chromosme 19 described above. It creates the index file for the downloaded BAM and creates a file 'artemis_bam.list' that can be used to load the read alignments into artemis.
    /bin/bash
    STRAINS=(129P2_OlaHsd 129S1_SvImJ 129S5SvEvBrd AKR_J A_J BALB_cJ C3H_HeJ C57BL_6NJ CAST_EiJ CBA_J DBA_2J LP_J NOD_ShiLtJ NZO_HlLtJ PWK_PhJ SPRET_EiJ WSB_EiJ)
    export REGION=19:57270000-57300000;
    for strain in ${STRAINS[*]}
    do
       samtools view -bh ftp://ftp-mouse.sanger.ac.uk/current_bams/$strain.bam $REGION > $strain.chr19.bam;
       rm -f $strain.bam.bai;
       samtools index $strain.chr19.bam;
       echo "$PWD/$strain.chr19.bam" >> artemis_bam.list
    done
    
  3. The gene sets can be downloaded for Mus musculus. Unzip the annotation file. This file contains the annotation for the genome and needs to be sorted and indexed with tabix:
    (grep ^"#" Mus_musculus.NCBIM37.64.gtf; grep -v ^"#" Mus_musculus.NCBIM37.64.gtf | sort -k1,1 -k4,4n) | bgzip > Mus_musculus.NCBIM37.64.sorted.gtf.gz;
    tabix -p gff Mus_musculus.NCBIM37.64.sorted.gtf.gz
    
  4. When Artemis has been launched open the sequence file NCBIM37_um.fa. From the drop down menu in the entry panel at the top of Artemis, change the view to chromosome 19.
    Artemis chromosome selection

    Artemis chromosome selection

    zoom

  5. Read the sorted and indexed annotation file into Artemis Mus_musculus.NCBIM37.64.sorted.gtf.gz (from the 'File' menu select the 'Read An Entry' option).
  6. Read the file containing the paths to the BAM files (artemis_bam.list) into Artemis ('File' → 'Read BAM /VCF...').
    Artemis BAM selection

    Artemis BAM selection

    zoom

  7. You will then get a message to indicate that Artemis is displaying reads for Chromosome 19 (rather than the default chromosome 1).
    Artemis warning

    Artemis warning

    zoom

  8. Navigate to the region of the BAM that was downloaded (57270000-57300000). This can be done using the Navigator (from the menus 'Goto' → 'Navigator...').
    Mouse Genome example

    Mouse Genome example

    zoom

  • Mouse genomic variation and its effect on phenotypes and gene regulation.

    Keane TM, Goodstadt L, Danecek P, White MA, Wong K, Yalcin B, Heger A, Agam A, Slater G, Goodson M, Furlotte NA, Eskin E, Nellåker C, Whitley H, Cleak J, Janowitz D, Hernandez-Pliego P, Edwards A, Belgard TG, Oliver PL, McIntyre RE, Bhomra A, Nicod J, Gan X, Yuan W, van der Weyden L, Steward CA, Bala S, Stalker J, Mott R, Durbin R, Jackson IJ, Czechanski A, Guerra-Assunção JA, Donahue LR, Reinholdt LG, Payseur BA, Ponting CP, Birney E, Flint J and Adams DJ

    Nature 2011;477;7364;289-94

Example 10: Human Genomes - protocol for loading next-gen data

  1. Download the reference file from the 1000 Genomes Project.
  2. Unzip the downloaded reference file.
  3. Then download the FASTA index file.
  4. Download the annotation. Unzip this annotation file. This file needs to be indexed for Artemis to be able to read it. The reference names need to match the names in the sequence file (human_g1k_v37.fasta). This can be achieved by stripping out the 'chr' string from the first column of the gtf file (e.g. chr1 becomes 1) using this perl command:
    perl -pi -e 's/^chr//' gencode_v4.annotation.GRCh37.gtf
    
    Then this is sorted and indexed with tabix:
    (grep ^"#" gencode_v4.annotation.GRCh37.gtf; grep -v ^"#" gencode_v4.annotation.GRCh37.gtf | sort -k1,1 -k4,4n ) | bgzip > gencode_v4.annotation.GRCh37.sorted.gtf.gz
    tabix -p gff gencode_v4.annotation.GRCh37.sorted.gtf.gz
    
  5. Download some of the alignment data For example download the BAM and the associated index files of the low coverage ILLUMINA data for different samples for chromosome 20:
    HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam.bai
    and
    HG00236.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam HG00236.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam.bai
  6. When Artemis has been launched open the sequence file human_g1k_v37.fasta. From the drop down menu in the entry panel at the top of Artemis, change the view to chromosome 20.
    Artemis chromosome selection

    Artemis chromosome selection

    zoom

  7. Read the annotation file into Artemis gencode_v4.annotation.GRCh37.sorted.gtf.gz (from the 'File' menu select the 'Read An Entry' option).
  8. Read the BAM files into Artemis ('File' → 'Read BAM /VCF...').
    Artemis BAM selection

    Artemis BAM selection

    zoom

  9. You will then get a message to indicate that Artemis is displaying reads for Chromosome 20 (rather than the default chromosome 1).
    Artemis warning

    Artemis warning

    zoom

  10. To differentiate reads from each BAM right click in the BAM view and select 'Colour By' → 'Coverage Plot Colours'.
    Human Genome example

    Human Genome example

    zoom

Usage:

Note: A minimum Java version of 1.6 is required to load the next-generation file formats.

Artemis and ACT can be downloaded and run as standalone tools and used to browse indexed BAM, VCF (Variant Call Format) and BCF (Binary VCF) files on a users local machine. Additional documentation is available on the Artemis and ACT home pages.

Artemis and ACT read BAM files that has been sorted and indexed using SAMtools. This provides an integrated BamView panel displaying sequence alignment mappings to a reference sequence (see Example 1 above).

Variant Call Format (VCF) files can also be read (see Example 3 above). The VCF files need to be compressed and indexed using bgzip and tabix respectively (see the tabix manual and download page). The compressed file gets read in (e.g. file.vcf.gz) and below are the commands for generating this from a VCF file:

  bgzip file.vcf
  tabix -p vcf file.vcf.gz

Alternatively a BCF file can be indexed with BCFtools and read into Artemis or ACT.

As with reading in multiple BAM files, it is possible to read a number of (compressed and indexed) VCF files by listing their full paths in a single file. They then get displayed in separate rows in the VCF panel.

* quick link - http://q.sanger.ac.uk/1410w4iq