Metagenomics classification and clinical diagnosis of brain infections

Florian Breitwieser
STAMPS 2016

logo logo

Summary

Metagenomics sequencing for the diagnosis of neuropathological infections
centrifuge: Novel classification engine for microbial sequences
pavian: Interface for analyzing metagenomics data (‘PAthogen VIsualization ANd more’)

Sequencing for diagnosis of infections



  • > 50% of infections remain undiagnosed
  • Encephalitis
    • about 20k cases / year in the U.S.
    • mortality rate > 5%
  • infectious causes viral, bacterial, fungal, parasites
Sequencing can enable fast and ‘unbiased’ identification of pathogens.

Potentially: Find unknown unknowns (zoonotic pathogens).

Sequencing diagnosis of neuropathological infections

  • Ten patients with suspected neuropathological infections
  • Negative results with standard methods
  • Brain or spinal cord biopsies

  • Whole metagenome sequencing on a MiSeq
  • Computational pipeline to analyze and compare samples

Metagenomics classification

Metagenomics classification

  • long runtime, or
  • require a very large index (50-100GB, e.g. Kraken, CLARK, kallisto), or
  • only part of the genome indexed (e.g. MetaPhlAn2, mOTU)
speed sensitivity mem req Aim: fast and sensitive engine that can run on a desktop

Centrifuge for microbial classification

  • full genomes, compressed on the species level
  • based on FM index
  • small database: < 4GB for all bacterial genomes
  • sensitivity and precision comparable to best programs
  • very fast

https://github.com/infphilo/centrifuge

Pan-Genome compression


Many species have several completed genomes:
  • 268 Salmonella enterica genomes
  • 134 Escherichia coli genomes
  • 112 Mycobacterium tubercolusis genomes

compression ratio: up to 98%

Centrifuge performance compared to Kraken

On 450 bacterial SRA datasets (25% of species not in database)



Database size: 20 times smaller! Centrifuge DB is < 4 GB

Build database on NCBI nt: > 1000 times faster than megablast

pavian interface, and results on patient data

Case report: 67 years old woman

  • numerous brain and spinal cord lesions
  • worsening neurological condition

pavian

Overview of results on ten patients

  • three of the cases resolved
    • 52yo, m, with motor seizures: JC polyomavirus (confirmed 3d later)
    • 67yo, f, with brain lesions: M tuberculosis (other tests negative)
    • 44yo, f, history of transplant, facial seizures: EBV (confirmed +10d)
  • two cases with tumor
  • one year later: re-analysis revealed 429 reads from Elizabethkingia in a patient, a newly emerging pathogen that caused significant morbidity in a cluster of cases in Wisconsin

Summary

Metagenomics is promising for infectious disease diagnosis
http://nn.neurology.org/content/¾/e251.full
Centrifuge is a memory-efficient and fast metagenomics classifier
https://github.com/infphilo/centrifuge
pavian can help in the analysis of metagenomics samples
https://bitbucket.org/florianbw/pavian

Installing and running Pavian

Installation in R

source("https://bioconductor.org/biocLite.R")
biocLite("Rsamtools")

install.packages("devtools")
devtools::install_bitbucket("florianbw/pavian")

Running Pavian in R

pavian::runApp()

https://bitbucket.org/florianbw/pavian

Installing and running Centrifuge

Installation

URL=ftp://ftp.ccb.jhu.edu/pub/infphilo/centrifuge
curl -o centrifuge.zip $URL/downloads/centrifuge-1.0.2-beta-Linux_x86_64.zip
unzip centrifuge.zip

curl -O $URL/data/b_compressed+h+v.tar.gz
tar xvvf b_compressed+h+v.tar.gz

Running Centrifuge (requires 6+GB of RAM)

centrifuge -x  b_compressed+h+v -1 R1.fq -2 R2.fq

http://www.ccb.jhu.edu/software/centrifuge/

Acknowledgments

  • Steven L Salzberg
  • Centrifuge: Daehwan Kim and Li Song
  • Brain infections: Anupama Kumar, Haiping Hao, Peter Burger, Fausto J. Rodriguez, Michael Lim, Alfredo QuiƱones-Hinojosa, Gary L. Gallia, Jeffrey A Tornheim, Michael T. Melia, Cynthia L. Sears and Carlos A. Pardo
  • Peter Thielen, Thomas Mehoke at the JHU Applied Physics Laboratory
  • collegues at CCB

Funding: National Institutes of Health/NHGRI, U.S. Army Research Office, and the Bart McLean Fund for Neuroimmunology Research