SlideShare ist ein Scribd-Unternehmen logo
1 von 26
One million monkeys with typewriters
Annotations of the Genomic Data Deluge

Genome Informatics Alliance
Portland, 28/29 March 2012

Dr. Frank Schacherer, CTO, BIOBASE GmbH
frank.schacherer@biobase-international.com
Disclaimer: no actual monkeys
involved
                         In 2003 the Arts Council for England
                         paid £2,000 for a real-life test of the
                         theorem involving six Sulawesi crested
                         macaques, but the trial was abandoned
                         after a month.

                AT C
                     G
               G AT TT   The monkeys produced five pages of
              TT A       text, mainly composed of the letter S,
                   C
             GTA CG      but failed to type anything close to a
            CGC          word of English, broke the computer
                  G
            G TA C       and used the keyboard as a lavatory.
                 A
           ATA
                C
          TTG A A        http://www.telegraph.co.uk/technology/news/8789
               C
         TG G            894/Monkeys-at-typewriters-close-to-reproducing-
               C
         CGT AT          Shakespeare.html
             T
Agenda

• What annotation do we need?
• How can we get it?
A deluge of data
• deluge (plural deluges)
   – A great flood or rain.
     The deluge continued for hours,
     drenching the land and slowing traffic
     to a halt.
   – An overwhelming amount of
     something.
     The rock concert was a deluge of
     sound.
Media perception
                                                    Science 2011


                                   The Power Of Digitizing
          Health Affairs 2009
                                   Human Beings
                                                     17 Feb 2012
  Soon, $1,000 Will             Cost of Gene Sequencing
  Map Your Genes                Falls, Raising Hopes for
            10 Jan 2012         Medical Advances
'Personalized Medicine'
                                                        7 March 2012
Hits a Bump                          / March 2012
Life cycle of data annotation

Understan                         Derive
  dMap                            Analyze
Annotate                          Publish
  Rank                            Curate
How to predict mutation effects
                                                                                 • Overlap with other data
                                                                                    – dbSNP, 1000 genomes
                                                                                    – Relatives and Controls
                                                                                 • Algorithmically
                                                                                    – Frameshift, Nonsense, Stop
                                                                                      gain/loss, Non-synonymous
                                                                                      changes (SIFT, PolyPhen, ...)
                                                                                 • Based on annotation
                                                                                    – known functional regions
                                                                                      (active sites, binding sites, ...)
                                                                                 • Directly known effects
                                                                                    – HGMD

Bioinformatics, Vol. 26 no. 16 2010, pages 2069; 10.1093/bioinformatics/btq330
Associating Genotype with Phenotype




http://www.gen2phen.org/
What data do we need for clinical
  application




ACCE takes its name from the four main criteria for evaluating a genetic test —
analytic validity, clinical validity, clinical utility and associated
ethical, legal and social implications
Centers for Disease Control and PreventionOffice of Public Health Genomics (OPHG)
Ideal Annotation for clinical use?
•      Variants                               N=12
        –   Pathogenic, Uncertain, Benign     4 Testing
                                                    (Clinical Validity,Who/When, Methods,
        –   Severities, if known
                                              Interpretation, Cost)
        –   Ethnicities/Frequencies           4 Management,
        –   Number of cases                      Clinical Significance, Implications
        –   Symptoms In conjunction with      3 Actionability, Clinical Utility
            other mutations                   3 Clinical manifestations
•                                                ( Pathophysiology, Phenotype, Prognosis,
       Evidences
                                              Severity, Penetrance,
        – Not weighted equally                    Pleiotropy)
        – Risks of incorrect classification   2 Frequency
          not equal between genes                 (especially indicate most common variants)
                                              2 Inheritance and
    Data from: Howard P. Levy, MD, PhD
    Johns Hopkins University
                                                de novo mutation rate
                                              2 Evidence-based
    Data from: Elaine Lyon, Ph.D.,
    FACMG University of Utah &                1 Clinical Decision Support in EHR
    ARUP Laboratories
Who provides annotation?




   Payor      Test Lab      Curator   Researcher




  Patient   MD/Geneticist   Anybody    Computer
Surveys & Patient Self-annotation




                            nature biotechnology VOLUME 29 NUMBER 5 MAY 2011
       Knaus, William A.
       BUILDING A GENOME    Patients with serious diseases may experiment with drugs that have
       ENABLED ELECTRONIC   not received regulatory approval. Online patient communities
       MEDICAL RECORD       structured around quantitative outcome data have the potential to
                            provide an observational environment to monitor such drug usage
                            and its consequences. Here we describe an analysis of data
                            reported on the website PatientsLikeMe by patients with amyotrophic
                            lateral sclerosis (ALS) who experimented with lithium carbonate
DNA Variant Databases




Data, except for HGMD and DMuDB courtesy of P. Willems, Mutabase
Data federation
Testing Lab data


A safe and secure route for sharing variant data
The Diagnostic Mutation Database (DMuDB) is a unique repository of high
quality variant data collected from accredited clinical genetic testing
laboratories in the UK National Health Service (NHS).
It provides a safe and secure way for variant data to be shared within and
between laboratories in order to support safer, more consistent
diagnoses. The database was established in order to address the lack of
data-sharing or publication in the genetic testing community.
DMuDB is used regularly by genetic scientists:
         • to check a new variant against existing reported variants from
              other laboratories
         • to check for co-reported variants
         • as a part of regular re-assessment of unclassified variants
         • via the Universal Browser as part of complex searches
              covering multiple databases




                                                 www.ngrl.org.uk/Manchester
LSDBs (Locus Specific Databases)




       http://www.hgvs.org/dblist/glsdb.html
Crowdsourcing genome annotation
Crowdsourcing reality




                              …biological databases can be
 “The future of               curated by a diffuse network of
 biocuration
 To thrive, the field that
                              volunteers? This is certainly not the
 links biologists and their   case and at the core of every
 data urgently needs          successful wiki database are a group
 structure, recognition
 and support. “
                              of dedicated experts who do the bulk
 NATURE|Vol 455|2008          of the data curation.
Database curation
Data Annotation Professionals
•   Clear incentives
•   Background in life sciences (MSc/PhD)
•   Curation is sole focus of work
•   Knowledge of standards, databases, formats,
    specialized tools

                                Huge volumes of primary data are currently
                                archived in numerous open-access databases, and
                                with new generation technologies becoming more
                                common in laboratories, large datasets will become
                                even more prevalent than today. The lasting
                                archiving, accurate curation, efficient analysis and
                                precise interpretation of all of these data are a
                                challenge. Collectively, database development and
                                biocuration are at the forefront of the endeavor to
                                make sense of this mounting deluge of data.
HGMD
HGMD - comprehensive disease-
causing germline
Cleaning up the literature




                     Charts from: Jonathan S. Berg, U North Carolina, Chapel Hill
Applying annotation
Conclusions on annotation

• Clinical-grade annotation may be the most
  important task ahead
• NGS itself contributes to generate evidence
• Many different sources and ways of annotation
  exist
• Human, specialist annotation remains essential
  (monkeys nonwithstanding)
•   BIOBASE Employees all around the world
                                          •   David Cooper, University of Cardiff


Thank you!
                                          •   Andrew Deveraux, NGRL
                                          •   Patrick Willems, MutaBase
                                          •   Johan den Dunnen, HVP & Leiden University Medical Center
                                          •   Anthony J. Brooks, GEN2PHEN & University of Leicester
                                          •   Samir K. Brahmachari , OSDD




 Gene Regulation Analysis           Human Mutation &                      Functional Analysis
                                     Variant Analysis




                            sales@biobase-international.com
                            www.biobase-international.com

Weitere ähnliche Inhalte

Was ist angesagt?

2017 agbt benchmarking_poster
2017 agbt benchmarking_poster2017 agbt benchmarking_poster
2017 agbt benchmarking_posterGenomeInABottle
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminarGenomeInABottle
 
161115 precision fda giab
161115 precision fda giab161115 precision fda giab
161115 precision fda giabGenomeInABottle
 
The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)Genome Reference Consortium
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 
Sept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequinsSept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequinsGenomeInABottle
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseNathan Olson
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justinGenomeInABottle
 
Sept2016 plenary nist_intro
Sept2016 plenary nist_introSept2016 plenary nist_intro
Sept2016 plenary nist_introGenomeInABottle
 

Was ist angesagt? (12)

2017 agbt giab_poster
2017 agbt giab_poster2017 agbt giab_poster
2017 agbt giab_poster
 
2017 agbt benchmarking_poster
2017 agbt benchmarking_poster2017 agbt benchmarking_poster
2017 agbt benchmarking_poster
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminar
 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
 
161115 precision fda giab
161115 precision fda giab161115 precision fda giab
161115 precision fda giab
 
The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Sept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequinsSept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequins
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin
 
Sept2016 plenary nist_intro
Sept2016 plenary nist_introSept2016 plenary nist_intro
Sept2016 plenary nist_intro
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 

Andere mochten auch

DIYA: An annotation pipeline for any genomics lab
DIYA: An annotation pipeline for any genomics labDIYA: An annotation pipeline for any genomics lab
DIYA: An annotation pipeline for any genomics labAndrew Stewart
 
Web Apollo: A Web-based Genomic Annotation Editing Platform ISB2013
Web Apollo: A Web-based Genomic Annotation Editing Platform ISB2013Web Apollo: A Web-based Genomic Annotation Editing Platform ISB2013
Web Apollo: A Web-based Genomic Annotation Editing Platform ISB2013Monica Munoz-Torres
 
Identification, annotation and visualisation of extreme changes in splicing w...
Identification, annotation and visualisation of extreme changes in splicing w...Identification, annotation and visualisation of extreme changes in splicing w...
Identification, annotation and visualisation of extreme changes in splicing w...Mar Gonzàlez-Porta
 
Web Apollo: A Web-based Genomics Annotation Editing Platform. 13ArthGen
Web Apollo: A Web-based Genomics Annotation Editing Platform. 13ArthGenWeb Apollo: A Web-based Genomics Annotation Editing Platform. 13ArthGen
Web Apollo: A Web-based Genomics Annotation Editing Platform. 13ArthGenMonica Munoz-Torres
 
Prediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein featuresPrediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein featuresLars Juhl Jensen
 
BIOL335: How to annotate a genome
BIOL335: How to annotate a genomeBIOL335: How to annotate a genome
BIOL335: How to annotate a genomePaul Gardner
 
2015 functional genomics variant annotation and interpretation- tools and p...
2015 functional genomics   variant annotation and interpretation- tools and p...2015 functional genomics   variant annotation and interpretation- tools and p...
2015 functional genomics variant annotation and interpretation- tools and p...Gabe Rudy
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function predictionLars Juhl Jensen
 
Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?Klaas Vandepoele
 

Andere mochten auch (11)

DIYA: An annotation pipeline for any genomics lab
DIYA: An annotation pipeline for any genomics labDIYA: An annotation pipeline for any genomics lab
DIYA: An annotation pipeline for any genomics lab
 
Web Apollo: A Web-based Genomic Annotation Editing Platform ISB2013
Web Apollo: A Web-based Genomic Annotation Editing Platform ISB2013Web Apollo: A Web-based Genomic Annotation Editing Platform ISB2013
Web Apollo: A Web-based Genomic Annotation Editing Platform ISB2013
 
Identification, annotation and visualisation of extreme changes in splicing w...
Identification, annotation and visualisation of extreme changes in splicing w...Identification, annotation and visualisation of extreme changes in splicing w...
Identification, annotation and visualisation of extreme changes in splicing w...
 
Web Apollo: A Web-based Genomics Annotation Editing Platform. 13ArthGen
Web Apollo: A Web-based Genomics Annotation Editing Platform. 13ArthGenWeb Apollo: A Web-based Genomics Annotation Editing Platform. 13ArthGen
Web Apollo: A Web-based Genomics Annotation Editing Platform. 13ArthGen
 
Prediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein featuresPrediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein features
 
BIOL335: How to annotate a genome
BIOL335: How to annotate a genomeBIOL335: How to annotate a genome
BIOL335: How to annotate a genome
 
2015 functional genomics variant annotation and interpretation- tools and p...
2015 functional genomics   variant annotation and interpretation- tools and p...2015 functional genomics   variant annotation and interpretation- tools and p...
2015 functional genomics variant annotation and interpretation- tools and p...
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?
 

Ähnlich wie Genomic Data Annotation: Making Sense of the Deluge

Next generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomicsNext generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomicsDr. Gerry Higgins
 
Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Vall d'Hebron Institute of Research (VHIR)
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Nathan Olson
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...nist-spin
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Sage Base
 
Integrating evidence based medicine and em rs
Integrating evidence based medicine and em rsIntegrating evidence based medicine and em rs
Integrating evidence based medicine and em rsTrimed Media Group
 
Friend NAS 2013-01-10
Friend NAS 2013-01-10Friend NAS 2013-01-10
Friend NAS 2013-01-10Sage Base
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use casesGuy Coates
 
Dmla0910 – Hoeck– Presentation
Dmla0910 – Hoeck– PresentationDmla0910 – Hoeck– Presentation
Dmla0910 – Hoeck– PresentationWolfgang G. Hoeck
 
DNA Sequence Data in Big Data Perspective
DNA Sequence Data in Big Data PerspectiveDNA Sequence Data in Big Data Perspective
DNA Sequence Data in Big Data PerspectivePalaniappan SP
 
HVP Critical Assessment of Genome Interpretation
HVP Critical Assessment of Genome InterpretationHVP Critical Assessment of Genome Interpretation
HVP Critical Assessment of Genome InterpretationReece Hart
 
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Pistoia Alliance
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesMartin Hartmann
 
Generation and Use of Quantitative Pathology Phenotype
Generation and Use of Quantitative Pathology PhenotypeGeneration and Use of Quantitative Pathology Phenotype
Generation and Use of Quantitative Pathology PhenotypeJoel Saltz
 
Sundaram et al. 2018 Presentation
Sundaram et al. 2018 PresentationSundaram et al. 2018 Presentation
Sundaram et al. 2018 PresentationBrianSchilder
 
Math, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchMath, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchJessica Minnier
 
Health Sciences Driving UCSD Research Cyberinfrastructure
Health Sciences Driving UCSD Research CyberinfrastructureHealth Sciences Driving UCSD Research Cyberinfrastructure
Health Sciences Driving UCSD Research CyberinfrastructureLarry Smarr
 

Ähnlich wie Genomic Data Annotation: Making Sense of the Deluge (20)

Next generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomicsNext generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomics
 
Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24
 
Integrating evidence based medicine and em rs
Integrating evidence based medicine and em rsIntegrating evidence based medicine and em rs
Integrating evidence based medicine and em rs
 
Dna chip
Dna chipDna chip
Dna chip
 
Friend NAS 2013-01-10
Friend NAS 2013-01-10Friend NAS 2013-01-10
Friend NAS 2013-01-10
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use cases
 
ASHG_2014_AP
ASHG_2014_APASHG_2014_AP
ASHG_2014_AP
 
Dmla0910 – Hoeck– Presentation
Dmla0910 – Hoeck– PresentationDmla0910 – Hoeck– Presentation
Dmla0910 – Hoeck– Presentation
 
DNA Sequence Data in Big Data Perspective
DNA Sequence Data in Big Data PerspectiveDNA Sequence Data in Big Data Perspective
DNA Sequence Data in Big Data Perspective
 
HVP Critical Assessment of Genome Interpretation
HVP Critical Assessment of Genome InterpretationHVP Critical Assessment of Genome Interpretation
HVP Critical Assessment of Genome Interpretation
 
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
 
Generation and Use of Quantitative Pathology Phenotype
Generation and Use of Quantitative Pathology PhenotypeGeneration and Use of Quantitative Pathology Phenotype
Generation and Use of Quantitative Pathology Phenotype
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Sundaram et al. 2018 Presentation
Sundaram et al. 2018 PresentationSundaram et al. 2018 Presentation
Sundaram et al. 2018 Presentation
 
Math, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchMath, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical Research
 
Health Sciences Driving UCSD Research Cyberinfrastructure
Health Sciences Driving UCSD Research CyberinfrastructureHealth Sciences Driving UCSD Research Cyberinfrastructure
Health Sciences Driving UCSD Research Cyberinfrastructure
 

Kürzlich hochgeladen

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Kürzlich hochgeladen (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Genomic Data Annotation: Making Sense of the Deluge

  • 1. One million monkeys with typewriters Annotations of the Genomic Data Deluge Genome Informatics Alliance Portland, 28/29 March 2012 Dr. Frank Schacherer, CTO, BIOBASE GmbH frank.schacherer@biobase-international.com
  • 2. Disclaimer: no actual monkeys involved In 2003 the Arts Council for England paid £2,000 for a real-life test of the theorem involving six Sulawesi crested macaques, but the trial was abandoned after a month. AT C G G AT TT The monkeys produced five pages of TT A text, mainly composed of the letter S, C GTA CG but failed to type anything close to a CGC word of English, broke the computer G G TA C and used the keyboard as a lavatory. A ATA C TTG A A http://www.telegraph.co.uk/technology/news/8789 C TG G 894/Monkeys-at-typewriters-close-to-reproducing- C CGT AT Shakespeare.html T
  • 3. Agenda • What annotation do we need? • How can we get it?
  • 4. A deluge of data • deluge (plural deluges) – A great flood or rain. The deluge continued for hours, drenching the land and slowing traffic to a halt. – An overwhelming amount of something. The rock concert was a deluge of sound.
  • 5. Media perception Science 2011 The Power Of Digitizing Health Affairs 2009 Human Beings 17 Feb 2012 Soon, $1,000 Will Cost of Gene Sequencing Map Your Genes Falls, Raising Hopes for 10 Jan 2012 Medical Advances 'Personalized Medicine' 7 March 2012 Hits a Bump / March 2012
  • 6. Life cycle of data annotation Understan Derive dMap Analyze Annotate Publish Rank Curate
  • 7. How to predict mutation effects • Overlap with other data – dbSNP, 1000 genomes – Relatives and Controls • Algorithmically – Frameshift, Nonsense, Stop gain/loss, Non-synonymous changes (SIFT, PolyPhen, ...) • Based on annotation – known functional regions (active sites, binding sites, ...) • Directly known effects – HGMD Bioinformatics, Vol. 26 no. 16 2010, pages 2069; 10.1093/bioinformatics/btq330
  • 8. Associating Genotype with Phenotype http://www.gen2phen.org/
  • 9. What data do we need for clinical application ACCE takes its name from the four main criteria for evaluating a genetic test — analytic validity, clinical validity, clinical utility and associated ethical, legal and social implications Centers for Disease Control and PreventionOffice of Public Health Genomics (OPHG)
  • 10. Ideal Annotation for clinical use? • Variants N=12 – Pathogenic, Uncertain, Benign 4 Testing (Clinical Validity,Who/When, Methods, – Severities, if known Interpretation, Cost) – Ethnicities/Frequencies 4 Management, – Number of cases Clinical Significance, Implications – Symptoms In conjunction with 3 Actionability, Clinical Utility other mutations 3 Clinical manifestations • ( Pathophysiology, Phenotype, Prognosis, Evidences Severity, Penetrance, – Not weighted equally Pleiotropy) – Risks of incorrect classification 2 Frequency not equal between genes (especially indicate most common variants) 2 Inheritance and Data from: Howard P. Levy, MD, PhD Johns Hopkins University de novo mutation rate 2 Evidence-based Data from: Elaine Lyon, Ph.D., FACMG University of Utah & 1 Clinical Decision Support in EHR ARUP Laboratories
  • 11. Who provides annotation? Payor Test Lab Curator Researcher Patient MD/Geneticist Anybody Computer
  • 12. Surveys & Patient Self-annotation nature biotechnology VOLUME 29 NUMBER 5 MAY 2011 Knaus, William A. BUILDING A GENOME Patients with serious diseases may experiment with drugs that have ENABLED ELECTRONIC not received regulatory approval. Online patient communities MEDICAL RECORD structured around quantitative outcome data have the potential to provide an observational environment to monitor such drug usage and its consequences. Here we describe an analysis of data reported on the website PatientsLikeMe by patients with amyotrophic lateral sclerosis (ALS) who experimented with lithium carbonate
  • 13. DNA Variant Databases Data, except for HGMD and DMuDB courtesy of P. Willems, Mutabase
  • 15. Testing Lab data A safe and secure route for sharing variant data The Diagnostic Mutation Database (DMuDB) is a unique repository of high quality variant data collected from accredited clinical genetic testing laboratories in the UK National Health Service (NHS). It provides a safe and secure way for variant data to be shared within and between laboratories in order to support safer, more consistent diagnoses. The database was established in order to address the lack of data-sharing or publication in the genetic testing community. DMuDB is used regularly by genetic scientists: • to check a new variant against existing reported variants from other laboratories • to check for co-reported variants • as a part of regular re-assessment of unclassified variants • via the Universal Browser as part of complex searches covering multiple databases www.ngrl.org.uk/Manchester
  • 16. LSDBs (Locus Specific Databases) http://www.hgvs.org/dblist/glsdb.html
  • 18. Crowdsourcing reality …biological databases can be “The future of curated by a diffuse network of biocuration To thrive, the field that volunteers? This is certainly not the links biologists and their case and at the core of every data urgently needs successful wiki database are a group structure, recognition and support. “ of dedicated experts who do the bulk NATURE|Vol 455|2008 of the data curation.
  • 20. Data Annotation Professionals • Clear incentives • Background in life sciences (MSc/PhD) • Curation is sole focus of work • Knowledge of standards, databases, formats, specialized tools Huge volumes of primary data are currently archived in numerous open-access databases, and with new generation technologies becoming more common in laboratories, large datasets will become even more prevalent than today. The lasting archiving, accurate curation, efficient analysis and precise interpretation of all of these data are a challenge. Collectively, database development and biocuration are at the forefront of the endeavor to make sense of this mounting deluge of data.
  • 21. HGMD
  • 22. HGMD - comprehensive disease- causing germline
  • 23. Cleaning up the literature Charts from: Jonathan S. Berg, U North Carolina, Chapel Hill
  • 25. Conclusions on annotation • Clinical-grade annotation may be the most important task ahead • NGS itself contributes to generate evidence • Many different sources and ways of annotation exist • Human, specialist annotation remains essential (monkeys nonwithstanding)
  • 26. BIOBASE Employees all around the world • David Cooper, University of Cardiff Thank you! • Andrew Deveraux, NGRL • Patrick Willems, MutaBase • Johan den Dunnen, HVP & Leiden University Medical Center • Anthony J. Brooks, GEN2PHEN & University of Leicester • Samir K. Brahmachari , OSDD Gene Regulation Analysis Human Mutation & Functional Analysis Variant Analysis sales@biobase-international.com www.biobase-international.com

Hinweis der Redaktion

  1. Callback to last year in Verona, where shakespeares romeo and julia played „ An infinite number of monkes with typewriters (or one monkey with infinite time) in principle would be able to write all the works of shakespeare“ the idea of getting things right by throwing impractically large resources at them (the monkeys would take much longer than the length of the universe) How can we deal with millions of genomes, how can we annotate them, facing the same limitation in resources Goes back to Aristoteles in regard to permutations, and more recently 1913 — Émile Borel’s essay — “Mécanique Statistique et Irréversibilité”
  2. Datas not bad, ist only bad if we do not know what to do with it
  3. „ Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?“ – T.S. Eliot
  4. Analytic validity How accurately and reliably the test measures the genotype of interest. Clinical validity How accurately the test detects or predicts the outcomes of interest. Clinical utility How likely the test is to significantly improve patient outcomes.
  5. Facebook for genomes Ebay for genomes?
  6. Bad fít for whole genome/exome Quality consistency issues
  7. Wikipedia = Crowdsourcing‘s Posterchild for distributed curation: Everybody can contribute quality killed Britannica Linus‘ Law: "given enough eyeballs, all bugs are shallow“ -- Eric S. Raymond, named in honor of L. Torvalds India: Incentive of getting published
  8. Has not worked out that well in practice, for biology/science because of researchers spending time rather to do research and get published, and because of difficulites with maintaining standards Ideas: Force journals mandate submission of data into databases journals require gene symbols, accessions, etc Tie career advancement to annotation with Microattribution Crowdfunding Suggested solutions: Force ‘em? journals mandate submission of data into databases authors provide a machine-readable XML summary journals require gene symbols, accessions for genes and isoforms, description of species, cell types, genotypes Tie career advancement to annotation? Author IDs, Microattribution Fund community annotation (crowdfunding)? Valuable suggestions to enable many tools. For lowest level of data curation. Changing career eval would mean changing the entire credit system for research which sits on peer reviewed author papers
  9. Use expert who are paid to collect information manually into databases
  10. We train curators for up to half a year before they do live curation