Using Ontologies to accelerate candidate gene identification

•

0 gefällt mir•454 views

Copy of my slides from the AMIA Summit on Translational Medicine, 2010. This outlines our work with the National Center for Biomedical Ontology where we are using their tools to index biological data repositories and then enable the use of these annotations for further discoveries.

Gesundheit & Medizin Technologie Bildung

Using Ontologies to
accelerate candidate gene
identiﬁcation
Simon Twigger, Ph.D.

AMIA Summit on Translational Bioinformatics
San Francisco, March 2010

Rat researchers ask...
Has anyone done any expression
studies using congenic rats?

Rat researchers ask...
Has anyone done any expression
studies using congenic rats?
What tissue is this gene
expressed in?

Rat researchers ask...
Has anyone done any expression
studies using congenic rats?
What tissue is this gene
expressed in?
Are any of these
genes associated
with my phenotype?

Rat researchers ask...
Has anyone done any expression
studies using congenic rats?
What tissue is this gene
What expression data expressed in?
is known for SD (aka Are any of these
SD/NHsd, Harlan genes associated
Sprague Dawley, with my phenotype?
Sprague Dawley) rats?
What rat expression studies have been
done on Mammary Cancer(aka breast
neoplasms/breast cancer/cancer of the

Biological Data Warehouse

Really important piece of data...

NCBO Annotator

http://www.bioontology.org/wiki/index.php/Annotator_Web_service

Parallel Annotation Workﬂow
GEO Records

Create Annotation
Jobs & Queue Up

Q-Out
1..n Annot. Workers

RabbitMQ Index text
at OBA

Parse
Q-In
Results

Results saved to Put results in to
GMiner database queue for save

Current Ontologies

http://bioportal.bioontology.org/

Curation of results

NCBO Ontology Widgets
http://www.bioontology.org/wiki/index.php/Ontology_Widgets

Linking annotations to data

Tm2d1
RGD1306410
Svs4
Hbb
Scgb2a1
Alb

Linking annotations to data
Tm2d1
RGD1306410
Svs4
Hbb
Scgb2a1
+
Alb

Hbb is_expressed_in rat kidney
Tm2d1 is_expressed_in rat kidney

Linking annotations to data
Tm2d1
RGD1306410
Svs4
Hbb
Scgb2a1
+
Alb

Hbb is_expressed_in rat kidney
Tm2d1 is_expressed_in rat kidney

Human (U133, U133v2.), Mouse (430, U74, U95) and Rat
(U34a/b/c, 230, 230v2)
62,000 samples x ca. 25,000 genes/sample = 1.5B data points

Probeset results on GMiner
Probeset 1395269_s_at for Gabrd - gamma-aminobutyric
acid (GABA) A receptor, delta

Probeset results on GMiner

Probeset 1395269_s_at for
Gabrd - gamma-aminobutyric
acid (GABA) A receptor, delta

Hs GABDR

QTL
Hypertensive

G G G

Pathway

Hypertension

QTL
Hypertensive

G G G

Pathway

G

G

Hypertension

QTL
Hypertensive

G G G

Pathway

G

G
Component
Function
Process

Hypertension

QTL
Hypertensive

G G G

Pathway

G

G Anatomy
(Kidney)
Component
Function
Process

Hypertension

QTL
Hypertensive

G G G

Pathway Str 1 != Str 2
G

G Anatomy
(Kidney)
Component
Function
Process

Hypertension

Ontology Advantages
• Unstructured to Structured (using OBA service)
• Structured (Faceted) browsing of data
• Encourages discussion of data & its meaning
• Integration with other data (via same ontologies
or mappings to others)

Ontology Hurdles
• Managing ontology/vocabulary terms and structure
• Time to encode data using ontology vs free text
• Consistent use/annotation using ontologies
• Quite a few ‘standards’ to pick from....

Acknowledgements
• Joey Geiger - Development of GMiner

• Jennifer Smith - Video creation, data curation

• Rajni Nigam - Rat Strain Ontology

• Clement Jonquet - NCBO Annotator tools

• Mark Musen & NIH Roadmap Initiative - Our Funding!

Links
• http://gminer.mcw.edu Web application

• http://github.com/mcwbbc/gminer Gminer Code

• http://github.com/simont/MCW-RDF RDFizer code

• http://bioportal.bioontology.org/ BioPortal

Email: simont@mcw.edu
Twitter: @simon_t

Weitere ähnliche Inhalte

Ähnlich wie Using Ontologies to accelerate candidate gene identification

Heterotrimeric G-proteinsGulpreet Kaur

Francisco Zafra Centro de Biologia Molecular Severo Ochoa. CSIC-UAM. Fundación Ramón Areces

PHd defense presentation Final RIVESMarie-Laure Rives, PhD

GAPDH, a well-known glycolytic enzyme, mediatesPei-Ju Chin

C Amp Detection Methods In HtsVincen Pan

IntOGen & Gitoolschristian.perez

G proteins in medicineTamara Jorquiera

Optimizing Grape Rootstock Production and Export of inhibitors of X. fastidio...huyng

VII Jornadas SEQT - hERGPedro-Manuel Grima-Poveda

Signal transductionDr.M.Prasad Naidu

Pyrosequencing slide presentation rev3.Robert Bruce

Cell signalling 2Dr. Khuram Aziz

G protein coupled receptors and their Signaling MechanismFarazaJaved

Clinical applications of NGSEastern Biotech

Cellular Neuroscience Productscailynnjohnson

2.2 analyzing and manipulating dnaEmmanuel Aguon

Gpcr in plantsAnanya Sinha

G protein signalDr. Khuram Aziz

Ähnlich wie Using Ontologies to accelerate candidate gene identification (18)

Heterotrimeric G-proteins

Francisco Zafra Centro de Biologia Molecular Severo Ochoa. CSIC-UAM.

PHd defense presentation Final RIVES

GAPDH, a well-known glycolytic enzyme, mediates

C Amp Detection Methods In Hts

IntOGen & Gitools

G proteins in medicine

Optimizing Grape Rootstock Production and Export of inhibitors of X. fastidio...

VII Jornadas SEQT - hERG

Signal transduction

Pyrosequencing slide presentation rev3.

Cell signalling 2

G protein coupled receptors and their Signaling Mechanism

Clinical applications of NGS

Cellular Neuroscience Products

2.2 analyzing and manipulating dna

Gpcr in plants

G protein signal

Mehr von Simon Twigger

Converged IT and Data CommonsSimon Twigger

A Distributed Annotation Pipeline for MSSNGSimon Twigger

DevOps and Automation for BioinformaticiansSimon Twigger

the iPad - an interface for Biologists?Simon Twigger

Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...Simon Twigger

Virtual Proteomics Analysis Cluster in the CloudSimon Twigger

Mehr von Simon Twigger (6)

Converged IT and Data Commons

A Distributed Annotation Pipeline for MSSNG

DevOps and Automation for Bioinformaticians

the iPad - an interface for Biologists?

Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...

Virtual Proteomics Analysis Cluster in the Cloud

Kürzlich hochgeladen

Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

Call Girls Varanasi Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora

The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...chandars293

VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋TANUJA PANDEY

Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...Taniya Sharma

All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...Arohi Goyal

Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...narwatsonia7

Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...narwatsonia7

Call Girls Mumbai Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escortsaditipandeya

Russian Call Girls in Jaipur Riya WhatsApp ❤8445551418 VIP Call Girls Jaipurparulsinha

Chandrapur Call girls 8617370543 Provides all area service COD availableDipal Arora

Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...indiancallgirl4rent

Call Girls Bareilly Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

Kürzlich hochgeladen (20)

Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available

Call Girls Varanasi Just Call 9907093804 Top Class Call Girl Service Available

Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...

The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...

VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋

Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service Available

💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...

All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...

Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...

Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available

Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available

Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...

Call Girls Mumbai Just Call 9907093804 Top Class Call Girl Service Available

Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available

VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts

Russian Call Girls in Jaipur Riya WhatsApp ❤8445551418 VIP Call Girls Jaipur

Chandrapur Call girls 8617370543 Provides all area service COD available

Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available

(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...

Call Girls Bareilly Just Call 9907093804 Top Class Call Girl Service Available

Using Ontologies to accelerate candidate gene identification

1. Using Ontologies to accelerate candidate gene identiﬁcation Simon Twigger, Ph.D. AMIA Summit on Translational Bioinformatics San Francisco, March 2010

2. http://rgd.mcw.edu

3. Meet the client

4. Hypertension

5. Hypertensive Hypertension

6. QTL Hypertensive Hypertension

7. QTL Hypertensive G G G Hypertension

8. QTL Hypertensive G G G Hypertension

9. Rat researchers ask...

10. Rat researchers ask... Has anyone done any expression studies using congenic rats?

11. Rat researchers ask... Has anyone done any expression studies using congenic rats? What tissue is this gene expressed in?

12. Rat researchers ask... Has anyone done any expression studies using congenic rats? What tissue is this gene expressed in? Are any of these genes associated with my phenotype?

13. Rat researchers ask... Has anyone done any expression studies using congenic rats? What tissue is this gene expressed in? Are any of these genes associated with my phenotype? What rat expression studies have been done on Mammary Cancer(aka breast neoplasms/breast cancer/cancer of the

14. Rat researchers ask... Has anyone done any expression studies using congenic rats? What tissue is this gene What expression data expressed in? is known for SD (aka Are any of these SD/NHsd, Harlan genes associated Sprague Dawley, with my phenotype? Sprague Dawley) rats? What rat expression studies have been done on Mammary Cancer(aka breast neoplasms/breast cancer/cancer of the

15. Rat researchers ask... Has anyone done any expression studies using congenic rats? What tissue is this gene What expression data expressed in? is known for SD (aka Are any of these SD/NHsd, Harlan genes associated Sprague Dawley, with my phenotype? Sprague Dawley) rats? Has this gene been seen in the brain? What rat expression studies have been done on Mammary Cancer(aka breast neoplasms/breast cancer/cancer of the

16. Biological Data Warehouse

17. Biological Data Warehouse Really important piece of data...

18. NCBI GEO db

19. Data hidden in plain sight

20. NCBO Annotator http://www.bioontology.org/wiki/index.php/Annotator_Web_service

21. Parallel Annotation Workﬂow GEO Records Create Annotation Jobs & Queue Up Q-Out 1..n Annot. Workers RabbitMQ Index text at OBA Parse Q-In Results Results saved to Put results in to GMiner database queue for save

22. Current Ontologies http://bioportal.bioontology.org/

23. gminer.mcw.edu

24. Using the ontology structure

25. Curation of results NCBO Ontology Widgets http://www.bioontology.org/wiki/index.php/Ontology_Widgets

26. Curation of results NCBO Ontology Widgets http://www.bioontology.org/wiki/index.php/Ontology_Widgets

27. Curation of results NCBO Ontology Widgets http://www.bioontology.org/wiki/index.php/Ontology_Widgets

28. Curation of results NCBO Ontology Widgets http://www.bioontology.org/wiki/index.php/Ontology_Widgets

29. Explore Cardio data

30. Find Congenic data

31. Browse by annotation

32. SHRSP overview

33. Combine results

34. Combine results

35. Linking annotations to data

36. Linking annotations to data

37. Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 Alb

38. Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 Alb

39. Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 + Alb

40. Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 + Alb Hbb is_expressed_in rat kidney Tm2d1 is_expressed_in rat kidney

41. Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 + Alb Hbb is_expressed_in rat kidney Tm2d1 is_expressed_in rat kidney Human (U133, U133v2.), Mouse (430, U74, U95) and Rat (U34a/b/c, 230, 230v2) 62,000 samples x ca. 25,000 genes/sample = 1.5B data points

42. Probeset results on GMiner Probeset 1395269_s_at for Gabrd - gamma-aminobutyric acid (GABA) A receptor, delta

43. Probeset results on GMiner Probeset 1395269_s_at for Gabrd - gamma-aminobutyric acid (GABA) A receptor, delta

44. Probeset results on GMiner Probeset 1395269_s_at for Gabrd - gamma-aminobutyric acid (GABA) A receptor, delta Hs GABDR

45. QTL Hypertensive G G G Hypertension

46. QTL Hypertensive G G G Hypertension

47. QTL Hypertensive G G G Pathway Hypertension

48. QTL Hypertensive G G G Pathway G G Hypertension

49. QTL Hypertensive G G G Pathway G G Component Function Process Hypertension

50. QTL Hypertensive G G G Pathway G G Component Function Process Hypertension

51. QTL Hypertensive G G G Pathway G G Anatomy (Kidney) Component Function Process Hypertension

52. QTL Hypertensive G G G Pathway Str 1 != Str 2 G G Anatomy (Kidney) Component Function Process Hypertension

53. Ontology Advantages • Unstructured to Structured (using OBA service) • Structured (Faceted) browsing of data • Encourages discussion of data & its meaning • Integration with other data (via same ontologies or mappings to others)

54. Ontology Hurdles • Managing ontology/vocabulary terms and structure • Time to encode data using ontology vs free text • Consistent use/annotation using ontologies • Quite a few ‘standards’ to pick from....

55. Acknowledgements • Joey Geiger - Development of GMiner • Jennifer Smith - Video creation, data curation • Rajni Nigam - Rat Strain Ontology • Clement Jonquet - NCBO Annotator tools • Mark Musen & NIH Roadmap Initiative - Our Funding!

56. Links • http://gminer.mcw.edu Web application • http://github.com/mcwbbc/gminer Gminer Code • http://github.com/simont/MCW-RDF RDFizer code • http://bioportal.bioontology.org/ BioPortal Email: simont@mcw.edu Twitter: @simon_t

Hinweis der Redaktion

The Rat Genome Database is one of the main projects we have at MCW. It is the model organism database for the laboratory rat, Rattus norvegicus. We curate, genes, strains, QTL, etc. and make extensive use of ontologies such as GO, pathway, rat strain, disease, phenotype.
This is a typical use case for rat genomics - how to identify the causes of hypertension in a hypertensive rat? Quite often a QTL is measured indicating a region on the chromosome that is statistically shown to be related to the trait in question - how to go from the genes in that region to the cause of the disease? Not an easy task - &#x2018;then a miracle happens&#x2019;
This is a typical use case for rat genomics - how to identify the causes of hypertension in a hypertensive rat? Quite often a QTL is measured indicating a region on the chromosome that is statistically shown to be related to the trait in question - how to go from the genes in that region to the cause of the disease? Not an easy task - &#x2018;then a miracle happens&#x2019;
This is a typical use case for rat genomics - how to identify the causes of hypertension in a hypertensive rat? Quite often a QTL is measured indicating a region on the chromosome that is statistically shown to be related to the trait in question - how to go from the genes in that region to the cause of the disease? Not an easy task - &#x2018;then a miracle happens&#x2019;
This is a typical use case for rat genomics - how to identify the causes of hypertension in a hypertensive rat? Quite often a QTL is measured indicating a region on the chromosome that is statistically shown to be related to the trait in question - how to go from the genes in that region to the cause of the disease? Not an easy task - &#x2018;then a miracle happens&#x2019;
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions. Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions. Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions. Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions. Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions. Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions. Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Technical problem - lots of data being stored, hard to find it again. Government Warehouse image. Data is archived with good intentions but in doing so is often not easy to find again... If you cant find the data, its not really much use.
NCBI&#x2019;s Gene Expression Omnibus has a lot of relevant data, either as text or raw data.
Can we start to capture some of this informaiton in an informatically-tractable fashion using ontologies and the OBA tools at the National Center for Biomedical Ontology in an annotation pipeline? The red boxes highlight some concepts of interest - rat strains and tissues being used in this experiment. A human can read these and know whats going on but what about a computer?
Driving biological project - use NCBO Annotator web services to mark up the text in the GEO records using ontologies
Take sections of text from GEO records, create annotation jobs, place in queue Workers take the jobs off the queue, index for appropriate ontologies at NCBO Results are placed on Input queue for saving back to the database.
We are currently using two ontologies, the rat strain ontology created at RGD and the Mouse Gross Anatomy Ontology created at the JAX. These are both available at the NCBO BioPortal
GEO data is run through the pipeline and loaded into Gminer for curation and analysis
Searching for BRAIN returns results that also match any of these child terms of the concept Brain.
New annotations can be added using the NCBO ontology widgets
New annotations can be added using the NCBO ontology widgets
New annotations can be added using the NCBO ontology widgets
Enter the ontology heirarchy at a top level and then drill down
Annotations and tag clouds can be used to explore the datasets - what do we know about SHRSP (Spontaneously hypertensive rat, stroke prone) - brain studies, also used in conjunction with the SR and SR/JHsd rats
Initial results focusing on GEO rat datasets has provided a lot of great information and allowed us to create some handy navigational interfaces to the data, enabling queries that were not possible on any other site. Want to find expression data for the SS rat Kidney - click the terms and the datasets appear.
Can we link from the annotations to the samples, down to the raw data in that sample and from there to the genes involved? Affy chips have the detection call, a fairly conservative present/absent call indicating if the probe set was observed in that particular sample.
Can we link from the annotations to the samples, down to the raw data in that sample and from there to the genes involved? Affy chips have the detection call, a fairly conservative present/absent call indicating if the probe set was observed in that particular sample.
Can we link from the annotations to the samples, down to the raw data in that sample and from there to the genes involved? Affy chips have the detection call, a fairly conservative present/absent call indicating if the probe set was observed in that particular sample.
We can then related the probesets to the genes to the ontology annotations to create triple such as this. If we do this for the affy data in GEO for Rat, Mouse and Human we will have somewhere upwards of 1.5B data points to encode.
We can then related the probesets to the genes to the ontology annotations to create triple such as this. If we do this for the affy data in GEO for Rat, Mouse and Human we will have somewhere upwards of 1.5B data points to encode.
For each probe we can look at the samples in which it was tested and see if it was present/absent/marginal and compile this data to get a feel for how often a gene was seen in a particular tissue/organ.
This can be viewed as a chart of tissue distribution. When compared to similar results from GeneCards/Novartis BioGPS the results are quite comparable indicating that this approach has some merit.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
Acknowledgements

Using Ontologies to accelerate candidate gene identification

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Using Ontologies to accelerate candidate gene identification

Ähnlich wie Using Ontologies to accelerate candidate gene identification (18)

Mehr von Simon Twigger

Mehr von Simon Twigger (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Using Ontologies to accelerate candidate gene identification

Hinweis der Redaktion