MongoDB and Connectivity Map making disease genetics connections

•Als PPTX, PDF herunterladen•

3 gefällt mir•1,188 views

The Broad Institute has developed a novel high-throughput gene-expression profiling technology and has used it to build an open-source catalog of over a million profiles that captures the functional states of cells when treated with drugs and other types of perturbations. Referred to as the Connectivity Map (or CMap), these data when paired with pattern matching algorithms, facilitate the discovery of connections between drugs, genes and diseases. We wished to expose this resource to scientists around the world via an API that is easily accessible to programmers and biologists alike. We required a database solution that could handle a variety of data types and handle frequent changes to the schema. We realized that a relational database did not fit our needs, and gravitated towards MongoDB for its ease of use, support for dynamic schema, complex data structures and expressive query syntax. In this talk, we’ll walk through how we built the CMap library. We’ll discuss why we chose MongoDB, the various schema design iterations and tradeoffs we’ve made, how people are using the API, and what we’re planning for the next generation of biomedical data.

Technologie

MongoDB and the Connectivity Map
making connections between genetics and disease

.
Corey
cflynn@broadinstitute.org
@CoreyJFlynn

Gene Expression
a common language for biology

.
13
2006
~7,000 experiments
Over 19,000 registered users
Cited by over 1,200 scientific reports

Connectivity Map Dataset
1.4 million gene expression profiles
12,488 Compounds
• FDA approved drugs
• Bioactive tool compounds
• Screening hits
3,800 Genes (shRNA & cDNA)
• Targets/pathways of approved drugs
• Candidate disease genes
• Community nominations
15 Cell types
• Banked primary cell types
• Cancer cell lines
• Primary hTERT-immortalized
• Patient-derived iPS cells
• Community nominated

Connectivity Map Data
Easy to describe, tough to Model
• Diverse users and use-cases
• Annotations are complex and often
incomplete
• Frequent updates

Data Model
An agile philosophy keeps the model tractable
Store just what’s needed
Test and use daily
Refactor frequently

Data Model
An inventory of signatures
signature_info

Data Model
Shared fields as separate collections
signature_info
cell_info

Data Model
Shared fields as separate collections
signature_info treatment_info

Data Model
Add computed fields and external meta-data
signature_info cell_info

Data Model
Denormalize to optimize lookups
signature_info treatment_info

APIs
Are awesome, life science needs more of them
/siginfo/cell/A

APIs
Are awesome, life science needs more of them
/siginfo?q={“cell”:“A”}

API
MongoDB inspired a rich query syntax
Function Example
Query /siginfo?q={“cell”:“A”,“name”:“B”}
Field selection /siginfo?q={}&f={“name”:1}
Document count /siginfo?q={}&c=true
Document limit /siginfo?q={}&l=10
Skip documents /siginfo?q={}&l=10&sk=10
Sort order /siginfo?q={}&s={“name”:-1,“cell”:1}
Distinct values /siginfo?q={}&d=name
Aggregation /siginfo?q={}&g=name

API
Node and Mongoose enable easy API creation

Language Bindings
JSON as a universal format
Javascript
Python
R

Analytic Tools
A compute API liberates command line scripts

Compute API
Message queuing via a capped collection

A research platform for functional
genomics

Predicting Drug Function
Diverse structures, common activities

Predicting Drug Function
Diverse structures, common activities
VEGFR inhibitor
PPARG agonist
PI3K/MTOR inhibitor
ROCK inhibitor
Estrogen agonist

Finding Novel Drug Targets
Repurposing failed drugs
Original target

Finding Novel Drug Targets
Repurposing failed drugs
Original target
Failed in Phase 2 clinical trial due to lack of efficacy

Finding Novel Drug Targets
Repurposing failed drugs
Original target
Novel Target A
Novel Target B
Novel Target C
Novel Target D

MongoDB and Connectivity Map making disease genetics connections

Empfohlen

Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible LibraryKsenija Mincic Obradovic

Crossref Funding Data Webinar 091616Crossref

The benefits of using Crossref metadata for libraries and scientists - Crossr...Crossref

Introduction to OPEN DATA and other hypes (2017/18)Julià Minguillón

Database Projecthaleycockrell208

Querieslindy23

The ENCODE Portal REST API ENCODE-DCC

How to create new spatial planning metainformation to be inspire compliantKarel Charvat

Empfohlen

Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible LibraryKsenija Mincic Obradovic

Crossref Funding Data Webinar 091616Crossref

The benefits of using Crossref metadata for libraries and scientists - Crossr...Crossref

Introduction to OPEN DATA and other hypes (2017/18)Julià Minguillón

Database Projecthaleycockrell208

Querieslindy23

The ENCODE Portal REST API ENCODE-DCC

How to create new spatial planning metainformation to be inspire compliantKarel Charvat

Open semantic chemical structuresStuart Chalk

culture victoria lodlam lightningtalkDavid F. Flanders

Overview of the features and architecture of Glowing Bear and tranSMARTGijs Kant

Getting started with looking up metadata Crossref

Session5Denise Garofalo

Entities and attributesForrester High School

New Initiatives - Geoffrey Bilder - London LIVE 2017Crossref

Funding data & the Funder RegistryCrossref

Database model BY MEcristina jane penaso

Jarrar: Data Integration and Fusion using RDFMustafa Jarrar

Database modelShashwat Shriparv

Validata: A tool for testing profile conformanceAlasdair Gray

Reference HackersNicoleBranch

Supporting Dataset Descriptions in the Life SciencesAlasdair Gray

Your Work is Distinctive, What about Your Name? Japan Library Fair 2014ORCID, Inc

Presentation_euroCRIS_ESEd Simons

ScienceOpen for ResearchersJonathan Tennant

FundRef Update - Charleston Conference 2013Chris Shillum

Locus linkVidya Kalaivani Rajkumar

The HCLS Community Profile: Describing Datasets, Versions, and DistributionsAlasdair Gray

MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...MongoDB

Use of open_linked_data_in_bioinformaticsRemzi Çelebi

Weitere ähnliche Inhalte

Was ist angesagt?

Open semantic chemical structuresStuart Chalk

culture victoria lodlam lightningtalkDavid F. Flanders

Overview of the features and architecture of Glowing Bear and tranSMARTGijs Kant

Getting started with looking up metadata Crossref

Session5Denise Garofalo

Entities and attributesForrester High School

New Initiatives - Geoffrey Bilder - London LIVE 2017Crossref

Funding data & the Funder RegistryCrossref

Database model BY MEcristina jane penaso

Jarrar: Data Integration and Fusion using RDFMustafa Jarrar

Database modelShashwat Shriparv

Validata: A tool for testing profile conformanceAlasdair Gray

Reference HackersNicoleBranch

Supporting Dataset Descriptions in the Life SciencesAlasdair Gray

Your Work is Distinctive, What about Your Name? Japan Library Fair 2014ORCID, Inc

Presentation_euroCRIS_ESEd Simons

ScienceOpen for ResearchersJonathan Tennant

FundRef Update - Charleston Conference 2013Chris Shillum

Locus linkVidya Kalaivani Rajkumar

The HCLS Community Profile: Describing Datasets, Versions, and DistributionsAlasdair Gray

Was ist angesagt? (20)

Open semantic chemical structures

culture victoria lodlam lightningtalk

Overview of the features and architecture of Glowing Bear and tranSMART

Getting started with looking up metadata

Session5

Entities and attributes

New Initiatives - Geoffrey Bilder - London LIVE 2017

Funding data & the Funder Registry

Database model BY ME

Jarrar: Data Integration and Fusion using RDF

Database model

Validata: A tool for testing profile conformance

Reference Hackers

Supporting Dataset Descriptions in the Life Sciences

Your Work is Distinctive, What about Your Name? Japan Library Fair 2014

Presentation_euroCRIS_ES

ScienceOpen for Researchers

FundRef Update - Charleston Conference 2013

Locus link

The HCLS Community Profile: Describing Datasets, Versions, and Distributions

Ähnlich wie MongoDB and Connectivity Map making disease genetics connections

MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...MongoDB

Use of open_linked_data_in_bioinformaticsRemzi Çelebi

BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu

FedCentric_PresentationYatpang Cheung

BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu

BioIT Europe 2010 - BioCatalogueBioCatalogue

Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...Syed Ahmad Chan Bukhari, PhD

The Role of Metadata in Reproducible Computational ResearchJeremy Leipzig

Linked APIs for Life Sciences Tutorial at SWAT4LS 3011sspeiser

ReVeaLD: A user-driven domain-specific interactive search platform for biomed...Maulik Kamdar

2009 0807 Lod GmodJun Zhao

Semantic Web Technologies: A Paradigm for Medical InformaticsChimezie Ogbuji

Accelerate Pharmaceutical R&D with Big Data and MongoDBMongoDB

Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk

How to make your published data findable, accessible, interoperable and reusablePhoenix Bioinformatics

BioThings and SmartAPI: building an ecosystem of interoperable biological kno...Chunlei Wu

agINFRA – a multilingual infrastructure for information on agricultural innov...AIMS (Agricultural Information Management Standards)

PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...Araport

Accelerate pharmaceutical r&d with mongo dbMongoDB

FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer

Ähnlich wie MongoDB and Connectivity Map making disease genetics connections (20)

MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...

Use of open_linked_data_in_bioinformatics

BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge

FedCentric_Presentation

BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge

BioIT Europe 2010 - BioCatalogue

Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...

The Role of Metadata in Reproducible Computational Research

Linked APIs for Life Sciences Tutorial at SWAT4LS 3011

ReVeaLD: A user-driven domain-specific interactive search platform for biomed...

2009 0807 Lod Gmod

Semantic Web Technologies: A Paradigm for Medical Informatics

Accelerate Pharmaceutical R&D with Big Data and MongoDB

Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...

How to make your published data findable, accessible, interoperable and reusable

BioThings and SmartAPI: building an ecosystem of interoperable biological kno...

agINFRA – a multilingual infrastructure for information on agricultural innov...

PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...

Accelerate pharmaceutical r&d with mongo db

FAIR Data Knowledge Graphs–from Theory to Practice

Mehr von MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB

MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB

MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Kürzlich hochgeladen

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Google AI Hackathon: LLM based Evaluator for RAGSujit Pal

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Histor y of HAM Radio presentation slidevu2urc

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

How to convert PDF to text with Nanonetsnaman860154

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Kürzlich hochgeladen (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

How to Troubleshoot Apps for the Modern Connected Worker

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

My Hashitalk Indonesia April 2024 Presentation

Google AI Hackathon: LLM based Evaluator for RAG

Data Cloud, More than a CDP by Matt Robison

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Salesforce Community Group Quito, Salesforce 101

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Presentation on how to chat with PDF using ChatGPT code interpreter

A Domino Admins Adventures (Engage 2024)

Histor y of HAM Radio presentation slide

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Handwritten Text Recognition for manuscripts and early printed texts

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Boost PC performance: How more available memory can improve productivity

How to convert PDF to text with Nanonets

Breaking the Kubernetes Kill Chain: Host Path Mount

MongoDB and Connectivity Map making disease genetics connections

1. MongoDB and the Connectivity Map making connections between genetics and disease

2. .

3. .

4. .

5. .

6. . Corey cflynn@broadinstitute.org @CoreyJFlynn

7. Gene Expression a common language for biology

8. .

9. .

10. .

11. .

12. .

13. . 13 2006 ~7,000 experiments Over 19,000 registered users Cited by over 1,200 scientific reports

14. . 2006

15. . 2014

16. . 16

17. Connectivity Map Dataset 1.4 million gene expression profiles 12,488 Compounds • FDA approved drugs • Bioactive tool compounds • Screening hits 3,800 Genes (shRNA & cDNA) • Targets/pathways of approved drugs • Candidate disease genes • Community nominations 15 Cell types • Banked primary cell types • Cancer cell lines • Primary hTERT-immortalized • Patient-derived iPS cells • Community nominated

18. Connectivity Map Data Easy to describe, tough to Model • Diverse users and use-cases • Annotations are complex and often incomplete • Frequent updates

19. Data Model An agile philosophy keeps the model tractable Store just what’s needed Test and use daily Refactor frequently

20. Data Model An agile philosophy keeps the model tractable Store just what’s needed Test and use daily Refactor frequently

21. Data Model An inventory of signatures signature_info

22. Data Model Shared fields as separate collections signature_info cell_info

23. Data Model Shared fields as separate collections signature_info treatment_info

24. Data Model Add computed fields and external meta-data signature_info cell_info

25. Data Model Denormalize to optimize lookups signature_info treatment_info

26. APIs Are awesome, life science needs more of them /siginfo/cell/A

27. APIs Are awesome, life science needs more of them /siginfo?q={“cell”:“A”}

28. API MongoDB inspired a rich query syntax Function Example Query /siginfo?q={“cell”:“A”,“name”:“B”} Field selection /siginfo?q={}&f={“name”:1} Document count /siginfo?q={}&c=true Document limit /siginfo?q={}&l=10 Skip documents /siginfo?q={}&l=10&sk=10 Sort order /siginfo?q={}&s={“name”:-1,“cell”:1} Distinct values /siginfo?q={}&d=name Aggregation /siginfo?q={}&g=name

29. API Node and Mongoose enable easy API creation

30. Language Bindings JSON as a universal format Javascript Python R

31.

32.

33.

34. Analytic Tools A compute API liberates command line scripts

35. Compute API Message queuing via a capped collection

36. A research platform for functional genomics

37. Predicting Drug Function Diverse structures, common activities

38. Predicting Drug Function Diverse structures, common activities VEGFR inhibitor PPARG agonist PI3K/MTOR inhibitor ROCK inhibitor Estrogen agonist

39. Finding Novel Drug Targets Repurposing failed drugs Original target

40. Finding Novel Drug Targets Repurposing failed drugs Original target Failed in Phase 2 clinical trial due to lack of efficacy

41. Finding Novel Drug Targets Repurposing failed drugs Original target Novel Target A Novel Target B Novel Target C Novel Target D