SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Wikidata for biomedical
knowledge integration and
curation
Benjamin Good
The Scripps Research Institute
@bgood
bgood@scripps.edu
“knowledge”
• A lot
• Important
• Text
What are the
functions of
Fibronectin?
37186 articles
What are the functions of
the 238 ‘significant’ genes
that came up in my high
throughput screen??
What are the
functions of
Fibronectin?
37186 articles
…
Gene Property Value
Fibronectin Biological
Process
Angiogenesis
Fibronectin Cellular
Localization
Extracellular
matrix
Fibronectin Related
Disease
Glomerulopathy
“knowledge integration”
“curation”
“knowledge base”
Answers
Knowledge Bases
5
1,500+ listed at http://www.oxfordjournals.org/nar/database/a/
Applications of knowledge bases
• Find information
• Plan research
• ”Known unknowns?”
• Interpret data
• Gene Ontology
Enrichment Analysis
Interesting Gene List
Gene Ontology, Pathway,
Network interpretation
Knowledge bases are important tools
and will only grow more important
over time
9
Great!
10
BUT
11
1. Knowledge bases are not complete
2. Will get to later..
Annotation
missing from
human GO
annotation.
Should be here!
(‘5 HT Receptor’ means ‘Serotonin Receptor’)
Circa 2010
Added to GO
Jan. 2016
First characterized 1996
(Kohen et al J Neurochem)
Interesting Gene List
Gene Ontology, Pathway,
Network interpretation
We don’t know what we are missing
15
inflammatory
response
defense
response
Serotonin
receptor
activity?
?
response to
wounding
immune
response
Interesting Gene List
“Gene Ontology, its great right ?”
• “It sucks”
• “I only use it out of desperation”
WHY?!
Process of building knowledge bases
1. do science 2. publish it 3. Manually extract
the knowledge
Gene Property Value
Fibronectin Biological
Process
Angiogenesis
Fibronectin Cellular
Localization
Extracellular
matrix
Fibronectin Related
Disease
Glomerulopathy
why does he look so down?
Many scientists, powerful tools,
comparatively little reward for
curating knowledge
100’s of thousands 100’s
More than 2 articles
published/minute
Professional biocuration does not scale
up to the rate of production
1. do science 2. publish it 3. Manually extract
the knowledge
Gene Property Value
Fibronectin Biological
Process
Angiogenesis
Fibronectin Cellular
Localization
Extracellular
matrix
Fibronectin Related
Disease
Glomerulopathy
23
1. Knowledge bases are not complete
2. Knowledge needs integration
Knowledge is scattered,
integration brings it together
Merging knowledge bases:
the language barrier
“Methadone”
Interacts with:
“Moxifloxacin”May treat:
Opioid-Related Disorders
ID:
N0000000174
ID:
4095
Molecular Weight:
309.44518 g/mol
…
= ?
= ?
= ?
= ?
= ?
= ?
ID:
DB00333
Manufactured by:
Roxane laboratories inc
Good for business, bad for science
Google Scholar search shows 469 papers about
“identifier mapping” in bioinformatics
What can we do?
Global Knowledge Platform
What would happen if everyone
was literally working on the same
database?
1. Split up work more effectively
2. Make integration the default
behavior
Is to data
as Wikipedia is to text
“Giving more people more access to more knowledge”
A free and open repository of knowledge
Managed by the MediaWiki foundation
that operates Wikipedia
It’s a
knowledge
base!
• Anyone
can edit
• Anyone
can use
Item: Q84
Item: Q414043
RELN
Genomic start: 103471784
GenLoc assembly:
GRCh38
Stated in:
Ensembl Release 83
Retrieved:
19 January 2016
Value (numeric)
Property
Claim Qualifiers
References
https://www.wikidata.org/wiki/Q414043
Statement
Item: Q414043
RELN
Encodes: Reelin (protein) Stated in:
NCBI homo sapiens
annotation release 107
Retrieved:
19 January 2016
Value (item)
Property
Claim Qualifiers
References
https://www.wikidata.org/wiki/Q414043
Statement
A Giant Global Graph
These statements link together into a queryable graph
https://query.wikidata.org
We are seeding it with
biomedical data
• All human, mouse genes
and proteins
• All Gene Ontology terms
• All FDA approved drugs
• 9,000+ human diseases
Burgstaller et al (2016) Database (preprint in BioRxiv)
Mitraka et al (2015) Semantic Web Applications for the Life Sciences (best paper) (preprint in BioRxiv)
Our seeds are largely
concepts linked to many
identifier systems
N identifiers per item
• Genes: 8
• Drugs: 18
• Diseases: 11
Burgstaller et al (2016) Database (preprint in BioRxiv)
Mitraka et al (2015) Semantic Web Applications for the Life Sciences (best paper) (preprint in BioRxiv)
Facilitate
integration
with key
external
knowledge
bases
Nurturing a multi-community
garden of biomedical knowledge
Gene DrugDisease
A Platform for knowledge integration and curation
38
Open data
Wikipedia(s)
Your Apps
Here!
Your Apps
Here!
Your Apps
Here!
Your Apps
Here!
Application #1 (of many)
Burgstaller et al (2016) Database (preprint in BioRxiv)
Impact of wikidata on Wikipedia
Gene Wiki
Version 1.
{{GNF_Protein_box | Name = Reelin| image = |
image_source = | PDB = {{PDB2|4AD9}} | HGNCid = 18512 |
MGIid = | Symbol = LACTB2 | AltSymbols =; CGI-83 |
IUPHAR = | ChEMBL = | OMIM = None | ECnumber = |
Homologene = 9349 | GeneAtlas_image1 = |
GeneAtlas_image2 = | GeneAtlas_image3 = |
Protein_domain_image = | Function =
{{GNF_GO|id=GO:0005515 |text = protein binding}}
{{GNF_GO|id=GO:0016787 |text = hydrolase activity}}
{{GNF_GO|id=GO:0046872 |text = metal ion binding}} |
Component = {{GNF_GO|id=GO:0005739 |text =
mitochondrion}} | Process = {{GNF_GO|id=GO:0008152
|text = metabolic process}} | Hs_EntrezGene = 51110 |
Hs_Ensembl = ENSG00000147592 | Hs_RefseqmRNA =
NM_016027 | Hs_RefseqProtein = NP_057111 |
Hs_GenLoc_db = hg38 | Hs_GenLoc_chr = 8 |
Hs_GenLoc_start = 70635318 | Hs_GenLoc_end = 70669174
| Hs_Uniprot = Q53H82 | Mm_EntrezGene = 212442 |
Mm_Ensembl = ENSMUSG00000025937 |
Mm_RefseqmRNA = NM_145381 | Mm_RefseqProtein =
NP_663356 | Mm_GenLoc_db = mm10 | Mm_GenLoc_chr =
1 | Mm_GenLoc_start = 13623330 | Mm_GenLoc_end =
13660546 | Mm_Uniprot = Q99KR3 | path = PBB/51110}}
=
Gene Wiki
Version 2.
{{Infobox gene}}
• All data in
Wikidata
• 1 Lua script works
for all genes
=
(1 of these for every gene)
Application #2 Web Apollo Genome Browser
41
• Genome annotation data retrieved
from wikidata via SPARQL queries
to https://query.wikidata.org
• Prototype achieved at recent San
Diego hackathon
1 Putman et al (2016) (under review) (preprint in BioRxiv)
Microbial Genetic Data
•Widely Distributed
•Difficult to query
•Not structured in meaningful way
•A lot of interest from this
community !
Microbial Genetic Data
Microbial genomes in Wikidata
• Loading genes,
proteins,
annotations for
120 reference
genomes.
• Completed 21
genomes so far
Putman et al (2016) (under review) (preprint in BioRxiv)
Microbiome modeling in Wikidata
Putman et al (2016) (under review) (preprint in BioRxiv)
46
1. Knowledge bases are not complete
2. Knowledge needs integration
Can help
Centralizing content while distributing labor
47
Open data
Your Apps
Here!
Wikipedia(s)
Your Apps
Here!
Your Apps
Here!
Your Apps
Here!
Thanks!
Gene Wikidata Team
Andra Waagmeester (Micelio)
* Sebastian Burgstaller (Scripps)
* Tim Putman (Scripps)
* Elvira Mitraka (U Maryland)
Julia Turner (Scripps)
Justin Leong (UBC)
Lynn Schriml (U Maryland)
Paul Pavlidis (UBC)
Andrew Su (Scripps)
Ginger Tsueng (Scripps)
Contact
bgood@scripps.edu* First author on manuscript cited in this presentation
Ben Tim
Andra
Elvira
Sebastian
Some Gene Wiki team members
enjoying their best paper award
at SWAT4LS, Dec. 2015
Adapted logo

Weitere ähnliche Inhalte

Was ist angesagt?

2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbioBenjamin Good
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Ravi Madduri
 
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceAndrew Su
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panelRavi Madduri
 
dkNET Webinar: Population-Based Approaches to Investigate Endocrine Communica...
dkNET Webinar: Population-Based Approaches to Investigate Endocrine Communica...dkNET Webinar: Population-Based Approaches to Investigate Endocrine Communica...
dkNET Webinar: Population-Based Approaches to Investigate Endocrine Communica...dkNET
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grcc.titus.brown
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Causal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyCausal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyChris Mungall
 
The Language of the Gene Ontology
The Language of the Gene OntologyThe Language of the Gene Ontology
The Language of the Gene Ontologyrobertstevens65
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesMelanie Courtot
 
GlyGen Warren Workshop in Boston
GlyGen Warren Workshop in BostonGlyGen Warren Workshop in Boston
GlyGen Warren Workshop in BostonGlyGen
 
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...GigaScience, BGI Hong Kong
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekingeProf. Wim Van Criekinge
 
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open DataGraph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open DataMaulik Kamdar
 
Acs denver dirks potenzone 30 aug2011
Acs denver dirks potenzone 30 aug2011Acs denver dirks potenzone 30 aug2011
Acs denver dirks potenzone 30 aug2011Rudy Potenzone
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyChris Mungall
 

Was ist angesagt? (20)

2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline
 
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panel
 
dkNET Webinar: Population-Based Approaches to Investigate Endocrine Communica...
dkNET Webinar: Population-Based Approaches to Investigate Endocrine Communica...dkNET Webinar: Population-Based Approaches to Investigate Endocrine Communica...
dkNET Webinar: Population-Based Approaches to Investigate Endocrine Communica...
 
David
DavidDavid
David
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grc
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Causal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyCausal reasoning using the Relation Ontology
Causal reasoning using the Relation Ontology
 
The Language of the Gene Ontology
The Language of the Gene OntologyThe Language of the Gene Ontology
The Language of the Gene Ontology
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resources
 
GlyGen Warren Workshop in Boston
GlyGen Warren Workshop in BostonGlyGen Warren Workshop in Boston
GlyGen Warren Workshop in Boston
 
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
 
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open DataGraph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
 
Acs denver dirks potenzone 30 aug2011
Acs denver dirks potenzone 30 aug2011Acs denver dirks potenzone 30 aug2011
Acs denver dirks potenzone 30 aug2011
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene Ontology
 

Andere mochten auch

Welcome to Ukraine - SunCity Travel LLC
Welcome to Ukraine - SunCity Travel LLCWelcome to Ukraine - SunCity Travel LLC
Welcome to Ukraine - SunCity Travel LLCAlex Faynin
 
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...Benjamin Good
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giantsBenjamin Good
 
Mark Hopper Product And Marketing Exec 2010
Mark Hopper Product And Marketing Exec 2010Mark Hopper Product And Marketing Exec 2010
Mark Hopper Product And Marketing Exec 2010Mark Hopper
 
Human Guided Forests (HGF)
Human Guided Forests (HGF)Human Guided Forests (HGF)
Human Guided Forests (HGF)Benjamin Good
 
Gene Wiki at Phenotype RCN annual meeting
Gene Wiki at Phenotype RCN annual meetingGene Wiki at Phenotype RCN annual meeting
Gene Wiki at Phenotype RCN annual meetingBenjamin Good
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KBenjamin Good
 
Short update on The Cure game first week
Short update on The Cure game first weekShort update on The Cure game first week
Short update on The Cure game first weekBenjamin Good
 
Open source breakfast norge findwise
Open source breakfast norge findwiseOpen source breakfast norge findwise
Open source breakfast norge findwiseCominvent AS
 
Resume 2009 Compatible V2 1
Resume 2009 Compatible V2 1 Resume 2009 Compatible V2 1
Resume 2009 Compatible V2 1 schelby
 
The National Society For The Protection Of Hmmm
The National Society For The Protection Of HmmmThe National Society For The Protection Of Hmmm
The National Society For The Protection Of Hmmmguest0233e9d0
 
Bio Logical Mass Collaboration3
Bio Logical Mass Collaboration3Bio Logical Mass Collaboration3
Bio Logical Mass Collaboration3Benjamin Good
 
EISHI CO. main eps machine catalogue
EISHI CO. main eps machine catalogueEISHI CO. main eps machine catalogue
EISHI CO. main eps machine catalogueeishimachinery
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaCominvent AS
 
Light steel villa catalogue log
Light steel villa catalogue logLight steel villa catalogue log
Light steel villa catalogue logeishimachinery
 
Dagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkDagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkCominvent AS
 
B2B Branding Explained
B2B Branding ExplainedB2B Branding Explained
B2B Branding Explainedcsadhy
 
Buyer Remorse
Buyer RemorseBuyer Remorse
Buyer Remorsesmfox
 

Andere mochten auch (20)

Welcome to Ukraine - SunCity Travel LLC
Welcome to Ukraine - SunCity Travel LLCWelcome to Ukraine - SunCity Travel LLC
Welcome to Ukraine - SunCity Travel LLC
 
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
Mark Hopper Product And Marketing Exec 2010
Mark Hopper Product And Marketing Exec 2010Mark Hopper Product And Marketing Exec 2010
Mark Hopper Product And Marketing Exec 2010
 
Human Guided Forests (HGF)
Human Guided Forests (HGF)Human Guided Forests (HGF)
Human Guided Forests (HGF)
 
Gene Wiki at Phenotype RCN annual meeting
Gene Wiki at Phenotype RCN annual meetingGene Wiki at Phenotype RCN annual meeting
Gene Wiki at Phenotype RCN annual meeting
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2K
 
Short update on The Cure game first week
Short update on The Cure game first weekShort update on The Cure game first week
Short update on The Cure game first week
 
Open source breakfast norge findwise
Open source breakfast norge findwiseOpen source breakfast norge findwise
Open source breakfast norge findwise
 
Gene wiki jamboree
Gene wiki jamboreeGene wiki jamboree
Gene wiki jamboree
 
Resume 2009 Compatible V2 1
Resume 2009 Compatible V2 1 Resume 2009 Compatible V2 1
Resume 2009 Compatible V2 1
 
genegames.org
genegames.orggenegames.org
genegames.org
 
The National Society For The Protection Of Hmmm
The National Society For The Protection Of HmmmThe National Society For The Protection Of Hmmm
The National Society For The Protection Of Hmmm
 
Bio Logical Mass Collaboration3
Bio Logical Mass Collaboration3Bio Logical Mass Collaboration3
Bio Logical Mass Collaboration3
 
EISHI CO. main eps machine catalogue
EISHI CO. main eps machine catalogueEISHI CO. main eps machine catalogue
EISHI CO. main eps machine catalogue
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alpha
 
Light steel villa catalogue log
Light steel villa catalogue logLight steel villa catalogue log
Light steel villa catalogue log
 
Dagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkDagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søk
 
B2B Branding Explained
B2B Branding ExplainedB2B Branding Explained
B2B Branding Explained
 
Buyer Remorse
Buyer RemorseBuyer Remorse
Buyer Remorse
 

Ähnlich wie 2016 bd2k bgood_wikidata

Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgAndrew Su
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeChris Mungall
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationBenjamin Good
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services IndustryBarry Smith
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...Andrew Su
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOAEBI
 
Python Meetup2014 (Ying Liu)
Python Meetup2014 (Ying Liu)Python Meetup2014 (Ying Liu)
Python Meetup2014 (Ying Liu)eilosei
 
NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw Alexander Pico
 
Python meetup 2014
Python meetup 2014Python meetup 2014
Python meetup 2014eilosei
 
Biothings APIs: high-performance bioentity-centric web services
Biothings APIs: high-performance bioentity-centric web servicesBiothings APIs: high-performance bioentity-centric web services
Biothings APIs: high-performance bioentity-centric web servicesChunlei Wu
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
Functional annotation of invertebrate genomes
Functional annotation of invertebrate genomesFunctional annotation of invertebrate genomes
Functional annotation of invertebrate genomesSurya Saha
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformaticsChris Dwan
 
Genome science intermine
Genome science intermineGenome science intermine
Genome science intermineELIXIR UK
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!adcobb
 
Annotation Analysis for Testing Drug Safety Signals
Annotation Analysis for Testing Drug Safety SignalsAnnotation Analysis for Testing Drug Safety Signals
Annotation Analysis for Testing Drug Safety SignalsTrish Whetzel
 

Ähnlich wie 2016 bd2k bgood_wikidata (20)

Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocuration
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services Industry
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
Python Meetup2014 (Ying Liu)
Python Meetup2014 (Ying Liu)Python Meetup2014 (Ying Liu)
Python Meetup2014 (Ying Liu)
 
NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw
 
Python meetup 2014
Python meetup 2014Python meetup 2014
Python meetup 2014
 
KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017
 
Biothings APIs: high-performance bioentity-centric web services
Biothings APIs: high-performance bioentity-centric web servicesBiothings APIs: high-performance bioentity-centric web services
Biothings APIs: high-performance bioentity-centric web services
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Functional annotation of invertebrate genomes
Functional annotation of invertebrate genomesFunctional annotation of invertebrate genomes
Functional annotation of invertebrate genomes
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
 
Genome science intermine
Genome science intermineGenome science intermine
Genome science intermine
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!
 
Annotation Analysis for Testing Drug Safety Signals
Annotation Analysis for Testing Drug Safety SignalsAnnotation Analysis for Testing Drug Safety Signals
Annotation Analysis for Testing Drug Safety Signals
 

Mehr von Benjamin Good

Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledgeBenjamin Good
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsBenjamin Good
 
Pathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsPathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsBenjamin Good
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of FoodBenjamin Good
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Benjamin Good
 
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery (Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery Benjamin Good
 
Citizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfCitizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfBenjamin Good
 
Building a massive biomedical knowledge graph with citizen science
Building a massive biomedical knowledge graph with citizen scienceBuilding a massive biomedical knowledge graph with citizen science
Building a massive biomedical knowledge graph with citizen scienceBenjamin Good
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBenjamin Good
 
Serious games for bioinformatics education. ISMB 2014 education workshop
Serious games for bioinformatics education.  ISMB 2014 education workshopSerious games for bioinformatics education.  ISMB 2014 education workshop
Serious games for bioinformatics education. ISMB 2014 education workshopBenjamin Good
 
The Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival predictionThe Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival predictionBenjamin Good
 
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Benjamin Good
 
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstractsMicrotask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstractsBenjamin Good
 
Mark2Cure: a crowdsourcing platform for biomedical literature annotation
Mark2Cure: a crowdsourcing platform for biomedical literature annotationMark2Cure: a crowdsourcing platform for biomedical literature annotation
Mark2Cure: a crowdsourcing platform for biomedical literature annotationBenjamin Good
 
An online game for human phenotype prediction
An online game for human phenotype predictionAn online game for human phenotype prediction
An online game for human phenotype predictionBenjamin Good
 

Mehr von Benjamin Good (17)

Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity Models
 
Pathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsPathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMs
 
Science Game Lab
Science Game LabScience Game Lab
Science Game Lab
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2
 
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery (Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
 
(Bio)Hackathons
(Bio)Hackathons(Bio)Hackathons
(Bio)Hackathons
 
Citizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfCitizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdf
 
Building a massive biomedical knowledge graph with citizen science
Building a massive biomedical knowledge graph with citizen scienceBuilding a massive biomedical knowledge graph with citizen science
Building a massive biomedical knowledge graph with citizen science
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
 
Serious games for bioinformatics education. ISMB 2014 education workshop
Serious games for bioinformatics education.  ISMB 2014 education workshopSerious games for bioinformatics education.  ISMB 2014 education workshop
Serious games for bioinformatics education. ISMB 2014 education workshop
 
The Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival predictionThe Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival prediction
 
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
 
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstractsMicrotask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
 
Mark2Cure: a crowdsourcing platform for biomedical literature annotation
Mark2Cure: a crowdsourcing platform for biomedical literature annotationMark2Cure: a crowdsourcing platform for biomedical literature annotation
Mark2Cure: a crowdsourcing platform for biomedical literature annotation
 
An online game for human phenotype prediction
An online game for human phenotype predictionAn online game for human phenotype prediction
An online game for human phenotype prediction
 

2016 bd2k bgood_wikidata

  • 1. Wikidata for biomedical knowledge integration and curation Benjamin Good The Scripps Research Institute @bgood bgood@scripps.edu
  • 2. “knowledge” • A lot • Important • Text
  • 3. What are the functions of Fibronectin? 37186 articles What are the functions of the 238 ‘significant’ genes that came up in my high throughput screen??
  • 4. What are the functions of Fibronectin? 37186 articles … Gene Property Value Fibronectin Biological Process Angiogenesis Fibronectin Cellular Localization Extracellular matrix Fibronectin Related Disease Glomerulopathy “knowledge integration” “curation” “knowledge base” Answers
  • 5. Knowledge Bases 5 1,500+ listed at http://www.oxfordjournals.org/nar/database/a/
  • 6. Applications of knowledge bases • Find information • Plan research • ”Known unknowns?” • Interpret data • Gene Ontology Enrichment Analysis
  • 7. Interesting Gene List Gene Ontology, Pathway, Network interpretation
  • 8. Knowledge bases are important tools and will only grow more important over time
  • 11. 11 1. Knowledge bases are not complete 2. Will get to later..
  • 12. Annotation missing from human GO annotation. Should be here! (‘5 HT Receptor’ means ‘Serotonin Receptor’) Circa 2010
  • 13. Added to GO Jan. 2016 First characterized 1996 (Kohen et al J Neurochem)
  • 14. Interesting Gene List Gene Ontology, Pathway, Network interpretation
  • 15. We don’t know what we are missing 15 inflammatory response defense response Serotonin receptor activity? ? response to wounding immune response Interesting Gene List
  • 16. “Gene Ontology, its great right ?” • “It sucks” • “I only use it out of desperation”
  • 17. WHY?!
  • 18. Process of building knowledge bases 1. do science 2. publish it 3. Manually extract the knowledge Gene Property Value Fibronectin Biological Process Angiogenesis Fibronectin Cellular Localization Extracellular matrix Fibronectin Related Disease Glomerulopathy
  • 19. why does he look so down?
  • 20. Many scientists, powerful tools, comparatively little reward for curating knowledge 100’s of thousands 100’s
  • 21. More than 2 articles published/minute
  • 22. Professional biocuration does not scale up to the rate of production 1. do science 2. publish it 3. Manually extract the knowledge Gene Property Value Fibronectin Biological Process Angiogenesis Fibronectin Cellular Localization Extracellular matrix Fibronectin Related Disease Glomerulopathy
  • 23. 23 1. Knowledge bases are not complete 2. Knowledge needs integration
  • 25. Merging knowledge bases: the language barrier “Methadone” Interacts with: “Moxifloxacin”May treat: Opioid-Related Disorders ID: N0000000174 ID: 4095 Molecular Weight: 309.44518 g/mol … = ? = ? = ? = ? = ? = ? ID: DB00333 Manufactured by: Roxane laboratories inc
  • 26. Good for business, bad for science Google Scholar search shows 469 papers about “identifier mapping” in bioinformatics
  • 27. What can we do?
  • 28. Global Knowledge Platform What would happen if everyone was literally working on the same database? 1. Split up work more effectively 2. Make integration the default behavior
  • 29. Is to data as Wikipedia is to text “Giving more people more access to more knowledge” A free and open repository of knowledge Managed by the MediaWiki foundation that operates Wikipedia
  • 30. It’s a knowledge base! • Anyone can edit • Anyone can use
  • 32. Item: Q414043 RELN Genomic start: 103471784 GenLoc assembly: GRCh38 Stated in: Ensembl Release 83 Retrieved: 19 January 2016 Value (numeric) Property Claim Qualifiers References https://www.wikidata.org/wiki/Q414043 Statement
  • 33. Item: Q414043 RELN Encodes: Reelin (protein) Stated in: NCBI homo sapiens annotation release 107 Retrieved: 19 January 2016 Value (item) Property Claim Qualifiers References https://www.wikidata.org/wiki/Q414043 Statement
  • 34. A Giant Global Graph These statements link together into a queryable graph https://query.wikidata.org
  • 35. We are seeding it with biomedical data • All human, mouse genes and proteins • All Gene Ontology terms • All FDA approved drugs • 9,000+ human diseases Burgstaller et al (2016) Database (preprint in BioRxiv) Mitraka et al (2015) Semantic Web Applications for the Life Sciences (best paper) (preprint in BioRxiv)
  • 36. Our seeds are largely concepts linked to many identifier systems N identifiers per item • Genes: 8 • Drugs: 18 • Diseases: 11 Burgstaller et al (2016) Database (preprint in BioRxiv) Mitraka et al (2015) Semantic Web Applications for the Life Sciences (best paper) (preprint in BioRxiv) Facilitate integration with key external knowledge bases
  • 37. Nurturing a multi-community garden of biomedical knowledge Gene DrugDisease
  • 38. A Platform for knowledge integration and curation 38 Open data Wikipedia(s) Your Apps Here! Your Apps Here! Your Apps Here! Your Apps Here!
  • 39. Application #1 (of many) Burgstaller et al (2016) Database (preprint in BioRxiv)
  • 40. Impact of wikidata on Wikipedia Gene Wiki Version 1. {{GNF_Protein_box | Name = Reelin| image = | image_source = | PDB = {{PDB2|4AD9}} | HGNCid = 18512 | MGIid = | Symbol = LACTB2 | AltSymbols =; CGI-83 | IUPHAR = | ChEMBL = | OMIM = None | ECnumber = | Homologene = 9349 | GeneAtlas_image1 = | GeneAtlas_image2 = | GeneAtlas_image3 = | Protein_domain_image = | Function = {{GNF_GO|id=GO:0005515 |text = protein binding}} {{GNF_GO|id=GO:0016787 |text = hydrolase activity}} {{GNF_GO|id=GO:0046872 |text = metal ion binding}} | Component = {{GNF_GO|id=GO:0005739 |text = mitochondrion}} | Process = {{GNF_GO|id=GO:0008152 |text = metabolic process}} | Hs_EntrezGene = 51110 | Hs_Ensembl = ENSG00000147592 | Hs_RefseqmRNA = NM_016027 | Hs_RefseqProtein = NP_057111 | Hs_GenLoc_db = hg38 | Hs_GenLoc_chr = 8 | Hs_GenLoc_start = 70635318 | Hs_GenLoc_end = 70669174 | Hs_Uniprot = Q53H82 | Mm_EntrezGene = 212442 | Mm_Ensembl = ENSMUSG00000025937 | Mm_RefseqmRNA = NM_145381 | Mm_RefseqProtein = NP_663356 | Mm_GenLoc_db = mm10 | Mm_GenLoc_chr = 1 | Mm_GenLoc_start = 13623330 | Mm_GenLoc_end = 13660546 | Mm_Uniprot = Q99KR3 | path = PBB/51110}} = Gene Wiki Version 2. {{Infobox gene}} • All data in Wikidata • 1 Lua script works for all genes = (1 of these for every gene)
  • 41. Application #2 Web Apollo Genome Browser 41 • Genome annotation data retrieved from wikidata via SPARQL queries to https://query.wikidata.org • Prototype achieved at recent San Diego hackathon 1 Putman et al (2016) (under review) (preprint in BioRxiv)
  • 42. Microbial Genetic Data •Widely Distributed •Difficult to query •Not structured in meaningful way •A lot of interest from this community !
  • 44. Microbial genomes in Wikidata • Loading genes, proteins, annotations for 120 reference genomes. • Completed 21 genomes so far Putman et al (2016) (under review) (preprint in BioRxiv)
  • 45. Microbiome modeling in Wikidata Putman et al (2016) (under review) (preprint in BioRxiv)
  • 46. 46 1. Knowledge bases are not complete 2. Knowledge needs integration Can help
  • 47. Centralizing content while distributing labor 47 Open data Your Apps Here! Wikipedia(s) Your Apps Here! Your Apps Here! Your Apps Here!
  • 48. Thanks! Gene Wikidata Team Andra Waagmeester (Micelio) * Sebastian Burgstaller (Scripps) * Tim Putman (Scripps) * Elvira Mitraka (U Maryland) Julia Turner (Scripps) Justin Leong (UBC) Lynn Schriml (U Maryland) Paul Pavlidis (UBC) Andrew Su (Scripps) Ginger Tsueng (Scripps) Contact bgood@scripps.edu* First author on manuscript cited in this presentation Ben Tim Andra Elvira Sebastian Some Gene Wiki team members enjoying their best paper award at SWAT4LS, Dec. 2015 Adapted logo

Hinweis der Redaktion

  1. Databases. Obviously much more flexible. You can ask them questions.. (and make pretty pictures that are dynamic)
  2. “known unknowns” ?? If I want X, what Y should I test?
  3. Though it is a child of the more generic GO annotation to ‘G protein coupled receptor activity’ Kohen 1996, J Neurochem.
  4. Given a list of active genes produced from an experiment what key biological processes are happening in the cells? what diseases are these genes associated with? Given a list of genetic variations what diseases is a patient more susceptible to? what drugs should they take/avoid? etc.
  5. Given a list of active genes produced from an experiment what key biological processes are happening in the cells? what diseases are these genes associated with? Given a list of genetic variations what diseases is a patient more susceptible to? what drugs should they take/avoid? etc.
  6. Knowledge is either not shared (stuck in your head or your notebook) or it is shared as text and images in journal articles. There are more than 1 million articles added to PubMed each year
  7. Given a list of active genes produced from an experiment what key biological processes are happening in the cells? what diseases are these genes associated with? Given a list of genetic variations what diseases is a patient more susceptible to? what drugs should they take/avoid? etc.
  8. Divide and conquer algorithm for creating the knowledge base of everything. Splitting is hard because its very hard to know what other groups are doing, there is no centralized coordination, and decisions about what should be curated are made based on what gets funded rather than what is mist useful for the collective.
  9. The principle problem of knowledge integration is establishing which entities are shared between different systems Methadone N0000002109 (Opioid-Related Disorders) http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3422823/
  10. It would be much easier to see what other people were doing By operating in the same database, it is much more likely that you will end up re-using entities that already exist rather than creating new ones and merging them later. Just like in your own local database.
  11. This is the first application of the work that we have done
  12. https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Molecular_biology#Update