SlideShare ist ein Scribd-Unternehmen logo
1 von 22
The Data Today
Alasdair Gray
Heriot-Watt University, Edinburgh, UK
A.J.G.Gray@hw.ac.uk
@gray_alasdair
@gray_alasdair Big Data Integration 2
Dataset Downloaded Version Licence Triples
Bio Assay Ontology CC-By 10,360
CALOHA 8 Apr 2015 2014-01-22 CC-By-ND 14,552
ChEBI 4 Mar 2015 125 CC-By-SA 1,012,056
ChEMBL 18 Feb 2015 20.0 CC-By-SA 445,732,880
ConceptWiki 12 Dec 2013 CC-By-SA 4,331,760
DisGeNET 31 Mar 2015 2.1.0 ODbL 15,011,136
Disease Ontology 2015-05-21 CC-By 188,062
DrugBank 19 Feb 2015 4.1 Non-commercial 4,028,767
ENZYME 2015_11 CC-By-ND 61,467
FDA Adverse Events 9 Jul 2012 CC0 13,557,070
Total: ~3 Billion triples
Dataset Downloaded Version Licence Triples
Gene Ontology 4 Mar 2015 CC-By 1,366,494
Gene Ontology Annotations 17 Feb 2015 CC-By 879,448,347
NCATS OPDDR Nov 2015 Oct 2015 2,643
neXTProt (NP) 1 Feb 2014 1.0 CC-By-ND 215,006,108
OPS Chemical Registry 4 Nov 2014 CC-By-SA 241,986,722
HMDB 3.6 HMDB
MeSH 2015 MeSH
PDB Ligands 2 PDB
OPS Metadata CC-By-SA 2,053
UniProt 2015_11 CC-By-ND 1,131,186,434
WikiPathways 20151118 CC-By 11,781,627
Total: ~3 Billion triples
John Wilbanks consulted for us
A framework built around STANDARD well-understood
Creative Commons licences – and how they interoperate
Deal with the problems by:
Interoperable licences
Appropriate terms
Declare expectations to users and
data publishers
One size won‘t fit all requirements
Data Licensing (Or Lack Of!)
Disease
Tissue
Target
Compound
Pathway
STANDARD_TYPE UNIT_COUNT
---------------- -------
AC50 7
Activity 421
EC50 39
IC50 46
ID50 42
Ki 23
Log IC50 4
Log Ki 7
Potency 11
log IC50 0
STANDARD_TYPE STANDARD_UNITS COUNT(*)
------------------ ------------------ --------
IC50 nM 829448
IC50 ug.mL-1 41000
IC50 38521
IC50 ug/ml 2038
IC50 ug ml-1 509
IC50 mg kg-1 295
IC50 molar ratio 178
IC50 ug 117
IC50 % 113
IC50 uM well-1 52
~ 100 units
>5000 types
Implemented using the Quantities, Units, Dimension, Types
Ontology (http://www.qudt.org/)
Quantitative Data Challenges
Quality Assurance
ops:OPS437281
✔
ops:OPS380297 ops:OPS380292
is_stereoisomer_of
[ci:CHEMINF_000461]
has_stereoundefined_parent
[ci:CHEMINF_000456] Other relationships
• has part
• is tautomer of
• uncharged counterpart
• isotope
…
Chemical Registration Service Data
Mappings: Raw
Mappings (Raw)
25,087,328
Mappings: Computed
Mappings (Comp)
200,000,000+
P12047
X31045
GB:29384
Andy Law's Third Law
“The number of unique identifiers assigned to an individual is
never less than the number of Institutions involved in the study”
http://bioinformatics.roslin.ac.uk/lawslaws/
DrugbankChemSpider PubChem
MesylateImatinib Mesylate
YLMAHDNUQAMNNX-UHFFFAOYSA-N
Are these records the same?
It depends upon your task!
skos:exactMatch
(InChI)
Strict Relaxed
Analysing Browsing
I need to perform an analysis, give me
details of the active compound in Gleevec.
skos:closeMatch
(Drug Name)
skos:closeMatch
(Drug Name)
skos:exactMatch
(InChI)
Strict Relaxed
Analysing Browsing
Which targets are known to interact
with Gleevec?
A lens defines a conceptual view over the data
Specifies operational equivalence conditions
Consists of:
Identifier (URI)
Title
(dct:title)
Description
(dct:description)
Documentation link
(dcat:landingPage)
Creator
(pav:createdBy)
Timestamp
(pav:createdOn)
Equivalence rules
(bdb:linksetJustification)
Scientific Lens
Lenses
34 in total
7 Public
25 Chemistry
2 Gene
Data Governance
Contribution must not be underestimated!!!
Alasdair J G Gray
A.J.G.Gray@hw.ac.uk
www.macs.hw.ac.uk/~ajg33/
@gray_alasdair
Open PHACTS
contact@openphacts.org
openphacts.org
@open_phacts

Weitere ähnliche Inhalte

Ähnlich wie Open PHACTS: The Data Today

Elsevier Medical Graph – mit Machine Learning zu Precision Medicine
Elsevier Medical Graph – mit Machine Learning zu Precision MedicineElsevier Medical Graph – mit Machine Learning zu Precision Medicine
Elsevier Medical Graph – mit Machine Learning zu Precision MedicineRising Media Ltd.
 
Computational tools for drug discovery
Computational tools for drug discoveryComputational tools for drug discovery
Computational tools for drug discoveryEszter Szabó
 
IRJET - Survey on Chronic Kidney Disease Prediction System with Feature Selec...
IRJET - Survey on Chronic Kidney Disease Prediction System with Feature Selec...IRJET - Survey on Chronic Kidney Disease Prediction System with Feature Selec...
IRJET - Survey on Chronic Kidney Disease Prediction System with Feature Selec...IRJET Journal
 
Chemoinformatics in Action
Chemoinformatics in ActionChemoinformatics in Action
Chemoinformatics in ActionSSA KPI
 
2012 to 2013 Australian Hospital Digital Scanning Survey
2012 to 2013 Australian Hospital Digital Scanning Survey2012 to 2013 Australian Hospital Digital Scanning Survey
2012 to 2013 Australian Hospital Digital Scanning Surveysquareearth
 
Health, Data Analytics and Decision Support
Health, Data Analytics and Decision SupportHealth, Data Analytics and Decision Support
Health, Data Analytics and Decision Supportimec
 
Fall 2015 HIV Update
Fall 2015 HIV UpdateFall 2015 HIV Update
Fall 2015 HIV Updatehivlifeinfo
 
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...IRJET Journal
 
Detection of Kidney Stone using Neural Network Classifier
Detection of Kidney Stone using Neural Network ClassifierDetection of Kidney Stone using Neural Network Classifier
Detection of Kidney Stone using Neural Network ClassifierIRJET Journal
 
Esfast services presentation
Esfast services presentationEsfast services presentation
Esfast services presentationEuroscreenFast
 
IRJET- Biochips Technology
IRJET-  	  Biochips TechnologyIRJET-  	  Biochips Technology
IRJET- Biochips TechnologyIRJET Journal
 
Recent Advances in Immune Monitoring Presentation Slides
Recent Advances in Immune Monitoring Presentation Slides Recent Advances in Immune Monitoring Presentation Slides
Recent Advances in Immune Monitoring Presentation Slides Covance
 
Cambridge Bioscience_ ACEA User Group Meeting2014
Cambridge Bioscience_ ACEA User Group Meeting2014Cambridge Bioscience_ ACEA User Group Meeting2014
Cambridge Bioscience_ ACEA User Group Meeting2014Jay Champaneri
 
IRJET - A Smartphone ALS based Syringe System for Colorimetric Detection of C...
IRJET - A Smartphone ALS based Syringe System for Colorimetric Detection of C...IRJET - A Smartphone ALS based Syringe System for Colorimetric Detection of C...
IRJET - A Smartphone ALS based Syringe System for Colorimetric Detection of C...IRJET Journal
 
7 sins in the analysis of high-throughput sequencing data
7 sins in the analysis of high-throughput sequencing data7 sins in the analysis of high-throughput sequencing data
7 sins in the analysis of high-throughput sequencing dataJavier Quílez Oliete
 
BILS 2015 Jesse Mc Cool Cytovance
BILS 2015 Jesse Mc Cool CytovanceBILS 2015 Jesse Mc Cool Cytovance
BILS 2015 Jesse Mc Cool CytovanceGBX Events
 
Detecting and Preventing Ulcerative Colitis samples using efficient feature s...
Detecting and Preventing Ulcerative Colitis samples using efficient feature s...Detecting and Preventing Ulcerative Colitis samples using efficient feature s...
Detecting and Preventing Ulcerative Colitis samples using efficient feature s...IRJET Journal
 
Roy E Morgan-Bio Presentation 2-16e
Roy E Morgan-Bio Presentation 2-16eRoy E Morgan-Bio Presentation 2-16e
Roy E Morgan-Bio Presentation 2-16eRoy Morgan
 

Ähnlich wie Open PHACTS: The Data Today (20)

Elsevier Medical Graph – mit Machine Learning zu Precision Medicine
Elsevier Medical Graph – mit Machine Learning zu Precision MedicineElsevier Medical Graph – mit Machine Learning zu Precision Medicine
Elsevier Medical Graph – mit Machine Learning zu Precision Medicine
 
Computational tools for drug discovery
Computational tools for drug discoveryComputational tools for drug discovery
Computational tools for drug discovery
 
IRJET - Survey on Chronic Kidney Disease Prediction System with Feature Selec...
IRJET - Survey on Chronic Kidney Disease Prediction System with Feature Selec...IRJET - Survey on Chronic Kidney Disease Prediction System with Feature Selec...
IRJET - Survey on Chronic Kidney Disease Prediction System with Feature Selec...
 
Chemoinformatics in Action
Chemoinformatics in ActionChemoinformatics in Action
Chemoinformatics in Action
 
2012 to 2013 Australian Hospital Digital Scanning Survey
2012 to 2013 Australian Hospital Digital Scanning Survey2012 to 2013 Australian Hospital Digital Scanning Survey
2012 to 2013 Australian Hospital Digital Scanning Survey
 
Health, Data Analytics and Decision Support
Health, Data Analytics and Decision SupportHealth, Data Analytics and Decision Support
Health, Data Analytics and Decision Support
 
Fall 2015 HIV Update
Fall 2015 HIV UpdateFall 2015 HIV Update
Fall 2015 HIV Update
 
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
 
Detection of Kidney Stone using Neural Network Classifier
Detection of Kidney Stone using Neural Network ClassifierDetection of Kidney Stone using Neural Network Classifier
Detection of Kidney Stone using Neural Network Classifier
 
Esfast services presentation
Esfast services presentationEsfast services presentation
Esfast services presentation
 
IRJET- Biochips Technology
IRJET-  	  Biochips TechnologyIRJET-  	  Biochips Technology
IRJET- Biochips Technology
 
Recent Advances in Immune Monitoring Presentation Slides
Recent Advances in Immune Monitoring Presentation Slides Recent Advances in Immune Monitoring Presentation Slides
Recent Advances in Immune Monitoring Presentation Slides
 
Cambridge Bioscience_ ACEA User Group Meeting2014
Cambridge Bioscience_ ACEA User Group Meeting2014Cambridge Bioscience_ ACEA User Group Meeting2014
Cambridge Bioscience_ ACEA User Group Meeting2014
 
German hospital network, AVS. Birgitta Schweicker (Germany)
German hospital network, AVS. Birgitta Schweicker (Germany)German hospital network, AVS. Birgitta Schweicker (Germany)
German hospital network, AVS. Birgitta Schweicker (Germany)
 
Analysis of c-diNMPthesis
Analysis of c-diNMPthesisAnalysis of c-diNMPthesis
Analysis of c-diNMPthesis
 
IRJET - A Smartphone ALS based Syringe System for Colorimetric Detection of C...
IRJET - A Smartphone ALS based Syringe System for Colorimetric Detection of C...IRJET - A Smartphone ALS based Syringe System for Colorimetric Detection of C...
IRJET - A Smartphone ALS based Syringe System for Colorimetric Detection of C...
 
7 sins in the analysis of high-throughput sequencing data
7 sins in the analysis of high-throughput sequencing data7 sins in the analysis of high-throughput sequencing data
7 sins in the analysis of high-throughput sequencing data
 
BILS 2015 Jesse Mc Cool Cytovance
BILS 2015 Jesse Mc Cool CytovanceBILS 2015 Jesse Mc Cool Cytovance
BILS 2015 Jesse Mc Cool Cytovance
 
Detecting and Preventing Ulcerative Colitis samples using efficient feature s...
Detecting and Preventing Ulcerative Colitis samples using efficient feature s...Detecting and Preventing Ulcerative Colitis samples using efficient feature s...
Detecting and Preventing Ulcerative Colitis samples using efficient feature s...
 
Roy E Morgan-Bio Presentation 2-16e
Roy E Morgan-Bio Presentation 2-16eRoy E Morgan-Bio Presentation 2-16e
Roy E Morgan-Bio Presentation 2-16e
 

Mehr von Alasdair Gray

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Alasdair Gray
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Alasdair Gray
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAlasdair Gray
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Alasdair Gray
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceAlasdair Gray
 
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsThe HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsAlasdair Gray
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataAlasdair Gray
 
Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Alasdair Gray
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileAlasdair Gray
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked DataAlasdair Gray
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingAlasdair Gray
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Alasdair Gray
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSAlasdair Gray
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsAlasdair Gray
 
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Alasdair Gray
 
Including Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL QueryIncluding Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL QueryAlasdair Gray
 
2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptionsAlasdair Gray
 

Mehr von Alasdair Gray (20)

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland Project
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformance
 
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsThe HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
 
Project X
Project XProject X
Project X
 
Data Linkage
Data LinkageData Linkage
Data Linkage
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry data
 
Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community Profile
 
SensorBench
SensorBenchSensorBench
SensorBench
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked Data
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-being
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery Datasets
 
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
 
Including Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL QueryIncluding Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL Query
 
2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions
 

Kürzlich hochgeladen

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Kürzlich hochgeladen (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Open PHACTS: The Data Today

Hinweis der Redaktion

  1. Data provided by many publishers: some cover other sets, e.g. ChemSpider Originally in many formats: relational, SD files and RDF Worked closely with publishers getting them to publish Raw RDF Metadata descriptions of their data Links between their data and others
  2. ~3billion triples 42GB gzip nquads 400GB uncompressed
  3. Getting this informaiton is still hard and manual! ~3billion triples 42GB gzip nquads 400GB uncompressed
  4. ~3billion triples 42GB gzip nquads 400GB uncompressed
  5. API: Complex data interactions/relationships Interactions needed to satisfy use cases Gradually added additional types of data and interactions
  6. Quantitative Data Challenges No standard units Even in curated sources! Feedback issues to data providers
  7. Quality Assurance Validation & Standardization Platform Developed by Royal Society of Chemistry http://bit.ly/NZF5VB
  8. CRS Dataset Generation Validate structure: Source data is messy! Identify common problems: Charge imbalance Stereochemistry Compute physiochemical properties Identify related properties based on structure 17 relationship types
  9. 230MB gzipped nquads 2 GB uncompressed 238 Mapping sets 43 data sources 11 predicates
  10. Identity Mapping
  11. Example drug: Gleevec Cancer drug for leukemia Lookup in three popular public chemical databases  Different results Chemistry is complicated, often simplified for convenience Data is messy! Are these records the same? It depends on what you are doing with the data! Each captures a subtly different view of the world
  12. Structure Lens Interested in physiochemical properties of Gleevec
  13. Name Lens Interested in biomedical and pharmacological properties sameAs != sameAs depends on your point of view Links relate individual data instances: source, target, predicate, reason. Links are grouped into Linksets which have VoID header providing provenance and justification for the link.
  14. Lens enables certain relationships and disables others Alters links between the data
  15. Builds on OPS document: Checklist and guidance notes! Covers a wider range of use cases Large community buy in – Including EBI
  16. Builds on OPS document: Checklist and guidance notes! Covers a wider range of use cases Large community buy in – Including EBI
  17. Verifying data Verifying linkages Investigating unexpected answers Not to be