SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Linked Cancer Genome Atlas
Database
Muhammad Saleem, Shanmukha
Sampath Padmanabhuni, Axel-Cyrille
Ngonga Ngomo, Jonas S. Almeida,
Stefan Decker, Helena F. Deus.
Linked Data Cup, I-Semantics 2013, September 04 - 06 2013, Graz, Austria
Agenda
• Cancer Genome Atlas (TCGA) introduction
• Problem statement
• Linked TCGA a scalable solution
• Cancer treatment using Linked TCGA
• Demo of the use cases
• Conclusion
TCGA Introduction
• A publicly accessible atlas of cancer related data
from National Cancer Institute (NCI)
– 9000 patients
– 33 cancer types
– 147,645 raw data files
– total of 12.7 terabytes of data
• Only a 46% of the total expected data with new
data being submitted every day
• Goal is to enable cancer researchers to make and
validate important discoveries
Problem Statement
• Data in the TCGA is organized as text archives
with no remote querying interface
– Download very large archives and waiting in queues
– Parse the relevant text
– Collect the critical co-variates necessary for analysis
• Various types of experimental results are not
connected biologically
• TCGA data should be made publicly available for
remote querying and virtual integration
Linked TCGA a Scalable Solution:
RDFization
chromosome position beta_value
16 28890100 0.439271303584937
3 57743543 0.245147665381461
7 15725862 0.0440161061196347
2 177029073 0.741342927038953
11 93862594 0.0290713821114479
14 93813777 0.985555436681019
18 11980953 0.0109832005732912
14 89290921 0.0104525957219692
composite
element REF gene_symbolchromosome position beta_value
cg00000292 ATP2A1 16 288901000.439271303584937
cg00002426 SLMAP 3 577435430.245147665381461
cg00003994 MEOX2 7 157258620.0440161061196347
cg00005847 HOXD3 2 1770290730.741342927038953
cg00006414 ZNF425 7 148822837NA
cg00007981 PANX1 11 938625940.0290713821114479
cg00008493 COX8C 14 938137770.985555436681019
cg00008713 IMPA2 18 119809530.0109832005732912
cg00009407 TTC8 14 892909210.0104525957219692
Text to RDF Conversion
Data Refiner
Refined
Raw
chromosome position beta_value
16 28890100 0.439271303584937
3 57743543 0.245147665381461
7 15725862 0.0440161061196347
2 177029073 0.741342927038953
11 93862594 0.0290713821114479
14 93813777 0.985555436681019
18 11980953 0.0109832005732912
14 89290921 0.0104525957219692
composite
element REF gene_symbolchromosome position beta_value
cg00000292 ATP2A1 16 288901000.439271303584937
cg00002426 SLMAP 3 577435430.245147665381461
cg00003994 MEOX2 7 157258620.0440161061196347
cg00005847 HOXD3 2 1770290730.741342927038953
cg00006414 ZNF425 7 148822837NA
cg00007981 PANX1 11 938625940.0290713821114479
cg00008493 COX8C 14 938137770.985555436681019
cg00008713 IMPA2 18 119809530.0109832005732912
cg00009407 TTC8 14 892909210.0104525957219692
Text to RDF Conversion
Data Refiner
Refined
Raw
chromosome position beta_value
16 28890100 0.439271303584937
3 57743543 0.245147665381461
7 15725862 0.0440161061196347
2 177029073 0.741342927038953
11 93862594 0.0290713821114479
14 93813777 0.985555436681019
18 11980953 0.0109832005732912
14 89290921 0.0104525957219692
composite
element REF gene_symbolchromosome position beta_value
cg00000292 ATP2A1 16 288901000.439271303584937
cg00002426 SLMAP 3 577435430.245147665381461
cg00003994 MEOX2 7 157258620.0440161061196347
cg00005847 HOXD3 2 1770290730.741342927038953
cg00006414 ZNF425 7 148822837NA
cg00007981 PANX1 11 938625940.0290713821114479
cg00008493 COX8C 14 938137770.985555436681019
cg00008713 IMPA2 18 119809530.0109832005732912
cg00009407 TTC8 14 892909210.0104525957219692
Text to RDF Conversion
Data Refiner
Refined
Raw
chromosome position beta_value
16 28890100 0.439271303584937
3 57743543 0.245147665381461
7 15725862 0.0440161061196347
2 177029073 0.741342927038953
11 93862594 0.0290713821114479
14 93813777 0.985555436681019
18 11980953 0.0109832005732912
14 89290921 0.0104525957219692
composite
element REF gene_symbolchromosome position beta_value
cg00000292 ATP2A1 16 288901000.439271303584937
cg00002426 SLMAP 3 577435430.245147665381461
cg00003994 MEOX2 7 157258620.0440161061196347
cg00005847 HOXD3 2 1770290730.741342927038953
cg00006414 ZNF425 7 148822837NA
cg00007981 PANX1 11 938625940.0290713821114479
cg00008493 COX8C 14 938137770.985555436681019
cg00008713 IMPA2 18 119809530.0109832005732912
cg00009407 TTC8 14 892909210.0104525957219692
@prefix b:<http://tcga.deri.ie/>.
@prefix d:<http://tcga.deri.ie/schema/bcr_patient_barcode>.
@prefix r:<http://tcga.deri.ie/schema/result>.
@prefix c:<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>.
@prefix w:<http://tcga.deri.ie/schema/dna_methylation_result>.
@prefix m:<http://tcga.deri.ie/schema/chromosome>.
@prefix v:<http://tcga.deri.ie/schema/position>.
@prefix u:<http://tcga.deri.ie/schema/beta_value>.
b:TCGA-A2-A0CX d: "TCGA-A2-A0CX".
b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d1 .
b:TCGA-A2-A0CX-d1 c: w: ; m: "16"; v: "28890100"; u: "0.439271303584937".
b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d2 .
b:TCGA-A2-A0CX-d2 c: w: ; m: "3"; v: "57743543"; u: "0.245147665381461".
b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d3 .
b:TCGA-A2-A0CX-d3 c: w: ; m: "7"; v: "15725862"; u: "0.0440161061196347".
b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d4 .
b:TCGA-A2-A0CX-d4 c: w: ; m: "2"; v: "177029073"; u: "0.741342927038953".
b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d5 .
b:TCGA-A2-A0CX-d5 c: w: ; m: "11"; v: "93862594"; u: "0.0290713821114479".
b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d6 .
b:TCGA-A2-A0CX-d6 c: w: ; m: "14"; v: "93813777"; u: "0.985555436681019".
b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d7 .
b:TCGA-A2-A0CX-d7 c: w: ; m: "18"; v: "11980953"; u: "0.0109832005732912".
b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d8 .
b:TCGA-A2-A0CX-d8 c: w: ; m: "14"; v: "89290921"; u: "0.0104525957219692".
Text to RDF Conversion
Data Refiner
RDFizer
Refined
RDFizedRaw
Linked TCGA Data Workflow
Linked TCGA Tumors Statistics
Tumor Type
Original
Size(GB)
Refined
Size (GB)
RDFized
Size (GB)
Triples
(Million)
Cervical (CESC) 8.75 2.44 8.86 400.19
Rectal adenocarcinoma (READ) 8.07 2.25 9.04 413.31
Papillary Kidney (KIRP) 10.40 2.90 10.4 469.65
Bladder cancer (BLCA) 12.16 3.39 12.3 556.38
Acute Myeloid Leukemia (LAML) 14.85 4.14 15.1 684.05
Lower Grade Glioma (LGG) 17.08 4.76 17.1 778.82
Prostate adenocarcinoma (PRAD) 18.05 5.03 18.1 821.01
Lung squamous carcinoma (LUSC) 20.63 5.75 20.5 927.08
Cutaneous melanoma (SKCM) 23.22 6.47 23.2 1050.94
Head and neck squamous cell(HNSC) 27.6 7.69 27.5 1245.37
• A total of 7.36 Billion Triples for 10 small tumors
• Total Linked TCGA > 30 billion triples (Largest Dataset of LOD)
Linking to Linked Open Data
Source Target Class #Links
DNA27 HGNC Gene 23181
DNA27 Homologene Gene 27654
DNA27 HGNC Gene 15171
DNA450 Homologene Gene 489643
DNA450 OMIM Gene 212284
DNA27 HGNC Chromosome 108662
DNA27 OMIM Chromosome 16039535
Methylation HGNC Chromosome 97530
Methylation OMIM Chromosome 14407269
Gene Expression HGNC Chromosome 86052
Gene Expression OMIM Chromosome 12535829
• Links are generated using LIMES
http://aksw.org/Projects/LIMES.html
Cancer Treatment using Linked TCGA
Linked TCGA Use Cases
1. Targeted cancer treatment
– Whether a specific drug can be used to treat a tumour
using the genomic data of patients with same tumor
2. Mechanism-based treatment
– Whether a combination of drugs can be applied to treat
a specific tumor using similar patients data
3. Survival outcome
– Using mathematical model to predict future signs such
as survival outcome for a new patient
Use case 1,2 SPARQL query
SELECT ?patient ?mean
WHERE
{
?uri tcga:tumour_type "BRCA".
?uri tcga:bcr_patient_barcode ?patient.
?patient rdf:type tcga:expression_gene_results.
?patient tcga:gene_symbol "HER2","ER".
?patient tcga:scaled_estimate ?mean
}
Use Case 1,2 Querying LOD DrugBank
SELECT ?drugname
WHERE
{
?patient rdf:type tcga:expression_gene_results.
?patient tcga:gene_symbol ?targetname .
?patient tcga:scaled_estimate ?mean.
FILTER (?mean > Threshold)
?drug drugbank:target ?target.
?drug drugbank:genericName ?drugname .
?target drugbank:synonym ?targetname .
FILTER REGEX (?targetname, "HER2||estrogenreceptor||ERBB2", "i")
}
Use Case 3 Query
SELECT ?patient ?mean
WHERE
{
?uri tcga:tumour_type "BRCA".
?uri tcga:bcr_patient_barcode ?patient.
?patient rdf:type tcga:clinical.
?patient tcga:tumour_stage ?tumour_stage.
?patient tcga:age_at_initial_patalogical_diagnosis ?age.
?patient tcga:relevant_biomarker "BRCA1","CDKN2A", "CDH1".
?patient tcga:beta_value ?mean
}
Demo1
Demo2
Everything is Public
• TopFed: https://code.google.com/p/topfed/
• Linked TCGA : http://tcga.deri.ie/
saleem@informatik.uni-leipzig.de
AKSW, University of Leipzig, Germany
Thanks
Muhammad Saleem
saleem.muhammd@gmail.com

Weitere ähnliche Inhalte

Andere mochten auch

Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationNils Gehlenborg
 
how to sell
how to sellhow to sell
how to selldkhsurvey
 
Clinical research training - Dr Blanaid Mee - Dec 7th 2016
Clinical research training - Dr Blanaid Mee - Dec 7th 2016Clinical research training - Dr Blanaid Mee - Dec 7th 2016
Clinical research training - Dr Blanaid Mee - Dec 7th 2016ipposi
 
City of hope research informatics common data elements
City of hope research informatics common data elementsCity of hope research informatics common data elements
City of hope research informatics common data elementsAbdul-Malik Shakir
 
Patient profiling disaggregating the data
Patient profiling disaggregating the dataPatient profiling disaggregating the data
Patient profiling disaggregating the datanhsnwHELP
 
Patient-Generated Data for Cancer Treatment and Management
Patient-Generated Data for Cancer Treatment and ManagementPatient-Generated Data for Cancer Treatment and Management
Patient-Generated Data for Cancer Treatment and ManagementTommy Snitz
 
FluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsFluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsdatablend
 
iHT² Health IT Summit New York - Cancer Care Ontario Presentation "Transformi...
iHT² Health IT Summit New York - Cancer Care Ontario Presentation "Transformi...iHT² Health IT Summit New York - Cancer Care Ontario Presentation "Transformi...
iHT² Health IT Summit New York - Cancer Care Ontario Presentation "Transformi...Health IT Conference – iHT2
 
Impact of Multidisciplinary Discussion on Treatment Outcome For Gynecologic C...
Impact of Multidisciplinary Discussion on Treatment Outcome For Gynecologic C...Impact of Multidisciplinary Discussion on Treatment Outcome For Gynecologic C...
Impact of Multidisciplinary Discussion on Treatment Outcome For Gynecologic C...Emad Shash
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationMuhammad Saleem
 
Elective Care Conference: the role of the MDT coordinator role
Elective Care Conference: the role of the MDT coordinator role Elective Care Conference: the role of the MDT coordinator role
Elective Care Conference: the role of the MDT coordinator role NHS Improvement
 
2015 Micromedex使用者大會 如何在臨床工作中找到實證解答
2015 Micromedex使用者大會 如何在臨床工作中找到實證解答2015 Micromedex使用者大會 如何在臨床工作中找到實證解答
2015 Micromedex使用者大會 如何在臨床工作中找到實證解答建豪 陳
 
National Cancer Data Ecosystem and Data Sharing
National Cancer Data Ecosystem and Data SharingNational Cancer Data Ecosystem and Data Sharing
National Cancer Data Ecosystem and Data SharingWarren Kibbe
 
Swedish National Board of Health and Welfare Mona Heurgren
Swedish National Board of Health and Welfare Mona Heurgren Swedish National Board of Health and Welfare Mona Heurgren
Swedish National Board of Health and Welfare Mona Heurgren HIQAHI
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Health Catalyst
 
Clinical Data Management
Clinical Data ManagementClinical Data Management
Clinical Data Managementbiinoida
 
NCRI Kerri Clough Gorr
NCRI Kerri Clough GorrNCRI Kerri Clough Gorr
NCRI Kerri Clough GorrHIQAHI
 
Human Resource planning
Human Resource planningHuman Resource planning
Human Resource planningAnything Group
 

Andere mochten auch (19)

Malmo 11.11.2008
Malmo 11.11.2008Malmo 11.11.2008
Malmo 11.11.2008
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
how to sell
how to sellhow to sell
how to sell
 
Clinical research training - Dr Blanaid Mee - Dec 7th 2016
Clinical research training - Dr Blanaid Mee - Dec 7th 2016Clinical research training - Dr Blanaid Mee - Dec 7th 2016
Clinical research training - Dr Blanaid Mee - Dec 7th 2016
 
City of hope research informatics common data elements
City of hope research informatics common data elementsCity of hope research informatics common data elements
City of hope research informatics common data elements
 
Patient profiling disaggregating the data
Patient profiling disaggregating the dataPatient profiling disaggregating the data
Patient profiling disaggregating the data
 
Patient-Generated Data for Cancer Treatment and Management
Patient-Generated Data for Cancer Treatment and ManagementPatient-Generated Data for Cancer Treatment and Management
Patient-Generated Data for Cancer Treatment and Management
 
FluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsFluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphs
 
iHT² Health IT Summit New York - Cancer Care Ontario Presentation "Transformi...
iHT² Health IT Summit New York - Cancer Care Ontario Presentation "Transformi...iHT² Health IT Summit New York - Cancer Care Ontario Presentation "Transformi...
iHT² Health IT Summit New York - Cancer Care Ontario Presentation "Transformi...
 
Impact of Multidisciplinary Discussion on Treatment Outcome For Gynecologic C...
Impact of Multidisciplinary Discussion on Treatment Outcome For Gynecologic C...Impact of Multidisciplinary Discussion on Treatment Outcome For Gynecologic C...
Impact of Multidisciplinary Discussion on Treatment Outcome For Gynecologic C...
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
 
Elective Care Conference: the role of the MDT coordinator role
Elective Care Conference: the role of the MDT coordinator role Elective Care Conference: the role of the MDT coordinator role
Elective Care Conference: the role of the MDT coordinator role
 
2015 Micromedex使用者大會 如何在臨床工作中找到實證解答
2015 Micromedex使用者大會 如何在臨床工作中找到實證解答2015 Micromedex使用者大會 如何在臨床工作中找到實證解答
2015 Micromedex使用者大會 如何在臨床工作中找到實證解答
 
National Cancer Data Ecosystem and Data Sharing
National Cancer Data Ecosystem and Data SharingNational Cancer Data Ecosystem and Data Sharing
National Cancer Data Ecosystem and Data Sharing
 
Swedish National Board of Health and Welfare Mona Heurgren
Swedish National Board of Health and Welfare Mona Heurgren Swedish National Board of Health and Welfare Mona Heurgren
Swedish National Board of Health and Welfare Mona Heurgren
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
 
Clinical Data Management
Clinical Data ManagementClinical Data Management
Clinical Data Management
 
NCRI Kerri Clough Gorr
NCRI Kerri Clough GorrNCRI Kerri Clough Gorr
NCRI Kerri Clough Gorr
 
Human Resource planning
Human Resource planningHuman Resource planning
Human Resource planning
 

Ähnlich wie Linked Cancer Genome Atlas Database

Medicilon KRAS-targeted Drugs R&D Service.pdf
Medicilon KRAS-targeted Drugs R&D Service.pdfMedicilon KRAS-targeted Drugs R&D Service.pdf
Medicilon KRAS-targeted Drugs R&D Service.pdfmedicilonz
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Jane Landolin
 
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...CancerImagingInforma
 
Mutation Profiling of CRC ctDNA using AmpliSeq CHP2 Cancer Panel AACR_NCI_EOR...
Mutation Profiling of CRC ctDNA using AmpliSeq CHP2 Cancer Panel AACR_NCI_EOR...Mutation Profiling of CRC ctDNA using AmpliSeq CHP2 Cancer Panel AACR_NCI_EOR...
Mutation Profiling of CRC ctDNA using AmpliSeq CHP2 Cancer Panel AACR_NCI_EOR...Weihua Liu
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchNolan Nichols
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowHorizonDiscovery
 
TCIA Data Harmonization Project
TCIA Data Harmonization ProjectTCIA Data Harmonization Project
TCIA Data Harmonization Projectimgcommcall
 
Next Generation Sequencing (NGS) Approach to Investigate ​ Role of Small RNA...
Next Generation Sequencing (NGS) Approach to Investigate ​  Role of Small RNA...Next Generation Sequencing (NGS) Approach to Investigate ​  Role of Small RNA...
Next Generation Sequencing (NGS) Approach to Investigate ​ Role of Small RNA...International Institute of Tropical Agriculture
 
A method to improve survival prediction using mutual information based network
A method to improve survival prediction using mutual information based networkA method to improve survival prediction using mutual information based network
A method to improve survival prediction using mutual information based networkSOYEON KIM
 
Clinical Utility of Droplet Digital PCR on Liquid Biopsies from Patients with...
Clinical Utility of Droplet Digital PCR on Liquid Biopsies from Patients with...Clinical Utility of Droplet Digital PCR on Liquid Biopsies from Patients with...
Clinical Utility of Droplet Digital PCR on Liquid Biopsies from Patients with...Kate Barlow
 
Mobile CRISPRi
Mobile CRISPRiMobile CRISPRi
Mobile CRISPRiNikunj tyagi
 
Presentation july 31_2015
Presentation july 31_2015Presentation july 31_2015
Presentation july 31_2015gkoytiger
 
2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP UpdateGenomeInABottle
 
Presentatie maastricht
Presentatie maastrichtPresentatie maastricht
Presentatie maastrichtriannefijten
 
Hupo2017 wessels mb2021 Glycopeptide profiling
Hupo2017 wessels mb2021 Glycopeptide profilingHupo2017 wessels mb2021 Glycopeptide profiling
Hupo2017 wessels mb2021 Glycopeptide profilingHans Wessels
 
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...Thermo Fisher Scientific
 

Ähnlich wie Linked Cancer Genome Atlas Database (20)

Medicilon KRAS-targeted Drugs R&D Service.pdf
Medicilon KRAS-targeted Drugs R&D Service.pdfMedicilon KRAS-targeted Drugs R&D Service.pdf
Medicilon KRAS-targeted Drugs R&D Service.pdf
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121
 
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...
 
Mutation Profiling of CRC ctDNA using AmpliSeq CHP2 Cancer Panel AACR_NCI_EOR...
Mutation Profiling of CRC ctDNA using AmpliSeq CHP2 Cancer Panel AACR_NCI_EOR...Mutation Profiling of CRC ctDNA using AmpliSeq CHP2 Cancer Panel AACR_NCI_EOR...
Mutation Profiling of CRC ctDNA using AmpliSeq CHP2 Cancer Panel AACR_NCI_EOR...
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
TCIA Data Harmonization Project
TCIA Data Harmonization ProjectTCIA Data Harmonization Project
TCIA Data Harmonization Project
 
RNA (gene expression) analysis of Prostate cancers and non-cancerous tissues t
RNA (gene expression) analysis of Prostate cancers and non-cancerous tissues tRNA (gene expression) analysis of Prostate cancers and non-cancerous tissues t
RNA (gene expression) analysis of Prostate cancers and non-cancerous tissues t
 
Next Generation Sequencing (NGS) Approach to Investigate Role of Small RNAs i...
Next Generation Sequencing (NGS) Approach to Investigate Role of Small RNAs i...Next Generation Sequencing (NGS) Approach to Investigate Role of Small RNAs i...
Next Generation Sequencing (NGS) Approach to Investigate Role of Small RNAs i...
 
Next Generation Sequencing (NGS) Approach to Investigate ​ Role of Small RNA...
Next Generation Sequencing (NGS) Approach to Investigate ​  Role of Small RNA...Next Generation Sequencing (NGS) Approach to Investigate ​  Role of Small RNA...
Next Generation Sequencing (NGS) Approach to Investigate ​ Role of Small RNA...
 
A method to improve survival prediction using mutual information based network
A method to improve survival prediction using mutual information based networkA method to improve survival prediction using mutual information based network
A method to improve survival prediction using mutual information based network
 
Clinical Utility of Droplet Digital PCR on Liquid Biopsies from Patients with...
Clinical Utility of Droplet Digital PCR on Liquid Biopsies from Patients with...Clinical Utility of Droplet Digital PCR on Liquid Biopsies from Patients with...
Clinical Utility of Droplet Digital PCR on Liquid Biopsies from Patients with...
 
Mobile CRISPRi
Mobile CRISPRiMobile CRISPRi
Mobile CRISPRi
 
Presentation july 31_2015
Presentation july 31_2015Presentation july 31_2015
Presentation july 31_2015
 
Illumina sequencing introduction
Illumina sequencing introductionIllumina sequencing introduction
Illumina sequencing introduction
 
2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
Presentatie maastricht
Presentatie maastrichtPresentatie maastricht
Presentatie maastricht
 
undergrad thesis
undergrad thesisundergrad thesis
undergrad thesis
 
Hupo2017 wessels mb2021 Glycopeptide profiling
Hupo2017 wessels mb2021 Glycopeptide profilingHupo2017 wessels mb2021 Glycopeptide profiling
Hupo2017 wessels mb2021 Glycopeptide profiling
 
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
 

Mehr von Muhammad Saleem

QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...Muhammad Saleem
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...Muhammad Saleem
 
Extended LargeRDFBench
Extended LargeRDFBenchExtended LargeRDFBench
Extended LargeRDFBenchMuhammad Saleem
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationMuhammad Saleem
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework Muhammad Saleem
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Muhammad Saleem
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedMuhammad Saleem
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsMuhammad Saleem
 
SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016Muhammad Saleem
 
FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015Muhammad Saleem
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialMuhammad Saleem
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataMuhammad Saleem
 
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationHiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationMuhammad Saleem
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataMuhammad Saleem
 

Mehr von Muhammad Saleem (15)

QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
 
LargeRDFBench
LargeRDFBenchLargeRDFBench
LargeRDFBench
 
Extended LargeRDFBench
Extended LargeRDFBenchExtended LargeRDFBench
Extended LargeRDFBench
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFed
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
 
SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016
 
FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
 
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationHiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
 

KĂźrzlich hochgeladen

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

KĂźrzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Linked Cancer Genome Atlas Database

  • 1. Linked Cancer Genome Atlas Database Muhammad Saleem, Shanmukha Sampath Padmanabhuni, Axel-Cyrille Ngonga Ngomo, Jonas S. Almeida, Stefan Decker, Helena F. Deus. Linked Data Cup, I-Semantics 2013, September 04 - 06 2013, Graz, Austria
  • 2. Agenda • Cancer Genome Atlas (TCGA) introduction • Problem statement • Linked TCGA a scalable solution • Cancer treatment using Linked TCGA • Demo of the use cases • Conclusion
  • 3. TCGA Introduction • A publicly accessible atlas of cancer related data from National Cancer Institute (NCI) – 9000 patients – 33 cancer types – 147,645 raw data files – total of 12.7 terabytes of data • Only a 46% of the total expected data with new data being submitted every day • Goal is to enable cancer researchers to make and validate important discoveries
  • 4. Problem Statement • Data in the TCGA is organized as text archives with no remote querying interface – Download very large archives and waiting in queues – Parse the relevant text – Collect the critical co-variates necessary for analysis • Various types of experimental results are not connected biologically • TCGA data should be made publicly available for remote querying and virtual integration
  • 5. Linked TCGA a Scalable Solution: RDFization
  • 6. chromosome position beta_value 16 28890100 0.439271303584937 3 57743543 0.245147665381461 7 15725862 0.0440161061196347 2 177029073 0.741342927038953 11 93862594 0.0290713821114479 14 93813777 0.985555436681019 18 11980953 0.0109832005732912 14 89290921 0.0104525957219692 composite element REF gene_symbolchromosome position beta_value cg00000292 ATP2A1 16 288901000.439271303584937 cg00002426 SLMAP 3 577435430.245147665381461 cg00003994 MEOX2 7 157258620.0440161061196347 cg00005847 HOXD3 2 1770290730.741342927038953 cg00006414 ZNF425 7 148822837NA cg00007981 PANX1 11 938625940.0290713821114479 cg00008493 COX8C 14 938137770.985555436681019 cg00008713 IMPA2 18 119809530.0109832005732912 cg00009407 TTC8 14 892909210.0104525957219692 Text to RDF Conversion Data Refiner Refined Raw
  • 7. chromosome position beta_value 16 28890100 0.439271303584937 3 57743543 0.245147665381461 7 15725862 0.0440161061196347 2 177029073 0.741342927038953 11 93862594 0.0290713821114479 14 93813777 0.985555436681019 18 11980953 0.0109832005732912 14 89290921 0.0104525957219692 composite element REF gene_symbolchromosome position beta_value cg00000292 ATP2A1 16 288901000.439271303584937 cg00002426 SLMAP 3 577435430.245147665381461 cg00003994 MEOX2 7 157258620.0440161061196347 cg00005847 HOXD3 2 1770290730.741342927038953 cg00006414 ZNF425 7 148822837NA cg00007981 PANX1 11 938625940.0290713821114479 cg00008493 COX8C 14 938137770.985555436681019 cg00008713 IMPA2 18 119809530.0109832005732912 cg00009407 TTC8 14 892909210.0104525957219692 Text to RDF Conversion Data Refiner Refined Raw
  • 8. chromosome position beta_value 16 28890100 0.439271303584937 3 57743543 0.245147665381461 7 15725862 0.0440161061196347 2 177029073 0.741342927038953 11 93862594 0.0290713821114479 14 93813777 0.985555436681019 18 11980953 0.0109832005732912 14 89290921 0.0104525957219692 composite element REF gene_symbolchromosome position beta_value cg00000292 ATP2A1 16 288901000.439271303584937 cg00002426 SLMAP 3 577435430.245147665381461 cg00003994 MEOX2 7 157258620.0440161061196347 cg00005847 HOXD3 2 1770290730.741342927038953 cg00006414 ZNF425 7 148822837NA cg00007981 PANX1 11 938625940.0290713821114479 cg00008493 COX8C 14 938137770.985555436681019 cg00008713 IMPA2 18 119809530.0109832005732912 cg00009407 TTC8 14 892909210.0104525957219692 Text to RDF Conversion Data Refiner Refined Raw
  • 9. chromosome position beta_value 16 28890100 0.439271303584937 3 57743543 0.245147665381461 7 15725862 0.0440161061196347 2 177029073 0.741342927038953 11 93862594 0.0290713821114479 14 93813777 0.985555436681019 18 11980953 0.0109832005732912 14 89290921 0.0104525957219692 composite element REF gene_symbolchromosome position beta_value cg00000292 ATP2A1 16 288901000.439271303584937 cg00002426 SLMAP 3 577435430.245147665381461 cg00003994 MEOX2 7 157258620.0440161061196347 cg00005847 HOXD3 2 1770290730.741342927038953 cg00006414 ZNF425 7 148822837NA cg00007981 PANX1 11 938625940.0290713821114479 cg00008493 COX8C 14 938137770.985555436681019 cg00008713 IMPA2 18 119809530.0109832005732912 cg00009407 TTC8 14 892909210.0104525957219692 @prefix b:<http://tcga.deri.ie/>. @prefix d:<http://tcga.deri.ie/schema/bcr_patient_barcode>. @prefix r:<http://tcga.deri.ie/schema/result>. @prefix c:<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>. @prefix w:<http://tcga.deri.ie/schema/dna_methylation_result>. @prefix m:<http://tcga.deri.ie/schema/chromosome>. @prefix v:<http://tcga.deri.ie/schema/position>. @prefix u:<http://tcga.deri.ie/schema/beta_value>. b:TCGA-A2-A0CX d: "TCGA-A2-A0CX". b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d1 . b:TCGA-A2-A0CX-d1 c: w: ; m: "16"; v: "28890100"; u: "0.439271303584937". b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d2 . b:TCGA-A2-A0CX-d2 c: w: ; m: "3"; v: "57743543"; u: "0.245147665381461". b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d3 . b:TCGA-A2-A0CX-d3 c: w: ; m: "7"; v: "15725862"; u: "0.0440161061196347". b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d4 . b:TCGA-A2-A0CX-d4 c: w: ; m: "2"; v: "177029073"; u: "0.741342927038953". b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d5 . b:TCGA-A2-A0CX-d5 c: w: ; m: "11"; v: "93862594"; u: "0.0290713821114479". b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d6 . b:TCGA-A2-A0CX-d6 c: w: ; m: "14"; v: "93813777"; u: "0.985555436681019". b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d7 . b:TCGA-A2-A0CX-d7 c: w: ; m: "18"; v: "11980953"; u: "0.0109832005732912". b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d8 . b:TCGA-A2-A0CX-d8 c: w: ; m: "14"; v: "89290921"; u: "0.0104525957219692". Text to RDF Conversion Data Refiner RDFizer Refined RDFizedRaw
  • 10. Linked TCGA Data Workflow
  • 11. Linked TCGA Tumors Statistics Tumor Type Original Size(GB) Refined Size (GB) RDFized Size (GB) Triples (Million) Cervical (CESC) 8.75 2.44 8.86 400.19 Rectal adenocarcinoma (READ) 8.07 2.25 9.04 413.31 Papillary Kidney (KIRP) 10.40 2.90 10.4 469.65 Bladder cancer (BLCA) 12.16 3.39 12.3 556.38 Acute Myeloid Leukemia (LAML) 14.85 4.14 15.1 684.05 Lower Grade Glioma (LGG) 17.08 4.76 17.1 778.82 Prostate adenocarcinoma (PRAD) 18.05 5.03 18.1 821.01 Lung squamous carcinoma (LUSC) 20.63 5.75 20.5 927.08 Cutaneous melanoma (SKCM) 23.22 6.47 23.2 1050.94 Head and neck squamous cell(HNSC) 27.6 7.69 27.5 1245.37 • A total of 7.36 Billion Triples for 10 small tumors • Total Linked TCGA > 30 billion triples (Largest Dataset of LOD)
  • 12. Linking to Linked Open Data Source Target Class #Links DNA27 HGNC Gene 23181 DNA27 Homologene Gene 27654 DNA27 HGNC Gene 15171 DNA450 Homologene Gene 489643 DNA450 OMIM Gene 212284 DNA27 HGNC Chromosome 108662 DNA27 OMIM Chromosome 16039535 Methylation HGNC Chromosome 97530 Methylation OMIM Chromosome 14407269 Gene Expression HGNC Chromosome 86052 Gene Expression OMIM Chromosome 12535829 • Links are generated using LIMES http://aksw.org/Projects/LIMES.html
  • 13. Cancer Treatment using Linked TCGA
  • 14. Linked TCGA Use Cases 1. Targeted cancer treatment – Whether a specific drug can be used to treat a tumour using the genomic data of patients with same tumor 2. Mechanism-based treatment – Whether a combination of drugs can be applied to treat a specific tumor using similar patients data 3. Survival outcome – Using mathematical model to predict future signs such as survival outcome for a new patient
  • 15. Use case 1,2 SPARQL query SELECT ?patient ?mean WHERE { ?uri tcga:tumour_type "BRCA". ?uri tcga:bcr_patient_barcode ?patient. ?patient rdf:type tcga:expression_gene_results. ?patient tcga:gene_symbol "HER2","ER". ?patient tcga:scaled_estimate ?mean }
  • 16. Use Case 1,2 Querying LOD DrugBank SELECT ?drugname WHERE { ?patient rdf:type tcga:expression_gene_results. ?patient tcga:gene_symbol ?targetname . ?patient tcga:scaled_estimate ?mean. FILTER (?mean > Threshold) ?drug drugbank:target ?target. ?drug drugbank:genericName ?drugname . ?target drugbank:synonym ?targetname . FILTER REGEX (?targetname, "HER2||estrogenreceptor||ERBB2", "i") }
  • 17. Use Case 3 Query SELECT ?patient ?mean WHERE { ?uri tcga:tumour_type "BRCA". ?uri tcga:bcr_patient_barcode ?patient. ?patient rdf:type tcga:clinical. ?patient tcga:tumour_stage ?tumour_stage. ?patient tcga:age_at_initial_patalogical_diagnosis ?age. ?patient tcga:relevant_biomarker "BRCA1","CDKN2A", "CDH1". ?patient tcga:beta_value ?mean }
  • 19. Everything is Public • TopFed: https://code.google.com/p/topfed/ • Linked TCGA : http://tcga.deri.ie/ saleem@informatik.uni-leipzig.de AKSW, University of Leipzig, Germany