1. Linked Cancer Genome Atlas
Database
Muhammad Saleem, Shanmukha
Sampath Padmanabhuni, Axel-Cyrille
Ngonga Ngomo, Jonas S. Almeida,
Stefan Decker, Helena F. Deus.
Linked Data Cup, I-Semantics 2013, September 04 - 06 2013, Graz, Austria
2. Agenda
⢠Cancer Genome Atlas (TCGA) introduction
⢠Problem statement
⢠Linked TCGA a scalable solution
⢠Cancer treatment using Linked TCGA
⢠Demo of the use cases
⢠Conclusion
3. TCGA Introduction
⢠A publicly accessible atlas of cancer related data
from National Cancer Institute (NCI)
â 9000 patients
â 33 cancer types
â 147,645 raw data files
â total of 12.7 terabytes of data
⢠Only a 46% of the total expected data with new
data being submitted every day
⢠Goal is to enable cancer researchers to make and
validate important discoveries
4. Problem Statement
⢠Data in the TCGA is organized as text archives
with no remote querying interface
â Download very large archives and waiting in queues
â Parse the relevant text
â Collect the critical co-variates necessary for analysis
⢠Various types of experimental results are not
connected biologically
⢠TCGA data should be made publicly available for
remote querying and virtual integration
14. Linked TCGA Use Cases
1. Targeted cancer treatment
â Whether a specific drug can be used to treat a tumour
using the genomic data of patients with same tumor
2. Mechanism-based treatment
â Whether a combination of drugs can be applied to treat
a specific tumor using similar patients data
3. Survival outcome
â Using mathematical model to predict future signs such
as survival outcome for a new patient
15. Use case 1,2 SPARQL query
SELECT ?patient ?mean
WHERE
{
?uri tcga:tumour_type "BRCA".
?uri tcga:bcr_patient_barcode ?patient.
?patient rdf:type tcga:expression_gene_results.
?patient tcga:gene_symbol "HER2","ER".
?patient tcga:scaled_estimate ?mean
}
19. Everything is Public
⢠TopFed: https://code.google.com/p/topfed/
⢠Linked TCGA : http://tcga.deri.ie/
saleem@informatik.uni-leipzig.de
AKSW, University of Leipzig, Germany