SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Bespoke	
  data	
  integra/on	
  using	
  open	
  
  source	
  &	
  seman/c	
  technologies    	
  




         Nadia	
  Anwar,	
  Mar/jn	
  van	
  Iersel
                                                  	
  
What	
  do	
  we	
  do?	
  
Direct	
  support	
  of	
  scien.sts	
  in	
  research	
  
Data	
  Acquisi.on,	
  Management	
  and	
  Stewardship	
  
Data	
  Integra.on	
  
Answering	
  specific	
  and	
  complex	
  Bioinforma.cs	
  ques.ons	
  
Tool	
  kit	
  deployment	
  and	
  maintenance	
  for	
  use	
  in-­‐house	
  
Goals
                                                  	
  
    Integrate	
  data	
  
     -    Typically,	
  using	
  linked	
  data	
  
    Answer	
  biological	
  ques/ons	
  
     -    Through	
  useful	
  visualisa/ons	
  	
  
    Reproducible	
  
     -    Everything	
  is	
  scripted,	
  version	
  controlled	
  and	
  tracked	
  
    Flexible	
  
     -    Using	
  Mul/ple	
  Specialized	
  Tools	
  


                         Expert	
  Bioinforma/cs	
  from	
  Bioinforma/cs	
  Experts	
  
You	
  have	
  data,	
  I	
  have	
  data...now	
  what?	
  




           * Adapted from http://xkcd.com/208/ (CC-BY-NC)
              Expert	
  Bioinforma/cs	
  from	
  Bioinforma/cs	
  Experts	
  
hVp://fly.cloud.generalbioinforma/cs.com	
  

  FlyAtlas    FlyCyc                         1) Transform	
  into	
  triples	
  
                                                       Fly	
  Expression	
  Data	
  
                                                       Fly	
  Pathway	
  Data	
  
                                             2) 	
  Infer	
  some	
  more	
  triples	
  
                                             3) 	
  Visualize	
  triples	
  
                                                       	
  Pathways	
  
                                                       	
  Networks	
  



             Expert	
  Bioinforma/cs	
  from	
  Bioinforma/cs	
  Experts	
  
1st	
  you	
  need	
  some	
  triples	
  




                   Expert	
  Bioinforma/cs	
  from	
  Bioinforma/cs	
  Experts	
  
Then,	
  you	
  need	
  some	
  magic
                                                      	
  
    Transi/ve	
  Proper/es	
                                  flyatlas:ProbeData	
  
     -    A-­‐B-­‐C	
  →	
  A-­‐C	
                             subclass	
  BP:DNAregion	
  
    Class	
  Subsump/on	
  
     -    FlyAtlas	
  to	
  BioPAX	
  
                                                               flyatlas:1234_at	
  is	
  a	
  
                                                                flyatlas:ProbeData,	
  	
  
    Node	
  Integra/on	
  
     -    iden/fiers.org	
  URI's	
  
                                                               flatlas:1234_at	
  is	
  a	
  
                                                                BP:DNARegion	
  	
  


                               Expert	
  Bioinforma/cs	
  from	
  Bioinforma/cs	
  Experts	
  
Next,	
  the	
  preVy	
  bit:	
  Visualisa/on	
  
    PathVisio	
                                        Cytoscape	
  




                               And	
  other	
  views	
  ….	
  
                     Expert	
  Bioinforma/cs	
  from	
  Bioinforma/cs	
  Experts	
  
Finally...
                          	
  




Expert	
  Bioinforma/cs	
  from	
  Bioinforma/cs	
  Experts	
  
<Posi/on	
  about:Flexiblilty/>	
  
    Flexible	
  data	
  integra/on	
  
      -    Use	
  Linked	
  Data	
  
      -    Use	
  Iden/fiers.org	
  
      -    Don’t	
  be	
  afraid	
  to	
  extend	
  Ontologies,	
  that’s	
  the	
  point	
  	
  
      -    Be	
  reasonable	
  with	
  what	
  you	
  integrate,	
  you	
  can	
  always	
  
           add	
  more	
  later...	
  
    Use	
  tools	
  that	
  best	
  answers	
  the	
  biological	
  ques/ons	
  
    Script	
  everything,	
  you	
  will	
  probably	
  have	
  to	
  redo	
  it!	
  
    It	
  needs	
  to	
  be	
  flexible,	
  this	
  is	
  research	
  and	
  is	
  not	
  
     about	
  building	
  the	
  best	
  Enterprise	
  like	
  “one	
  fits	
  all”	
  
     solu/on.	
  	
   Expert	
  Bioinforma/cs	
  from	
  Bioinforma/cs	
  Experts	
  
We	
  are	
  hiring!
                          	
  
www.generalbioinforma/cs.com   	
  




    Nadia	
  Anwar,	
  Mar/jn	
  van	
  Iersel
                                             	
  
       Expert	
  Bioinforma/cs	
  from	
  Bioinforma/cs	
  Experts	
  

Weitere ähnliche Inhalte

Ähnlich wie Swat4 ls2012

From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
From Buffer-Overflowing Genomic Tools to Securing Biomedical File FormatsFrom Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
From Buffer-Overflowing Genomic Tools to Securing Biomedical File FormatsCharles Fracchia
 
2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow system2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow systemStian Soiland-Reyes
 
Building collaborative workflows for scientific data
Building collaborative workflows for scientific dataBuilding collaborative workflows for scientific data
Building collaborative workflows for scientific dataBruno Vieira
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglyJoão André Carriço
 
Construire une « data fabric » pour les environnements edge
Construire une « data fabric » pour les environnements edgeConstruire une « data fabric » pour les environnements edge
Construire une « data fabric » pour les environnements edgeOpen Source Experience
 
Best Practices for Securing Active Directory v2.0
Best Practices for Securing Active Directory v2.0Best Practices for Securing Active Directory v2.0
Best Practices for Securing Active Directory v2.0Danny Wong
 
Enabling Biobank-Scale Genomic Processing with Spark SQL
Enabling Biobank-Scale Genomic Processing with Spark SQLEnabling Biobank-Scale Genomic Processing with Spark SQL
Enabling Biobank-Scale Genomic Processing with Spark SQLDatabricks
 
TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...keesvb
 
zenoh -- the ZEro Network OverHead protocol
zenoh -- the ZEro Network OverHead protocolzenoh -- the ZEro Network OverHead protocol
zenoh -- the ZEro Network OverHead protocolAngelo Corsaro
 
Cheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesCheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesJeremy Yang
 
How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)Dinis Cruz
 
Jim Wojno: Incident Response - No Pain, No Gain!
Jim Wojno: Incident Response - No Pain, No Gain!Jim Wojno: Incident Response - No Pain, No Gain!
Jim Wojno: Incident Response - No Pain, No Gain!centralohioissa
 
Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of BioinformaticsManaging & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of BioinformaticsRaul Chong
 
Bio-IT 2017 - Session 7: Next-Gen Sequencing Informatics
Bio-IT 2017 - Session 7: Next-Gen Sequencing InformaticsBio-IT 2017 - Session 7: Next-Gen Sequencing Informatics
Bio-IT 2017 - Session 7: Next-Gen Sequencing InformaticsYaoyu Wang
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesGuy Coates
 

Ähnlich wie Swat4 ls2012 (20)

HPC For Bioinformatics
HPC For BioinformaticsHPC For Bioinformatics
HPC For Bioinformatics
 
From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
From Buffer-Overflowing Genomic Tools to Securing Biomedical File FormatsFrom Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
 
2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow system2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow system
 
Building collaborative workflows for scientific data
Building collaborative workflows for scientific dataBuilding collaborative workflows for scientific data
Building collaborative workflows for scientific data
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databases
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The Ugly
 
Construire une « data fabric » pour les environnements edge
Construire une « data fabric » pour les environnements edgeConstruire une « data fabric » pour les environnements edge
Construire une « data fabric » pour les environnements edge
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
Best Practices for Securing Active Directory v2.0
Best Practices for Securing Active Directory v2.0Best Practices for Securing Active Directory v2.0
Best Practices for Securing Active Directory v2.0
 
Enabling Biobank-Scale Genomic Processing with Spark SQL
Enabling Biobank-Scale Genomic Processing with Spark SQLEnabling Biobank-Scale Genomic Processing with Spark SQL
Enabling Biobank-Scale Genomic Processing with Spark SQL
 
TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...
 
zenoh -- the ZEro Network OverHead protocol
zenoh -- the ZEro Network OverHead protocolzenoh -- the ZEro Network OverHead protocol
zenoh -- the ZEro Network OverHead protocol
 
Cheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesCheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case Studies
 
How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)
 
Jim Wojno: Incident Response - No Pain, No Gain!
Jim Wojno: Incident Response - No Pain, No Gain!Jim Wojno: Incident Response - No Pain, No Gain!
Jim Wojno: Incident Response - No Pain, No Gain!
 
Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of BioinformaticsManaging & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
 
Bio-IT 2017 - Session 7: Next-Gen Sequencing Informatics
Bio-IT 2017 - Session 7: Next-Gen Sequencing InformaticsBio-IT 2017 - Session 7: Next-Gen Sequencing Informatics
Bio-IT 2017 - Session 7: Next-Gen Sequencing Informatics
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 

Swat4 ls2012

  • 1. Bespoke  data  integra/on  using  open   source  &  seman/c  technologies   Nadia  Anwar,  Mar/jn  van  Iersel  
  • 2. What  do  we  do?   Direct  support  of  scien.sts  in  research   Data  Acquisi.on,  Management  and  Stewardship   Data  Integra.on   Answering  specific  and  complex  Bioinforma.cs  ques.ons   Tool  kit  deployment  and  maintenance  for  use  in-­‐house  
  • 3. Goals     Integrate  data   -  Typically,  using  linked  data     Answer  biological  ques/ons   -  Through  useful  visualisa/ons       Reproducible   -  Everything  is  scripted,  version  controlled  and  tracked     Flexible   -  Using  Mul/ple  Specialized  Tools   Expert  Bioinforma/cs  from  Bioinforma/cs  Experts  
  • 4. You  have  data,  I  have  data...now  what?   * Adapted from http://xkcd.com/208/ (CC-BY-NC) Expert  Bioinforma/cs  from  Bioinforma/cs  Experts  
  • 5. hVp://fly.cloud.generalbioinforma/cs.com   FlyAtlas FlyCyc 1) Transform  into  triples     Fly  Expression  Data     Fly  Pathway  Data   2)   Infer  some  more  triples   3)   Visualize  triples      Pathways      Networks   Expert  Bioinforma/cs  from  Bioinforma/cs  Experts  
  • 6. 1st  you  need  some  triples   Expert  Bioinforma/cs  from  Bioinforma/cs  Experts  
  • 7. Then,  you  need  some  magic     Transi/ve  Proper/es     flyatlas:ProbeData   -  A-­‐B-­‐C  →  A-­‐C   subclass  BP:DNAregion     Class  Subsump/on   -  FlyAtlas  to  BioPAX     flyatlas:1234_at  is  a   flyatlas:ProbeData,       Node  Integra/on   -  iden/fiers.org  URI's     flatlas:1234_at  is  a   BP:DNARegion     Expert  Bioinforma/cs  from  Bioinforma/cs  Experts  
  • 8. Next,  the  preVy  bit:  Visualisa/on     PathVisio     Cytoscape     And  other  views  ….   Expert  Bioinforma/cs  from  Bioinforma/cs  Experts  
  • 9. Finally...   Expert  Bioinforma/cs  from  Bioinforma/cs  Experts  
  • 10. <Posi/on  about:Flexiblilty/>     Flexible  data  integra/on   -  Use  Linked  Data   -  Use  Iden/fiers.org   -  Don’t  be  afraid  to  extend  Ontologies,  that’s  the  point     -  Be  reasonable  with  what  you  integrate,  you  can  always   add  more  later...     Use  tools  that  best  answers  the  biological  ques/ons     Script  everything,  you  will  probably  have  to  redo  it!     It  needs  to  be  flexible,  this  is  research  and  is  not   about  building  the  best  Enterprise  like  “one  fits  all”   solu/on.     Expert  Bioinforma/cs  from  Bioinforma/cs  Experts  
  • 11. We  are  hiring!   www.generalbioinforma/cs.com   Nadia  Anwar,  Mar/jn  van  Iersel   Expert  Bioinforma/cs  from  Bioinforma/cs  Experts