Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

(ATS6-PLAT03) What's behind Discngine collections

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 72 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Anzeige

Ähnlich wie (ATS6-PLAT03) What's behind Discngine collections (20)

Weitere von BIOVIA (20)

Anzeige

Aktuellste (20)

(ATS6-PLAT03) What's behind Discngine collections

  1. 1. ATS6-PLAT03 What's behind Discngine collections Accelrys Tech Summit 2013 Eric Le Roux – Vincent Le Guilloux | May, 2013
  2. 2. Agenda Discngine Tibco Spotfire Connector ► How it works ► Integration challenges Graph collection ► Quick introduction to graphs ► Implementations approach (In-memory and graph databases) ► Quick demo / Use case
  3. 3. Discngine Scientific computing consulting services and solutions for pharmaceutical research 3 Customers: Sanofi, l’Oréal, IPSEN, Novartis, Roche, Pierre Fabre, CEREP, P&G, Servier, Cephalon, Tibotec-Virco, Galapagos, Biofocus… Founded in 2004 - Based in Paris, France - 17 Consultants Come visit our booth for more information & demos
  4. 4. Tibco Spotfire Pipeline Pilot Connector 4 How does it work?
  5. 5. Tibco Spotfire Pipeline Pilot Connector Demo 5
  6. 6. Tibco Spotfire Pipeline Pilot Connector 6 Pipeline Pilot Server Tibco Spotfire Server Discngine TS Connector Collection Discngine Web Panel Client Management Template storage Architecture
  7. 7. Tibco Spotfire Pipeline Pilot Connector 7 Pipeline Pilot Server Tibco Spotfire Server Discngine TS Connector Collection Discngine Web Panel Client Management Template storage Architecture Javascript – C# wrapper
  8. 8. Tibco Spotfire Pipeline Pilot Connector 8 Pipeline Pilot Server Tibco Spotfire Server Discngine TS Connector Collection Discngine Web Panel Client Management Template storage Architecture Reporting collection based custom components
  9. 9. Tibco Spotfire Pipeline Pilot Connector 9 Pipeline Pilot Server Tibco Spotfire Server Discngine TS Connector Collection Discngine Web Panel Client Management Template storage Oracle Application Express Other web server Architecture
  10. 10. Tibco Spotfire Pipeline Pilot Connector Execution flow (basic protocol) 1. Pipeline Pilot protocol runs 2. Pipeline Pilot protocol generate a HTML page 3. The HTML page is rendered in an Internet Explorer .net control inside Discngine Web Panel 4. JavaScript instruction is executed 5. Spotfire C# API function is called 6. End of HTML page rendering 10
  11. 11. Tibco Spotfire Pipeline Pilot Connector Demo: Building protocols 11
  12. 12. Tibco Spotfire Pipeline Pilot Connector Integration challenges 12
  13. 13. Tibco Spotfire Pipeline Pilot Connector Integration challenges ► API encapsulation 13 9000+ Methods & Properties 28 components
  14. 14. Tibco Spotfire Pipeline Pilot Connector Integration challenges ► API encapsulation Example: « Event Listener », a single component to • Listen to marking events • Create a hidden form • Capture marked records identifiers • Submit marked records to a PP protocol 14 1 2 3
  15. 15. Tibco Spotfire Pipeline Pilot Connector Integration challenges ► Component parameters mapping & wording 15 Do you speak Pipelinish? X
  16. 16. Tibco Spotfire Pipeline Pilot Connector Integration challenges ► Component parameters mapping & wording 16 No I speak Spotfirish!
  17. 17. Tibco Spotfire Pipeline Pilot Connector Integration challenges ► Component parameters mapping & wording 17 How to capture advanced color gradients with component parameters?  Workaround: Spotfire templates
  18. 18. Tibco Spotfire Pipeline Pilot Connector Integration challenges ► Client & server datasets synchronization 18 Data consistency End-users can modify data context on the client side: Computation of new columns Add & remove rows Drop & create data tables Initialize data sets on the client (new .dxp file)
  19. 19. Tibco Spotfire Pipeline Pilot Connector Integration challenges ► Client & server datasets synchronization • Option 1: HTTP Post 19 Pipeline Pilot Server Discngine Web Panel Implemented in v1.2 Data tables in .stdf format
  20. 20. Tibco Spotfire Pipeline Pilot Connector Integration challenges ► Client & server datasets synchronization • Option 2: File copy 20 Pipeline Pilot Server Discngine Web Panel Implemented in v1.2 Data tables in .stdf format .stdf reader component Filesystem
  21. 21. Web Mashups 21 Come visit our booth for more information & demos
  22. 22. The Graph collection 22
  23. 23. Agenda (quick) Introduction to Graphs Graphs in Pipeline Pilot Demo Graph Databases in Pipeline Pilot Demo 23
  24. 24. What is a Graph ? 24 A graph is a data structure representing objects (nodes) that are connected to each others by links (edges, or relationships).
  25. 25. What is a Graph ? 25 A graph is a data structure representing objects (nodes) that are connected to each others by links (edges, or relationships). Node Undirected Edge Directed Edge
  26. 26. Property Graph Data Model 26
  27. 27. Property Graph Data Model 27 Protein A Protein B Molecule 1 Molecule 2
  28. 28. Property Graph Data Model 28 Protein A Protein B interact Molecule 1 Molecule 2 similar inhibits shareFragment
  29. 29. Property Graph Data Model 29 Protein A Protein B interact Molecule 1 Molecule 2 similar inhibits LogP = 1.1 pIC50 = 6.8 shareFragment
  30. 30. Graphs: when and why? Graph Problems ► You need Graphs if you have a problem that requires algorithms related to graph theory: • Shortest path (GPS systems) • Motif search (substructure search in molecules) • Importance Measures (Google’s PageRank) 30
  31. 31. Graphs: when and why? Visualization ► You may want to use graphs as an intuitive way to represent objects and their relationships • Subway Map • Metabolic Pathways • Protein-protein interaction networks • Molecule depiction 31
  32. 32. Graphs: when and why? Data Modeling (NoSQL / Big Data hype) ► You can use graphs as a flexible data model, when your data consists in objects and relationships between them • Google’s Knowledge Graph • Facebook Graph Search 32
  33. 33. Discngine Graph Collection Manage graphs as Pipeline Pilot data records: ► Creation and Manipulation ► Algorithms ► Persistence / IO ► Visualization ► Traversals (the “SQL” of graphs) 33
  34. 34. The big question How can we represent graphs in the data flow ? ► A Graph is not flat ► A Graph has different types of data ► Advanced data structures are required to operate efficiently on graphs 34
  35. 35. The big question How can we represent graphs in the data flow ? 35 Pro Cons Native No objects, methods, etc. User and developer friendly No Fibonnacy heap, FIFO / LIFO queues, etc. Record hierarchy is a Tree Pipeline Pilot Data model
  36. 36. The big question How can we represent graphs in the data flow ? 36 Pro Cons Advanced programming framework Performance: overhead induced by interfacing C++ and JAVA / Perl Exposes most functions required to deal with data record Pipeline Pilot Data model JAVA / Perl API Pro Cons Native No objects, methods, etc. User and developer friendly No Fibonnacy heap, FIFO / LIFO queues, etc. Record hierarchy is a Tree
  37. 37. The answer How can we represent graphs in the data flow ? A mixed solution: ► JAVA for performance and advanced data structures / Object Oriented API ► Expose part of the data and processes via. the data record tree and PilotScript 37
  38. 38. PilotGraph Hierarchy 38
  39. 39. PilotGraph Hierarchy 39
  40. 40. PilotGraph Hierarchy 40 Root Node of a data record
  41. 41. PilotGraph Hierarchy 41 Group Node containing Node records
  42. 42. PilotGraph Hierarchy 42 Group Node containing edge records
  43. 43. PilotGraph Hierarchy 43 Nodes containing properties
  44. 44. PilotGraph in JAVA 44
  45. 45. PilotGraph in JAVA 45
  46. 46. PilotGraph in JAVA 46 https://github.com/tinkerpop/blueprints/wiki
  47. 47. Demo 47
  48. 48. PilotGraph Model: cons 48 JAVA consumes memory JAVA has limited allocated memory per-job ► 384 Mb on a 64 bit server – see apps/scitegic/core/xml/Objects/JavaEnvironment.xml Serialization is OK for small to medium graphs, but the bigger the graph is, the longer the serialization process will be
  49. 49. Graph Databases 49 Graph Databases are persistent engines dedicated to the storage of graph data structures. The Graph Database Stack (not exhaustive): ► Neo4j ► Orient DB ► HypergraphDB ► Titan ► Dex ► InfiniteGraph ► AllegroGraph
  50. 50. PilotGraph VS DatabaseGraph 50 PilotGraph (record) ~ 300 000 elements (depends on the amount of memory allocated to JAVA)
  51. 51. PilotGraph VS DatabaseGraph 51 PilotGraph (record) DatabaseGraph (connection) ~ 300 000 elements (depends on the amount of memory allocated to JAVA) Millions to Billions of elements
  52. 52. Graph database workflow 52
  53. 53. Demos 53
  54. 54. Take home message What is the best way to manage Graphs within Pipeline Pilot ? ► Take advantage of PP JAVA API, which is the best tradeoff between performance and flexibility JAVA ► Expose as much as possible the data via Data Record hierarchy and Pilotscript ► Use a common API to manage in-memory and persistent graph databases transparently 54
  55. 55. Thank you for your attention Traversals, Visualization, Reporting Integration, Algorithms, Roadmap… Welcome to our booth  55 Come visit our booth for more information & demos
  56. 56. www.discngine.com Thanks!
  57. 57. Graph Collection v 2.0 57 BASIC MANIPULATIONS ► Add / Remove elements • From Cache • From Records ► Pilotscript facilities • Remove elements with Pilotscript • Set property values ► Add / Remove / Keep Properties ► Join Graph Records ► Intersect Graph Records ► Extract Edges and Nodes ► Key-Value property search ► Traversal framework GRAPH ALGORITHMS ► Shortest Path (weighted / unweighted ► Minimum Spanning Tree ► Cliques ► Disconnected sub-graphs ► Articulators ► Subgraph-matching IMPORTANCE MEASURES ► Degree centrality ► Closeness centrality ► Density ► Distance to query
  58. 58. Graph Collection v 2.0 58 VISUALISATION ► Layouts • ARF • Frucherman-Reingold • GraphViz ► GraphViz Integration ► HTML 5 Interactive Viewer ► Cytoscape Web Report REPORTING INTEGRATION ► GraphViz image report ► HTML 5 Graph report (prototype) ► Cytoscape Web Report (prototype) READERS AND WRITERS ► GraphML ► SIF (Cytoscape) ► GEXF GRAPH DATABASE ► Neo4j Integration ► ACID transactions ► Algorithms can be applied on graph databases in a transparent way ► Scales to millions of nodes and edges
  59. 59. Traversal ? 59 “I have an active molecule on protein P, which other protein(s) can be potentially inhibited by this molecule ?“ Step 0: Find your query in the graph Query
  60. 60. Traversal ? 60 “I have an active molecule on protein P, which other protein(s) can be potentially inhibited by this molecule ?“ Step 1: Fetch similar molecules : Walk through “similar” relationships Query similar
  61. 61. Traversal ? 61 “I have an active molecule on protein P, which other protein(s) can be potentially inhibited by this molecule ?“ Step 1: Fetch similar molecules : Save molecules Mol Query similar Mol
  62. 62. Traversal ? 62 “I have an active molecule on protein P, which other protein(s) can be potentially inhibited by this molecule ?“ Step 2: Fetch associated proteins: walk through “activates” and “inhibits” (and anything else related to our problem) relationships inhibits pIC50 = 8,8 Mol Query similar Mol
  63. 63. Traversal ? 63 “I have an active molecule on protein P, which other protein(s) can be potentially inhibited by this molecule ?“ Step 3: Collect the (potential!) winners Protein B Protein C inhibits pIC50 = 8,8 Mol Query similar Mol
  64. 64. Protein-Protein interaction networks Proteins linked if they interact 64
  65. 65. Protein-Protein interaction networks Hubs: highly connected proteins 65
  66. 66. Protein-Protein interaction networks Articulators: central proteins that, if removed (i.e. inhibited), will disconnect two functional modules 66
  67. 67. Protein-Protein interaction networks Articulators: central proteins that, if removed (i.e. inhibited), will disconnect two functional modules Candidates for inhibition ? Side Effects ? 67
  68. 68. SAR Analysis Similarity networks 68 similar Tanimoto = 0,98
  69. 69. SAR Analysis Similarity networks (PubchemCYP3A4 inhibition assay, AID 884) 69 Cluster of low activity Cluster of high activity
  70. 70. SAR Analysis Activity cliffs 70 pIC50 = 5.1 pIC50 = 6,9
  71. 71. SAR Analysis Single-point substitution analysis 71
  72. 72. Scaffold Network Display 72

×