Melden

Teilen

•0 gefällt mir•150 views

•0 gefällt mir•150 views

Melden

Teilen

Downloaden Sie, um offline zu lesen

Prof Arghya Das from University of Wisconsin - Platteville presented as part of the 3 days International summit using OpenpOWER Systems

- 1. Arghya Kusum Das, Ph.D. Assistant Professor, University of Wisconsin-Platteville In collaboration with Radha Nagarajan, Ph.D. Director, COSH, Marshfield Clinic Health System (Digital Health, Data Science, Bioinformatics, RWE) RWE) Graphical Structure Learning Accelerated with POWER9
- 2. o Overview of Graphical Models o Implementation o Preliminary Findings o Healthcare Applications Overview
- 3. Graphs/Networks: Comprised of nodes and edges nodes/vertex: represent the entities of interest edges: represent the associations/relationships between the nodes. Graphical models: Model the associations between the entities as a graph. Example: nodes: COVID subjects edges: association between the COVID subjects (e.g. contact tracing) © searchengineland.com
- 4. Why Graphical Models? o system-level abstractions: Graphical models can reveal system-level properties and behavior not apparent in the reductionist representation. System-level abstractions is especially critical in developing developing targeted intervention. e.g. model COVID spread in a given community from contact tracing1; use the model to assist in assist in targeted community-based interventions/policies e.g. model the signaling mechanism initiated by COVID spike protein; use the model to identify identify potential target molecules for drugs to minimize disease severity/inflammation2 o in-silico models: Graphical models can be experimented in a controlled and cost-effective manner. This includes posing questions to these models (e.g. inference). e.g. given the evidence that a subject has cough, fever, sore throat and shortness of breath determine the probability that the subject is COVID +ve o causal associations: Graphical models may reveal causal association3 under certain implicit assumptions (Note: we are attempting decipher causality from observational data!) 1https://www.cdc.gov/coronavirus/2019-ncov/daily-life-coping/contact-tracing.html 2https://www.cebm.net/covid-19/dexamethasone/ 3Pearl, J [2009] Causality: Models, Reasoning and Inference.
- 5. Problem: What we have: Data across an informed set of variables (D) What we need: Graphical structure (G) representing the associations between these variables Pair-wise dependencies: Direct associations between a given pair of nodes determined using similarity measures Note: Associations between a pair of variables may not be direct and can mediated through a third variable.Conclusions based on pair-wise dependencies while helpful may be incomplete. e.g. Loss of Taste (L) and Disease Severity (D) may not be associated as such (i.e. marginally marginally independent). However, L and D may be associated given that the subject has COVID L D C D L
- 6. What we need: Graphical structure Approach: Bayesian structure learning - Models the joint probability distribution across the given informed set of variables - Incorporates conditional dependencies between a given set of variables in an iterative manner C D L
- 7. Data? o multivariate: more than one variable is measured o Can be longitudinal or cross-sectional longitudinal: a continuous process is sampled as a function of time resulting in time series challenging to obtain as the several factors have to be controlled cross-sectional: replicate measurements of a continuous process is sampled in a given time window (snapshot) (snapshot) relatively easier to obtain Note: The approaches to be discussed implicitly assumes that the properties of the data is preserved across the replicate realizations.
- 8. Question: Given the cross-sectional data on the loss of taste (Yes/No), Disease Severity (Yes/No), Result of COVID test (+/-) can we model the association between them Three popular approaches for structure learning (static): o Constraint-based Learn the structure using conditional independence tests o Search and score Learn the structure that best fits the data using a greedy search with a scoring criteria o Hybrid Learn the structure using a combination of constraint-based and search-score approaches Subject C (+/-) D (Y/N) L (Y/N) 1 + Y Y 2 + Y N 3 - Y Y 4 - N Y . . . . . . . . . . . . C D L ? ?
- 9. Bayesian network structure learning: o Exhaustive Enumeration: Number of possible structures grows super-exponentially with the number of nodes n1. 𝑎𝑛 = 𝑘=1 𝑛 (−1)𝑘−1 𝑛 𝑘 2𝑘(𝑛−𝑘) 𝑎𝑛−𝑘 𝑎0 = 1 Note: Exhaustive enumeration in general is not computationally feasible from a practical standpoint. 1Robinson, R. W. "Counting Labeled Acyclic Digraphs." In New Directions in Graph Theory (Ed. F. Harary). New Nodes DAGs 1 1 2 3 3 25 4 543 5 29281 . . . .
- 10. Markov Equivalence Class: probabilistically indistinguishable graphical structures. 𝑝 𝐿, 𝐷, 𝐶 = 𝑝(𝐿/𝐶). 𝑝 𝐶 . 𝑝(𝐷/𝐶) 𝑝 𝐿, 𝐷, 𝐶 = 𝑝(𝐿/𝐶). 𝑝 𝐷 . 𝑝(𝐶/𝐷) 𝑝 𝐿, 𝐷, 𝐶 = 𝑝 𝐿 . 𝑝(𝐶/𝐿). 𝑝(𝐷/𝐶) Note: Even if exhaustive enumeration were possible, structures can be learned only up to the Markov equivalence class. C D L C D L C D L
- 11. Search and Score (Hill Climbing): 𝑃 𝐺|𝐷 α 𝑃 𝐷|𝐺 . 𝑃(𝐺) Theoretical consideration on the complexity of Greedy search under certain assumptions have been been investigated1 1Scutari, M et al. [2018] Learning Bayesian Networks from Big Data with Greedy Search, Statistics and Computing Likelihood Prior
- 12. Search and Score (Hill Climbing) Hill-climbing is a sequential algorithm. Score of the present structure G* is generated by modifying the modifying the previous structure (G) as in Step 4 in an iterative manner BIC Score = 𝑖=1 𝑛 log[𝑃(𝑋𝑖/Π𝑋𝑖 )] − 𝑑 2 log 𝑛 Opportunities for distributing the computation in the hill climbing approach o The potential structures interrogated in Step 4(a) can be distributed o BIC score of a candidate structure is the sum of the scores of its local structures, hence can be distributed o Greedy aspect of hill-climbing in conjunction Markov equivalence can result in locally optimal convergence encouraging repeating the procedure with multiple random restarts, this in turn can be can be distributed Regularization term d = #parameters
- 13. Implementation: Architecture *Image from IC922 Redbook x86: Server: HPE ProLiant DL580 servers CPU Type: Intel Xeon EX-series Cores per node: 16 DRAM: 512GB POWER 9: Server: IC922 CPU Type: DD2.3 POWER9 processor modules Cores per node: 160 virtual cores Access up to 32DIMM Sustained bandwidth 28.8 GB
- 14. Implementation: o Data description: HEPMASS1,2 (10.5 x 106 samples comprising of 28 variables , Baldi et al., 20161). All continuous normalized features were discretized into binary categorical variables by thresholding thresholding about their mean. o Python Implementation: Bayesian network using Pandas, NetworkX 1Baldi P, et al. [2016] Parameterized Neural Networks for High-Energy Physics. The Eur. Phys. J. C 76(235). 2Scutari, M et al. [2018] Learning Bayesian Networks from Big Data with Greedy Search, Statistics and Computing. A C D B E A C D B E A C D B E A C D B E A C D B E A C D B E
- 15. Multiple Cores Architecture: Dask Distributed Python/Dask APIs Parallel Restart SHA-256 Hash confirms uniqueness of visited graph A C D B E A C D B E A C D B E A C D B E A C D B E A C D B E Spawning multiple Hill Climbing instances Data
- 16. Performance of structure learning on POWER and x86: Mean, standard distribution of the computational time across 5 runs of the HEPMASS data with Hill- Climbing. A two-sample ttest with unequal variance was used to compared the times between x86 and POWER architectures (# implies significant difference). The computational time were statistically significant (p < 0.001) between the x86 and the POWER architectures, with the POWER architectures taken considerably lesser time than x86. As expected, BIC score takes less computational time than K2 score and these scores 0 10000 20000 30000 40000 50000 1 2 3 Time (Seconds) Max Fan in x86 POWER Performance of x86 and POWER 9 on HEPMASS (BIC Score) 0 10000 20000 30000 40000 50000 1 2 3 Time (Seconds) Max Fan in x86 POWER Performance of x86 and POWER 9 on HEPMASS (K2 Score) # # # # # #
- 17. Performance of structure learning on POWER and x86 with varying Map Tasks: Mean, standard distribution of the computational time across 5 runs of the HEPMASS data with Hill- Climbing. A two-sample ttest with unequal variance was used to compared the times between x86 and POWER architectures (# implies significant difference). There was statistically significant difference in the computational time between the x86 and the POWER architectures when random restarts were distributed as map task jobs. As the number of map tasks were increased the computations time decreased across both POWER and x86 and the separation in the average time increased between x86 and POWER. # Corresponds to p < 0.05; * Corresponds to p < 0.0001 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 1 2 3 4 5 6 7 Time (Seconds) Map Tasks x86 POWER # # # # # * * Performance of POWER and x86 with varying Map Tasks (BIC Score) 2 4 8 16 32 64 128
- 18. Healthcare – current trends: o Explosion in Digital Healthcare Data: - Source Systems: Continued digitization from multiple sources (EHR, Claims, Registries, IoT) and multiple types (Text, Image, Signals) - Multiscale Profiles: Emphasis on capturing the complete description of patients. - Common Data Models: Develop approaches for sharing observational healthcare data (OMOP/OHDSI) across multiple organizations and research networks (e.g. HIE, PCORNet) - High-throughput: molecular data (e.g. Next Generation Sequencing) - FHIR: Development of (Fast Healthcare Interoperability Resources) for enhanced interoperability across systems and devices o Explosion in Analytics Adoption: - Descriptive, Predictive, Prescriptive Analytics - Shift from storage to analytics and consensus-based to evidence-based/data-driven approaches to impact outcomes/KPIs. - Surge in the adoption of Machine Learning (ML) and Artificial Intelligence (AI) approaches.
- 19. Healthcare Applications: Graphical Models – where do they fit in o Healthcare data sets are inherently multivariate and noisy attributed to several factors. Probabilistic graphical models are especially suited to handle noisy data. o Associations in multivariate healthcare data may be unknown. Graphical models can discover novel associations (hypothesis generation) in addition to validating known associations (hypothesis testing). Deciphering these associations is critical in prescribing targeted interventions. o Graphical models fall under ML and AI1. Can be used for descriptive, predictive and prescriptive analytics (e.g. Naïve Bayes Classifier). AI aspect of Graphical models: Answer queries posed from the evidence provided about a disease. o Graphical Models Healthcare applications include: Diagnostic Reasoning, Prognostic Reasoning and Treatment selection, Discovering functional associations2 o Emphasis on inferring causal associations from observational healthcare data with potential to complement classical approaches (e.g. RCT 3), RCTs being idealizations. o Interpretable and easily visualized for critical evaluation in healthcare settings. Need: Architectures and programming environment that can implement 1Russell, S. Norvig, R. [2020] Artificial Intelligence: A Modern Approach, 4th ed 2Lucas PJF et al. [2004] Bayesian networks in biomedicine and health-care Artif. Intell. Med. 30(3):201-14 3Berwick, D [2008] The Science of Improvement, JAMA, 1182- 1184 4Mclachlan, S et al. [2020] Bayesian networks in healthcare: Distribution by medical condition. Artificial Intelligence in Medicine. 107, 101912
- 20. Summary o Structure learning is computationally intensive especially across large data sets and large number of variables o Preliminary findings revealed marked improvement in performance using POWER architectures in addressing computational challenges of structure learning approaches such as hill-climbing o Need for a more detailed investigation using a battery of data sets and across distinct graphical model algorithms o Graphical modeling approaches in general have considerable healthcare applications. Their ability to reason under uncertainty makes them especially ideal for healthcare analytics. o https://onstituteacademy.herokuapp.com Acknowledgements Marco Scutari, Ph.D. Senior Researcher, Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA), Switzerland Terry Leatherland, Trish Froeschle, Thomas Prokop, IBM, USA Ganesan Narayanswami, OpenPOWER leader in Education and Research