Krishnaprasad Thirunarayan and Amit Sheth: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications, In: Proceedings of AAAI 2013 Fall Symposium on Semantics for Big Data, Arlington, Virginia, November 15-17, 2013.
With the rapid proliferation of mobile phones, social media, and sensors, it is critical to collect and convert big data so generated into actionable information that is relevant for decision making. In this session, we explore challenges and approaches for synthesizing relevant background knowledge and inferences that can enable smart healthcare and ultimately benefit community at large.
Paper: http://www.knoesis.org/library/resource.php?id=1903
Call Now ☎ 8868886958 || Call Girls in Chandigarh Escort Service Chandigarh
Big data healthcare
1. 1
1
T. K. Prasad (Krishnaprasad Thirunarayan )
Professor of Computer Science and Engineering
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, OH-45435
Big Data and Smart Healthcare
Honors Institute Symposium on Visions of the Future
2. Big Data Processing and Smart Healthcare
Krishnaprasad Thirunarayan (T. K. Prasad)
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
3. Outline
• Extent and Economics of Healthcare Problem
• Nature of Health-related Big Data
• Cognitive Computing Goals
• Five V’s of Big Data Research
• Our Research
– Semantic Perception for Scalability
– Lightweight Semantics to Manage Heterogeneity
– Hybrid Knowledge Representation and Reasoning
• Anomaly, Correlation, Causation
03/20/2014 Prasad 3
4. Acute Decompensated Heart Failure (ADHF) Statistics
• Heart failure affects > 5 million people in the US.
• > 550,000 new cases are diagnosed each year.
• The estimated cost of heart failure in the US for
2008 is $34.8 billion.
• Approximately 25% of patients are re-hospitalized
within 30 days of discharge.
• Approximately 50% of patients are re-hospitalized
within 6 months of discharge.
03/20/2014 Prasad 4
5. Asthma Statistics
• Asthma affects > 25 million people in the US.
• > 7 million are children.
• The current reactive cost > $56 billion.
• Asthma is the third leading cause of hospitalization
with 800,000 emergency room visits among
children under the age of 15.
03/20/2014 Prasad 5
6. Obesity Statistics
03/20/2014 Prasad 6
• The number of severely obese (BMI ≥ 40)
patients has quadrupled between 1986 and
2000 from one in 200 to one in 50.
• Obesity-related medical treatment costs
> $150 billion a year.
• Hospitalizations of children and youths with
obesity doubled from 1999 to 2005.
7. Parkinson’s Disease (PD) Statistics
03/20/2014 Prasad 7
• In 2010, 630,000 people in the US had a
diagnosis of PD.
• The number of people with PD will
double by 2040.
• Just medical costs for people with PD is
$8.1 billion total.
8. The Patient of the Future
MIT Technology Review, 2012
http://www.technologyreview.com/featuredstory/426968/the-patient-of-the-future/ 8
9. Healthcare Related Big Data for Potential Exploitation:
Assorted Examples
• Sensor data: M. J. Fox Foundation Parkinson
disease challenge
• Other Applications: The healthcare industry spends
roughly $250 billion per year due to fraud.
03/20/2014 Prasad 9
10. Structured vs Unstructured Data
Patient Disorders ICD-9 Code
Patient1 Hypertension 401
Patient2 Atrial fibrillation 427.31
Patient1 Pulmonary hypertension 416
Patient3 Edema 782.3
Patient4 hyperthyroidism 242.9
Coronary artery disease, status post four-vessel coronary
artery bypass graft surgery on , by Dr. X with a left internal
mammary artery to the left anterior descending artery,
sequential vein graft to the ramus and first diagonal, and a vein
graft to the posterior descending artery. He had normal left
ventricular function. He is having some symptoms that are
unclear if they are angina or not. I am therefore going to get
him scheduled for an exercise Cardiolite stress test.
VS
13. An Example
He is off both Diovan and Lotrel. I am unsure if it is due to underlying renal insufficiency. He
has actually been on atenolol alone for his hypertension.
Raw Text
Concepts
Knowledge
Inference
diovan lotrel
renal
insufficiency
atenolol hypertension
diovanvaltuna
valsartan
antihypertensive
agent
atenolol
tenominatenix
kidney
failure
renal
insufficiency
kidney
disease
disorder
blood pressure
disorder
hypertension
systoloc
hypertension
pulmonary
hypertension
Patient taking diovan
for hypertension
Patient has
kidney disease
Patient is on
antihypertensive drugs
is used to treat
is a
drug
disorder
14. Purpose of Big Data Analytics Vetted by Domain Experts
Data can help compensate for our overconfidence
in our own intuitions and reduce the extent to
which our desires distort our perceptions.
-- David Brooks of New York Times
However, inferred correlations require clear
justification that they are not coincidental, to
inspire confidence.
03/20/2014 Prasad 14
15. Cognitive Computing Systems
03/20/2014 Prasad 15
• Leverage Big Data using human experts to
enable better decisions.
– Process natural language and unstructured
data.
– Use of Artificial Intelligence (e.g., Machine
Learning algorithms) to
sense, infer, predict, abduce, and, in some
ways, think.
Check engine light analogy
16. Research Challenges : 5V’s of Big Data
Volume
Velocity
Variety
Veracity
Value
Big Data => Smart Data
03/20/2014 Prasad 16
17. Volume : (1) Semantic Perception
Semantic Perception : Volume => Value
Distill voluminous machine-sensed data
into human comprehensible nuggets
necessary for decision-making using
background knowledge
03/20/2014 Prasad 17
22. Volume with a Twist
Resource-constrained reasoning on
mobile-devices
03/20/2014 Prasad 25
23. Cory Henson’s Thesis Statement
Machine perception can be
formalized using semantic web
technologies to derive abstractions
from sensor data using background
knowledge on the Web, and
efficiently executed on resource-
constrained devices.
03/20/2014 Prasad 26
24. * based on Neisser’s cognitive model of perception
Observe
Property
Perceive
Feature
Explanation
Discrimination
1
2
Perception Cycle* that exploits background knowledge / domain models
Abstracting raw data
for human
comprehension
Focus generation for
disambiguation and action
(incl. human in the loop)
Prior Knowledge
2703/20/2014 Prasad
25. O(n3) < x < O(n4) O(n)
Efficiency Improvement
• Problem size increased from 10’s to 1000’s of nodes
• Time reduced from minutes to milliseconds
• Complexity reduced from polynomial to linear
Evaluation on a mobile device
Prasad 35
26. 36
kHealth: Health Signal Processing Architecture
Take Medication before going to work Avoid going out in the evening due to
high pollen levels
Domain ExpertsDomain Knowledge
Risk Model
Data Acquisition &
aggregation
Analysis
Personalized
Actionable
Information
Personal level
Signals
Public level
Signals
Population level
Signals
Events from
Social Streams
Contact doctor
28. Variety
Syntactic and semantic heterogeneity
• in textual and sensor data,
• in social media and Web forums data
• In Electronic Medical Records
03/20/2014 Prasad 39
29. Variety (How?): (1) Granularity of Semantics & Applications
• Lightweight semantics: File and document-level
annotation to enable discovery and sharing
• Richer semantics: Data-level annotation and
extraction for semantic search and summarization
• Fine-grained semantics: Data
integration, interoperability and reasoning in
Linked Open Data
Cost-benefit trade-off and continuum
03/20/2014 Prasad 40
30. Variety (How?): (2) Hybrid KRR
Blending data-driven models with declarative
knowledge
– Data-gleaned models: Bottom-up, correlation-
based, statistical
– Expert-given KBs: Top-
down, causal/taxonomical, logical
– Refine structure to better estimate parameters
E.g., Medical Data Analytics using PGMs + KBs
03/20/2014 Prasad 42
31. Veracity
Scalable and Agile Big Data Analytics cannot
deliver value unless we have confidence and
trust in our data.
Open Problem:
Develop expressive frameworks for trust to
make explicit all aspects that go into trust
formation and inferences.
03/20/2014 Prasad 45
32. Veracity: Confession of sorts!
Trust is well-known,
but is not well-understood.
The utility of a notion testifies
not to its clarity but rather to the
philosophical importance of
clarifying it.
-- Nelson Goodman
(Fact, Fiction and Forecast, 1955)
03/20/2014 Prasad 46
33. (More on) Value
Discovering gaps and enriching domain models using
data
E.g., Semantics Driven Approach for Knowledge
Acquisition from EMRs
03/20/2014 Prasad 47
34. (More on) Value
Discovering drug-drug interaction by analyzing
search query logs
• E.g., The antidepressant, paroxetine, and the
cholesterol lowering drug, pravastatin, were
shown to interfere causing high blood sugar, by
correlated searches with “hyperglycemia”, “high
blood sugar” or “blurry vision”.
03/20/2014 Prasad 48
35. Conclusions
• Glimpse of our research organized around
the 5 V’s of Big Data
• Discussed role in harnessing Value
– Semantic Perception (Volume)
– Continuum of Semantic models to manage
Heterogeneity (Variety)
– Hybrid KRR: Probabilistic + Logical (Variety)
– Trust Models (Veracity)
03/20/2014 Prasad 49
36. thank you, and please visit us at
http://knoesis.org/
Department of Computer Science and Engineering
Wright State University, Dayton, Ohio, USA
Kno.e.sis: Ohio Center of Excellence in Knowledge-enabled Computing
Special Thanks to: Pramod Anantharam, Sujan Perera,
Dr. Cory Henson, Professor Amit Sheth
03/20/2014 Prasad 50
Hinweis der Redaktion
EVENT: Wright State Honors Institute Symposium “Visions of the Future” on Thursday, March 20, 2014. ABSTRACT:With the rapid proliferation of mobile phones, social media, and sensors, it is critical to collect and convert big data so generated into actionable information that is relevant for decision making. In this session, we explore challenges and approaches for synthesizing relevant background knowledge and inferences that can enable smart healthcare and ultimately benefit community at large.
EVENT: Wright State Honors Institute Symposium “Visions of the Future” on Thursday, March 20, 2014. ABSTRACT:With the rapid proliferation of mobile phones, social media, and sensors, it is critical to collect and convert big data so generated into actionable information that is relevant for decision making. In this session, we explore challenges and approaches for synthesizing relevant background knowledgeand inferences that can enable smart healthcare and ultimately benefit community at large.-----------------------Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications Big Data Research: Sensor, Social, and Cyber-Physical Systems-----Our research thro the lens of big data.
Statistics in terms of the number of people effected and costs involved Heterogeniety: Sensor data, social media data, text documents / forum posts, Semi-structured Electronic Medical RecordsIBM Vision: Machine-sensed data to human action by distilling the data into nuggets of actionable information and progressively improving decision making by learningNature of computational problems to be addressedOur technical work : Web 3.0
Population of US : 315 million GDP : $16 trillionObama legislation Affordable Care Act : Hospital will not be reimbursed by medicare/medicaid insurance if patient readmitted within 30 daysChronic condition – can we help reduce preventable readmissions?CHF: Congestive Heart Failure
Can we determine cause/potential triggers, predict asthma exacerbation to avoid, treat, or control symptoms.chronic obstructive pulmonary disease (COPD)
Awareness important because it impacts overall healthQuantified Self
Quality of life
Larry Smarr is a professor at the University of California, San DiegoAnd he diagnosed himself with Crohn’s DiseaseHe is a pioneer in the area of Quantified-Self, which uses sensors to monitor physiological symptomsThrough this self-tracking process he discovered inflammation, which led him to discovery of Crohn’sDisease
EMR: capture information exchanged during Doctor’s visit and tests data : disease/symptom/prescribed medications/suggested regimen(PHR: Personal Health Record)Social media engagement : self-reported data from public at large----------------------------Huge amount of raw data generated by continuous monitoring => (what we are lacking is) actionable nuggets of information for decision making (treatment/control/avoidance/change in lifestyle)-----------Quantified SelfMonitoring for disease diagnosis, severity, and progression-------Semantics-based approaches needed to deal with variety or to transcend abstraction levels--------
---------------------discovering “unexpected” correlations, and then seeking a transparent basis for them, seems worthy of pursuit. For instance, consider the controversies surrounding assertions such as ‘smoking causes cancer’, ‘high debt causes low growth’, ‘low growth causes high debt’, and ‘religious fanaticism breeds terrorists’.
Jeopardy : WATSON beat out (crème de la crème) human competitorsBig Data growth is accelerating as more of the world's activity is expressed digitally.Process and make sense of it, and enhance and extend the expertise of humans. -----------------http://www.forbes.com/sites/matthewherper/2014/03/19/what-watson-cant-tell-us-about-our-genes-yet/-----------------Check engine light signals/alerts : on detecting -> anomaly / problem => for further analysis / action--------
Size, rate of flow/accumulation and change, (syntactic and semantic) heterogeneity, trustworthiness/quality (signal to noise ratio), end-use (nuggets of wisdom)(develop techniques to harness data to derive value for decision making in the presence of these challenges)
What does semantic perception entail?Making sense of large amounts of low level data and communicating it in a meaningful waye.g. Ranges, aggregate/statistical measures ---------------------Semantic Perception: Converting Sensory Observations to Abstractions Using perception cycle and domain models: derive explanation, determine focus to disambiguate and discriminate for taking actionsHybrid reasoning: interleaved abductive and deductive components[**complex domain models reflecting comorbidities : high-fidelity models**] [**Gleaning Patterns from data**] [**Personalization**]
Saffir Simpson Hurricane Wind ScaleHurricane/Typoon/Cyclone(5 catergories) / Tropical storm / Tropical depression vs TsunamiNational Oceanic and Atmospheric Administration (NOAA)
---------------------------ParkinsonMild(person) = Tremor(person) ∧ PoorBalance(person)ParkinsonModerate(person) = MoveSlow(person) ∧ PoorSleep(person) ∧ MonotoneSpeech(person)ParkinsonAdvanced(person) = Fall(person)----------------------------Loss of speech / food intake impossible / lack of balance => is there value in continuous monitoring? => Signatures for proactive control?----------------------------Dataset Characteristics: 8 weeks of data from 5 sensors on a smart phone, collected for 12 patients resulting in ~150 GB (with lot of missing data).--------------------------Control group vs PD patients distinguished on the basis of restricted motion, monotone speech, etc.
Main idea: Prior knowledge of PD was used to facilitate its detection from massive sensor data by reducing the search spaceDetails:Declarative knowledge of PD includes PD severity and their symptoms as shown in the logical rule aboveEach PD severity level is a conjunction of a set of PD symptomsEach symptom was mapped to its manifestation in sensor observationsThe availability of declarative knowledge significantly improved the analytics by aiding feature selection processThe graphs above contrasts the physical movements and voice of two control group members and two PD patients
congestive heart failure / acute decompensated heart failure-- weight change due to water retention-------------------------------------------- cardiologist evaluate risk based on periodic monitoring data (+ human sensed health info inputs)--------------------------------------------Reduce preventable readmissions: 25% patients readmitted 30 day after discharge 50% patients readmitted within 6 months-------------------
EVIDENCE-BASED Approach to diagnosis, treatment and control (IRB)Environmental: CO, CO2, NO, pollen counts, mold, dust, smoke, humidity, temperature, pressure, etc. (sensordrone, dust –smoke sensor, air quality egg)Physiological: Wheezometer (breathing), heart rate, etc25 million people in the U.S. are diagnosed with asthma (7 million are children).300 million people suffering from asthma worldwide.Asthma related healthcare costs alone are around $50 billion a year.155,000 hospital admissions and 593,000 emergency department visits in 2006.
Volume: (1) semantic perception (2) parallelism
An Efficient Bit Vector Approach to Semantics-Based Machine Perception in Resource-Constrained Devices.Resources: memory, cpu, power, …Healthcare use-case – privacy, mobility, cheap onboard sensors, personalization, power, convenience-considerations dominateAbstracting and summarizing multimodal machine sensed observations + human observations for actionable and human accessible situational awareness and decision making---------Characteristics of a big data problem: size of the data exceeds the resources available/needed to compute
perception cycle contains interleaved iterative execution of two primary phasesExplanation (abductive)translating low-level signals into high-level abstractions inference to the best explanationDiscrimination (declarative)focusing attention on those properties that will help distinguish between multiple possible explanationsused to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)-----------------------Ask human relevant questions
perception cycle contains two primary phasesexplanationtranslating low-level signals into high-level abstractions inference to the best explanationdiscriminationfocusing attention on those properties that will help distinguish between multiple possible explanationsused to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
Observe units on x and y axis : small vs large problem size; small vs large amount time Step as opposed to linear which reflects allocation in quantum of 1 word (32 or 64 bits)---------Size of the graph is plotted in terms of number of nodes as we hold one of feature/property fixedOtherwise, the size of the graph is o(n^2)
Research on Asthma has three phases Data collection: what signals to collect?Analysis: what analysis to be done?Actionable information: what action to recommend?In the next slide, we take a peek into the analysis that we do for Asthma
Syntactic : different data formatsSemantic :Conceptual modelsSemantic : multimodal sensing + different conceptual models--------------Complementary and corroborative information => complete and reliable/robust;---------------------------“Semantics Empowered Web 3.0” book
Semantics at different levels of detail and developed in stages : ---------------------Ease of use by domain expertsFaster and wider adoption, promoting evolutionLow upfront cost to supportShallow semantics has wider applicability to a range of documents/data and appeal to a broader communityBottom-line: “Learn to Walk before we Run”------------------------------------------------------Controlled vocabularies <= Lightweight ontologies [ legacy vocab + community agreed semantic relationships] <= Formal ontologiesOriginal document vs its translation => traceability (provenance)---------Past Research: We have dealt with top-down UMLS ontology vs bottom-up facts from Pubmed in HPCO (Literature-based discovery -> LBD)-----------------------------RECALL: materials and process specs typically describe: composition, processing, testing, and packaging of materialFormalizing a procedure (a process or a test) as an aggregation of characteristic/parameter-value pairs = LOD Eventually allows combining and comparing specs==============================Biomaterials use case: Gold surface affinity of peptide sequence
Semantic Perception and Hybrid KRR => Event, disease, human comprehensible features … (e.g., Parkinson, Asthma)--------------Slow traffic vs reason for it (accident vs tree fall): semantics to data : sensors monitoring traffic space-----------Cardiology use case – how a patient is feeling – giddy, depressed, etc.
Idea : Glean statistical correlations from data (PGM) and enrich/validate it using symbolic knowledge (manually curated) orient undirected links, delete conflicting links, + complement nodes and links Explicit declarative knowledge obviates the need to generate it, especially in the context of sparse/skewed data PLUS it will be relaible------------Structure learning uncovers qualitative conditional dependencies integrate with declarative information using progressively expressive graphical models : same abstraction levelParameter learning using refined structure to estimate better fitting model
Taxonomic : relating and organizing terms : nomenclature
e.g., tides and ebbs caused by the alignment of earth, sun and moon, around full moon and new moon; “anomalous” orbits of Solar system planets w.r.t. the “circular” motion of stars in geocentric theory (‘planet’ is ‘wanderer’ in Greek) explained by heliocentrism and theory of gravitation, (Copernicus) correlation of time period and distance of planets (Kepler)and the “anomalous” precision of Mercury’s orbit clarified by General Theory of Relativity; (Einstein) C-peptide protein can be used to estimate insulin produced by a patient’s pancreas => ANOMALY (Copernicus) and REGULARITY (Kepler) => CAUSE (Newton)=> (Newtonian Mechanics) => (General Theory of Relativity)Bold claims all the time in politicsBeer vs diaper; Walmart’s hurricanes vspoptarts ---------------------(4) Stress/spicy foods are correlated with peptic ulcers, but the latter are caused by Helicobacter Pyrolias demonstrated by Nobel Prize winning works of Marshall and Warren.ORIENTATION UNCLEAR: ‘high debt causes low growth’, ‘low growth causes high debt’, ------------------(5) Since the 1950s, both the atmospheric Carbon Dioxide level and obesity levels have increased sharply. (6) Pavlovian learning induced conditional reflex, and some of the financial market moves, seem to be classic cases of correlation turning into causation! ---------PARADOXES : THE SEEDS OF PROGRESSZeno’s paradox, Hydrostatic paradox, light speed constant in all reference frames, CBR, Expanding universe, …
complementary and corroboratory
EMR
http://www.nytimes.com/2013/03/07/science/unreported-side-effects-of-drugs-found-using-internet-data-study-finds.html?_r=0They determined that people who searched for both drugs during the 12-month period were significantly more likely to search for terms related to hyperglycemia than were those who searched for just one of the drugs. They also found that people who did the searches for symptoms relating to both drugs were likely to do the searches in a short time period: 30 percent did the search on the same day, 40 percent during the same week and 50 percent during the same month.
Semantic Perception : Hybrid Abductive/Deductive Reasoning (Volume)Cost-benefit trade-off and Continuum of Semantic models to manage Heterogeneity (Variety)Hybrid Knowledge Representation and Reasoning : Probabilisitc + Logical : structure + parameter estimation (Variety)