Featured Keynote at Worldcomp'14, July 2014: http://www.world-academy-of-science.org/worldcomp14/ws/keynotes/keynote_sheth
Video of the talk at: http://youtu.be/2991W7OBLqU
Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses. Recently, there is rapid growth in situations where a big data challenge relates to making individually relevant decisions. A key example is human health, fitness, and well-being. Consider for instance, understanding the reasons for and avoiding an asthma attack based on Big Data in the form of personal health signals (e.g., physiological data measured by devices/sensors or Internet of Things around humans, on the humans, and inside/within the humans), public health signals (information coming from the healthcare system such as hospital admissions), and population health signals (such as Tweets by people related to asthma occurrences and allergens, Web services providing pollen and smog information, etc.). However, no individual has the ability to process all these data without the help of appropriate technology, and each human has different set of relevant data!
In this talk, I will forward the concept of Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If I am an asthma patient, for all the data relevant to me with the four V-challenges, what I care about is simply, “How is my current health, and what is the risk of having an asthma attack in my personal situation, especially if that risk has changed?” As I will show, Smart Data that gives such personalized and actionable information will need to utilize metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP.
For harnessing volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration. For Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships, using them to better understand new cues in the data that capture rapidly evolving events and situations.
Smart Data applications in development at Kno.e.sis come from the domains of personalized health, energy, disaster response, and smart city. I will present examples from a couple of these.
Smart Data for you and me: Personalized and Actionable Physical Cyber Social Big Data
1. Smart Data for you and me: Personalized and
Actionable Physical Cyber Social Big Data
Put Knoesis Banner
Keynote at WorldComp 2014, July 21, 2014
Amit Sheth
LexisNexis Ohio Eminent Scholar & Exec. Director,
The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)
Wright State, USA
3. Only 0.5% to 1% of
the data is used for
analysis.
3http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explode
http://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume
4. Variety – not just structure but modality: multimodal, multisensory
Semi structured
4
6. 6
What has changed now?
About 2 billion of the 5+ billion have data connections – so they perform “citizen sensing”.
And there are more devices connected to the Internet than the entire human population.
These ~2 billion citizen sensors and 10 billion devices & objects connected to the Internet
makes this an era of IoT (Internet of Things) and Internet of Everything (IoE).
http://www.cisco.com/web/about/ac79/docs/innov/IoT_IBSG_0411FINAL.pdf
7. 7
“The next wave of dramatic Internet growth will come through the confluence of
people, process, data, and things — the Internet of Everything (IoE).”
- CISCO IBSG, 2013
http://www.cisco.com/web/about/ac79/docs/innov/IoE_Economy.pdf
Beyond the IoE based infrastructure, it is the possibility of developing applications that spans
Physical, Cyber and the Social Worlds that is very exciting.
What has changed now?
8. 8
What has not changed?
We need computational paradigms to tap into the
rich pulse of the human populace, and utilize
diverse data
We are still working on the simpler representations of the real-
world!
Represent, capture, and compute with richer and fine-
grained representations of real-world problems
What should change?
9. 9
Current focus on Big Data is on meeting Enterprise/Company
needs.
Significant opportunity in applications for individual and
community needs. Many of these, esp. in complex domains
such as health, fitness and well-being; better disaster coordination,
personalized smart energy These need to exploit diverse data
types and sources: Physical(sensor/IoT), Cyber(Web) and Social
data.
Smart data –personalized, contextually relevant, actionable
information – provide a better computational paradigm.
My take on thinking beyond the Big Data buzz
10. • Not just data to information, not just analysis, but actionable
information, delivering insight and support better decision
making right in the context of human activities
10
What is needed?
Data Information
Actionable: An apple a day
keeps the doctor away
A blood test has ~30 bio markers…how will a doctor cope with a test with 300K data points?
11. 11
What is needed? Taking inspiration from cognitive models
• Bottom up and top down cognitive
processes:
– Bottom up: find patterns, mine (ML, …)
– Top down: Infusion of models and background
knowledge (data + knowledge + reasoning)
Left(plans)/Right(perceives) Brain
Top(plans)/Bottom(perceives) Brain
http://online.wsj.com/news/articles/SB10001424052702304410204579139423079198270
12. • Ambient processing as much as possible while enabling
natural human involvement to guide the system
12
What is needed?
Smart Refrigerator: Low on Apples
Adapting the Plan:
shopping for apples
13. Makes Sense to a human
Is actionable –
timely and better decisions/outcomes
13
14. 15
My 2004-2005 formulation of SMART DATA - Semagix
Formulation of Smart Data
strategy providing services
for Search, Explore, Notify.
“Use of Ontologies and
Data repositories to gain
relevant insights”
15. Smart Data (2014 retake)
Smart data makes sense out of Big data
It provides value from harnessing the
challenges posed by volume, velocity, variety
and veracity of big data, in-turn providing
actionable information and improve decision
making.
16
16. OF human, BY human FOR human
Smart data is about extracting value by
improving human involvement in data creation,
processing and consumption.
It is about (improving)
computing for human experience.
Another perspective on Smart Data
17
17. Petabytes of Physical(sensory)-Cyber-Social Data everyday!
More on PCS Computing: http://wiki.knoesis.org/index.php/PCS
18
‘OF human’ : Relevant Real-time Data Streams for Human Experience
18. Use of Prior Human-created Knowledge Models
19
‘BY human’: Involving Crowd Intelligence in data processing
Crowdsourcing and Domain-expert guided
Machine Learning Modeling
19. Detection of events, such as wheezing
sound, indoor temperature, humidity,
dust, and CO level
Weather Application
Asthma Healthcare
Application
Close the window at home
during day to avoid CO in
gush, to avoid asthma attacks
at night
20
‘FOR human’ : Improving Human Experience (Smart Health)
Population Level
Personal
Public Health
Action in the Physical World
Luminosity
CO level
CO in gush
during day time
20. Electricity usage over a day, device at
work, power consumption, cost/kWh,
heat index, relative humidity, and public
events from social stream
Weather Application
Power Monitoring
Application
21
‘FOR human’ : Improving Human Experience (Smart Energy)
Population Level Observations
Personal Level Observations
Action in the Physical World
Washing and drying has
resulted in significant cost
since it was done during peak
load period. Consider
changing this time to night.
21. 22
Every one and everything has Big Data –
It is Smart Data that matter!
24. PCS Computing
People live in the physical world while interacting with the cyber and
social worlds
Physical World Cyber World
Social World
25. 26
Computations leverage observations form
sensors, knowledge and experiences from
people to understand, correlate, and personalize
solutions.
Physical-
Cyber
Social-Cyber
Physical-Cyber-Social
What if?
26. Sensors around, on, and in humans will bridge the physical
and cyber world.
Cyber
Physical
We believe that current CPS should view the physical world
by incorporate solutions form (knowledge) cyber world
with a lens of social context.
There are silos of knowledge on the cyber
world which are under utilized.
Social
Social networks bridge the social interactions
in the physical and cyber world.
Mark’s discomfort sensed by:
galvanic skin response, heart rate, fitbit, and Microsoft Kinect
Physical Cyber Social Computing involves: (1) Comparing physiological observations from people similar to him (age, weight, lifestyle,
ethnicity, etc.) (2) Analyzing health experiences of similar people reporting heartburn (3) Incorporating history of ailments of Mark
(4) Leveraging medical domain knowledge of diseases and symptoms.
•He is advised to visit a doctor since he had a heart condition (from EMR) in the past and heartburns in similar people (social) was a
symptom of arterial blockage
Mark is experiencing heartburn.
Alert to contact his doctor.
Physical
Sensing
Actuating
Computing
Rich
knowledge of
the medical
domain
EMR and
PHR
Physiological
sensor data from
human population
Health related
experiences
shared by
humans
27
PCS Computing: Health Scenario
27. 28
Vertical operators facilitate
transcending from data-
information-knowledge-wisdom
using background knowledge
Horizontal operators facilitate semantic
integration of multimodal and multisensory
observations
PCS Computing
PCS computing is a holistic treatment of data, information, and knowledge
from physical, cyber, and social worlds to integrate, understand, correlate,
and provide contextually relevant abstractions to humans. Think of
PCS Computing as the application/semantic layer for
the IoE-based infrastructure.
http://wiki.knoesis.org/index.php/PCS
29. 30
What if we could automate this sense making ability?
… and do it efficiently and at scale
30. 31
Making sense of sensor data with
Henson et al An Ontological Approach to Focusing Attention and Enhancing Machine Perception on the Web, Applied Ont, 2011
31. 32
People are good at making sense of sensory input
What can we learn from cognitive models of perception?
The key ingredient is prior knowledge
32. * based on Neisser’s cognitive model of perception
Observe
Property
Perceive
Feature
Explanation
Discrimination
1
2
Translating low-level signals
into high-level knowledge
Focusing attention on those
aspects of the environment that
provide useful information
Prior Knowledge
33
Perception Cycle*
Convert large number of observations to semantic
abstractions that provide insights and translate into
decisions
33. 34
To enable machine perception,
Semantic Web technology is used to integrate
sensor data with prior knowledge on the Web
W3C SSN XG 2010-2011, SSN Ontology
37. Inference to the best explanation
• In general, explanation is an abductive problem;
and hard to compute
Finding the sweet spot between abduction and OWL
• Single-feature assumption* enables use of
OWL-DL deductive reasoner
* An explanation must be a single feature which accounts for
all observed properties
38
Explanation
Explanation is the act of choosing the objects or events that best account
for a set of observations; often referred to as hypothesis building
Representation of Parsimonious Covering Theory in OWL-DL
38. ExplanatoryFeature ≡ ∃ssn:isPropertyOf—.{p1} ⊓ … ⊓ ∃ssn:isPropertyOf—.{pn}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Observed Property Explanatory Feature
39
Explanation
Explanatory Feature: a feature that explains the set of observed
properties
43. Qualities
-High BP
-Increased Weight
Entities
-Hypertension
-Hypothyroidism
kHealth
Machine Sensors
Personal Input
EMR/PHR
Comorbidity risk
score e.g.,
Charlson Index
Longitudinal studies
of cardiovascular
risks
- Find risk factors
- Validation
- domain knowledge
- domain expert
Find contribution of
each risk factor
Risk Assessment Model
Current
Observations
-Physical
-Physiological
-History
Risk Score
(e.g., 1 => continue
3 => contact clinic)
Model CreationValidate correlations
Historical
observations e.g.,
EMR, sensor
observations
44
Risk Score: from Data to Abstraction and Actionable Information
44. Use of OWL reasoner is resource intensive
(especially on resource-constrained devices),
in terms of both memory and time
• Runs out of resources with prior knowledge >> 15 nodes
• Asymptotic complexity: O(n3)
45
How do we implement machine perception efficiently on a
resource-constrained device?
45. intelligence at the edge
Approach 1: Send all sensor
observations to the cloud for
processing
46
Approach 2: downscale semantic
processing so that each device is
capable of machine perception
47. O(n3) < x < O(n4) O(n)
Efficiency Improvement
• Problem size increased from 10’s to 1000’s of
nodes
• Time reduced from minutes to milliseconds
• Complexity growth reduced from polynomial to
linear
48
Evaluation on a mobile device
48. 2 Prior knowledge is the key to perception
Using SW technologies, machine perception can be
formalized and integrated with prior knowledge on the
Web
3 Intelligence at the edge
By downscaling semantic inference, machine
perception can execute efficiently on resource-constrained
devices
1 Translate low-level data to high-level knowledge
Machine perception can be used to convert low-level
sensory signals into high-level knowledge useful for
decision making
49
Semantic Perception for smarter analytics: 3 ideas to
takeaway
49. • Healthcare:
ADFH, Asthma, GI
– Using kHealth system
• Smart Cities:
Traffic management
50
I will use applications in 2 domains to demonstrate
• Social Media Analysis*:
Crisis coordination
Using Twitris platform
52. Through physical monitoring and
analysis, our cellphones could act as
an early warning system to detect
serious health conditions, and
provide actionable information
canary in a coal mine
Empowering Individuals (who are not Larry Smarr!) for their own health
kHealth: knowledge-enabled healthcare
53
55. Sensordrone
(Carbon monoxide,
temperature, humidity)
Node Sensor
(exhaled Nitric
Oxide)
56
Sensors
Android Device
(w/ kHealth App)
Total cost: ~ $500
kHealth Kit for the application for Asthma management
*Along with two sensors in the kit, the application uses a variety of population level signals from the web:
Pollen level Air Quality
Temperature & Humidity
56. what can we do to avoid asthma episode?
59
Real-time health signals from personal level (e.g., Wheezometer, NO in breath,
accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and
population level (e.g., pollen level, CO2) arriving continuously in fine grained
samples potentially with missing information and uneven sampling frequencies.
Variety Volume
VeracityVelocity
Value
What risk factors influence asthma control?
What is the contribution of each risk factor?
semantics
Understanding relationships between
health signals and asthma attacks
for providing actionable information
WHY Big Data to Smart Data: Asthma example
57. kHealth: Health Signal Processing Architecture
Personal level
Signals
Public level
Signals
Population level
Signals
Domain
Knowledge
Risk Model
Events from
Social Streams
Take Medication before
going to work
Avoid going out in the
evening due to high pollen
levels
Contact doctor
Analysis
Personalized
Actionable
Information
Data Acquisition &
aggregation
60
58. 61
Asthma Domain Knowledge
Domain
Knowledge
Asthma Control
à
Daily Medication
Choices for starting
therapy
Not Well Controlled Poor Controlled
Severity Level
of Asthma
(Recommended Action) (Recommended Action) (Recommended Action)
Intermittent Asthma SABA prn - -
Mild Persistent Asthma Low dose ICS Medium ICS Medium ICS
Moderate Persistent
Asthma
Medium dose ICS alone
Or with
LABA/montelukast
Medium ICS +
LABA/Montelukast
Or High dose ICS
Medium ICS +
LABA/Montelukast
Or High dose ICS*
Severe Persistent Asthma High dose ICS with
LABA/montelukast
Needs specialist care Needs specialist care
ICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ;
*consider referral to specialist
Asthma Control
and Actionable Information
59. 62
Patient Health Score (diagnostic)
Risk assessment
model
Semantic
Perception
Personal level
Signals
Public level
Signals
Domain
Knowledge
Population level
Signals
GREEN -- Well Controlled
YELLOW – Not well controlled
Red -- poor controlled
How controlled is my asthma?
60. 63
Patient Vulnerability Score (prognostic)
Risk assessment
model
Semantic
Perception
Personal level
Signals
Public level
Signals
Domain
Knowledge
Population level
Signals
Patient health
Score
How vulnerable* is my control level today?
*considering changing environmental conditions and current control level
61. 67
Sensordrone – for monitoring
environmental air quality
Wheezometer – for monitoring
wheezing sounds
Can I reduce my asthma attacks at night?
What are the triggers? What is the wheezing level?
What is the propensity toward asthma?
What is the exposure level over a day?
Commute to Work
Asthma: Actionable Information for Asthma Patients
Luminosity
CO level
CO in gush
during day time
Actionable
Information
Personal level
Signals
Public level
Signals
Population level
Signals
What is the air quality indoors?
62. 68
Population Level
Personal
Wheeze – Yes
Do you have tightness of chest? –Yes
ObservationsPhysical-Cyber-Social System Health Signal Extraction Health Signal Understanding
<Wheezing=Yes, time, location>
<ChectTightness=Yes, time, location>
<PollenLevel=Medium, time, location>
<Pollution=Yes, time, location>
<Activity=High, time, location>
Wheezing
ChectTightness
PollenLevel
Pollution
Activity
Wheezing
ChectTightness
PollenLevel
Pollution
Activity
RiskCategory
<PollenLevel, ChectTightness, Pollution,
Activity, Wheezing, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
.
.
.
Expert
Knowledge
Background
Knowledge
tweet reporting pollution level
and asthma attacks
Acceleration readings from
on-phone sensors
Sensor and personal
observations
Signals from personal, personal
spaces, and community spaces
Risk Category assigned by
doctors
Qualify
Quantify
Enrich
Outdoor pollen and pollution
Public Health
Health Signal Extraction to Understanding
Well Controlled - continue
Not Well Controlled – contact nurse
Poor Controlled – contact doctor
63. 73
RDF OWL
How are machines supposed to integrate and interpret sensor data?
Semantic Sensor Networks (SSN)
64. 74
W3C Semantic Sensor Network Ontology
Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K.,
Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).
65. 76
W3C Semantic Sensor Network Ontology
Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K.,
Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).
66. SSN
Ontology
2 Interpreted data
(deductive)
[in OWL]
e.g., threshold
1 Annotated Data
[in RDF]
e.g., label
0 Raw Data
[in TEXT]
e.g., number
Levels of Abstraction
3 Interpreted data
(abductive)
[in OWL]
e.g., diagnosis
Intellego
“150”
Systolic blood pressure of 150 mmHg
Elevated
Blood
Pressure
Hyperthyroidism
……
78
68. People are good at making sense of sensory input
What can we learn from cognitive models of perception?
• The key ingredient is prior knowledge
80
69. * based on Neisser’s cognitive model of perception
Observe
Property
Perceive
Feature
Explanation
Discrimination
1
2
Perception Cycle*
Translating low-level signals
into high-level knowledge
Focusing attention on those
aspects of the environment that
provide useful information
Prior Knowledge
81
70. To enable machine perception,
Semantic Web technology is used to integrate
sensor data with prior knowledge on the Web
82
71. Prior knowledge on the Web
W3C Semantic Sensor
Network (SSN) Ontology Bi-partite Graph
83
72. Prior knowledge on the Web
W3C Semantic Sensor
Network (SSN) Ontology Bi-partite Graph
84
74. Discrimination is the act of finding those properties that, if observed, would help distinguish
between multiple explanatory features
Observe
Property
Perceive
Feature
Explanation
Discrimination
2
Focusing attention on those
aspects of the environment that
provide useful information
Discrimination
86
76. Semantic scalability: Resource savings of abstracting sensor data
88
Orders of magnitude resource savings for generating and storing relevant
abstractions vs. raw observations.
Relevant abstractions
Raw observations
77. How do we implement machine perception efficiently on a
resource-constrained device?
Use of OWL reasoner is resource intensive
(especially on resource-constrained devices),
in terms of both memory and time
• Runs out of resources with prior knowledge >> 15 nodes
• Asymptotic complexity: O(n3)
89
78. intelligence at the edge
Approach 1: Send all sensor observations
to the cloud for processing
Approach 2: downscale semantic
processing so that each device is capable
of machine perception
90
Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices,
ISWC 2012.
79. Efficient execution of machine perception
Use bit vector encodings and their operations to encode prior knowledge and
execute semantic reasoning
010110001101
0011110010101
1000110110110
101100011010
0111100101011
000110101100
0110100111
91
80. O(n3) < x < O(n4) O(n)
Efficiency Improvement
• Problem size increased from 10’s to 1000’s of nodes
• Time reduced from minutes to milliseconds
• Complexity growth reduced from polynomial to linear
Evaluation on a mobile device
92
81. 2 Prior knowledge is the key to perception
Using SW technologies, machine perception can be formalized and
integrated with prior knowledge on the Web
3 Intelligence at the edge
By downscaling semantic inference, machine perception can
execute efficiently on resource-constrained devices
Semantic Perception for smarter analytics: 3 ideas to takeaway
1 Translate low-level data to high-level knowledge
Machine perception can be used to convert low-level sensory
signals into high-level knowledge useful for decision making
93
84. Vehicular traffic data from San Francisco Bay Area aggregated from on-road sensors
(numerical data/Physical), incident reports (textual/Cyber) and Tweets (Social)
97
http://511.org/
Every minute update of speed, volume, travel time, and occupancy resulting in
178 million link status observations, 8 million tweets, 738 active events, and
146 scheduled events with many unevenly sampled observations collected
over 3 months.
Variety Volume
VeracityVelocity
Value
Can we detect the onset of traffic congestion?
Can we characterize traffic congestion based on events?
Can we estimate traffic delays in a road network?
semantics
Representing prior knowledge of
traffic lead to a focused exploration
of this massive dataset
Big Data to Smart Data: Traffic Management example
87. City Event Annotation – CRF Annotation Examples
Last O night O in O CA... O (@ O Half B-LOCATION Moon I-LOCATION Bay B-LOCATION
Brewing I-LOCATION Company O w/ O 8 O others) O http://t.co/w0eGEJjApY O
B-LOCATION
I-LOCATION
B-EVENT
I-EVENT
O
Tags used in our approach:
These are the annotations provided
by a Conditional Random Field model
trained on tweet corpus to spot
city related events and location
BIO – Beginning, Intermediate, and Other is a notation used in multi-phrase entity spotting 100
88. Accident
Music
event
Sporting
event Road Work
Theatre event
External events
<ActiveEvents, ScheduledEvents>
Internal observations
<speed, volume, traveTime>
Weather
Time of Day
101
Modeling Traffic Events: Pictorial representation
90. Domain Experts
ColdWeather
PoorVisibility
SlowTraffic
IcyRoad
Declarative domain knowledge
Causal
knowledge
Linked Open Data
ColdWeather(YES/NO)IcyRoad (ON/OFF) PoorVisibility (YES/NO)SlowTraffic (YES/NO)
1 0 1 1
1 1 1 0
1 1 1 1
1 0 1 0
Domain Observations
Domain Knowledge
Structure and parameters
103
WinterSeason
Correlations to causations using
Declarative knowledge on the
Semantic Web
Combining Data and Knowledge Graph
91. Traffic jam
Link
Description
Scheduled
Event
traffic jambaseball
game
Add missing random variables
Time of day
bad weather CapableOf slow traffic
bad
weather
Traffic data from sensors deployed on
road network in San Francisco Bay Area
time of day
traffic jambaseball game
time of day
slow traffic
Three Operations: Complementing graphical model structure
extraction
Add missing links bad
weather
traffic jambaseball game
time of day
slow traffic
Add link direction
bad
weather
traffic jambaseball game
time of day
slow traffic
go to baseball game Causes traffic jam
Knowledge from ConceptNet5
traffic jam CapableOfoccur twice each day
traffic jam CapableOf slow traffic
104
92. City Infrastructure
Tweets from a city
POS
Tagging
Hybrid NER+
Event term
extraction
Geohashing
Temporal
Estimation
Impact
Assessment
Event
Aggregation
OSM
Locations
SCRIBE
ontology
511.org hierarchy
City Event Extraction
City Event Extraction Solution Architecture
City Event Annotation
OSM – Google Open Street Maps
NER – Named Entity Recognition 105
93. City Events from Sensor and Social Streams can be…
• Complementary
• Additional information
• e.g., slow traffic from sensor data and accident from textual data
• Corroborative
• Additional confidence
• e.g., accident event supporting a accident report from ground truth
• Timely
• Additional insight
• e.g., knowing poor visibility before formal report from ground truth
106
94. Evaluation – Extracted Events AND Ground Truth Verification
Complementary Events
Event Sources
City events extracted from tweets
511.org, Active events e.g., accidents, breakdowns
511.org, Scheduled events e.g., football game, parade
City event extracted from twitter reporting about traffic
complementing the road construction event reported on 511.org
95. Evaluation – Extracted Events AND Ground Truth Verification
Corroborative Events
Event Sources
City events extracted from tweets
511.org, Active events e.g., accidents, breakdowns
511.org, Scheduled events e.g., football game, parade
City event from twitter providing corroborative evidence for fog
reported by 511.org
96. Evaluation – Extracted Events AND Ground Truth Verification
Event Sources
City events extracted from tweets
511.org, Active events e.g., accidents, breakdowns
511.org, Scheduled events e.g., football game, parade
City event from twitter providing report of a tornado before an event
related to strong winds is reported by 511.org
Timeliness
97. Events from Social Streams and City Department*
Corroborative EventsComplementary Events
Event Sources
City events extracted from tweets
511.org, Active events e.g., accidents, breakdowns
511.org, Scheduled events e.g., football game, parade
City event from twitter providing complementary and
corroborative evidence for fog reported by 511.org
*511.org
110
98. 111
Actionable Information in City Management
Tweets from a CityTraffic Sensor Data
OSM
Locations
SCRIBE
ontology
511.org hierarchy
Web of Data
How issues in a city can be resolved?
e.g., what should I do when I have fog condition?
99. Two excellent videos
• Vinod Khosla: the Power of Storytelling and
the Future of Healthcare
• Larry Smarr: The Human Microbiome and the
Revolution in Digital Health
112
Wrapping up: For more on importance of what we talked about
100. • Big Data is every where
– at individual and community levels - not just
limited to corporation
– with growing complexity: Physical-Cyber-Social
• Analysis is not sufficient
• Bottom up techniques are not sufficient, need
top down processing, need background
knowledge
113
Wrapping up: Take Away
101. Wrapping up: Take Away
• Focus on Humans and Improve human life and
experience with SMART Data.
– Data to Information to Contextually Relevant
Abstractions (Semantic Perception)
– Actionable Information (Value from data) to assist
and support human in decision making.
• Focus on Value -- SMART Data
– Big Data Challenges without the intention of deriving
Value is a “Journey without GOAL”.
114
102. • Collaborators: Clinicians: Dr. William Abrahams (OSU-
Wexner), Dr. Shalini Forbis (Dayton Childrens), Dr.
Sangeeta Agrawal (VA), Valerie Shalin (WSU Cognitive
Scientists ), Payam Barnaghi (U-Surrey), Ramesh
Jain(UCI), …
• Funding: NSF (esp. IIS-1111183 “SoCS: Social Media
Enhanced Organizational Sensemaking in Emergency
Response,”), AFRL, NIH, Industry….
Acknowledgment
103. Amit Sheth’s
PHD students
Ashutos
h
Jadhav*
Hemant
Purohit
Vinh
Nguyen
Lu Chen
Pavan
Kapanipathi*
Pramod
Anantharam*
Sujan
Perera
Maryam Panahiazar
Sarasi Lalithsena
Shreyansh
Batt
Kalpa
Gunaratna
Delroy
Cameron
Sanjaya
Wijeratne
Wenbo
Wang
Special thanks: Ashu. This presentation covers some of the work of my PhD students.
Key contributors: Pramod Anantharam, Cory Henson and TK Prasad.
116
Special thanks
104. • Among top universities in the world in World Wide Web (cf: 10-yr impact,
Microsoft Academic Search: among top 10 in June2014)
• Among the largest academic groups in the US in Semantic Web + Social/Sensor
Webs, Mobile/Cloud/Cognitive Computing, Big Data, IoT, Health/Clinical &
Biomedicine Applications
• Exceptional student success: internships and jobs at top salary (IBM
Watson/Research, MSR, Amazon, CISCO, Oracle, Yahoo!, Samsung, research
universities, NLM, startups )
• 100 researchers including 15 World Class faculty (>3K citations/faculty avg) and
~45 PhD students- practically all funded
• Extensive research for largely multidisciplinary projects; world class resources;
industry sponsorships/collaborations (Google, IBM, …)
117
Top and bottom part of the brain -- http://online.wsj.com/news/articles/SB10001424052702304410204579139423079198270
Top part of the brain is known for generating plans
Bottom part of the brain deals with current situational awareness
Perception through senses happens in the primitive part of the brain (mostly subconsciously)
Machine perception allows us to transform low level sensor observations to higher level abstractions that are directly communicable to the upper part of the brain (non-subconscious)
Thus, people can understand/adapt their plan quickly with abstractions
The left brain here is generating plan of having an apple a day to make a healthy living
The right part of the brain identifies an apple through senses
Communicating the “abstraction” of less apples at home through “Ambient processing/intelligence”
The left/top part of the brain will adapt the plan to shopping for apple soon so that the overall plan of having an apple a day can be achieved
Smart data makes sense out of big data – it provides value from harnessing the challenges posed by volume, velocity, variety and veracity of big data, to provide actionable information and improve decision making.
- HUMAN CENTRIC!!
All the data related to human activity, existence and experiences
More on PCS Computing: http://wiki.knoesis.org/index.php/PCS
Information is CREATED by human with the Machinery available – Wikipedia tool, sensors and social networks
Information is STORED in Man+Machine readable format, LOD
Information is PROCESSED using the LOD and Human assisted Knowledge-based
Higher level abstraction on info is now consumed in many mechanistic ways (including GIS) to provide EXPERIENCE for humans
Example of a human guided modeling and improved performance
http://research.microsoft.com/en-us/um/people/akapoor/papers/IJCAI%202011a.pdf
- what if we could automate this sense making ability?
- and what if we could do this at scale?
sense making based on human cognitive models
perception cycle contains two primary phases
explanation
translating low-level signals into high-level abstractions
inference to the best explanation
discrimination
focusing attention on those properties that will help distinguish between multiple possible explanations
used to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
perception cycle contains two primary phases
explanation
translating low-level signals into high-level abstractions
inference to the best explanation
discrimination
focusing attention on those properties that will help distinguish between multiple possible explanations
used to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
A single-feature (disease) assumption means that all the observed properties (symptoms) must be explained by a single feature.
i.e., this framework is not expressive enough to model comorbidity where there may be more than one feature (disease) co-existing
For example, if there are two diseases causing disjoint symptoms, and all the symptoms of both the diseases are
observed, then this framework will not be able to find the coverage and returns no diseases.
- With this ability, many problems could be solved
- For example: we could help solve health problems (before they become serious health problems) through monitoring symptoms and real-time sense making, acting as an early warning system to detect problematic health conditions
ADHF – Acute Decompensated Heart Failure
1)www.pollen.com(For pollen levels)
2)http://www.airnow.gov/(For air quality levels)
3)http://www.weatherforyou.com/(For temperature and humidity)
Data overload in the context of asthma
“
sense making based on human cognitive models
perception cycle contains two primary phases
explanation
translating low-level signals into high-level abstractions
inference to the best explanation
discrimination
focusing attention on those properties that will help distinguish between multiple possible explanations
used to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
perception cycle contains two primary phases
explanation
translating low-level signals into high-level abstractions
inference to the best explanation
discrimination
focusing attention on those properties that will help distinguish between multiple possible explanations
used to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
perception cycle contains two primary phases
explanation
translating low-level signals into high-level abstractions
inference to the best explanation
discrimination
focusing attention on those properties that will help distinguish between multiple possible explanations
used to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
So check galvanic skin response sensor
Intelligence distributed at the edge of the network
Requires resource-constrained devices (mobile phones, gateway notes, etc.) to be able to utilize SW technologies