SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Semantics-empowered Big Data Processing for PCS Applications
Krishnaprasad Thirunarayan (T. K. Prasad) and Amit Sheth
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, OH-45435
Outline
• 5 V’s of Big Data Research
• Semantic Perception for Scalability
• Lightweight semantics to manage heterogeneity
– Cost-benefit trade-off and continuum
• Hybrid Knowledge Representation and Reasoning
– Anomaly, Correlation, Causation
211/15/2013 Prasad
5V’s of Big Data Research
Volume
Velocity
Variety
Veracity
Value
11/15/2013 Prasad 3
Big Data => Smart Data
Volume : Assorted Examples
Check engine light analogy
11/15/2013 Prasad 4
Volume : Semantic Perception
11/15/2013 Prasad 5
Weather Use Case
11/15/2013 Prasad 6
Parkinson’s Disease Use Case
11/15/2013 Prasad 7
Heart Failure Use Case
11/15/2013 Prasad 8
Asthma Use Case
11/15/2013 Prasad 9
Traffic Use Case
11/15/2013 Prasad 10
Slow moving
traffic
Link
Description
Scheduled
Event
Scheduled
Event
511.org
511.org
Schedule Information
511.org
Traffic Monitoring
11
Heterogeneity in a Physical-Cyber-Social System
Volume with a Twist
Resource-constrained reasoning on mobile-
devices
11/15/2013 Prasad 12
* based on Neisser’s cognitive model of perception
Observe
Property
Perceive
Feature
Explanation
Discrimination
1
2
Perception Cycle* that exploits background knowledge / domain models
Abstracting raw data
for human
comprehension
Focus generation for
disambiguation and action
(incl. human in the loop)
Prior Knowledge
13
Virtues of Our Approach to Semantic Perception
Blends simplicity, effectiveness, and scalability.
• Declarative specification of explanation and discrimination;
• With applications (e.g., to healthcare) that are of
contemporary relevance and interdisciplinary;
• Using encodings/algorithms that are significant (asymptotic
order of magnitude gain) and necessary (“tractable” due to
time/memory reduction for typical problem sizes); and
• Prototyped using extant PCs and mobile devices.
O(n3) < x < O(n4) O(n)
Efficiency Improvement
• Problem size increased from 10’s to 1000’s of nodes
• Time reduced from minutes to milliseconds
• Complexity growth reduced from polynomial to linear
Evaluation on a mobile device
15
Volume and Velocity
• Lightweight semantics-based Adaptive/Continuous
Filtering
Disaster response use-case
• Building domain models dynamically
11/15/2013 Prasad 16
Dynamic Model Creation
Continuous Semantics 17
Variety
Syntactic and semantic heterogeneity
• in textual and sensor data,
• in (legacy) materials data
• in (long tail) geosciences data
11/15/2013 Prasad 18
Variety (What?): Materials/Geosciences Use Case
• Structured Data (e.g., relational)
• Semi-structured, Heterogeneous Documents
(e.g., Publications and technical specs, which
usually include text, numerics, maps and images)
• Tabular data (e.g., ad hoc spreadsheets and
complex tables incorporating “irregular” entries)
1911/15/2013 Prasad
Variety (How?/Why?): Granularity of Semantics & Applications
• Lightweight semantics: File and document-level
annotation to enable discovery and sharing
• Richer semantics: Data-level annotation and
extraction for semantic search and summarization
• Fine-grained semantics: Data
integration, interoperability and reasoning in
Linked Open Data
Cost-benefit trade-off and continuum
20
Challenges Associated with Typical Spreadsheet/Table
• Meant for human consumption
• Irregular :
– Not simple rectangular grid
• Heterogeneous
– All rows not interpreted similarly
• Complex
– Meaning of each row and each column context
dependent
• Footnotes modify meaning of entries (esp. in materials
and process specifications)
2111/15/2013 Prasad
22
Practical Semi-Automatic Content Extraction
• DESIGN: Develop regular data structures that
can be used to formalize tabular information.
– Provide a natural expression of data
– Provide semantics to data, thereby removing potential
ambiguities
– Enable automatic translation
• USE: Manual population of regular tables and
automatic translation into LOD
2311/15/2013 Prasad
Variety (What?) : Sensor Data Use Case
Develop/learn domain models to exploit
complementary and corroborative
information
• To relate patterns in multimodal data to
“situation”
• To integrate machine sensed and human
sensed data
11/15/2013 Prasad 24
Variety: Hybrid KRR
Blending data-driven models with declarative
knowledge
– Data-driven: Bottom-up, correlation-
based, statistical
– Declarative: Top-
down, causal/taxonomical, logical
– Refine structure to better estimate parameters
E.g., Traffic Analytics using PGMs + KBs
11/15/2013 Prasad 25
Variety (Why?): Hybrid KRR
Data can help compensate for our overconfidence
in our own intuitions and reduce the extent to
which our desires distort our perceptions.
-- David Brooks of New York Times
However, inferred correlations require clear
justification that they are not coincidental, to
inspire confidence.
11/15/2013 Prasad 26
• Correlations due to common cause or origin
• Coincidental due to data skew or misrepresentation
• Coincidental new discovery
• Strong correlation vs causation
• Anomalous and accidental
• Correlation turning into causations
Correlations vs Causation vs Anomalies
11/15/2013 Prasad 27
• Correlations Due to common cause or origin
– E.g., Planets: Copernicus > Kepler > Newton > Einstein
• Coincidental due to data skew or misrepresentation
– E.g., Tall policy claims made by politicians!
• Coincidental new discovery
– E.g., Hurricanes and Strawberry Pop-Tarts Sales
• Strong correlation vs causation
– E.g., Spicy foods vs Helicobacter Pyroli : Stomach Ulcers
• Anomalous and accidental
– E.g., CO2 levels and Obesity
• Correlation turning into causations
– E.g., Pavlovian learning: conditional reflex
Correlations vs Causation vs Anomalies
11/15/2013 Prasad 28
• Correlations Due to common cause or origin
– E.g., Planets: Copernicus > Kepler > Newton > Einstein
• Coincidental due to data skew or misrepresentation
– E.g., Tall policy claims made by politicians!
• Coincidental new discovery
– E.g., Hurricanes and Strawberry Pop-Tarts Sales
• Strong correlation vs causation
– E.g., Spicy foods vs Helicobacter Pyroli : Stomach Ulcers
• Anomalous and accidental
– E.g., CO2 levels and Obesity
• Correlation turning into causations
– E.g., Pavlovian learning: conditional reflex
Correlations vs Causation vs Anomalies
11/15/2013 Prasad 29
Veracity
Lot of existing work on Trust ontologies, metrics and
models, and on Provenance tracking
• Homogeneous data: Statistical techniques
• Heterogeneous data: Semantic models
11/15/2013 Prasad 30
Veracity
Machine sensing: objective, quantitative,
but prone to environmental effects, battery life, …
Human sensing: subjective, qualitative,
but prone to bias, perceptual errors, rumors, …
Open problem: Improving trustworthiness by
combining machine sensing and human sensing
– E.g., 2002 Überlingen mid-air collision :Pilot incorrectly
using Traffic controller advice over electronic TCAS
system recommendation
11/15/2013 Prasad 31
(More on) Value
Learning domain models from “big data” for
prediction
E.g., Harnessing Twitter "Big Data" for Automatic
Emotion Identification
11/15/2013 Prasad 32
(More on) Value
Discovering gaps and enriching domain models
using data
E.g., Data driven knowledge acquisition method for
domain knowledge enrichment in the healthcare
11/15/2013 Prasad 33
Conclusions
• Glimpse of our research organized around
the 5 V’s of Big Data
• Discussed role in harnessing Value
– Semantic Perception (Volume)
– Continuum of Semantic models to manage
Heterogeneity (Variety)
– Hybrid KRR: Probabilistic + Logical (Variety)
– Continuous Semantics (Velocity)
– Trust Models (Veracity)
3411/15/2013 Prasad
35
thank you, and please visit us at
http://knoesis.org/
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio, USA
Kno.e.sis
11/15/2013 Prasad
Special Thanks to: Pramod Anantharam and Cory Henson

Weitere ähnliche Inhalte

Was ist angesagt?

Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Artificial Intelligence Institute at UofSC
 
Physical Cyber Social Computing
Physical Cyber Social ComputingPhysical Cyber Social Computing
Physical Cyber Social ComputingAmit Sheth
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
 
Physical Cyber Social Computing: An early 21st century approach to Computing ...
Physical Cyber Social Computing: An early 21st century approach to Computing ...Physical Cyber Social Computing: An early 21st century approach to Computing ...
Physical Cyber Social Computing: An early 21st century approach to Computing ...Amit Sheth
 
A Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionA Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionCory Andrew Henson
 
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Artificial Intelligence Institute at UofSC
 
Semantics-empowered Smart City applications: today and tomorrow
Semantics-empowered Smart City applications: today and tomorrowSemantics-empowered Smart City applications: today and tomorrow
Semantics-empowered Smart City applications: today and tomorrowAmit Sheth
 
What's up at Kno.e.sis?
What's up at Kno.e.sis? What's up at Kno.e.sis?
What's up at Kno.e.sis? Amit Sheth
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsAmit Sheth
 
Extracting City Traffic Events from Social Streams
 Extracting City Traffic Events from Social Streams Extracting City Traffic Events from Social Streams
Extracting City Traffic Events from Social StreamsPramod Anantharam
 
The Age of Big Data: A New Class of Economic Asset
The Age of Big Data: A New Class of Economic AssetThe Age of Big Data: A New Class of Economic Asset
The Age of Big Data: A New Class of Economic AssetChulalongkorn University
 
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Amit Sheth
 
ON EXPLOITING MULTIMODAL INFORMATION FOR MACHINE INTELLIGENCE AND NATURAL IN...
ON EXPLOITING MULTIMODAL INFORMATION FOR MACHINE INTELLIGENCE AND NATURAL IN...ON EXPLOITING MULTIMODAL INFORMATION FOR MACHINE INTELLIGENCE AND NATURAL IN...
ON EXPLOITING MULTIMODAL INFORMATION FOR MACHINE INTELLIGENCE AND NATURAL IN...Amit Sheth
 
Myths about data science and big data analytics
Myths about data science and big data analyticsMyths about data science and big data analytics
Myths about data science and big data analyticsChulalongkorn University
 
Big Data, AI, and Pharma
Big Data, AI, and PharmaBig Data, AI, and Pharma
Big Data, AI, and PharmaAmit Sheth
 
Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG DataPrasant Misra
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionFabio Stella
 
Digital Transformation: Big Data and Data Science Learning Path
Digital Transformation: Big Data and Data Science Learning PathDigital Transformation: Big Data and Data Science Learning Path
Digital Transformation: Big Data and Data Science Learning PathChulalongkorn University
 

Was ist angesagt? (20)

Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
 
Physical Cyber Social Computing
Physical Cyber Social ComputingPhysical Cyber Social Computing
Physical Cyber Social Computing
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
 
Physical Cyber Social Computing: An early 21st century approach to Computing ...
Physical Cyber Social Computing: An early 21st century approach to Computing ...Physical Cyber Social Computing: An early 21st century approach to Computing ...
Physical Cyber Social Computing: An early 21st century approach to Computing ...
 
A Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionA Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine Perception
 
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
 
Semantics-empowered Smart City applications: today and tomorrow
Semantics-empowered Smart City applications: today and tomorrowSemantics-empowered Smart City applications: today and tomorrow
Semantics-empowered Smart City applications: today and tomorrow
 
What's up at Kno.e.sis?
What's up at Kno.e.sis? What's up at Kno.e.sis?
What's up at Kno.e.sis?
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and Applications
 
Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual ObservationsUnderstanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
 
Web and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sisWeb and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sis
 
Extracting City Traffic Events from Social Streams
 Extracting City Traffic Events from Social Streams Extracting City Traffic Events from Social Streams
Extracting City Traffic Events from Social Streams
 
The Age of Big Data: A New Class of Economic Asset
The Age of Big Data: A New Class of Economic AssetThe Age of Big Data: A New Class of Economic Asset
The Age of Big Data: A New Class of Economic Asset
 
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
 
ON EXPLOITING MULTIMODAL INFORMATION FOR MACHINE INTELLIGENCE AND NATURAL IN...
ON EXPLOITING MULTIMODAL INFORMATION FOR MACHINE INTELLIGENCE AND NATURAL IN...ON EXPLOITING MULTIMODAL INFORMATION FOR MACHINE INTELLIGENCE AND NATURAL IN...
ON EXPLOITING MULTIMODAL INFORMATION FOR MACHINE INTELLIGENCE AND NATURAL IN...
 
Myths about data science and big data analytics
Myths about data science and big data analyticsMyths about data science and big data analytics
Myths about data science and big data analytics
 
Big Data, AI, and Pharma
Big Data, AI, and PharmaBig Data, AI, and Pharma
Big Data, AI, and Pharma
 
Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG Data
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - Introduction
 
Digital Transformation: Big Data and Data Science Learning Path
Digital Transformation: Big Data and Data Science Learning PathDigital Transformation: Big Data and Data Science Learning Path
Digital Transformation: Big Data and Data Science Learning Path
 

Ähnlich wie Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_PresentationYatpang Cheung
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
 
Umc floortje scheepers
Umc floortje scheepersUmc floortje scheepers
Umc floortje scheepersBigDataExpo
 
Characterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchCharacterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchMicah Altman
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018Susanna-Assunta Sansone
 
Data Discovery and Visualization
Data Discovery and VisualizationData Discovery and Visualization
Data Discovery and VisualizationDr. Neil Brittliff
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewPhilip Bourne
 
Novel Research Data Delivery System Using REDCap 20131211
Novel Research Data Delivery System Using REDCap 20131211Novel Research Data Delivery System Using REDCap 20131211
Novel Research Data Delivery System Using REDCap 20131211Travis H Nagler, MS, CPHIMS
 
Novel Research Data Delivery System Using REDCap 20131211
Novel Research Data Delivery System Using REDCap 20131211Novel Research Data Delivery System Using REDCap 20131211
Novel Research Data Delivery System Using REDCap 20131211Travis H Nagler, MS, CPHIMS
 
RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneResearch Data Alliance
 
UVA School of Data Science
UVA School of Data ScienceUVA School of Data Science
UVA School of Data SciencePhilip Bourne
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data SciencePhilip Bourne
 
Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...
Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...
Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...COST Action TD1210
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data SciencePhilip Bourne
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 

Ähnlich wie Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications (20)

Big Data Challenges and Trust Management at CTS -2016
Big Data Challenges and Trust Management at CTS -2016Big Data Challenges and Trust Management at CTS -2016
Big Data Challenges and Trust Management at CTS -2016
 
Big data healthcare
Big data healthcareBig data healthcare
Big data healthcare
 
Big dataprocessing cts2015
Big dataprocessing cts2015Big dataprocessing cts2015
Big dataprocessing cts2015
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...
 
Umc floortje scheepers
Umc floortje scheepersUmc floortje scheepers
Umc floortje scheepers
 
Characterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchCharacterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science Research
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018
 
Data Discovery and Visualization
Data Discovery and VisualizationData Discovery and Visualization
Data Discovery and Visualization
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's View
 
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
 
Novel Research Data Delivery System Using REDCap 20131211
Novel Research Data Delivery System Using REDCap 20131211Novel Research Data Delivery System Using REDCap 20131211
Novel Research Data Delivery System Using REDCap 20131211
 
Novel Research Data Delivery System Using REDCap 20131211
Novel Research Data Delivery System Using REDCap 20131211Novel Research Data Delivery System Using REDCap 20131211
Novel Research Data Delivery System Using REDCap 20131211
 
Data collection, Data Integration, Data Understanding e Data Cleaning & Prepa...
Data collection, Data Integration, Data Understanding e Data Cleaning & Prepa...Data collection, Data Integration, Data Understanding e Data Cleaning & Prepa...
Data collection, Data Integration, Data Understanding e Data Cleaning & Prepa...
 
RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOne
 
UVA School of Data Science
UVA School of Data ScienceUVA School of Data Science
UVA School of Data Science
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...
Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...
Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data Science
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 

Kürzlich hochgeladen

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Kürzlich hochgeladen (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

  • 1. Semantics-empowered Big Data Processing for PCS Applications Krishnaprasad Thirunarayan (T. K. Prasad) and Amit Sheth Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, OH-45435
  • 2. Outline • 5 V’s of Big Data Research • Semantic Perception for Scalability • Lightweight semantics to manage heterogeneity – Cost-benefit trade-off and continuum • Hybrid Knowledge Representation and Reasoning – Anomaly, Correlation, Causation 211/15/2013 Prasad
  • 3. 5V’s of Big Data Research Volume Velocity Variety Veracity Value 11/15/2013 Prasad 3 Big Data => Smart Data
  • 4. Volume : Assorted Examples Check engine light analogy 11/15/2013 Prasad 4
  • 5. Volume : Semantic Perception 11/15/2013 Prasad 5
  • 7. Parkinson’s Disease Use Case 11/15/2013 Prasad 7
  • 8. Heart Failure Use Case 11/15/2013 Prasad 8
  • 12. Volume with a Twist Resource-constrained reasoning on mobile- devices 11/15/2013 Prasad 12
  • 13. * based on Neisser’s cognitive model of perception Observe Property Perceive Feature Explanation Discrimination 1 2 Perception Cycle* that exploits background knowledge / domain models Abstracting raw data for human comprehension Focus generation for disambiguation and action (incl. human in the loop) Prior Knowledge 13
  • 14. Virtues of Our Approach to Semantic Perception Blends simplicity, effectiveness, and scalability. • Declarative specification of explanation and discrimination; • With applications (e.g., to healthcare) that are of contemporary relevance and interdisciplinary; • Using encodings/algorithms that are significant (asymptotic order of magnitude gain) and necessary (“tractable” due to time/memory reduction for typical problem sizes); and • Prototyped using extant PCs and mobile devices.
  • 15. O(n3) < x < O(n4) O(n) Efficiency Improvement • Problem size increased from 10’s to 1000’s of nodes • Time reduced from minutes to milliseconds • Complexity growth reduced from polynomial to linear Evaluation on a mobile device 15
  • 16. Volume and Velocity • Lightweight semantics-based Adaptive/Continuous Filtering Disaster response use-case • Building domain models dynamically 11/15/2013 Prasad 16
  • 18. Variety Syntactic and semantic heterogeneity • in textual and sensor data, • in (legacy) materials data • in (long tail) geosciences data 11/15/2013 Prasad 18
  • 19. Variety (What?): Materials/Geosciences Use Case • Structured Data (e.g., relational) • Semi-structured, Heterogeneous Documents (e.g., Publications and technical specs, which usually include text, numerics, maps and images) • Tabular data (e.g., ad hoc spreadsheets and complex tables incorporating “irregular” entries) 1911/15/2013 Prasad
  • 20. Variety (How?/Why?): Granularity of Semantics & Applications • Lightweight semantics: File and document-level annotation to enable discovery and sharing • Richer semantics: Data-level annotation and extraction for semantic search and summarization • Fine-grained semantics: Data integration, interoperability and reasoning in Linked Open Data Cost-benefit trade-off and continuum 20
  • 21. Challenges Associated with Typical Spreadsheet/Table • Meant for human consumption • Irregular : – Not simple rectangular grid • Heterogeneous – All rows not interpreted similarly • Complex – Meaning of each row and each column context dependent • Footnotes modify meaning of entries (esp. in materials and process specifications) 2111/15/2013 Prasad
  • 22. 22
  • 23. Practical Semi-Automatic Content Extraction • DESIGN: Develop regular data structures that can be used to formalize tabular information. – Provide a natural expression of data – Provide semantics to data, thereby removing potential ambiguities – Enable automatic translation • USE: Manual population of regular tables and automatic translation into LOD 2311/15/2013 Prasad
  • 24. Variety (What?) : Sensor Data Use Case Develop/learn domain models to exploit complementary and corroborative information • To relate patterns in multimodal data to “situation” • To integrate machine sensed and human sensed data 11/15/2013 Prasad 24
  • 25. Variety: Hybrid KRR Blending data-driven models with declarative knowledge – Data-driven: Bottom-up, correlation- based, statistical – Declarative: Top- down, causal/taxonomical, logical – Refine structure to better estimate parameters E.g., Traffic Analytics using PGMs + KBs 11/15/2013 Prasad 25
  • 26. Variety (Why?): Hybrid KRR Data can help compensate for our overconfidence in our own intuitions and reduce the extent to which our desires distort our perceptions. -- David Brooks of New York Times However, inferred correlations require clear justification that they are not coincidental, to inspire confidence. 11/15/2013 Prasad 26
  • 27. • Correlations due to common cause or origin • Coincidental due to data skew or misrepresentation • Coincidental new discovery • Strong correlation vs causation • Anomalous and accidental • Correlation turning into causations Correlations vs Causation vs Anomalies 11/15/2013 Prasad 27
  • 28. • Correlations Due to common cause or origin – E.g., Planets: Copernicus > Kepler > Newton > Einstein • Coincidental due to data skew or misrepresentation – E.g., Tall policy claims made by politicians! • Coincidental new discovery – E.g., Hurricanes and Strawberry Pop-Tarts Sales • Strong correlation vs causation – E.g., Spicy foods vs Helicobacter Pyroli : Stomach Ulcers • Anomalous and accidental – E.g., CO2 levels and Obesity • Correlation turning into causations – E.g., Pavlovian learning: conditional reflex Correlations vs Causation vs Anomalies 11/15/2013 Prasad 28
  • 29. • Correlations Due to common cause or origin – E.g., Planets: Copernicus > Kepler > Newton > Einstein • Coincidental due to data skew or misrepresentation – E.g., Tall policy claims made by politicians! • Coincidental new discovery – E.g., Hurricanes and Strawberry Pop-Tarts Sales • Strong correlation vs causation – E.g., Spicy foods vs Helicobacter Pyroli : Stomach Ulcers • Anomalous and accidental – E.g., CO2 levels and Obesity • Correlation turning into causations – E.g., Pavlovian learning: conditional reflex Correlations vs Causation vs Anomalies 11/15/2013 Prasad 29
  • 30. Veracity Lot of existing work on Trust ontologies, metrics and models, and on Provenance tracking • Homogeneous data: Statistical techniques • Heterogeneous data: Semantic models 11/15/2013 Prasad 30
  • 31. Veracity Machine sensing: objective, quantitative, but prone to environmental effects, battery life, … Human sensing: subjective, qualitative, but prone to bias, perceptual errors, rumors, … Open problem: Improving trustworthiness by combining machine sensing and human sensing – E.g., 2002 Überlingen mid-air collision :Pilot incorrectly using Traffic controller advice over electronic TCAS system recommendation 11/15/2013 Prasad 31
  • 32. (More on) Value Learning domain models from “big data” for prediction E.g., Harnessing Twitter "Big Data" for Automatic Emotion Identification 11/15/2013 Prasad 32
  • 33. (More on) Value Discovering gaps and enriching domain models using data E.g., Data driven knowledge acquisition method for domain knowledge enrichment in the healthcare 11/15/2013 Prasad 33
  • 34. Conclusions • Glimpse of our research organized around the 5 V’s of Big Data • Discussed role in harnessing Value – Semantic Perception (Volume) – Continuum of Semantic models to manage Heterogeneity (Variety) – Hybrid KRR: Probabilistic + Logical (Variety) – Continuous Semantics (Velocity) – Trust Models (Veracity) 3411/15/2013 Prasad
  • 35. 35 thank you, and please visit us at http://knoesis.org/ Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA Kno.e.sis 11/15/2013 Prasad Special Thanks to: Pramod Anantharam and Cory Henson

Hinweis der Redaktion

  1. Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications Big Data Research: Sensor, Social, and Cyber-Physical Systems
  2. Relevance of our research to Big Data and PCS applications by organizing it around the 5 V’sRole in overcoming … challenge Volume Variety Both by combining probabilistic and logical knowledge
  3. Size, rate of flow/accumulation and change, (syntactic and semantic) heterogeneity, trustworthiness, end-use(develop techniques to harness data to derive value in the presence of these challenges)
  4. Huge amount of raw data generated by continuous monitoring =&gt; actionable nuggets for decision makingMJFoxFoundation Parkinson disease challenge : Diagnosis and progression-------Embarrassingly parallel computations (Map-Reduce programming model) can be implemented on distributed fault-tolerant architectures/systems (HDFS + Hadoop) Using redundant storage and computations  answer for homogeneous data------Semantics-based approaches needed to deal with variety or to transcend abstraction levels--------Check engine light signals/alerts : on detecting -&gt; anomaly / problem =&gt; for further analysis / action--------
  5. What does semantic perception entail?Making sense of large amounts of low level data and communicating it in a meaningful waye.g. Ranges, aggregate/statistical measures GOAL: “Buzz it up!”---------------------Semantic Perception: Converting Sensory Observations to Abstractions Using perception cycle and domain models: derive explanation, determine focus to disambiguate and discriminate for taking actionsHybrid reasoning: interleaved abductive and deductive components[**complex domain models reflecting comorbidities : high-fidelity models**] [**Gleaning Patterns from data**] [**Personalization**]
  6. Saffir Simpson Hurricane Wind ScaleHurricane/Typoon/Cyclone(5 catergories) / Tropical storm / Tropical depression vs Tsunami
  7. ---------------------------ParkinsonMild(person) = Tremor(person) ∧ PoorBalance(person)ParkinsonModerate(person) = MoveSlow(person) ∧ PoorSleep(person) ∧ MonotoneSpeech(person)ParkinsonAdvanced(person) = Fall(person)----------------------------Loss of speech / food intake impossible / lack of balance =&gt; is there value in continuous monitoring? =&gt; Signatures for proactive control?----------------------------Dataset Characteristics: 8 weeks of data from 5 sensors on a smart phone, collected for 16 patients resulting in ~12 GB (with lot of missing data).
  8. cardiologist evaluates the risk based on periodic monitoring data (+ human sensed health info inputs)--------------------------------------------Reduce preventable readmissions: 25% patients readmitted 30 day after discharge 50% patients readmitted after 60mo
  9. EVIDENCE-BASED Approach to diagnosis, treatment and controlEnvironmental: CO, CO2, NO, pollen counts, mold, dust, smoke, etc.Physiological: Wheezometer (breathing), heart rate, etc25 million people in the U.S. are diagnosed with asthma (7 million are children)1.300 million people suffering from asthma worldwide2.Asthma related healthcare costs alone are around $50 billion a year2.155,000 hospital admissions and 593,000 emergency department visits in 20063.
  10. Current predictions and long-term planning
  11. Point of this slide: correlations
  12. An Efficient Bit Vector Approach to Semantics-Based Machine Perception in Resource-Constrained Devices.Resources: memory, cpu, power, …Healthcare use-case – privacy, mobility, cheap onboard sensors, personalization, power, convenience-considerations dominateAbstracting and summarizing multimodal machine sensed observations + human observations for actionable and human accessible situational awareness and decision making---------Characteristics of a big data problem
  13. perception cycle contains interleaved iterative execution of two primary phasesExplanation (abductive)translating low-level signals into high-level abstractions inference to the best explanationDiscrimination (declarative)focusing attention on those properties that will help distinguish between multiple possible explanationsused to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)-----------------------Ask human relevant questions
  14. Solving information overload problem – improving relevance (both recall and precision) E.g., in the context of important/unfolding events, disaster scenarios, … learn to rank and select relevant hashtags for improved crawling and filtering-------------Use keywords to carve out a relevant model of the domain for scalable and more focused information crawling, disambiguation and extraction in the face of rapidly unfolding event------------Leveraging Semantics for Detection of Event-Descriptors on Twitter
  15. Use seed keywords and tweets to carve out a relevant model from Wikipedia pages : DoozerTrack dynamically unfolding events
  16. Syntactic : different data formatsSemantic :Conceptual modelsSemantic : multimodal sensing + different conceptual models--------------Complementary and corroborative information =&gt; complete and reliable/robust;---------------------------“Semantics Empowered Web 3.0” book
  17. Variery challenge: Sources of heterogeneity (Addl:UOM, table captions)Use text-basedmetadata to help mediate
  18. Semantics at different levels of detail and developed in stages : ---------------------Ease of use by domain expertsFaster and wider adoption, promoting evolutionLow upfront cost to supportShallow semantics has wider applicability to a range of documents/data and appeal to a broader communityBottom-line: “Learn to Walk before we Run”------------------------------------------------------Controlled vocabularies &lt;= Lightweight ontologies [ legacy vocab + community agreed semantic relationships] &lt;= Formal ontologiesOriginal document vs its translation =&gt; traceability (provenance)---------Past Research: We have dealt with top-down UMLS ontology vs bottom-up facts from Pubmed in HPCO (Literature-based discovery -&gt; LBD)-----------------------------RECALL: materials and process specs typically describe: composition, processing, testing, and packaging of materialFormalizing a procedure (a process or a test) as an aggregation of characteristic/parameter-value pairs = LOD  Eventually allows combining and comparing specs==============================Biomaterials use case: Gold surface affinity of peptide sequence
  19. Use case: Materials and Process specsCompact structures for sharing information : Minimize duplication
  20. AMS 4928Nhttp://www.youtube.com/watch?v=D8U4G5kcpcMhttp://www.ndt-ed.org/EducationResources/CommunityCollege/Materials/Mechanical/Mechanical.htmMost structural materials are anisotropic, which means that their material properties vary with orientation.In products such as sheet and plate, the rolling direction is called the longitudinal direction, the width of the product is called the (long) transverse direction, and the thickness is called the short transverse direction.
  21. In content extraction from tables, a human extractor formalizes the data using “predefined” tables, and a wizard then generates LOD from it.Human Extractor is responsible for gleaning the semantics (manual part)Wizard responsible for the mechanical translation (automatic part)==================The yardstick of success is the extent to which regular parts of the table can be automatically assimilated and translated, while leaving more complex parts for manual guidance.
  22. Event, disease, human comprehensible features …--------------Slow traffic vs reason for it (accident vs tree fall): semantics to data : sensors monitoring traffic space-----------Cardiology use case – how a patient is feeling – giddy, depressed, etc.
  23. Idea : Glean statistical correlations from data (PGM) and enrich/validate it using symbolic knowledge (manually curated) orient undirected links, delete conflicting links, + complement nodes and links Explicit declarative knowledge obviates the need to generate it, especially in the context of sparse/skewed data PLUS it will be relaible------------Structure learning uncovers qualitative conditional dependencies integrate with declarative information using progressively expressive graphical models : same abstraction levelParameter learning using refined structure to estimate better fitting model
  24. ---------------------discovering “unexpected” correlations, and then seeking a transparent basis for them, seems worthy of pursuit. For instance, consider the controversies surrounding assertions such as ‘smoking causes cancer’, ‘high debt causes low growth’, ‘low growth causes high debt’, and ‘religious fanaticism breeds terrorists’.
  25. e.g., tides and ebbs caused by the alignment of earth, sun and moon, around full moon and new moon; “anomalous” orbits of Solar system planets w.r.t. the “circular” motion of stars in geocentric theory (‘planet’ is ‘wanderer’ in Greek) explained by heliocentrism and theory of gravitation, (Copernicus) correlation of time period and distance of planets (Kepler)and the “anomalous” precision of Mercury’s orbit clarified by General Theory of Relativity; (Einstein) C-peptide protein can be used to estimate insulin produced by a patient’s pancreas =&gt; ANOMALY (Copernicus) and REGULARITY (Kepler) =&gt; CAUSE (Newton)=&gt; (Newtonian Mechanics) =&gt; (General Theory of Relativity)Bold claims all the time in politicsBeer vs diaper; Walmart’s hurricanes vspoptarts ---------------------(4) Stress/spicy foods are correlated with peptic ulcers, but the latter are caused by Helicobacter Pyrolias demonstrated by Nobel Prize winning works of Marshall and Warren.ORIENTATION UNCLEAR: ‘high debt causes low growth’, ‘low growth causes high debt’, ------------------(5) Since the 1950s, both the atmospheric Carbon Dioxide level and obesity levels have increased sharply. (6) Pavlovian learning induced conditional reflex, and some of the financial market moves, seem to be classic cases of correlation turning into causation! ---------PARADOXES : THE SEEDS OF PROGRESSZeno’s paradox, Hydrostatic paradox, light speed constant in all reference frames, CBR, Expanding universe, …
  26. e.g., tides and ebbs caused by the alignment of earth, sun and moon, around full moon and new moon; “anomalous” orbits of Solar system planets w.r.t. the “circular” motion of stars in geocentric theory (‘planet’ is ‘wanderer’ in Greek) explained by heliocentrism and theory of gravitation, (Copernicus) correlation of time period and distance of planets (Kepler)and the “anomalous” precision of Mercury’s orbit clarified by General Theory of Relativity; (Einstein) C-peptide protein can be used to estimate insulin produced by a patient’s pancreas =&gt; ANOMALY (Copernicus) and REGULARITY (Kepler) =&gt; CAUSE (Newton)=&gt; (Newtonian Mechanics) =&gt; (General Theory of Relativity)Bold claims all the time in politicsBeer vs diaper; Walmart’s hurricanes vspoptarts ---------------------(4) Stress/spicy foods are correlated with peptic ulcers, but the latter are caused by Helicobacter Pyrolias demonstrated by Nobel Prize winning works of Marshall and Warren.ORIENTATION UNCLEAR: ‘high debt causes low growth’, ‘low growth causes high debt’, ------------------(5) Since the 1950s, both the atmospheric Carbon Dioxide level and obesity levels have increased sharply. (6) Pavlovian learning induced conditional reflex, and some of the financial market moves, seem to be classic cases of correlation turning into causation! ---------PARADOXES : THE SEEDS OF PROGRESSZeno’s paradox, Hydrostatic paradox, light speed constant in all reference frames, CBR, Expanding universe, …
  27. e.g., tides and ebbs caused by the alignment of earth, sun and moon, around full moon and new moon; “anomalous” orbits of Solar system planets w.r.t. the “circular” motion of stars in geocentric theory (‘planet’ is ‘wanderer’ in Greek) explained by heliocentrism and theory of gravitation, (Copernicus) correlation of time period and distance of planets (Kepler)and the “anomalous” precision of Mercury’s orbit clarified by General Theory of Relativity; (Einstein) C-peptide protein can be used to estimate insulin produced by a patient’s pancreas =&gt; ANOMALY (Copernicus) and REGULARITY (Kepler) =&gt; CAUSE (Newton)=&gt; (Newtonian Mechanics) =&gt; (General Theory of Relativity)Bold claims all the time in politicsBeer vs diaper; Walmart’s hurricanes vspoptarts ---------------------(4) Stress/spicy foods are correlated with peptic ulcers, but the latter are caused by Helicobacter Pyrolias demonstrated by Nobel Prize winning works of Marshall and Warren.ORIENTATION UNCLEAR: ‘high debt causes low growth’, ‘low growth causes high debt’, ------------------(5) Since the 1950s, both the atmospheric Carbon Dioxide level and obesity levels have increased sharply. (6) Pavlovian learning induced conditional reflex, and some of the financial market moves, seem to be classic cases of correlation turning into causation! ---------PARADOXES : THE SEEDS OF PROGRESSZeno’s paradox, Hydrostatic paradox, light speed constant in all reference frames, CBR, Expanding universe, …
  28. Different forms of trust; What features contribute to trust; how do we combine trust; Trust propagation: aggregation and chaining;Application-specific basis / AxiomaticEcommerce examples : risk tolerance (propensity to trust) + trustworthiness = trust
  29. complementary and corroboratory
  30. Biggest hurdle in ML : Significant training datasetTraining bigdata: tweets with emotion hashtags (provided by the tweet creator)Learn domain model to associate emotion hashtags with tweet content Glean/predict emotions from “untagged” tweets using this model
  31. EMR
  32. Semantic Perception : Hybrid Abductive/Deductive Reasoning (Volume)Cost-benefit trade-off and Continuum of Semantic models to manage Heterogeneity (Variety)Hybrid Knowledge Representation and Reasoning : Probabilisitc + Logical : structure + parameter estimation (Variety)