1. Challenges in Analytics for BIGData
Dr. Prasant Misra
W: https://sites.google.com/site/prasantmisra
Disclaimer:
The opinions expressed in this presentation and on the following slides are solely those
of the presenter and not necessarily those of the organization that he works for.
2. A simple narrative to BIG DATA
8/26/2016 2
DATA whose characteristics exceeds the capabilities of conventional
algorithms, systems and techniques to derive useful value is considered BIG
datascience.berkeley.edu
The term if very fuzzy and means different things to different groups of people ….
9. 8/26/2016 9
Local Search - II
Context Service Example
Current
Location
Local business and
directions
+
Time Tracks
Businesses in
driving direction
10. 8/26/2016 10
Local Search - III
Context Service Example
Current
Location
Local business and
directions
+
Time Tracks
Businesses in
driving direction
+
History
Personalized
directions
Take 520 East
11. 8/26/2016 11
Local Search - IV
Context Service Example
Current
Location
Local business and
directions
+
Time Tracks
Businesses in
driving direction
+
History
Personalized
directions
+
Community
Tourist
recommendation
35% people pick
the scenic route
12. 8/26/2016 12
Local Search - V
Alert: Bad
Traffic
Consider
Alternate
route
Context Service Example
Current
Location
Local business and
directions
Tracks
Businesses in
driving direction
+
History
Personalized
directions
+
Community
Tourist
recommendation
+
Push
alerts, triggers,
reminders
BIG Data for Location Analytics …
13. 8/26/2016 13
Analytics: Span across Verticals & Horizontals
Depending on the type and quality of analytics, system could manifest themselves into:
User-centric Systems — Systems That Know/Aware
Adaptive Systems — Systems That Learn
Cognitive Systems — Systems That Reason
E
N
E
R
G
Y
W
A
T
E
R
R
E
T
A
I
L
T
E
L
C
O
M
H
E
A
L
T
H
Time, Location Management
Sensor, Device Management
Network Management
Cloud Infra Management
Customer Management
17. Value
8/26/2016 17
Hindsight and Insight/
Insights into the PAST
Foresight/
Insights into the FUTURE
Skill
Descriptive
“WHAT has
happened ? ”
Diagnostic
“WHY did this
happen ?”
Prescriptive
“WHAT should
we do ?”
Predictive
“WHAT could
happen ? ”
Information Optimization
Analytics : Category
DASHBOARD
FORECAST ACTIONS,
RULES,
RECOMMs
18. Example: Energy Analytics for a PV Microgrid
8/26/2016 18
Descriptive: What is the total energy, instantaneous energy and power, etc., …?
Diagnostic: Why is the panel temperature decreasing when the solar irradiance is high and the wind
speed is very low ?
Predictive: Can I forecast the plant output for tomorrow, or can I generate 4kWh net energy ?
Predictive : What actions should be undertaken for the plant to reach 4kW energy generation capacity
from its current 2 kW ?
19. 8/26/2016 19
Analytics : Methodology
Reason and Plan with Uncertain Knowledge
Quantify uncertainty & Probabilistic reasoning: Bayesian networks, Conditional distributions
Probabilistic reasoning over time:
Hidden Markov models, Kalman filters, Dynamic Bayesian networks
Simple decisions: Utility theory, Decision networks
Complex decisions: Partial observable Markov Decision Process (MDP), Game theoretic
models
Planning graphs
Learning and Data Mining:
[Supervised | Semi-supervised | Unsupervised | Reinforcement] learning – Classification,
Clustering
Different type of ANN | Deep Learning Networks | Support Vector Machines
21. Data to Knowledge Pipeline
8/26/2016 21
Cyber & Physical Space Entities
Edge
Global Infra
Data Ingestion
Data Analysis
Applications
Data source
“Big” data Infra
“Little” data Infra
Decision making
with Knowledge
DATA @ REST (VOLUME)
Archival/Static data (TBs) in Data stores
DATA @ MOTION (VELOCITY)
Streaming data
DATA @ MANY FORMS (VARIETY)
Structured/Unstructured, Text, Multimedia, Audio, Video
DATA @ DOUBT (VERACITY)
Data with uncertainty that may be due to
incompleteness, missing points, etc.,
NATURE of INGESTED DATA
COGNITIVE
Learn dynamically ?
PRESCRIPTIVE
What are the best outcomes ?
PREDICTIVE
What could happen ?
DESCRIPTIVE
What has happened ?
DISCOVERY
What do we have ?
NATURE of ANALYSIS
22. A first list of challenges derived from the V’s
8/26/2016 22
Volume:
How much data is really relevant to the problem solution & what is the cost of processing ?
Can you really afford to store and process all that data ?
Velocity
A lot of data is coming in at high speed
Need for streaming versus block approach to data analysis
How to analyze data in-flight and combine with data at-rest
Variety:
A small fraction is in structured formats (e.g., relational, XML, etc.)
A fair amount is semi-structured (e.g., web logs, etc.)
The rest of the data is unstructured (e.g., text, photographs, etc.)
No single data model can currently handle the diversity
Veracity:
Cover term for: Accuracy, Precision, Reliability, Integrity
What is it that you don’t know about the data ?
23. Top Challenges
8/26/2016 23
Data acquisition
Is raw data of interest in totality ?
Challenge:
design efficient filters and compression techniques in a manner that does not discard useful
information; automatically generate the right meta data to describe it
Data reduction
Will traditional data reduction approaches (via compression) become overwhelming ?
Challenge: introduction of new data collection practices and models as per analytical needs;
compact (space, time) representations/dictionary/basis; parsimonious model (low-
dimensionality, compressed sensing and sparse data capture models)
“Big-Little” Data
Device cloud vs. Conventional cloud; Distributed data and Peer-to-Peer Federation
Challenge: how to combine Big and Little data for meaningful analytics (often in real time)
Analytics from the Edge to the Cloud
Will the current model of pushing all data to a central cloud for analytics scale, be inefficient, and
alleviate privacy concerns ?
Challenge: how to automate distributed analytics and decision making on subsets of “Little” and
“Big” data; within the constraints of device capability, privacy needs, energy and network costs,
and application QoS
24. Top Challenges
8/26/2016 24
Handling inconsistent/incomplete/missing data and outliers
Is this critical ?
Challenge: design robust imputation algorithms
Heterogeneous Data Fusion
Is there a need to analyze the relationship between heterogeneous data objects/streams
Challenge: Extract right amount of semantics, sequential data fusion via transform spaces
Scalability with multi-level hierarchy
Will traditional methods of data navigational and search in deep hierarchy be scalable ?
Challenge: design newer alternatives
Data summarization for interactive Query
Will examination of datasets (all at once) become difficult ?
Data summarization let users request data with particular characteristics
Data summarization: organize data based on the presence/type of feature
Scientific data features: geometrical, topological, statistical
Non-scientific data features: related to semantic/syntactic components of the data
Challenge:
extraction of meaningful features, both from high and low dimension data
data storage and indexing in an I/O efficient format for rapid runtime retrieval
25. Top Challenges
8/26/2016 25
Analytics of temporally/spatially evolving features
Do data features occur at different spatial and temporal scales ?
Challenge: effective visual techniques that are computationally practical and that can take
advantage of humans unique cognitive ability to track those feature changes
Representation of evidence and uncertainty
Interpretation of evidence is subject to person performing this task, and depends on his prior
knowledge, subjective settings and viewpoint
Uncertainty quantification models the consequence based on the presented evidence and then
predicts the qualities of the corresponding outcome
Challenge: how to represent evidence and uncertainty clearly and without bias through
visualization
Sense making to users/decision makers
Involves examining all the assumptions made and retracing the analysis
There can be many sources of error: computer systems can have bugs, models almost always have
assumptions, and results can be based on erroneous data. For all of these reasons, users will try
to understand, and verify, the results produced by the computer.
Challenge: what should the man-machine interface for this look like ?
29. 8/26/2016 29
References
Stephen H. Kaisler et. Al ,“Big data and analytics: challenges and issues”
Pak Chung Wong, Han-Wei Shen, Chaomei Chen, “Top Ten Interaction Challenges
in Extreme-Scale Visual Analytics”
http://link.springer.com/chapter/10.1007/978-1-4471-2804-5_12#page-1
Other info graphics from the web !!!