Weitere ähnliche Inhalte
Ähnlich wie Extent 2013 Obninsk Managing Uncertain Data at Scale (20)
Mehr von extentconf Tsoy (16)
Kürzlich hochgeladen (20)
Extent 2013 Obninsk Managing Uncertain Data at Scale
- 1. • Click to add text
Managing Uncertain Data at Scale
Nikolay Marin
© 2013 IBM Corporation
- 2. Managing Uncertain Data at Scale
Managing Uncertain Data at Scale
By 2015, 80% of the world’s data will be uncertain
Trend: Most of the
world’s analyzed Uncertain data management requires new techniques
data will be uncertain These techniques are necessary for real-world Big Data Analytics
Opportunity: Robust, business-aware uncertain data management
Business leadership
Use analytics over uncertain web, sensor, and human-generated data
using Big Data
Analytics Enable good business decisions by understanding analysis
confidence
Challenge: Taking Analysis of text is highly nuanced; sensor-based data is imprecise
Big Data Analytics Timely business decisions require efficient large-scale analytics
into an uncertain
world It is more difficult to obtain insight about an individual than a group,
especially if the source data is uncertain
© 2013 3IBM Corporation 2
- 3. Managing Uncertain Data at Scale
The fourth dimension of Big Data: Veracity – handling data in doubt
Volume Velocity Variety Veracity*
Data in Many
Data at Rest Data in Motion Data in Doubt
Forms
Terabytes to Streaming data, Structured, Uncertainty due to
exabytes of existing milliseconds to unstructured, text, data inconsistency
data to process seconds to respond multimedia & incompleteness,
ambiguities, latency,
deception, model
approximations
* Truthfulness, accuracy or precision, correctness
© 2013 3IBM Corporation 3
- 4. Managing Uncertain Data at Scale
Uncertainty arises from many sources
Process Uncertainty Data Uncertainty Model Uncertainty
Processes contain Data input is uncertain All modeling is approximate
“randomness”
Intended Actual
Spelling Text Entry Spelling
? ?
? Fitting a curve to data
Uncertain travel times GPS Uncertainty
? ?
Testimony
?
{Paris Airport}
Ambiguity
{John Smith, Dallas}
Semiconductor yield {John Smith, Kansas} Forecasting a hurricane
Contaminated? (www.noaa.gov)
Rumors Conflicting Data
© 2013 3IBM Corporation 4
- 5. Managing Uncertain Data at Scale
By 2015, 80% of all available data will be uncertain
By 2015 the number of networked devices will
be double the entire global population. All
9000
sensor data has uncertainty.
8000 100
Global Data Volume in Exabytes
90 The total number of social media
7000
accounts exceeds the entire global
Aggregate Uncertainty %
80 population. This data is highly uncertain
6000
in both its expression and content.
70
s)
5000
of r s
in g
rn nso
60
Th
Data quality solutions exist for
e
4000
S
50
et
enterprise data like customer,
te
(In
3000 40 product, and address data, but
this is only a fraction of the ia )
M ed d text
2000
30 total enterprise data. i a l an
S ,oc audio
20 eo P
1000 (vid VoI
10
0 Enterprise Data
Multiple sources: IDC,Cisco
2005 2010 2015
© 2013 3IBM Corporation 5
- 6. Managing Uncertain Data at Scale
How to reduce uncertainty in processes, models, and data
Constructing context for better understanding
Extract as much information as feasible from each source
Combine (condense) data from multiple sources
More data from more sources is better
– Gathers more evidence for statistical methods
Using statistical methods scaled for Big Data
Stochastic techniques efficiently reason about uncertainty
Monte Carlo techniques explore many possible scenarios
in order to gain insight
Requires specific business process and industry context
© 2013 3IBM Corporation 6
- 7. Managing Uncertain Data at Scale
Statistical techniques reduce uncertainty in analytical models
Attributes
Trouble tickets
Help agent find
similar tickets
Use stochastic search
to find trouble tickets
that are similar
Trouble ticket attributes Model approximation Prediction
Some attributes such as server type Treat N attributes as N
are precise dimensions in space Improve predictability by getting
Other attributes such as words in Model similarity as closeness in agent feedback
trouble tickets may be imprecise the N dimensional space
indicators of the problem
Improve suggestions for similar problems using corroborating data and better mathematical techniques
Analyze all the data – do not subset
Use related techniques to automate Level 1 support, finding problem clusters, etc.
© 2013 3IBM Corporation 7
- 8. Managing Uncertain Data at Scale
Analytics is broadly defined as the use of data and computation to make
smart decisions
Data Decision point Possible outcomes
Data instances
Historical 1
n
Reports and queries on Optio
data aggregates
Predictive models Option 2
Answers and confidence Opt
Simulated ion
Feedback and learning 3
Text Video, Images Audio
© 2013 3IBM Corporation 8
- 9. Managing Uncertain Data at Scale
Future of Analytics
Explosion of Creates new analytics opportunities
unstructured data Addresses new enterprise needs
Consistent,
extensible, and Reduces cost-to-value for enterprises
consumable analytics Increases analytics solution coverage with limited supply of skills
platform
Optimizing across Analytics becomes a dominant IT workload and drives HW design
the stack to deploy
Opportunity to seamlessly scale from terascale to exascale
analytics at scale
© 2013 3IBM Corporation 9
- 10. Managing Uncertain Data at Scale
Analytics toolkits will be expanded to support ingestion and interpretation of
unstructured data, and enable adaptation and learning
Adaptive Analysis Responding to context Learn
In the context of
Continual Analysis Responding to local change/feedback
New the decision
Methods Optimization under Uncertainty Quantifying or mitigating risk process
Decide and Act
Optimization Decision complexity, solution speed
Predictive Modeling Causality, probabilistic, confidence levels
Simulation High fidelity, games, data farming
Understand
Forecasting Larger data sets, nonlinear regression and Predict
Tradi-
tional Alerts Rules/triggers, context sensitive, complex events
Query/Drill Down In memory data, fuzzy search, geo spatial
Ad hoc Reporting Query by example, user defined reports Report
Standard Reporting Real time, visualizations, user interaction
Entity Resolution People, roles, locations, things
Collect and
New Relationship, Feature Extraction Rules, semantic inferencing, matching Ingest/Interpret
Data Decide what to count;
Annotation and Tokenization Automated, crowd sourced
enable accurate counting
Extended from: Competing on Analytics, Davenport and Harris, 2007
© 2013 3IBM Corporation 10
- 11. Managing Uncertain Data at Scale
Finally...what about a longer term view.... say the next 10-50 years?
1. Artificial Intelligence
2. Nano –“everything”
3. Cognitive Computing
4. Deep (Exascale) Computing
5. Automic & Quantum Computing
6. Human / Computer Interaction
7. Machine to Machine Interaction
8. BioTech / Human Augmentation
9. Robots & Robotics
10. Advanced / Predictive Analytics
11. Security & Privacy
12. 3-D Printing
13. Video-enabled Business Processes
14. Personalized Web/Assistants
15. Ubiquitous Computing
16. Gaming
17. Simulation
18. Virtual Computing (including virtual worlds, tele-presence, etc.)
19. Augmented Reality
IBM Academy of Technology and Global Technology Outlook can help you find some answers
© 2013 3IBM Corporation 11