SlideShare ist ein Scribd-Unternehmen logo
1 von 66
1© Cloudera, Inc. All rights reserved.
Mirko Kämpf | Solutions Architect
mirko@cloudera.com
From Events to Networks:
Apply Time Series Analysis at Scale.
2© Cloudera, Inc. All rights reserved.
Who is speaking?
• Mirko Kämpf
• Solutions Architect, EMEA
• Data Analysis Projects:
• Econodiagnostics: Relation between Social Media & Economy
• Analysis of network growth processes
• Github: kamir
• gephi-hadoop-connector: store networks in Hadoop and plot layouts in Gephi
• fuseki-cloud: scale out the RDF meta(data)store
• Hadoop.TS3: simplify complex time series analysis processes
3© Cloudera, Inc. All rights reserved.
Recap:
The Data Science Process (DSP)
Time Series: What, Why, How?
What are Similarity Graphs?
Applications of TSA
Hadoop.TS and HDGS
HDGS: History & High Level Architecture
Outlook
Agenda
4© Cloudera, Inc. All rights reserved.
Time Series Analysis on Hadoop:
• Data Driven Business:
•
Domain Knowledge,
Science, Math
Data Engineering
• Efficient Operations
•
Security
Intuition
Algorithms
Interpretation
ETL,
Workflows
Application
5© Cloudera, Inc. All rights reserved.
Where are the time series?
Image from: http://semanticommunity.info/Data_Science/Doing_Data_Science
6© Cloudera, Inc. All rights reserved.
Where are the time series?
Image from: http://semanticommunity.info/Data_Science/Doing_Data_Science
7© Cloudera, Inc. All rights reserved.
Network Analysis on Hadoop: What is it?
Process collected
raw data
scalable graph
analysis in
distributed
heterogeneous
environments
+ time evolution
Multiple data sets of any kind …
Obviuos and hidden relations between variables.
> Structure is not accessible in many cases.
8© Cloudera, Inc. All rights reserved.
• The ideal gas law, relates the pressure, volume, and temperature of an ideal gas a
compact equation.
History of gas laws: Three names in particular are associated with gas laws.
(1) Robert Boyle (1627 - 1691),
(2) Jacques Charles (1746 - 1823), and
(3) J.L. Gay-Lussac (1778 - 1850).
From our experience: The gas laws
9© Cloudera, Inc. All rights reserved.
• Boyle showed that for a fixed amount of gas at constant temperature, the
pressure and volume are inversely proportional to one another.
• Boyle's law : PV = constant.
• In Charles' law, it is the pressure that is kept constant. Under this
constraint, the volume is proportional to the temperature.
• Charles' law : V1 / T1 = V2 / T2
• When the volume is kept constant, it is the pressure of the gas that is
proportional to temperature:
• Gay-Lussac's law : P1 / T1 = P2 / T2
The gas laws
Indices 1 and 2
represent point
in time.
10© Cloudera, Inc. All rights reserved.
• We use time dependent variables to
describe the system.
• Relations between the variable are
characteristic for a given system.
• Learning or identifying such relations
means understanding the systems.
• Instead of pressure, volume, and
temperature we use:
• IT-Operations:
• I/O rates
• available RAM
• system utilization
• Financial markets:
• trading volume
• price
• volatility
Recap:
11© Cloudera, Inc. All rights reserved.
Network Analysis on Hadoop:
Process collected
raw data
Analyze results from
previous phases
scalable graph
analysis in
distributed
heterogeneous
environments
+ time evolution
Relations among variables can be expressed as
formulas. (analytical approach)
A data driven approach uses pairwise correlations
and other statistical measures.
Final results are model parameters, which can be
used in analytical models and for forecast.
12© Cloudera, Inc. All rights reserved.
Network Analysis on Hadoop:
Process collected
raw data
Analyze results from
previous phases
scalable graph
analysis in
distributed
heterogeneous
environments
+ time evolution
13© Cloudera, Inc. All rights reserved.
Time Series Analysis on Hadoop:
• Hadoop.TS provides data
containers & operations:
• time series bucket
• time series classes
• transformations
• extractions
• HDGS exposes results as
semantic network,
using a flexible, and generic
format by using RDF
14© Cloudera, Inc. All rights reserved.
Goals of Hadoop.TS:
• Provides abstraction to separate:
• data science from data engineering
• data from algorithms
• results from implementation
• Reuse existing analysis algorithms in data driven applications.
• Build Time Series related Data Products faster.
15© Cloudera, Inc. All rights reserved.
Time Series:
What is it?
16© Cloudera, Inc. All rights reserved.
What is a time series?
• y=f(x) … a function?
• Let x be time t: y=f(t)
• A time series is simply a measure of some thing as a function of time.
17© Cloudera, Inc. All rights reserved.
What is a time series?
• y=f(x) … a function?
• Let x be time t: y=f(t)
• A time series is simply a measure of some thing as a function of time.
What is t?
• Continuous
• Discrete (fixed points in time with constant distance)
• Unknown points in time
18© Cloudera, Inc. All rights reserved.
Typical Approaches for Time Based Analysis
• Events => single event can be compared with an intent
• No history
• Complex Even Processing
• A series of events
• Needs small amount of historical data
• Continuous time series processing
• Equidistant measures
• Needs huge amount of historical data
19© Cloudera, Inc. All rights reserved.
From Complex Events to Time Series
• Univariate:
• A series of events / measurements
• Limited by a time range
• CEP: A known pattern
• TSA: A known property such as:
• average, volatility, or other parameters of the distribution of values
• Multivariate:
• CEP: Co-occurrence of events
• TSA: Correlation measures
20© Cloudera, Inc. All rights reserved.
—Why should I care about time series analysis?
“A time series describes a thing over time.”
Many time series describes many things over time.
21© Cloudera, Inc. All rights reserved.
—Why should I care about time series analysis?
“A time series describes a thing over time.”
Many time series describes many things over time.
Correlation networks are derived from time series.
22© Cloudera, Inc. All rights reserved.
—Why should I care about time series analysis?
“A time series describes a thing over time.”
Many time series describes many things over time.
Correlation networks are derived from time series.
Correlation networks describe systems.
23© Cloudera, Inc. All rights reserved.
Time Series:
Available in multiple flavors ...
24© Cloudera, Inc. All rights reserved.
Typical Time Series
(a,c,e) continuous time (b,d,f) spontaneous events
25© Cloudera, Inc. All rights reserved.
Transformations: TS > ETS > TS
26© Cloudera, Inc. All rights reserved.
Networks for structural analysis
What is similar among nodes?
(a) static properties
(b) dynamic properties
27© Cloudera, Inc. All rights reserved.
Visualization of topological structure.
Figures are based on term-vectors, stored in a Lucene Index.
Inspection of topological system properties:
data quality screening (1)
28© Cloudera, Inc. All rights reserved.
Inspection of static system properties:
data quality screening (1)
• Network nodes are articles (represented as term-vectors).
One term-vector per article:
… stored in a Lucene index.
• Links are given by pairwise distance: cosine-similarity.
• Gephi toolkit provides Force directed layout.
29© Cloudera, Inc. All rights reserved.
Visualization of the context
Comparison of subsystems
Inspection of dynamic system properties:
data quality screening (2)
30© Cloudera, Inc. All rights reserved.
Motivation for Hadoop.TS & HDGS
Overview & Concepts
31© Cloudera, Inc. All rights reserved.
Challenge:
32© Cloudera, Inc. All rights reserved.
Study properties per time series
Uni-Variate Time Series Analysis
33© Cloudera, Inc. All rights reserved.
Distribution of values (PDF) …
Warning: Correlations are
not visible in probability
distribution chart!
34© Cloudera, Inc. All rights reserved.
Impact of Long-Term-Correlations:
• P
PDF
Warning: Correlations
cause non stationarity.
35© Cloudera, Inc. All rights reserved.
Detect Long Term Correlation in Time Series
Detrended Fluctuation Analysis Return Interval Statistics
36© Cloudera, Inc. All rights reserved.
More Time Series Properties:
• Is a time series stationary?
• Peak detection
• Find frequency patterns
Images:
- pixel lines and rows can be handled like time series
Sound files:
- sound analysis and signal analysis are common in
engineering and industry
37© Cloudera, Inc. All rights reserved.
More Time Series Properties:
• Time Series Models:
• Auto-Regressive (AR)
• Moving average (MA)
• Combined: ARMA
• Extended: ARMA+TOPOLOGICAL INFORMATION (work in progress)
How to get this structural information?
>>> see next part: Multivariate TSA
38© Cloudera, Inc. All rights reserved.
Information, derived from time series pairs
Multi-Variate Time Series Analysis
39© Cloudera, Inc. All rights reserved.
https://imgs.xkcd.com/comics/compass_and_straightedge.png
40© Cloudera, Inc. All rights reserved.
But: Multivariate TSA allows you …
to reconstruct networks.
https://imgs.xkcd.com/comics/compass_and_straightedge.png
41© Cloudera, Inc. All rights reserved.
Network Reconstruction
• Content Networks:
• Cosine-Similarity
• Functional Network:
• Cross-Correlation
• Event-Synchronization
• Dependency and Impact:
• Granger Causality
• Mutual Information
Question:
How can I identify significant links?
Modifications and variation lead to
better results in special use cases.
INTRA CORRELATION
INTRA CORRELATION
INTER
CORRELATION
42© Cloudera, Inc. All rights reserved.
43© Cloudera, Inc. All rights reserved.
Get Meaning out of Correlation Metrics …
1D vs. 2D approach: Using multiple independent metrics allows separation of disjoint groups of
node pairs (or links) as shown in as area (A) and (B) in b).
b)a)
44© Cloudera, Inc. All rights reserved.
Application of Hadoop.TS:
Results
45© Cloudera, Inc. All rights reserved.
(1) Usage of Online Content
46© Cloudera, Inc. All rights reserved.
Usage of Online Content
Even if distribution of links is stable we see structural changes
47© Cloudera, Inc. All rights reserved.
(2) Understand Financial Markets
48© Cloudera, Inc. All rights reserved.
Interconnected Financial Markets:
We can identify which nodes connect the markets …
49© Cloudera, Inc. All rights reserved.
HDGS: History & Current Status
Data Flow, Prototype & Architecture Overview
50© Cloudera, Inc. All rights reserved.
Hadoop.TS
Historical Approach (2012):
51© Cloudera, Inc. All rights reserved.
Hadoop.TS (2013)
52© Cloudera, Inc. All rights reserved.
• End-2-end applications need multiple
technologies (HBase, Kudu, SOLR,
Spark, Impala)
• Multiple algorithms are combined
(Cross-correlation, Rank-correlation,
Wavelet analysis, Frequency analysis,
Poisson- or Hawkes-process)
• Parameters are often unknown
Modern Time Series Analysis:
53© Cloudera, Inc. All rights reserved.
Enhanced Time Series Representations
54© Cloudera, Inc. All rights reserved.
TSA on Apache Spark
Time Series Analysis: using spark shell or applications (TSA-workbench)
Hadoop.TS provides domain specific functions.
Etosha exposes metadata and dataset properties as „linked data“ using RDF.
Hadoop.TS
Etosha
55© Cloudera, Inc. All rights reserved.
HDGS: Outlook
... towards an econo-diagnostics toolbox
56© Cloudera, Inc. All rights reserved.
Hadoop Distributed Graph Space (HDGS)
• Reconstruction of networks
• Profiling of networks
• Support for:
• Multi-layer networks
• Time-dependent multi-layer
networks
57© Cloudera, Inc. All rights reserved.
58© Cloudera, Inc. All rights reserved.
An Oscilloscope for Business Data on Hadoop …
59© Cloudera, Inc. All rights reserved.
Replace by screen shots ...
60© Cloudera, Inc. All rights reserved.
Enjoy your time ...
Enjoy your data …
Thank you !
61© Cloudera, Inc. All rights reserved.
Practical Tips
62© Cloudera, Inc. All rights reserved.
Collecting Sensor Data with Spark Streaming …
• Spark Streaming works on fixed time slices only.
• Use the original time stamp?
• Requires additional storage and bandwidth
• Original system clock defines resolution
• Use „Spark-Time“ or a local time reference:
• You may lose information!
• You have a limited resolution, defined by batch size.
63© Cloudera, Inc. All rights reserved.
Data Management
• Think about typical access patterns:
• random access to each event, record or field?
• access to entire groups of records?
• variable size or fixed size sets?
• In general, prepare for „full table scan“
• OPTIMIZE FOR YOUR DOMINANT ACCESS PATTERN!
• Select efficient storage formats: Avro, Parquet
• Index your data in SOLR for random access and data exploration
• Indexing can be done by just a few clicks in HUE …
64© Cloudera, Inc. All rights reserved.
Visualization of
Large Correlation Networks
• How to manage metadata for time dependent
multi-layer networks?
• Mediawiki or Fuseki/Jena are available
• Gephi-Hadoop-Connector provides access
to raw data:
• using SQL queries on Impala
• using SOLR queries
65© Cloudera, Inc. All rights reserved.
Gephi-Hadoop-Connector in Action …
66© Cloudera, Inc. All rights reserved.
Metadata for Multi-Layer Networks

Weitere ähnliche Inhalte

Was ist angesagt?

Building Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks DeltaBuilding Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks Delta
Databricks
 

Was ist angesagt? (20)

Enterprise Metadata Integration
Enterprise Metadata IntegrationEnterprise Metadata Integration
Enterprise Metadata Integration
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
 
Scaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark EcosystemScaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark Ecosystem
 
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringSuccessful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data Engineering
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobal
 
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with RBuilding a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Snaplogic Live: Big Data in Motion
Snaplogic Live: Big Data in MotionSnaplogic Live: Big Data in Motion
Snaplogic Live: Big Data in Motion
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Building Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks DeltaBuilding Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks Delta
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 
TechEvent Building a Data Lake
TechEvent Building a Data LakeTechEvent Building a Data Lake
TechEvent Building a Data Lake
 
Witsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingWitsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streaming
 

Andere mochten auch

M_Wolfe-LIBR501TopicBriefing
M_Wolfe-LIBR501TopicBriefingM_Wolfe-LIBR501TopicBriefing
M_Wolfe-LIBR501TopicBriefing
Myles Wolfe
 

Andere mochten auch (17)

M_Wolfe-LIBR501TopicBriefing
M_Wolfe-LIBR501TopicBriefingM_Wolfe-LIBR501TopicBriefing
M_Wolfe-LIBR501TopicBriefing
 
Roe v wade
Roe v wadeRoe v wade
Roe v wade
 
The future of Productivity - SharePoint 2010
The future of Productivity - SharePoint 2010The future of Productivity - SharePoint 2010
The future of Productivity - SharePoint 2010
 
Cesar Camargo Mariano
Cesar Camargo MarianoCesar Camargo Mariano
Cesar Camargo Mariano
 
National data archive on child abuse and neglect
National data archive on child abuse and neglectNational data archive on child abuse and neglect
National data archive on child abuse and neglect
 
Регламент на 24.04.2016 5-го этапа МГЛК сезона 2015-2016
Регламент на 24.04.2016 5-го этапа МГЛК сезона 2015-2016Регламент на 24.04.2016 5-го этапа МГЛК сезона 2015-2016
Регламент на 24.04.2016 5-го этапа МГЛК сезона 2015-2016
 
Library Skill
Library SkillLibrary Skill
Library Skill
 
Figures of Speech for Kids
Figures of Speech for KidsFigures of Speech for Kids
Figures of Speech for Kids
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
 
Presentation on W B Yeats
Presentation on W B YeatsPresentation on W B Yeats
Presentation on W B Yeats
 
Eduroam seminar - Networkshop44 2016
Eduroam seminar - Networkshop44 2016Eduroam seminar - Networkshop44 2016
Eduroam seminar - Networkshop44 2016
 
Using sdn to secure the campus - Networkshop44
Using sdn to secure the campus - Networkshop44Using sdn to secure the campus - Networkshop44
Using sdn to secure the campus - Networkshop44
 
Eduroam workshop nic mitev loughborough uni - networkshop44
Eduroam workshop nic mitev loughborough uni - networkshop44Eduroam workshop nic mitev loughborough uni - networkshop44
Eduroam workshop nic mitev loughborough uni - networkshop44
 
Next gen insight networkshop44
Next gen insight   networkshop44Next gen insight   networkshop44
Next gen insight networkshop44
 
Hadoop to spark_v2
Hadoop to spark_v2Hadoop to spark_v2
Hadoop to spark_v2
 
これからの時代に! パソコン離れの中のパソコン選び
これからの時代に! パソコン離れの中のパソコン選びこれからの時代に! パソコン離れの中のパソコン選び
これからの時代に! パソコン離れの中のパソコン選び
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
 

Ähnlich wie From Events to Networks: Time Series Analysis on Scale

Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
 

Ähnlich wie From Events to Networks: Time Series Analysis on Scale (20)

Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
 
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
 
PCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System TuningPCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System Tuning
 
Intro to Time Series
Intro to Time Series Intro to Time Series
Intro to Time Series
 
Why You Should NOT Be Using an RDBMS for Time-stamped Data
Why You Should NOT Be Using an RDBMS for Time-stamped DataWhy You Should NOT Be Using an RDBMS for Time-stamped Data
Why You Should NOT Be Using an RDBMS for Time-stamped Data
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Why You Should NOT Be Using an RDBS for Time-stamped Data
 Why You Should NOT Be Using an RDBS for Time-stamped Data Why You Should NOT Be Using an RDBS for Time-stamped Data
Why You Should NOT Be Using an RDBS for Time-stamped Data
 
DDS Tutorial -- Part I
DDS Tutorial -- Part IDDS Tutorial -- Part I
DDS Tutorial -- Part I
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming PlatformTime Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platform
 
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfDagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
 
Building Reactive Applications with DDS
Building Reactive Applications with DDSBuilding Reactive Applications with DDS
Building Reactive Applications with DDS
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
 
Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...
Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...
Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...
 
Best Practices: How to Analyze IoT Sensor Data with InfluxDB
Best Practices: How to Analyze IoT Sensor Data with InfluxDBBest Practices: How to Analyze IoT Sensor Data with InfluxDB
Best Practices: How to Analyze IoT Sensor Data with InfluxDB
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightning
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at Scale
 
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
 
mysql_pn_heatwave.pdf
mysql_pn_heatwave.pdfmysql_pn_heatwave.pdf
mysql_pn_heatwave.pdf
 

Mehr von Dr. Mirko Kämpf

Mehr von Dr. Mirko Kämpf (9)

Improving computer vision models at scale (Strata Data NYC)
Improving computer vision models at scale  (Strata Data NYC)Improving computer vision models at scale  (Strata Data NYC)
Improving computer vision models at scale (Strata Data NYC)
 
Improving computer vision models at scale presentation
Improving computer vision models at scale presentationImproving computer vision models at scale presentation
Improving computer vision models at scale presentation
 
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapEtosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road map
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific Applications
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
DPG Berlin - SOE 18 - talk v1.2.4
DPG Berlin - SOE 18 - talk v1.2.4DPG Berlin - SOE 18 - talk v1.2.4
DPG Berlin - SOE 18 - talk v1.2.4
 
Information Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation OptimizationInformation Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation Optimization
 
Hadoop & Complex Systems Research
Hadoop & Complex Systems ResearchHadoop & Complex Systems Research
Hadoop & Complex Systems Research
 
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
 

Kürzlich hochgeladen

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
gajnagarg
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 

Kürzlich hochgeladen (20)

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 

From Events to Networks: Time Series Analysis on Scale

  • 1. 1© Cloudera, Inc. All rights reserved. Mirko Kämpf | Solutions Architect mirko@cloudera.com From Events to Networks: Apply Time Series Analysis at Scale.
  • 2. 2© Cloudera, Inc. All rights reserved. Who is speaking? • Mirko Kämpf • Solutions Architect, EMEA • Data Analysis Projects: • Econodiagnostics: Relation between Social Media & Economy • Analysis of network growth processes • Github: kamir • gephi-hadoop-connector: store networks in Hadoop and plot layouts in Gephi • fuseki-cloud: scale out the RDF meta(data)store • Hadoop.TS3: simplify complex time series analysis processes
  • 3. 3© Cloudera, Inc. All rights reserved. Recap: The Data Science Process (DSP) Time Series: What, Why, How? What are Similarity Graphs? Applications of TSA Hadoop.TS and HDGS HDGS: History & High Level Architecture Outlook Agenda
  • 4. 4© Cloudera, Inc. All rights reserved. Time Series Analysis on Hadoop: • Data Driven Business: • Domain Knowledge, Science, Math Data Engineering • Efficient Operations • Security Intuition Algorithms Interpretation ETL, Workflows Application
  • 5. 5© Cloudera, Inc. All rights reserved. Where are the time series? Image from: http://semanticommunity.info/Data_Science/Doing_Data_Science
  • 6. 6© Cloudera, Inc. All rights reserved. Where are the time series? Image from: http://semanticommunity.info/Data_Science/Doing_Data_Science
  • 7. 7© Cloudera, Inc. All rights reserved. Network Analysis on Hadoop: What is it? Process collected raw data scalable graph analysis in distributed heterogeneous environments + time evolution Multiple data sets of any kind … Obviuos and hidden relations between variables. > Structure is not accessible in many cases.
  • 8. 8© Cloudera, Inc. All rights reserved. • The ideal gas law, relates the pressure, volume, and temperature of an ideal gas a compact equation. History of gas laws: Three names in particular are associated with gas laws. (1) Robert Boyle (1627 - 1691), (2) Jacques Charles (1746 - 1823), and (3) J.L. Gay-Lussac (1778 - 1850). From our experience: The gas laws
  • 9. 9© Cloudera, Inc. All rights reserved. • Boyle showed that for a fixed amount of gas at constant temperature, the pressure and volume are inversely proportional to one another. • Boyle's law : PV = constant. • In Charles' law, it is the pressure that is kept constant. Under this constraint, the volume is proportional to the temperature. • Charles' law : V1 / T1 = V2 / T2 • When the volume is kept constant, it is the pressure of the gas that is proportional to temperature: • Gay-Lussac's law : P1 / T1 = P2 / T2 The gas laws Indices 1 and 2 represent point in time.
  • 10. 10© Cloudera, Inc. All rights reserved. • We use time dependent variables to describe the system. • Relations between the variable are characteristic for a given system. • Learning or identifying such relations means understanding the systems. • Instead of pressure, volume, and temperature we use: • IT-Operations: • I/O rates • available RAM • system utilization • Financial markets: • trading volume • price • volatility Recap:
  • 11. 11© Cloudera, Inc. All rights reserved. Network Analysis on Hadoop: Process collected raw data Analyze results from previous phases scalable graph analysis in distributed heterogeneous environments + time evolution Relations among variables can be expressed as formulas. (analytical approach) A data driven approach uses pairwise correlations and other statistical measures. Final results are model parameters, which can be used in analytical models and for forecast.
  • 12. 12© Cloudera, Inc. All rights reserved. Network Analysis on Hadoop: Process collected raw data Analyze results from previous phases scalable graph analysis in distributed heterogeneous environments + time evolution
  • 13. 13© Cloudera, Inc. All rights reserved. Time Series Analysis on Hadoop: • Hadoop.TS provides data containers & operations: • time series bucket • time series classes • transformations • extractions • HDGS exposes results as semantic network, using a flexible, and generic format by using RDF
  • 14. 14© Cloudera, Inc. All rights reserved. Goals of Hadoop.TS: • Provides abstraction to separate: • data science from data engineering • data from algorithms • results from implementation • Reuse existing analysis algorithms in data driven applications. • Build Time Series related Data Products faster.
  • 15. 15© Cloudera, Inc. All rights reserved. Time Series: What is it?
  • 16. 16© Cloudera, Inc. All rights reserved. What is a time series? • y=f(x) … a function? • Let x be time t: y=f(t) • A time series is simply a measure of some thing as a function of time.
  • 17. 17© Cloudera, Inc. All rights reserved. What is a time series? • y=f(x) … a function? • Let x be time t: y=f(t) • A time series is simply a measure of some thing as a function of time. What is t? • Continuous • Discrete (fixed points in time with constant distance) • Unknown points in time
  • 18. 18© Cloudera, Inc. All rights reserved. Typical Approaches for Time Based Analysis • Events => single event can be compared with an intent • No history • Complex Even Processing • A series of events • Needs small amount of historical data • Continuous time series processing • Equidistant measures • Needs huge amount of historical data
  • 19. 19© Cloudera, Inc. All rights reserved. From Complex Events to Time Series • Univariate: • A series of events / measurements • Limited by a time range • CEP: A known pattern • TSA: A known property such as: • average, volatility, or other parameters of the distribution of values • Multivariate: • CEP: Co-occurrence of events • TSA: Correlation measures
  • 20. 20© Cloudera, Inc. All rights reserved. —Why should I care about time series analysis? “A time series describes a thing over time.” Many time series describes many things over time.
  • 21. 21© Cloudera, Inc. All rights reserved. —Why should I care about time series analysis? “A time series describes a thing over time.” Many time series describes many things over time. Correlation networks are derived from time series.
  • 22. 22© Cloudera, Inc. All rights reserved. —Why should I care about time series analysis? “A time series describes a thing over time.” Many time series describes many things over time. Correlation networks are derived from time series. Correlation networks describe systems.
  • 23. 23© Cloudera, Inc. All rights reserved. Time Series: Available in multiple flavors ...
  • 24. 24© Cloudera, Inc. All rights reserved. Typical Time Series (a,c,e) continuous time (b,d,f) spontaneous events
  • 25. 25© Cloudera, Inc. All rights reserved. Transformations: TS > ETS > TS
  • 26. 26© Cloudera, Inc. All rights reserved. Networks for structural analysis What is similar among nodes? (a) static properties (b) dynamic properties
  • 27. 27© Cloudera, Inc. All rights reserved. Visualization of topological structure. Figures are based on term-vectors, stored in a Lucene Index. Inspection of topological system properties: data quality screening (1)
  • 28. 28© Cloudera, Inc. All rights reserved. Inspection of static system properties: data quality screening (1) • Network nodes are articles (represented as term-vectors). One term-vector per article: … stored in a Lucene index. • Links are given by pairwise distance: cosine-similarity. • Gephi toolkit provides Force directed layout.
  • 29. 29© Cloudera, Inc. All rights reserved. Visualization of the context Comparison of subsystems Inspection of dynamic system properties: data quality screening (2)
  • 30. 30© Cloudera, Inc. All rights reserved. Motivation for Hadoop.TS & HDGS Overview & Concepts
  • 31. 31© Cloudera, Inc. All rights reserved. Challenge:
  • 32. 32© Cloudera, Inc. All rights reserved. Study properties per time series Uni-Variate Time Series Analysis
  • 33. 33© Cloudera, Inc. All rights reserved. Distribution of values (PDF) … Warning: Correlations are not visible in probability distribution chart!
  • 34. 34© Cloudera, Inc. All rights reserved. Impact of Long-Term-Correlations: • P PDF Warning: Correlations cause non stationarity.
  • 35. 35© Cloudera, Inc. All rights reserved. Detect Long Term Correlation in Time Series Detrended Fluctuation Analysis Return Interval Statistics
  • 36. 36© Cloudera, Inc. All rights reserved. More Time Series Properties: • Is a time series stationary? • Peak detection • Find frequency patterns Images: - pixel lines and rows can be handled like time series Sound files: - sound analysis and signal analysis are common in engineering and industry
  • 37. 37© Cloudera, Inc. All rights reserved. More Time Series Properties: • Time Series Models: • Auto-Regressive (AR) • Moving average (MA) • Combined: ARMA • Extended: ARMA+TOPOLOGICAL INFORMATION (work in progress) How to get this structural information? >>> see next part: Multivariate TSA
  • 38. 38© Cloudera, Inc. All rights reserved. Information, derived from time series pairs Multi-Variate Time Series Analysis
  • 39. 39© Cloudera, Inc. All rights reserved. https://imgs.xkcd.com/comics/compass_and_straightedge.png
  • 40. 40© Cloudera, Inc. All rights reserved. But: Multivariate TSA allows you … to reconstruct networks. https://imgs.xkcd.com/comics/compass_and_straightedge.png
  • 41. 41© Cloudera, Inc. All rights reserved. Network Reconstruction • Content Networks: • Cosine-Similarity • Functional Network: • Cross-Correlation • Event-Synchronization • Dependency and Impact: • Granger Causality • Mutual Information Question: How can I identify significant links? Modifications and variation lead to better results in special use cases. INTRA CORRELATION INTRA CORRELATION INTER CORRELATION
  • 42. 42© Cloudera, Inc. All rights reserved.
  • 43. 43© Cloudera, Inc. All rights reserved. Get Meaning out of Correlation Metrics … 1D vs. 2D approach: Using multiple independent metrics allows separation of disjoint groups of node pairs (or links) as shown in as area (A) and (B) in b). b)a)
  • 44. 44© Cloudera, Inc. All rights reserved. Application of Hadoop.TS: Results
  • 45. 45© Cloudera, Inc. All rights reserved. (1) Usage of Online Content
  • 46. 46© Cloudera, Inc. All rights reserved. Usage of Online Content Even if distribution of links is stable we see structural changes
  • 47. 47© Cloudera, Inc. All rights reserved. (2) Understand Financial Markets
  • 48. 48© Cloudera, Inc. All rights reserved. Interconnected Financial Markets: We can identify which nodes connect the markets …
  • 49. 49© Cloudera, Inc. All rights reserved. HDGS: History & Current Status Data Flow, Prototype & Architecture Overview
  • 50. 50© Cloudera, Inc. All rights reserved. Hadoop.TS Historical Approach (2012):
  • 51. 51© Cloudera, Inc. All rights reserved. Hadoop.TS (2013)
  • 52. 52© Cloudera, Inc. All rights reserved. • End-2-end applications need multiple technologies (HBase, Kudu, SOLR, Spark, Impala) • Multiple algorithms are combined (Cross-correlation, Rank-correlation, Wavelet analysis, Frequency analysis, Poisson- or Hawkes-process) • Parameters are often unknown Modern Time Series Analysis:
  • 53. 53© Cloudera, Inc. All rights reserved. Enhanced Time Series Representations
  • 54. 54© Cloudera, Inc. All rights reserved. TSA on Apache Spark Time Series Analysis: using spark shell or applications (TSA-workbench) Hadoop.TS provides domain specific functions. Etosha exposes metadata and dataset properties as „linked data“ using RDF. Hadoop.TS Etosha
  • 55. 55© Cloudera, Inc. All rights reserved. HDGS: Outlook ... towards an econo-diagnostics toolbox
  • 56. 56© Cloudera, Inc. All rights reserved. Hadoop Distributed Graph Space (HDGS) • Reconstruction of networks • Profiling of networks • Support for: • Multi-layer networks • Time-dependent multi-layer networks
  • 57. 57© Cloudera, Inc. All rights reserved.
  • 58. 58© Cloudera, Inc. All rights reserved. An Oscilloscope for Business Data on Hadoop …
  • 59. 59© Cloudera, Inc. All rights reserved. Replace by screen shots ...
  • 60. 60© Cloudera, Inc. All rights reserved. Enjoy your time ... Enjoy your data … Thank you !
  • 61. 61© Cloudera, Inc. All rights reserved. Practical Tips
  • 62. 62© Cloudera, Inc. All rights reserved. Collecting Sensor Data with Spark Streaming … • Spark Streaming works on fixed time slices only. • Use the original time stamp? • Requires additional storage and bandwidth • Original system clock defines resolution • Use „Spark-Time“ or a local time reference: • You may lose information! • You have a limited resolution, defined by batch size.
  • 63. 63© Cloudera, Inc. All rights reserved. Data Management • Think about typical access patterns: • random access to each event, record or field? • access to entire groups of records? • variable size or fixed size sets? • In general, prepare for „full table scan“ • OPTIMIZE FOR YOUR DOMINANT ACCESS PATTERN! • Select efficient storage formats: Avro, Parquet • Index your data in SOLR for random access and data exploration • Indexing can be done by just a few clicks in HUE …
  • 64. 64© Cloudera, Inc. All rights reserved. Visualization of Large Correlation Networks • How to manage metadata for time dependent multi-layer networks? • Mediawiki or Fuseki/Jena are available • Gephi-Hadoop-Connector provides access to raw data: • using SQL queries on Impala • using SOLR queries
  • 65. 65© Cloudera, Inc. All rights reserved. Gephi-Hadoop-Connector in Action …
  • 66. 66© Cloudera, Inc. All rights reserved. Metadata for Multi-Layer Networks

Hinweis der Redaktion

  1. All starts with a question / problem ? How has …. Changed (descriptive) ? What will happen if .... Changes ? (impact) How will .... evolve? (forecast) Domain Knowledge and ituition help us to get a starting point TSA: offers multiple specialities, one has to select the right incredients
  2. Source for info is: Measured data from …. http://images.google.de/imgres?imgurl=http%3A%2F%2F3.bp.blogspot.com%2F-tEkIR2kcyCY%2FVEcQJGrqb3I%2FAAAAAAAAABU%2F9Nj4hxeuqa0%2Fs1600%2FTHAI1.jpg&imgrefurl=http%3A%2F%2Fkonwersatorium1-ms-pjwstk.blogspot.com%2F2014%2F10%2Fthe-human-artificial-intelligence_22.html&h=958&w=965&tbnid=WscyQ01kH-s7CM%3A&docid=sGVehcJYs2-e1M&ei=gy6aV4zmJMX1UqSwsYAO&tbm=isch&iact=rc&uact=3&dur=774&page=1&start=0&ndsp=36&ved=0ahUKEwjMs_6BxpbOAhXFuhQKHSRYDOAQMwhEKAowCg&bih=1058&biw=1804 https://openclipart.org/download/242296/remix-fossasia-2016-contest4.svg
  3. Results tell us about very specific properties of the system: Lets look into a thermodynamics: http://images.google.de/imgres?imgurl=http%3A%2F%2F3.bp.blogspot.com%2F-tEkIR2kcyCY%2FVEcQJGrqb3I%2FAAAAAAAAABU%2F9Nj4hxeuqa0%2Fs1600%2FTHAI1.jpg&imgrefurl=http%3A%2F%2Fkonwersatorium1-ms-pjwstk.blogspot.com%2F2014%2F10%2Fthe-human-artificial-intelligence_22.html&h=958&w=965&tbnid=WscyQ01kH-s7CM%3A&docid=sGVehcJYs2-e1M&ei=gy6aV4zmJMX1UqSwsYAO&tbm=isch&iact=rc&uact=3&dur=774&page=1&start=0&ndsp=36&ved=0ahUKEwjMs_6BxpbOAhXFuhQKHSRYDOAQMwhEKAowCg&bih=1058&biw=1804 https://openclipart.org/download/242296/remix-fossasia-2016-contest4.svg
  4. Results tell us about very specific properties of the system: Lets look into a thermodynamics: http://images.google.de/imgres?imgurl=http%3A%2F%2F3.bp.blogspot.com%2F-tEkIR2kcyCY%2FVEcQJGrqb3I%2FAAAAAAAAABU%2F9Nj4hxeuqa0%2Fs1600%2FTHAI1.jpg&imgrefurl=http%3A%2F%2Fkonwersatorium1-ms-pjwstk.blogspot.com%2F2014%2F10%2Fthe-human-artificial-intelligence_22.html&h=958&w=965&tbnid=WscyQ01kH-s7CM%3A&docid=sGVehcJYs2-e1M&ei=gy6aV4zmJMX1UqSwsYAO&tbm=isch&iact=rc&uact=3&dur=774&page=1&start=0&ndsp=36&ved=0ahUKEwjMs_6BxpbOAhXFuhQKHSRYDOAQMwhEKAowCg&bih=1058&biw=1804 https://openclipart.org/download/242296/remix-fossasia-2016-contest4.svg
  5. There are some open questions … (see yellow bubble)
  6. The ARIMA model can be viewed as a "cascade" of two models. The first is non-stationary: {\displaystyle Y_{t}=\left(1-L\right)^{d}X_{t}}while the second is wide-sense stationary: {\displaystyle \left(1-\sum _{i=1}^{p}\phi _{i}L^{i}\right)Y_{t}=\left(1+\sum _{i=1}^{q}\theta _{i}L^{i}\right)\varepsilon _{t}\,.}Now forecasts can be made for the process {\displaystyle Y_{t}}, using a generalization of the method of autoregressive forecasting.
  7. The ARIMA model can be viewed as a "cascade" of two models. The first is non-stationary: {\displaystyle Y_{t}=\left(1-L\right)^{d}X_{t}}while the second is wide-sense stationary: {\displaystyle \left(1-\sum _{i=1}^{p}\phi _{i}L^{i}\right)Y_{t}=\left(1+\sum _{i=1}^{q}\theta _{i}L^{i}\right)\varepsilon _{t}\,.}Now forecasts can be made for the process {\displaystyle Y_{t}}, using a generalization of the method of autoregressive forecasting.