SlideShare ist ein Scribd-Unternehmen logo
1 von 66
1© Cloudera, Inc. All rights reserved.
Mirko Kämpf | Solutions Architect
mirko@cloudera.com
From Events to Networks:
Apply Time Series Analysis at Scale.
2© Cloudera, Inc. All rights reserved.
Who is speaking?
• Mirko Kämpf
• Solutions Architect, EMEA
• Data Analysis Projects:
• Econodiagnostics: Relation between Social Media & Economy
• Analysis of network growth processes
• Github: kamir
• gephi-hadoop-connector: store networks in Hadoop and plot layouts in Gephi
• fuseki-cloud: scale out the RDF meta(data)store
• Hadoop.TS3: simplify complex time series analysis processes
3© Cloudera, Inc. All rights reserved.
Recap:
The Data Science Process (DSP)
Time Series: What, Why, How?
What are Similarity Graphs?
Applications of TSA
Hadoop.TS and HDGS
HDGS: History & High Level Architecture
Outlook
Agenda
4© Cloudera, Inc. All rights reserved.
Time Series Analysis on Hadoop:
• Data Driven Business:
•
Domain Knowledge,
Science, Math
Data Engineering
• Efficient Operations
•
Security
Intuition
Algorithms
Interpretation
ETL,
Workflows
Application
5© Cloudera, Inc. All rights reserved.
Where are the time series?
Image from: http://semanticommunity.info/Data_Science/Doing_Data_Science
6© Cloudera, Inc. All rights reserved.
Where are the time series?
Image from: http://semanticommunity.info/Data_Science/Doing_Data_Science
7© Cloudera, Inc. All rights reserved.
Network Analysis on Hadoop: What is it?
Process collected
raw data
scalable graph
analysis in
distributed
heterogeneous
environments
+ time evolution
Multiple data sets of any kind …
Obviuos and hidden relations between variables.
> Structure is not accessible in many cases.
8© Cloudera, Inc. All rights reserved.
• The ideal gas law, relates the pressure, volume, and temperature of an ideal gas a
compact equation.
History of gas laws: Three names in particular are associated with gas laws.
(1) Robert Boyle (1627 - 1691),
(2) Jacques Charles (1746 - 1823), and
(3) J.L. Gay-Lussac (1778 - 1850).
From our experience: The gas laws
9© Cloudera, Inc. All rights reserved.
• Boyle showed that for a fixed amount of gas at constant temperature, the
pressure and volume are inversely proportional to one another.
• Boyle's law : PV = constant.
• In Charles' law, it is the pressure that is kept constant. Under this
constraint, the volume is proportional to the temperature.
• Charles' law : V1 / T1 = V2 / T2
• When the volume is kept constant, it is the pressure of the gas that is
proportional to temperature:
• Gay-Lussac's law : P1 / T1 = P2 / T2
The gas laws
Indices 1 and 2
represent point
in time.
10© Cloudera, Inc. All rights reserved.
• We use time dependent variables to
describe the system.
• Relations between the variable are
characteristic for a given system.
• Learning or identifying such relations
means understanding the systems.
• Instead of pressure, volume, and
temperature we use:
• IT-Operations:
• I/O rates
• available RAM
• system utilization
• Financial markets:
• trading volume
• price
• volatility
Recap:
11© Cloudera, Inc. All rights reserved.
Network Analysis on Hadoop:
Process collected
raw data
Analyze results from
previous phases
scalable graph
analysis in
distributed
heterogeneous
environments
+ time evolution
Relations among variables can be expressed as
formulas. (analytical approach)
A data driven approach uses pairwise correlations
and other statistical measures.
Final results are model parameters, which can be
used in analytical models and for forecast.
12© Cloudera, Inc. All rights reserved.
Network Analysis on Hadoop:
Process collected
raw data
Analyze results from
previous phases
scalable graph
analysis in
distributed
heterogeneous
environments
+ time evolution
13© Cloudera, Inc. All rights reserved.
Time Series Analysis on Hadoop:
• Hadoop.TS provides data
containers & operations:
• time series bucket
• time series classes
• transformations
• extractions
• HDGS exposes results as
semantic network,
using a flexible, and generic
format by using RDF
14© Cloudera, Inc. All rights reserved.
Goals of Hadoop.TS:
• Provides abstraction to separate:
• data science from data engineering
• data from algorithms
• results from implementation
• Reuse existing analysis algorithms in data driven applications.
• Build Time Series related Data Products faster.
15© Cloudera, Inc. All rights reserved.
Time Series:
What is it?
16© Cloudera, Inc. All rights reserved.
What is a time series?
• y=f(x) … a function?
• Let x be time t: y=f(t)
• A time series is simply a measure of some thing as a function of time.
17© Cloudera, Inc. All rights reserved.
What is a time series?
• y=f(x) … a function?
• Let x be time t: y=f(t)
• A time series is simply a measure of some thing as a function of time.
What is t?
• Continuous
• Discrete (fixed points in time with constant distance)
• Unknown points in time
18© Cloudera, Inc. All rights reserved.
Typical Approaches for Time Based Analysis
• Events => single event can be compared with an intent
• No history
• Complex Even Processing
• A series of events
• Needs small amount of historical data
• Continuous time series processing
• Equidistant measures
• Needs huge amount of historical data
19© Cloudera, Inc. All rights reserved.
From Complex Events to Time Series
• Univariate:
• A series of events / measurements
• Limited by a time range
• CEP: A known pattern
• TSA: A known property such as:
• average, volatility, or other parameters of the distribution of values
• Multivariate:
• CEP: Co-occurrence of events
• TSA: Correlation measures
20© Cloudera, Inc. All rights reserved.
—Why should I care about time series analysis?
“A time series describes a thing over time.”
Many time series describes many things over time.
21© Cloudera, Inc. All rights reserved.
—Why should I care about time series analysis?
“A time series describes a thing over time.”
Many time series describes many things over time.
Correlation networks are derived from time series.
22© Cloudera, Inc. All rights reserved.
—Why should I care about time series analysis?
“A time series describes a thing over time.”
Many time series describes many things over time.
Correlation networks are derived from time series.
Correlation networks describe systems.
23© Cloudera, Inc. All rights reserved.
Time Series:
Available in multiple flavors ...
24© Cloudera, Inc. All rights reserved.
Typical Time Series
(a,c,e) continuous time (b,d,f) spontaneous events
25© Cloudera, Inc. All rights reserved.
Transformations: TS > ETS > TS
26© Cloudera, Inc. All rights reserved.
Networks for structural analysis
What is similar among nodes?
(a) static properties
(b) dynamic properties
27© Cloudera, Inc. All rights reserved.
Visualization of topological structure.
Figures are based on term-vectors, stored in a Lucene Index.
Inspection of topological system properties:
data quality screening (1)
28© Cloudera, Inc. All rights reserved.
Inspection of static system properties:
data quality screening (1)
• Network nodes are articles (represented as term-vectors).
One term-vector per article:
… stored in a Lucene index.
• Links are given by pairwise distance: cosine-similarity.
• Gephi toolkit provides Force directed layout.
29© Cloudera, Inc. All rights reserved.
Visualization of the context
Comparison of subsystems
Inspection of dynamic system properties:
data quality screening (2)
30© Cloudera, Inc. All rights reserved.
Motivation for Hadoop.TS & HDGS
Overview & Concepts
31© Cloudera, Inc. All rights reserved.
Challenge:
32© Cloudera, Inc. All rights reserved.
Study properties per time series
Uni-Variate Time Series Analysis
33© Cloudera, Inc. All rights reserved.
Distribution of values (PDF) …
Warning: Correlations are
not visible in probability
distribution chart!
34© Cloudera, Inc. All rights reserved.
Impact of Long-Term-Correlations:
• P
PDF
Warning: Correlations
cause non stationarity.
35© Cloudera, Inc. All rights reserved.
Detect Long Term Correlation in Time Series
Detrended Fluctuation Analysis Return Interval Statistics
36© Cloudera, Inc. All rights reserved.
More Time Series Properties:
• Is a time series stationary?
• Peak detection
• Find frequency patterns
Images:
- pixel lines and rows can be handled like time series
Sound files:
- sound analysis and signal analysis are common in
engineering and industry
37© Cloudera, Inc. All rights reserved.
More Time Series Properties:
• Time Series Models:
• Auto-Regressive (AR)
• Moving average (MA)
• Combined: ARMA
• Extended: ARMA+TOPOLOGICAL INFORMATION (work in progress)
How to get this structural information?
>>> see next part: Multivariate TSA
38© Cloudera, Inc. All rights reserved.
Information, derived from time series pairs
Multi-Variate Time Series Analysis
39© Cloudera, Inc. All rights reserved.
https://imgs.xkcd.com/comics/compass_and_straightedge.png
40© Cloudera, Inc. All rights reserved.
But: Multivariate TSA allows you …
to reconstruct networks.
https://imgs.xkcd.com/comics/compass_and_straightedge.png
41© Cloudera, Inc. All rights reserved.
Network Reconstruction
• Content Networks:
• Cosine-Similarity
• Functional Network:
• Cross-Correlation
• Event-Synchronization
• Dependency and Impact:
• Granger Causality
• Mutual Information
Question:
How can I identify significant links?
Modifications and variation lead to
better results in special use cases.
INTRA CORRELATION
INTRA CORRELATION
INTER
CORRELATION
42© Cloudera, Inc. All rights reserved.
43© Cloudera, Inc. All rights reserved.
Get Meaning out of Correlation Metrics …
1D vs. 2D approach: Using multiple independent metrics allows separation of disjoint groups of
node pairs (or links) as shown in as area (A) and (B) in b).
b)a)
44© Cloudera, Inc. All rights reserved.
Application of Hadoop.TS:
Results
45© Cloudera, Inc. All rights reserved.
(1) Usage of Online Content
46© Cloudera, Inc. All rights reserved.
Usage of Online Content
Even if distribution of links is stable we see structural changes
47© Cloudera, Inc. All rights reserved.
(2) Understand Financial Markets
48© Cloudera, Inc. All rights reserved.
Interconnected Financial Markets:
We can identify which nodes connect the markets …
49© Cloudera, Inc. All rights reserved.
HDGS: History & Current Status
Data Flow, Prototype & Architecture Overview
50© Cloudera, Inc. All rights reserved.
Hadoop.TS
Historical Approach (2012):
51© Cloudera, Inc. All rights reserved.
Hadoop.TS (2013)
52© Cloudera, Inc. All rights reserved.
• End-2-end applications need multiple
technologies (HBase, Kudu, SOLR,
Spark, Impala)
• Multiple algorithms are combined
(Cross-correlation, Rank-correlation,
Wavelet analysis, Frequency analysis,
Poisson- or Hawkes-process)
• Parameters are often unknown
Modern Time Series Analysis:
53© Cloudera, Inc. All rights reserved.
Enhanced Time Series Representations
54© Cloudera, Inc. All rights reserved.
TSA on Apache Spark
Time Series Analysis: using spark shell or applications (TSA-workbench)
Hadoop.TS provides domain specific functions.
Etosha exposes metadata and dataset properties as „linked data“ using RDF.
Hadoop.TS
Etosha
55© Cloudera, Inc. All rights reserved.
HDGS: Outlook
... towards an econo-diagnostics toolbox
56© Cloudera, Inc. All rights reserved.
Hadoop Distributed Graph Space (HDGS)
• Reconstruction of networks
• Profiling of networks
• Support for:
• Multi-layer networks
• Time-dependent multi-layer
networks
57© Cloudera, Inc. All rights reserved.
58© Cloudera, Inc. All rights reserved.
An Oscilloscope for Business Data on Hadoop …
59© Cloudera, Inc. All rights reserved.
Replace by screen shots ...
60© Cloudera, Inc. All rights reserved.
Enjoy your time ...
Enjoy your data …
Thank you !
61© Cloudera, Inc. All rights reserved.
Practical Tips
62© Cloudera, Inc. All rights reserved.
Collecting Sensor Data with Spark Streaming …
• Spark Streaming works on fixed time slices only.
• Use the original time stamp?
• Requires additional storage and bandwidth
• Original system clock defines resolution
• Use „Spark-Time“ or a local time reference:
• You may lose information!
• You have a limited resolution, defined by batch size.
63© Cloudera, Inc. All rights reserved.
Data Management
• Think about typical access patterns:
• random access to each event, record or field?
• access to entire groups of records?
• variable size or fixed size sets?
• In general, prepare for „full table scan“
• OPTIMIZE FOR YOUR DOMINANT ACCESS PATTERN!
• Select efficient storage formats: Avro, Parquet
• Index your data in SOLR for random access and data exploration
• Indexing can be done by just a few clicks in HUE …
64© Cloudera, Inc. All rights reserved.
Visualization of
Large Correlation Networks
• How to manage metadata for time dependent
multi-layer networks?
• Mediawiki or Fuseki/Jena are available
• Gephi-Hadoop-Connector provides access
to raw data:
• using SQL queries on Impala
• using SOLR queries
65© Cloudera, Inc. All rights reserved.
Gephi-Hadoop-Connector in Action …
66© Cloudera, Inc. All rights reserved.
Metadata for Multi-Layer Networks

Weitere ähnliche Inhalte

Was ist angesagt?

Enterprise Metadata Integration
Enterprise Metadata IntegrationEnterprise Metadata Integration
Enterprise Metadata IntegrationDr. Mirko Kämpf
 
Scaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark EcosystemScaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark EcosystemDatabricks
 
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringSuccessful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringDatabricks
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Data Con LA
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalCaserta
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentHostedbyConfluent
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data SolutionsGuido Schmutz
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
 
Snaplogic Live: Big Data in Motion
Snaplogic Live: Big Data in MotionSnaplogic Live: Big Data in Motion
Snaplogic Live: Big Data in MotionSnapLogic
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Building Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks DeltaBuilding Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks DeltaDatabricks
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...SnapLogic
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics Franco Ucci
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsDataWorks Summit
 
TechEvent Building a Data Lake
TechEvent Building a Data LakeTechEvent Building a Data Lake
TechEvent Building a Data LakeTrivadis
 
Witsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingWitsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingMark Kerzner
 

Was ist angesagt? (20)

Enterprise Metadata Integration
Enterprise Metadata IntegrationEnterprise Metadata Integration
Enterprise Metadata Integration
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
 
Scaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark EcosystemScaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark Ecosystem
 
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringSuccessful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data Engineering
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobal
 
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with RBuilding a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Snaplogic Live: Big Data in Motion
Snaplogic Live: Big Data in MotionSnaplogic Live: Big Data in Motion
Snaplogic Live: Big Data in Motion
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Building Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks DeltaBuilding Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks Delta
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 
TechEvent Building a Data Lake
TechEvent Building a Data LakeTechEvent Building a Data Lake
TechEvent Building a Data Lake
 
Witsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingWitsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streaming
 

Andere mochten auch

M_Wolfe-LIBR501TopicBriefing
M_Wolfe-LIBR501TopicBriefingM_Wolfe-LIBR501TopicBriefing
M_Wolfe-LIBR501TopicBriefingMyles Wolfe
 
The future of Productivity - SharePoint 2010
The future of Productivity - SharePoint 2010The future of Productivity - SharePoint 2010
The future of Productivity - SharePoint 2010Marwan Tarek
 
National data archive on child abuse and neglect
National data archive on child abuse and neglectNational data archive on child abuse and neglect
National data archive on child abuse and neglectLexi Cathcart
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Sparkelephantscale
 
Presentation on W B Yeats
Presentation on W B YeatsPresentation on W B Yeats
Presentation on W B YeatsMonir Hossen
 
Eduroam seminar - Networkshop44 2016
Eduroam seminar - Networkshop44 2016Eduroam seminar - Networkshop44 2016
Eduroam seminar - Networkshop44 2016Jisc
 
Using sdn to secure the campus - Networkshop44
Using sdn to secure the campus - Networkshop44Using sdn to secure the campus - Networkshop44
Using sdn to secure the campus - Networkshop44Jisc
 
Eduroam workshop nic mitev loughborough uni - networkshop44
Eduroam workshop nic mitev loughborough uni - networkshop44Eduroam workshop nic mitev loughborough uni - networkshop44
Eduroam workshop nic mitev loughborough uni - networkshop44Jisc
 
Next gen insight networkshop44
Next gen insight   networkshop44Next gen insight   networkshop44
Next gen insight networkshop44Jisc
 
これからの時代に! パソコン離れの中のパソコン選び
これからの時代に! パソコン離れの中のパソコン選びこれからの時代に! パソコン離れの中のパソコン選び
これからの時代に! パソコン離れの中のパソコン選びteapipin
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache SparkQAware GmbH
 

Andere mochten auch (17)

M_Wolfe-LIBR501TopicBriefing
M_Wolfe-LIBR501TopicBriefingM_Wolfe-LIBR501TopicBriefing
M_Wolfe-LIBR501TopicBriefing
 
Roe v wade
Roe v wadeRoe v wade
Roe v wade
 
The future of Productivity - SharePoint 2010
The future of Productivity - SharePoint 2010The future of Productivity - SharePoint 2010
The future of Productivity - SharePoint 2010
 
Cesar Camargo Mariano
Cesar Camargo MarianoCesar Camargo Mariano
Cesar Camargo Mariano
 
National data archive on child abuse and neglect
National data archive on child abuse and neglectNational data archive on child abuse and neglect
National data archive on child abuse and neglect
 
Регламент на 24.04.2016 5-го этапа МГЛК сезона 2015-2016
Регламент на 24.04.2016 5-го этапа МГЛК сезона 2015-2016Регламент на 24.04.2016 5-го этапа МГЛК сезона 2015-2016
Регламент на 24.04.2016 5-го этапа МГЛК сезона 2015-2016
 
Library Skill
Library SkillLibrary Skill
Library Skill
 
Figures of Speech for Kids
Figures of Speech for KidsFigures of Speech for Kids
Figures of Speech for Kids
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
 
Presentation on W B Yeats
Presentation on W B YeatsPresentation on W B Yeats
Presentation on W B Yeats
 
Eduroam seminar - Networkshop44 2016
Eduroam seminar - Networkshop44 2016Eduroam seminar - Networkshop44 2016
Eduroam seminar - Networkshop44 2016
 
Using sdn to secure the campus - Networkshop44
Using sdn to secure the campus - Networkshop44Using sdn to secure the campus - Networkshop44
Using sdn to secure the campus - Networkshop44
 
Eduroam workshop nic mitev loughborough uni - networkshop44
Eduroam workshop nic mitev loughborough uni - networkshop44Eduroam workshop nic mitev loughborough uni - networkshop44
Eduroam workshop nic mitev loughborough uni - networkshop44
 
Next gen insight networkshop44
Next gen insight   networkshop44Next gen insight   networkshop44
Next gen insight networkshop44
 
Hadoop to spark_v2
Hadoop to spark_v2Hadoop to spark_v2
Hadoop to spark_v2
 
これからの時代に! パソコン離れの中のパソコン選び
これからの時代に! パソコン離れの中のパソコン選びこれからの時代に! パソコン離れの中のパソコン選び
これからの時代に! パソコン離れの中のパソコン選び
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
 

Ähnlich wie From Events to Networks: Time Series Analysis on Scale

Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...Data Con LA
 
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...InfluxData
 
PCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System TuningPCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System TuningDr. Mirko Kämpf
 
Intro to Time Series
Intro to Time Series Intro to Time Series
Intro to Time Series InfluxData
 
Why You Should NOT Be Using an RDBMS for Time-stamped Data
Why You Should NOT Be Using an RDBMS for Time-stamped DataWhy You Should NOT Be Using an RDBMS for Time-stamped Data
Why You Should NOT Be Using an RDBMS for Time-stamped DataDevOps.com
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
Why You Should NOT Be Using an RDBS for Time-stamped Data
 Why You Should NOT Be Using an RDBS for Time-stamped Data Why You Should NOT Be Using an RDBS for Time-stamped Data
Why You Should NOT Be Using an RDBS for Time-stamped DataDevOps.com
 
DDS Tutorial -- Part I
DDS Tutorial -- Part IDDS Tutorial -- Part I
DDS Tutorial -- Part IAngelo Corsaro
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Big Data Spain
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming PlatformTime Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platformconfluent
 
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfDagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfHong Ong
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Cloudera, Inc.
 
Building Reactive Applications with DDS
Building Reactive Applications with DDSBuilding Reactive Applications with DDS
Building Reactive Applications with DDSAngelo Corsaro
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaNeo4j
 
Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...
Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...
Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...Project Controls Expo
 
Best Practices: How to Analyze IoT Sensor Data with InfluxDB
Best Practices: How to Analyze IoT Sensor Data with InfluxDBBest Practices: How to Analyze IoT Sensor Data with InfluxDB
Best Practices: How to Analyze IoT Sensor Data with InfluxDBInfluxData
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningCloudLightning
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at ScaleDataWorks Summit
 
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...J On The Beach
 
mysql_pn_heatwave.pdf
mysql_pn_heatwave.pdfmysql_pn_heatwave.pdf
mysql_pn_heatwave.pdfRavishPatel19
 

Ähnlich wie From Events to Networks: Time Series Analysis on Scale (20)

Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
 
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
 
PCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System TuningPCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System Tuning
 
Intro to Time Series
Intro to Time Series Intro to Time Series
Intro to Time Series
 
Why You Should NOT Be Using an RDBMS for Time-stamped Data
Why You Should NOT Be Using an RDBMS for Time-stamped DataWhy You Should NOT Be Using an RDBMS for Time-stamped Data
Why You Should NOT Be Using an RDBMS for Time-stamped Data
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Why You Should NOT Be Using an RDBS for Time-stamped Data
 Why You Should NOT Be Using an RDBS for Time-stamped Data Why You Should NOT Be Using an RDBS for Time-stamped Data
Why You Should NOT Be Using an RDBS for Time-stamped Data
 
DDS Tutorial -- Part I
DDS Tutorial -- Part IDDS Tutorial -- Part I
DDS Tutorial -- Part I
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming PlatformTime Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platform
 
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfDagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
 
Building Reactive Applications with DDS
Building Reactive Applications with DDSBuilding Reactive Applications with DDS
Building Reactive Applications with DDS
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
 
Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...
Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...
Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...
 
Best Practices: How to Analyze IoT Sensor Data with InfluxDB
Best Practices: How to Analyze IoT Sensor Data with InfluxDBBest Practices: How to Analyze IoT Sensor Data with InfluxDB
Best Practices: How to Analyze IoT Sensor Data with InfluxDB
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightning
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at Scale
 
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
 
mysql_pn_heatwave.pdf
mysql_pn_heatwave.pdfmysql_pn_heatwave.pdf
mysql_pn_heatwave.pdf
 

Mehr von Dr. Mirko Kämpf

Improving computer vision models at scale (Strata Data NYC)
Improving computer vision models at scale  (Strata Data NYC)Improving computer vision models at scale  (Strata Data NYC)
Improving computer vision models at scale (Strata Data NYC)Dr. Mirko Kämpf
 
Improving computer vision models at scale presentation
Improving computer vision models at scale presentationImproving computer vision models at scale presentation
Improving computer vision models at scale presentationDr. Mirko Kämpf
 
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapEtosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapDr. Mirko Kämpf
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsDr. Mirko Kämpf
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsDr. Mirko Kämpf
 
DPG Berlin - SOE 18 - talk v1.2.4
DPG Berlin - SOE 18 - talk v1.2.4DPG Berlin - SOE 18 - talk v1.2.4
DPG Berlin - SOE 18 - talk v1.2.4Dr. Mirko Kämpf
 
Information Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation OptimizationInformation Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation OptimizationDr. Mirko Kämpf
 
Hadoop & Complex Systems Research
Hadoop & Complex Systems ResearchHadoop & Complex Systems Research
Hadoop & Complex Systems ResearchDr. Mirko Kämpf
 
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"Dr. Mirko Kämpf
 

Mehr von Dr. Mirko Kämpf (9)

Improving computer vision models at scale (Strata Data NYC)
Improving computer vision models at scale  (Strata Data NYC)Improving computer vision models at scale  (Strata Data NYC)
Improving computer vision models at scale (Strata Data NYC)
 
Improving computer vision models at scale presentation
Improving computer vision models at scale presentationImproving computer vision models at scale presentation
Improving computer vision models at scale presentation
 
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapEtosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road map
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific Applications
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
DPG Berlin - SOE 18 - talk v1.2.4
DPG Berlin - SOE 18 - talk v1.2.4DPG Berlin - SOE 18 - talk v1.2.4
DPG Berlin - SOE 18 - talk v1.2.4
 
Information Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation OptimizationInformation Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation Optimization
 
Hadoop & Complex Systems Research
Hadoop & Complex Systems ResearchHadoop & Complex Systems Research
Hadoop & Complex Systems Research
 
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
 

Kürzlich hochgeladen

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 

Kürzlich hochgeladen (20)

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 

From Events to Networks: Time Series Analysis on Scale

  • 1. 1© Cloudera, Inc. All rights reserved. Mirko Kämpf | Solutions Architect mirko@cloudera.com From Events to Networks: Apply Time Series Analysis at Scale.
  • 2. 2© Cloudera, Inc. All rights reserved. Who is speaking? • Mirko Kämpf • Solutions Architect, EMEA • Data Analysis Projects: • Econodiagnostics: Relation between Social Media & Economy • Analysis of network growth processes • Github: kamir • gephi-hadoop-connector: store networks in Hadoop and plot layouts in Gephi • fuseki-cloud: scale out the RDF meta(data)store • Hadoop.TS3: simplify complex time series analysis processes
  • 3. 3© Cloudera, Inc. All rights reserved. Recap: The Data Science Process (DSP) Time Series: What, Why, How? What are Similarity Graphs? Applications of TSA Hadoop.TS and HDGS HDGS: History & High Level Architecture Outlook Agenda
  • 4. 4© Cloudera, Inc. All rights reserved. Time Series Analysis on Hadoop: • Data Driven Business: • Domain Knowledge, Science, Math Data Engineering • Efficient Operations • Security Intuition Algorithms Interpretation ETL, Workflows Application
  • 5. 5© Cloudera, Inc. All rights reserved. Where are the time series? Image from: http://semanticommunity.info/Data_Science/Doing_Data_Science
  • 6. 6© Cloudera, Inc. All rights reserved. Where are the time series? Image from: http://semanticommunity.info/Data_Science/Doing_Data_Science
  • 7. 7© Cloudera, Inc. All rights reserved. Network Analysis on Hadoop: What is it? Process collected raw data scalable graph analysis in distributed heterogeneous environments + time evolution Multiple data sets of any kind … Obviuos and hidden relations between variables. > Structure is not accessible in many cases.
  • 8. 8© Cloudera, Inc. All rights reserved. • The ideal gas law, relates the pressure, volume, and temperature of an ideal gas a compact equation. History of gas laws: Three names in particular are associated with gas laws. (1) Robert Boyle (1627 - 1691), (2) Jacques Charles (1746 - 1823), and (3) J.L. Gay-Lussac (1778 - 1850). From our experience: The gas laws
  • 9. 9© Cloudera, Inc. All rights reserved. • Boyle showed that for a fixed amount of gas at constant temperature, the pressure and volume are inversely proportional to one another. • Boyle's law : PV = constant. • In Charles' law, it is the pressure that is kept constant. Under this constraint, the volume is proportional to the temperature. • Charles' law : V1 / T1 = V2 / T2 • When the volume is kept constant, it is the pressure of the gas that is proportional to temperature: • Gay-Lussac's law : P1 / T1 = P2 / T2 The gas laws Indices 1 and 2 represent point in time.
  • 10. 10© Cloudera, Inc. All rights reserved. • We use time dependent variables to describe the system. • Relations between the variable are characteristic for a given system. • Learning or identifying such relations means understanding the systems. • Instead of pressure, volume, and temperature we use: • IT-Operations: • I/O rates • available RAM • system utilization • Financial markets: • trading volume • price • volatility Recap:
  • 11. 11© Cloudera, Inc. All rights reserved. Network Analysis on Hadoop: Process collected raw data Analyze results from previous phases scalable graph analysis in distributed heterogeneous environments + time evolution Relations among variables can be expressed as formulas. (analytical approach) A data driven approach uses pairwise correlations and other statistical measures. Final results are model parameters, which can be used in analytical models and for forecast.
  • 12. 12© Cloudera, Inc. All rights reserved. Network Analysis on Hadoop: Process collected raw data Analyze results from previous phases scalable graph analysis in distributed heterogeneous environments + time evolution
  • 13. 13© Cloudera, Inc. All rights reserved. Time Series Analysis on Hadoop: • Hadoop.TS provides data containers & operations: • time series bucket • time series classes • transformations • extractions • HDGS exposes results as semantic network, using a flexible, and generic format by using RDF
  • 14. 14© Cloudera, Inc. All rights reserved. Goals of Hadoop.TS: • Provides abstraction to separate: • data science from data engineering • data from algorithms • results from implementation • Reuse existing analysis algorithms in data driven applications. • Build Time Series related Data Products faster.
  • 15. 15© Cloudera, Inc. All rights reserved. Time Series: What is it?
  • 16. 16© Cloudera, Inc. All rights reserved. What is a time series? • y=f(x) … a function? • Let x be time t: y=f(t) • A time series is simply a measure of some thing as a function of time.
  • 17. 17© Cloudera, Inc. All rights reserved. What is a time series? • y=f(x) … a function? • Let x be time t: y=f(t) • A time series is simply a measure of some thing as a function of time. What is t? • Continuous • Discrete (fixed points in time with constant distance) • Unknown points in time
  • 18. 18© Cloudera, Inc. All rights reserved. Typical Approaches for Time Based Analysis • Events => single event can be compared with an intent • No history • Complex Even Processing • A series of events • Needs small amount of historical data • Continuous time series processing • Equidistant measures • Needs huge amount of historical data
  • 19. 19© Cloudera, Inc. All rights reserved. From Complex Events to Time Series • Univariate: • A series of events / measurements • Limited by a time range • CEP: A known pattern • TSA: A known property such as: • average, volatility, or other parameters of the distribution of values • Multivariate: • CEP: Co-occurrence of events • TSA: Correlation measures
  • 20. 20© Cloudera, Inc. All rights reserved. —Why should I care about time series analysis? “A time series describes a thing over time.” Many time series describes many things over time.
  • 21. 21© Cloudera, Inc. All rights reserved. —Why should I care about time series analysis? “A time series describes a thing over time.” Many time series describes many things over time. Correlation networks are derived from time series.
  • 22. 22© Cloudera, Inc. All rights reserved. —Why should I care about time series analysis? “A time series describes a thing over time.” Many time series describes many things over time. Correlation networks are derived from time series. Correlation networks describe systems.
  • 23. 23© Cloudera, Inc. All rights reserved. Time Series: Available in multiple flavors ...
  • 24. 24© Cloudera, Inc. All rights reserved. Typical Time Series (a,c,e) continuous time (b,d,f) spontaneous events
  • 25. 25© Cloudera, Inc. All rights reserved. Transformations: TS > ETS > TS
  • 26. 26© Cloudera, Inc. All rights reserved. Networks for structural analysis What is similar among nodes? (a) static properties (b) dynamic properties
  • 27. 27© Cloudera, Inc. All rights reserved. Visualization of topological structure. Figures are based on term-vectors, stored in a Lucene Index. Inspection of topological system properties: data quality screening (1)
  • 28. 28© Cloudera, Inc. All rights reserved. Inspection of static system properties: data quality screening (1) • Network nodes are articles (represented as term-vectors). One term-vector per article: … stored in a Lucene index. • Links are given by pairwise distance: cosine-similarity. • Gephi toolkit provides Force directed layout.
  • 29. 29© Cloudera, Inc. All rights reserved. Visualization of the context Comparison of subsystems Inspection of dynamic system properties: data quality screening (2)
  • 30. 30© Cloudera, Inc. All rights reserved. Motivation for Hadoop.TS & HDGS Overview & Concepts
  • 31. 31© Cloudera, Inc. All rights reserved. Challenge:
  • 32. 32© Cloudera, Inc. All rights reserved. Study properties per time series Uni-Variate Time Series Analysis
  • 33. 33© Cloudera, Inc. All rights reserved. Distribution of values (PDF) … Warning: Correlations are not visible in probability distribution chart!
  • 34. 34© Cloudera, Inc. All rights reserved. Impact of Long-Term-Correlations: • P PDF Warning: Correlations cause non stationarity.
  • 35. 35© Cloudera, Inc. All rights reserved. Detect Long Term Correlation in Time Series Detrended Fluctuation Analysis Return Interval Statistics
  • 36. 36© Cloudera, Inc. All rights reserved. More Time Series Properties: • Is a time series stationary? • Peak detection • Find frequency patterns Images: - pixel lines and rows can be handled like time series Sound files: - sound analysis and signal analysis are common in engineering and industry
  • 37. 37© Cloudera, Inc. All rights reserved. More Time Series Properties: • Time Series Models: • Auto-Regressive (AR) • Moving average (MA) • Combined: ARMA • Extended: ARMA+TOPOLOGICAL INFORMATION (work in progress) How to get this structural information? >>> see next part: Multivariate TSA
  • 38. 38© Cloudera, Inc. All rights reserved. Information, derived from time series pairs Multi-Variate Time Series Analysis
  • 39. 39© Cloudera, Inc. All rights reserved. https://imgs.xkcd.com/comics/compass_and_straightedge.png
  • 40. 40© Cloudera, Inc. All rights reserved. But: Multivariate TSA allows you … to reconstruct networks. https://imgs.xkcd.com/comics/compass_and_straightedge.png
  • 41. 41© Cloudera, Inc. All rights reserved. Network Reconstruction • Content Networks: • Cosine-Similarity • Functional Network: • Cross-Correlation • Event-Synchronization • Dependency and Impact: • Granger Causality • Mutual Information Question: How can I identify significant links? Modifications and variation lead to better results in special use cases. INTRA CORRELATION INTRA CORRELATION INTER CORRELATION
  • 42. 42© Cloudera, Inc. All rights reserved.
  • 43. 43© Cloudera, Inc. All rights reserved. Get Meaning out of Correlation Metrics … 1D vs. 2D approach: Using multiple independent metrics allows separation of disjoint groups of node pairs (or links) as shown in as area (A) and (B) in b). b)a)
  • 44. 44© Cloudera, Inc. All rights reserved. Application of Hadoop.TS: Results
  • 45. 45© Cloudera, Inc. All rights reserved. (1) Usage of Online Content
  • 46. 46© Cloudera, Inc. All rights reserved. Usage of Online Content Even if distribution of links is stable we see structural changes
  • 47. 47© Cloudera, Inc. All rights reserved. (2) Understand Financial Markets
  • 48. 48© Cloudera, Inc. All rights reserved. Interconnected Financial Markets: We can identify which nodes connect the markets …
  • 49. 49© Cloudera, Inc. All rights reserved. HDGS: History & Current Status Data Flow, Prototype & Architecture Overview
  • 50. 50© Cloudera, Inc. All rights reserved. Hadoop.TS Historical Approach (2012):
  • 51. 51© Cloudera, Inc. All rights reserved. Hadoop.TS (2013)
  • 52. 52© Cloudera, Inc. All rights reserved. • End-2-end applications need multiple technologies (HBase, Kudu, SOLR, Spark, Impala) • Multiple algorithms are combined (Cross-correlation, Rank-correlation, Wavelet analysis, Frequency analysis, Poisson- or Hawkes-process) • Parameters are often unknown Modern Time Series Analysis:
  • 53. 53© Cloudera, Inc. All rights reserved. Enhanced Time Series Representations
  • 54. 54© Cloudera, Inc. All rights reserved. TSA on Apache Spark Time Series Analysis: using spark shell or applications (TSA-workbench) Hadoop.TS provides domain specific functions. Etosha exposes metadata and dataset properties as „linked data“ using RDF. Hadoop.TS Etosha
  • 55. 55© Cloudera, Inc. All rights reserved. HDGS: Outlook ... towards an econo-diagnostics toolbox
  • 56. 56© Cloudera, Inc. All rights reserved. Hadoop Distributed Graph Space (HDGS) • Reconstruction of networks • Profiling of networks • Support for: • Multi-layer networks • Time-dependent multi-layer networks
  • 57. 57© Cloudera, Inc. All rights reserved.
  • 58. 58© Cloudera, Inc. All rights reserved. An Oscilloscope for Business Data on Hadoop …
  • 59. 59© Cloudera, Inc. All rights reserved. Replace by screen shots ...
  • 60. 60© Cloudera, Inc. All rights reserved. Enjoy your time ... Enjoy your data … Thank you !
  • 61. 61© Cloudera, Inc. All rights reserved. Practical Tips
  • 62. 62© Cloudera, Inc. All rights reserved. Collecting Sensor Data with Spark Streaming … • Spark Streaming works on fixed time slices only. • Use the original time stamp? • Requires additional storage and bandwidth • Original system clock defines resolution • Use „Spark-Time“ or a local time reference: • You may lose information! • You have a limited resolution, defined by batch size.
  • 63. 63© Cloudera, Inc. All rights reserved. Data Management • Think about typical access patterns: • random access to each event, record or field? • access to entire groups of records? • variable size or fixed size sets? • In general, prepare for „full table scan“ • OPTIMIZE FOR YOUR DOMINANT ACCESS PATTERN! • Select efficient storage formats: Avro, Parquet • Index your data in SOLR for random access and data exploration • Indexing can be done by just a few clicks in HUE …
  • 64. 64© Cloudera, Inc. All rights reserved. Visualization of Large Correlation Networks • How to manage metadata for time dependent multi-layer networks? • Mediawiki or Fuseki/Jena are available • Gephi-Hadoop-Connector provides access to raw data: • using SQL queries on Impala • using SOLR queries
  • 65. 65© Cloudera, Inc. All rights reserved. Gephi-Hadoop-Connector in Action …
  • 66. 66© Cloudera, Inc. All rights reserved. Metadata for Multi-Layer Networks

Hinweis der Redaktion

  1. All starts with a question / problem ? How has …. Changed (descriptive) ? What will happen if .... Changes ? (impact) How will .... evolve? (forecast) Domain Knowledge and ituition help us to get a starting point TSA: offers multiple specialities, one has to select the right incredients
  2. Source for info is: Measured data from …. http://images.google.de/imgres?imgurl=http%3A%2F%2F3.bp.blogspot.com%2F-tEkIR2kcyCY%2FVEcQJGrqb3I%2FAAAAAAAAABU%2F9Nj4hxeuqa0%2Fs1600%2FTHAI1.jpg&imgrefurl=http%3A%2F%2Fkonwersatorium1-ms-pjwstk.blogspot.com%2F2014%2F10%2Fthe-human-artificial-intelligence_22.html&h=958&w=965&tbnid=WscyQ01kH-s7CM%3A&docid=sGVehcJYs2-e1M&ei=gy6aV4zmJMX1UqSwsYAO&tbm=isch&iact=rc&uact=3&dur=774&page=1&start=0&ndsp=36&ved=0ahUKEwjMs_6BxpbOAhXFuhQKHSRYDOAQMwhEKAowCg&bih=1058&biw=1804 https://openclipart.org/download/242296/remix-fossasia-2016-contest4.svg
  3. Results tell us about very specific properties of the system: Lets look into a thermodynamics: http://images.google.de/imgres?imgurl=http%3A%2F%2F3.bp.blogspot.com%2F-tEkIR2kcyCY%2FVEcQJGrqb3I%2FAAAAAAAAABU%2F9Nj4hxeuqa0%2Fs1600%2FTHAI1.jpg&imgrefurl=http%3A%2F%2Fkonwersatorium1-ms-pjwstk.blogspot.com%2F2014%2F10%2Fthe-human-artificial-intelligence_22.html&h=958&w=965&tbnid=WscyQ01kH-s7CM%3A&docid=sGVehcJYs2-e1M&ei=gy6aV4zmJMX1UqSwsYAO&tbm=isch&iact=rc&uact=3&dur=774&page=1&start=0&ndsp=36&ved=0ahUKEwjMs_6BxpbOAhXFuhQKHSRYDOAQMwhEKAowCg&bih=1058&biw=1804 https://openclipart.org/download/242296/remix-fossasia-2016-contest4.svg
  4. Results tell us about very specific properties of the system: Lets look into a thermodynamics: http://images.google.de/imgres?imgurl=http%3A%2F%2F3.bp.blogspot.com%2F-tEkIR2kcyCY%2FVEcQJGrqb3I%2FAAAAAAAAABU%2F9Nj4hxeuqa0%2Fs1600%2FTHAI1.jpg&imgrefurl=http%3A%2F%2Fkonwersatorium1-ms-pjwstk.blogspot.com%2F2014%2F10%2Fthe-human-artificial-intelligence_22.html&h=958&w=965&tbnid=WscyQ01kH-s7CM%3A&docid=sGVehcJYs2-e1M&ei=gy6aV4zmJMX1UqSwsYAO&tbm=isch&iact=rc&uact=3&dur=774&page=1&start=0&ndsp=36&ved=0ahUKEwjMs_6BxpbOAhXFuhQKHSRYDOAQMwhEKAowCg&bih=1058&biw=1804 https://openclipart.org/download/242296/remix-fossasia-2016-contest4.svg
  5. There are some open questions … (see yellow bubble)
  6. The ARIMA model can be viewed as a "cascade" of two models. The first is non-stationary: {\displaystyle Y_{t}=\left(1-L\right)^{d}X_{t}}while the second is wide-sense stationary: {\displaystyle \left(1-\sum _{i=1}^{p}\phi _{i}L^{i}\right)Y_{t}=\left(1+\sum _{i=1}^{q}\theta _{i}L^{i}\right)\varepsilon _{t}\,.}Now forecasts can be made for the process {\displaystyle Y_{t}}, using a generalization of the method of autoregressive forecasting.
  7. The ARIMA model can be viewed as a "cascade" of two models. The first is non-stationary: {\displaystyle Y_{t}=\left(1-L\right)^{d}X_{t}}while the second is wide-sense stationary: {\displaystyle \left(1-\sum _{i=1}^{p}\phi _{i}L^{i}\right)Y_{t}=\left(1+\sum _{i=1}^{q}\theta _{i}L^{i}\right)\varepsilon _{t}\,.}Now forecasts can be made for the process {\displaystyle Y_{t}}, using a generalization of the method of autoregressive forecasting.