SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
SPARK SUMMIT
EUROPE2016
Distributed Computing with Spark
for Actionable Business Insights!
Stephan Kessler
SAP SE, Spark Developer
Who I am
• Stephan Kessler
• SAP HANA Vora Team, Walldorf,
Germany
• Integrating SAP engines into Apache Spark
since almost two years
• 2nd Spark Summit as a speaker
Today’s talk
On average, between
60% and 73% of all data
within an enterprise goes
unused for business
intelligence (BI) and
analytics.
Skills gaps continue to be a
major adoption inhibitor for
57% of respondents, while
deciding how to get value
from Hadoop was cited by
49% of respondents.
The Forrester Wave™:Big Data Hadoop Distributions,Q1
2016,” January 19,2016 by Mike Gualtieri and Noel
Yuhanna
Gartner:“Survey Analysis:Hadoop AdoptionDrivers
and Challenges,” May 12,2015 by Nick Heudecker
and Merv Adrian
Current System Landscape
SAP
SAP HANA
Platform
Other Apps
What is missing?
• Business application perspective:
• Access to Big Data Landscape in a standardized way
• Similar SQL expressiveness
• Big Data / Data Science perspective:
• Access to specialized engines to perform analysis
close to the data
• Integration of ‘business engines’ into Spark
SAP Hana Vora
SAP
SAP HANA
Platform
Other Apps
SAP HANA
Vora
SAP Hana Vora – 10k ft POV
Data Science, Predictive, Business Intelligence, Visualization Apps
SAP HANA Vora
Data Modeler
OLAP Time Series Graph Doc Store
Disk-to-Memory
Accelerator
Distributed Transaction Log
ERP Systems
SAP HANA
Platform
Goals
• Make data and functionality available in
enterprise applications as well as Spark
applications
• Allow an easy consumption, i.e., allow users to
write SQL for computation jobs
Agenda
• Business functionality integration in Spark
• Utilizing different data sources in Spark
• HANA Vora 1.3
• Summary & Outlook
Focus on Spark User POV
Data Science, Predictive, Business Intelligence, Visualization Apps
SAP HANA Vora
Data Modeler
OLAP Time Series Graph Doc Store
Disk-to-Memory
Accelerator
Distributed Transaction Log
ERP Systems
SAP HANA
Platform
Business Functionality in Spark
• Questions answered in Spark:
– What is the sentiment of users for product XYZ?
– When will a certain part of this machinery fail?
• You might also want to know…
– What is the sales volume for XYZ?
– How much did this part cost?
Example: SAP Predictive Analytics
and SAP HANA Vora
1. Business Analyst friendly – No coding or
in-depth Big Data expertise required
2. Support for end-to-end operationalisation
of predictive models on Hadoop
– Data preparation of Analytical Dataset for
modelling
– Native Spark Modelling – Ultra wide
datasets
– Scoring using In Database Apply or Spark
Stream API
• Usage of different sources (Vora, Hana,
Spark, …)
http://go.sap.com/solution/platform-
technology/analytics/predictive-analytics.html
HDFS
(Hadoop Distributed File System)
Hive/Spark
SQL
SAP
HANA
Vora
Model Lifecycle Manager (Factory)
Scorer
Predictive Analytics Data Manager
In-DB
scoring
Analytics
Dataset
Definition
Layer
Advanced
Analytics
Execution
Layer Spark
Streaming
Modeler -
Training
Native
Spark
SAP
HANA
Business Functionality in Spark
• Important typical ERP function
– Currency Conversion (i.e., EUR à GBP)
– Done via SQL UDF
• Required to analyze enterprise data in Spark
Currency Conversion
• Single currency makes transactions comparable
• Conversion not trivial: rates change over time
TID USERID CURRENCY AMOUNT ORDERDATE
100 User1 USD 120.10 2014-12-15
101 User1 USD 24.99 2015-01-01
102 User5 EUR 24.11 2015-01-02
103 User3 DBP 542.00 2015-01-02
Currency Conversion
• Introducing an UDF
implemented in Spark
• Converting everything in USD
CC( AMOUNT Double,
SOURCE_CURRENCY String,
TARGET_CURRENCY String,
REF_DATE String )
SELECT TID, USERID, ORDERDATE,
CC( AMOUNT, CURRENCY, "USD", ORDERDATE )
FROM ORDERS
Currency Conversion
• Conversion backed by a ‘rates’ table
• Calculation simple, maintenance difficult
• Rates maintained in ERP system
– Couldn’t we use that?
SOURCE_CUR TARGET_CUR REF_DATE RATE
EUR USD 2015-01-01 1.32113
EURO USD 2015-01-02 1.30121
USD GBP 2015-01-01 0.68960
Specialized Engine: Time Series
• HANA Vora Time Series Engine in a nutshell:
– Effective Model-Based Compression
– Multi-representation storage for time series
• Optimized usage for IOT applications
– Fast injections paths
– Long running processes in a cluster
Time Series
Specialized Engine: Time Series
• Query Language: SQL Dialect
• How to use that in Spark?
-30
-25
-20
-15
-10
-5
0
5
Temperature °C
Halifax Waterloo
SELECT STDDEV(val1)
FROM SERIES ts
BETWEEN "2000-01-01", "2001-12-31”
SELECT TREND(val1) OVER (SERIES)
FROM SERIES ts
BETWEEN "2000-01-01", "2001-12-31”
Agenda
• Business functionality integration in Spark
• Utilizing different data sources in Spark
• HANA Vora 1.3
• Summary & Outlook
What we have seen so far
• Currency Conversion:
– Implementation in Spark SQL
– Computation could be deferred
• Time Series Engine:
– Special query language
– .. but no implementation in Spark
à “The pushdown
of everything”
à Raw SQL
The Pushdown of Everything
• Spark datasource API
• Limited to Filters and Projects
Spark SQL
Spark Core Engine
Data Sources
MLlib Streaming …
HANA
HANA VORA
Vora Extension to Datasource API
• Datasource indicates its processing capabilities
• Arbitrary parts of logical plan can be computed
where the data is
• Details in Spark Summit Europe Talk 2015
– https://www.youtube.com/watch?v=QNaf2Z8l8IY
– “The Pushdown of Everything”
Vora Extension to Datasource API
• Consider this query
• Pushing down filters an projects: SCAN on orders
• Pushing down arbitrary parts returns:
– One row per orderdate
– Converted currency
SELECT ORDERDATE,
AVG(CC( AMOUNT, CURRENCY, "USD", ORDERDATE ))
FROM ORDERS
GROUP BY ORDERDATE
• Query Syntax on SparkSQL not supported but in the
datasource à Raw SQL
Raw SQL Extension
``select trend(val1) from ts …`` using com.sap.spark.engines
Logical
SelectUsing
Node
get schema for query
Col1: Int, Col2: String
Physical
Node
Execute Query
Fetch results
Time Series
Pushdown & Raw SQL
• Both extensions allow to incorporate other data
sources extensively
• Computation happens where the data is
• Integration is mostly seamless for Spark
developer
• Interfaces are open source:
– https://github.com/SAP/HANAVora-Extensions
Agenda
• Business functionality integration in Spark
• Utilizing different data sources in Spark
• HANA Vora 1.3
• Summary & Outlook
HANA Vora 1.3 - Relational
• Relational Engines in memory and disk
• In-memory
– Query compilation
– Columnar data layout
• Disk based
– Indices for fast data access
SAP HANA Vora
Data Modeler
OLAP Time Series Graph Doc Store
Disk-to-Memory
Accelerator
Distributed
Transaction Log
HANA Vora 1.3 – Graph & Doc Store
• Graph:
– In-memory and distributed
– SQL-Like interface for graph analysis
– Combination of Graph patterns with relational operators
• Doc Store
– Stored semi structured JSON
– Compresses in-memory
representation
– Compiled queries with NUMA
awareness
SAP HANA Vora
Data Modeler
OLAP Time Series Graph Doc Store
Disk-to-Memory
Accelerator
Distributed
Transaction Log
Agenda
• Business functionality integration in Spark
• Utilizing different data sources in Spark
• HANA Vora 1.3
• Summary & Outlook
Summary and Outlook
• Vora allows to combine all data source in an
enterprise environment
– Across different query languages
– While moving computation close to the data
• Business Insights are driven by all the available
data in the enterprise
• Integration into SQL makes it easily consumable
SPARK SUMMIT
EUROPE2016
THANK YOU.
Stephan Kessler – stephan.kessler@sap.com
BACKUP
HANA Vora 1.3
• Project started 2013 by HANA Research Teams
• Shared concepts and libraries with HANA but
independently developed
• Concepts
– In memory
– Distributed engines
– Low memory footprint SAP HANA
Vora
SAP Predictive Analytics: Optimised for
Big Data
üNative (scala code) Spark
approach goes deeper than
SQL
üPerformanceand Scalability
with Ultra wide datasets
üProcessing close to the data
distributed across the cluster
üNo data transfer
CPU/Memory scales dynamically
Native Modeling
Processing on 100’s-1000’s of nodes
No Training Data Transfer
Predictive Analytics
(Server or Desktop)
Limited CPU/Memory

Weitere ähnliche Inhalte

Was ist angesagt?

Spark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit
 
Data Science with Spark & Zeppelin
Data Science with Spark & ZeppelinData Science with Spark & Zeppelin
Data Science with Spark & ZeppelinVinay Shukla
 
Spark Summit EU talk by John Musser
Spark Summit EU talk by John MusserSpark Summit EU talk by John Musser
Spark Summit EU talk by John MusserSpark Summit
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingDatabricks
 
Spark Summit EU talk by Simon Whitear
Spark Summit EU talk by Simon WhitearSpark Summit EU talk by Simon Whitear
Spark Summit EU talk by Simon WhitearSpark Summit
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowDatabricks
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaDatabricks
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksAnyscale
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIDatabricks
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...Spark Summit
 
Operational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache SparkOperational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache SparkDatabricks
 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Databricks
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkBurak Yavuz
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Spark Summit
 

Was ist angesagt? (20)

Spark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub Hava
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
 
Data Science with Spark & Zeppelin
Data Science with Spark & ZeppelinData Science with Spark & Zeppelin
Data Science with Spark & Zeppelin
 
Spark Summit EU talk by John Musser
Spark Summit EU talk by John MusserSpark Summit EU talk by John Musser
Spark Summit EU talk by John Musser
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko Korndorf
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
 
Spark Summit EU talk by Simon Whitear
Spark Summit EU talk by Simon WhitearSpark Summit EU talk by Simon Whitear
Spark Summit EU talk by Simon Whitear
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark API
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
 
Operational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache SparkOperational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache Spark
 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
 

Andere mochten auch

Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan RavatSpark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan RavatSpark Summit
 
Using Apache Spark for Intelligent Services: Keynote at Spark Summit East by ...
Using Apache Spark for Intelligent Services: Keynote at Spark Summit East by ...Using Apache Spark for Intelligent Services: Keynote at Spark Summit East by ...
Using Apache Spark for Intelligent Services: Keynote at Spark Summit East by ...Spark Summit
 
Spark Summit EU talk by Johnathan Mercer
Spark Summit EU talk by Johnathan MercerSpark Summit EU talk by Johnathan Mercer
Spark Summit EU talk by Johnathan MercerSpark Summit
 
Spark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn WhittickSpark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn WhittickSpark Summit
 
Spark Summit EU talk by Yaroslav Nedashkovsky and Andy Starzhinsky
Spark Summit EU talk by Yaroslav Nedashkovsky and Andy StarzhinskySpark Summit EU talk by Yaroslav Nedashkovsky and Andy Starzhinsky
Spark Summit EU talk by Yaroslav Nedashkovsky and Andy StarzhinskySpark Summit
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit
 
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)Spark Summit
 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Spark Summit
 
Spark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit
 
Using Apache Spark for Intelligent Services by Alexis Roos
Using Apache Spark for Intelligent Services by Alexis RoosUsing Apache Spark for Intelligent Services by Alexis Roos
Using Apache Spark for Intelligent Services by Alexis RoosSpark Summit
 
Spark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit EU talk by Ruben Pulido and Behar VeliqiSpark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit EU talk by Ruben Pulido and Behar VeliqiSpark Summit
 
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...Spark Summit
 
Spark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken TsaiSpark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken TsaiSpark Summit
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteAmr Awadallah
 
Spark Summit EU talk by Tug Grall
Spark Summit EU talk by Tug GrallSpark Summit EU talk by Tug Grall
Spark Summit EU talk by Tug GrallSpark Summit
 
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Spark Summit
 
Monitoring Electronic Trading Environments using Spark by Fergal Toomey and P...
Monitoring Electronic Trading Environments using Spark by Fergal Toomey and P...Monitoring Electronic Trading Environments using Spark by Fergal Toomey and P...
Monitoring Electronic Trading Environments using Spark by Fergal Toomey and P...Spark Summit
 
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Spark Summit
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Spark Summit
 
Huohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For SparkHuohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For SparkJen Aman
 

Andere mochten auch (20)

Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan RavatSpark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan Ravat
 
Using Apache Spark for Intelligent Services: Keynote at Spark Summit East by ...
Using Apache Spark for Intelligent Services: Keynote at Spark Summit East by ...Using Apache Spark for Intelligent Services: Keynote at Spark Summit East by ...
Using Apache Spark for Intelligent Services: Keynote at Spark Summit East by ...
 
Spark Summit EU talk by Johnathan Mercer
Spark Summit EU talk by Johnathan MercerSpark Summit EU talk by Johnathan Mercer
Spark Summit EU talk by Johnathan Mercer
 
Spark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn WhittickSpark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn Whittick
 
Spark Summit EU talk by Yaroslav Nedashkovsky and Andy Starzhinsky
Spark Summit EU talk by Yaroslav Nedashkovsky and Andy StarzhinskySpark Summit EU talk by Yaroslav Nedashkovsky and Andy Starzhinsky
Spark Summit EU talk by Yaroslav Nedashkovsky and Andy Starzhinsky
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
 
Spark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat Patterson
 
Using Apache Spark for Intelligent Services by Alexis Roos
Using Apache Spark for Intelligent Services by Alexis RoosUsing Apache Spark for Intelligent Services by Alexis Roos
Using Apache Spark for Intelligent Services by Alexis Roos
 
Spark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit EU talk by Ruben Pulido and Behar VeliqiSpark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit EU talk by Ruben Pulido and Behar Veliqi
 
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
 
Spark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken TsaiSpark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken Tsai
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
 
Spark Summit EU talk by Tug Grall
Spark Summit EU talk by Tug GrallSpark Summit EU talk by Tug Grall
Spark Summit EU talk by Tug Grall
 
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
 
Monitoring Electronic Trading Environments using Spark by Fergal Toomey and P...
Monitoring Electronic Trading Environments using Spark by Fergal Toomey and P...Monitoring Electronic Trading Environments using Spark by Fergal Toomey and P...
Monitoring Electronic Trading Environments using Spark by Fergal Toomey and P...
 
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
 
Huohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For SparkHuohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For Spark
 

Ähnlich wie Spark Summit EU talk by Stephan Kessler

2020.04.28-ASUG_Introduction-to-Extracting-data-from-S4HANA-with-ABAP-CDS-vie...
2020.04.28-ASUG_Introduction-to-Extracting-data-from-S4HANA-with-ABAP-CDS-vie...2020.04.28-ASUG_Introduction-to-Extracting-data-from-S4HANA-with-ABAP-CDS-vie...
2020.04.28-ASUG_Introduction-to-Extracting-data-from-S4HANA-with-ABAP-CDS-vie...blaisecheuteu1
 
SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707Henrique Pinto
 
SAP HANA Migration Deck.pptx
SAP HANA Migration Deck.pptxSAP HANA Migration Deck.pptx
SAP HANA Migration Deck.pptxSingbBablu
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...Impetus Technologies
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to sparkHome
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkJames Chen
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...IBM
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentBlueData, Inc.
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark FundamentalsZahra Eskandari
 
HANA SPS07 Architecture & Landscape
HANA SPS07 Architecture & LandscapeHANA SPS07 Architecture & Landscape
HANA SPS07 Architecture & LandscapeSAP Technology
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsDr. Mirko Kämpf
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsDr. Mirko Kämpf
 
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15MLconf
 

Ähnlich wie Spark Summit EU talk by Stephan Kessler (20)

2020.04.28-ASUG_Introduction-to-Extracting-data-from-S4HANA-with-ABAP-CDS-vie...
2020.04.28-ASUG_Introduction-to-Extracting-data-from-S4HANA-with-ABAP-CDS-vie...2020.04.28-ASUG_Introduction-to-Extracting-data-from-S4HANA-with-ABAP-CDS-vie...
2020.04.28-ASUG_Introduction-to-Extracting-data-from-S4HANA-with-ABAP-CDS-vie...
 
SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707
 
SAP HANA Migration Deck.pptx
SAP HANA Migration Deck.pptxSAP HANA Migration Deck.pptx
SAP HANA Migration Deck.pptx
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
 
Apache drill
Apache drillApache drill
Apache drill
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environment
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
HANA a PoV
HANA a PoVHANA a PoV
HANA a PoV
 
HANA SPS07 Architecture & Landscape
HANA SPS07 Architecture & LandscapeHANA SPS07 Architecture & Landscape
HANA SPS07 Architecture & Landscape
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific Applications
 
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
 

Mehr von Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Spark Summit
 

Mehr von Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
 

Kürzlich hochgeladen

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 

Kürzlich hochgeladen (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

Spark Summit EU talk by Stephan Kessler

  • 1. SPARK SUMMIT EUROPE2016 Distributed Computing with Spark for Actionable Business Insights! Stephan Kessler SAP SE, Spark Developer
  • 2. Who I am • Stephan Kessler • SAP HANA Vora Team, Walldorf, Germany • Integrating SAP engines into Apache Spark since almost two years • 2nd Spark Summit as a speaker
  • 3. Today’s talk On average, between 60% and 73% of all data within an enterprise goes unused for business intelligence (BI) and analytics. Skills gaps continue to be a major adoption inhibitor for 57% of respondents, while deciding how to get value from Hadoop was cited by 49% of respondents. The Forrester Wave™:Big Data Hadoop Distributions,Q1 2016,” January 19,2016 by Mike Gualtieri and Noel Yuhanna Gartner:“Survey Analysis:Hadoop AdoptionDrivers and Challenges,” May 12,2015 by Nick Heudecker and Merv Adrian
  • 4. Current System Landscape SAP SAP HANA Platform Other Apps
  • 5. What is missing? • Business application perspective: • Access to Big Data Landscape in a standardized way • Similar SQL expressiveness • Big Data / Data Science perspective: • Access to specialized engines to perform analysis close to the data • Integration of ‘business engines’ into Spark
  • 6. SAP Hana Vora SAP SAP HANA Platform Other Apps SAP HANA Vora
  • 7. SAP Hana Vora – 10k ft POV Data Science, Predictive, Business Intelligence, Visualization Apps SAP HANA Vora Data Modeler OLAP Time Series Graph Doc Store Disk-to-Memory Accelerator Distributed Transaction Log ERP Systems SAP HANA Platform
  • 8. Goals • Make data and functionality available in enterprise applications as well as Spark applications • Allow an easy consumption, i.e., allow users to write SQL for computation jobs
  • 9. Agenda • Business functionality integration in Spark • Utilizing different data sources in Spark • HANA Vora 1.3 • Summary & Outlook
  • 10. Focus on Spark User POV Data Science, Predictive, Business Intelligence, Visualization Apps SAP HANA Vora Data Modeler OLAP Time Series Graph Doc Store Disk-to-Memory Accelerator Distributed Transaction Log ERP Systems SAP HANA Platform
  • 11. Business Functionality in Spark • Questions answered in Spark: – What is the sentiment of users for product XYZ? – When will a certain part of this machinery fail? • You might also want to know… – What is the sales volume for XYZ? – How much did this part cost?
  • 12. Example: SAP Predictive Analytics and SAP HANA Vora 1. Business Analyst friendly – No coding or in-depth Big Data expertise required 2. Support for end-to-end operationalisation of predictive models on Hadoop – Data preparation of Analytical Dataset for modelling – Native Spark Modelling – Ultra wide datasets – Scoring using In Database Apply or Spark Stream API • Usage of different sources (Vora, Hana, Spark, …) http://go.sap.com/solution/platform- technology/analytics/predictive-analytics.html HDFS (Hadoop Distributed File System) Hive/Spark SQL SAP HANA Vora Model Lifecycle Manager (Factory) Scorer Predictive Analytics Data Manager In-DB scoring Analytics Dataset Definition Layer Advanced Analytics Execution Layer Spark Streaming Modeler - Training Native Spark SAP HANA
  • 13. Business Functionality in Spark • Important typical ERP function – Currency Conversion (i.e., EUR à GBP) – Done via SQL UDF • Required to analyze enterprise data in Spark
  • 14. Currency Conversion • Single currency makes transactions comparable • Conversion not trivial: rates change over time TID USERID CURRENCY AMOUNT ORDERDATE 100 User1 USD 120.10 2014-12-15 101 User1 USD 24.99 2015-01-01 102 User5 EUR 24.11 2015-01-02 103 User3 DBP 542.00 2015-01-02
  • 15. Currency Conversion • Introducing an UDF implemented in Spark • Converting everything in USD CC( AMOUNT Double, SOURCE_CURRENCY String, TARGET_CURRENCY String, REF_DATE String ) SELECT TID, USERID, ORDERDATE, CC( AMOUNT, CURRENCY, "USD", ORDERDATE ) FROM ORDERS
  • 16. Currency Conversion • Conversion backed by a ‘rates’ table • Calculation simple, maintenance difficult • Rates maintained in ERP system – Couldn’t we use that? SOURCE_CUR TARGET_CUR REF_DATE RATE EUR USD 2015-01-01 1.32113 EURO USD 2015-01-02 1.30121 USD GBP 2015-01-01 0.68960
  • 17. Specialized Engine: Time Series • HANA Vora Time Series Engine in a nutshell: – Effective Model-Based Compression – Multi-representation storage for time series • Optimized usage for IOT applications – Fast injections paths – Long running processes in a cluster Time Series
  • 18. Specialized Engine: Time Series • Query Language: SQL Dialect • How to use that in Spark? -30 -25 -20 -15 -10 -5 0 5 Temperature °C Halifax Waterloo SELECT STDDEV(val1) FROM SERIES ts BETWEEN "2000-01-01", "2001-12-31” SELECT TREND(val1) OVER (SERIES) FROM SERIES ts BETWEEN "2000-01-01", "2001-12-31”
  • 19. Agenda • Business functionality integration in Spark • Utilizing different data sources in Spark • HANA Vora 1.3 • Summary & Outlook
  • 20. What we have seen so far • Currency Conversion: – Implementation in Spark SQL – Computation could be deferred • Time Series Engine: – Special query language – .. but no implementation in Spark à “The pushdown of everything” à Raw SQL
  • 21. The Pushdown of Everything • Spark datasource API • Limited to Filters and Projects Spark SQL Spark Core Engine Data Sources MLlib Streaming … HANA HANA VORA
  • 22. Vora Extension to Datasource API • Datasource indicates its processing capabilities • Arbitrary parts of logical plan can be computed where the data is • Details in Spark Summit Europe Talk 2015 – https://www.youtube.com/watch?v=QNaf2Z8l8IY – “The Pushdown of Everything”
  • 23. Vora Extension to Datasource API • Consider this query • Pushing down filters an projects: SCAN on orders • Pushing down arbitrary parts returns: – One row per orderdate – Converted currency SELECT ORDERDATE, AVG(CC( AMOUNT, CURRENCY, "USD", ORDERDATE )) FROM ORDERS GROUP BY ORDERDATE
  • 24. • Query Syntax on SparkSQL not supported but in the datasource à Raw SQL Raw SQL Extension ``select trend(val1) from ts …`` using com.sap.spark.engines Logical SelectUsing Node get schema for query Col1: Int, Col2: String Physical Node Execute Query Fetch results Time Series
  • 25. Pushdown & Raw SQL • Both extensions allow to incorporate other data sources extensively • Computation happens where the data is • Integration is mostly seamless for Spark developer • Interfaces are open source: – https://github.com/SAP/HANAVora-Extensions
  • 26. Agenda • Business functionality integration in Spark • Utilizing different data sources in Spark • HANA Vora 1.3 • Summary & Outlook
  • 27. HANA Vora 1.3 - Relational • Relational Engines in memory and disk • In-memory – Query compilation – Columnar data layout • Disk based – Indices for fast data access SAP HANA Vora Data Modeler OLAP Time Series Graph Doc Store Disk-to-Memory Accelerator Distributed Transaction Log
  • 28. HANA Vora 1.3 – Graph & Doc Store • Graph: – In-memory and distributed – SQL-Like interface for graph analysis – Combination of Graph patterns with relational operators • Doc Store – Stored semi structured JSON – Compresses in-memory representation – Compiled queries with NUMA awareness SAP HANA Vora Data Modeler OLAP Time Series Graph Doc Store Disk-to-Memory Accelerator Distributed Transaction Log
  • 29. Agenda • Business functionality integration in Spark • Utilizing different data sources in Spark • HANA Vora 1.3 • Summary & Outlook
  • 30. Summary and Outlook • Vora allows to combine all data source in an enterprise environment – Across different query languages – While moving computation close to the data • Business Insights are driven by all the available data in the enterprise • Integration into SQL makes it easily consumable
  • 31. SPARK SUMMIT EUROPE2016 THANK YOU. Stephan Kessler – stephan.kessler@sap.com
  • 33. HANA Vora 1.3 • Project started 2013 by HANA Research Teams • Shared concepts and libraries with HANA but independently developed • Concepts – In memory – Distributed engines – Low memory footprint SAP HANA Vora
  • 34. SAP Predictive Analytics: Optimised for Big Data üNative (scala code) Spark approach goes deeper than SQL üPerformanceand Scalability with Ultra wide datasets üProcessing close to the data distributed across the cluster üNo data transfer CPU/Memory scales dynamically Native Modeling Processing on 100’s-1000’s of nodes No Training Data Transfer Predictive Analytics (Server or Desktop) Limited CPU/Memory