SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 1
Oracle Stream Analytics
Complex Event Processing for Apache Spark Streaming
Complex Event Processing for Spark Streaming
Prabhu Thukkaram, Senior Director, Oracle Product Development
Hoyong Park, Architect, Oracle Product Development
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 2
Complex Event Processing with Continuous Query Processor
Continuous
Query Processor
Pre-registered Queries
E3@T3, E2@T2, E1@T1
ResultsInput Events
R3@T3, R2@T2, R1@T1
Heartbeats
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Why Continuous Query Processor ?
Oracle Confidential – Internal/Restricted/Highly Restricted 3
• Complex event processing requires events to be processed one at a time
– Each event must be processed as identified by its individual timestamp
– Real world events originate at different times and must be processed as such
– CEP applications seek correlation and patterns across events in time within or across
batches and irrespective of batch boundaries
• Window length can span fractional batches
• Micro-batching with Spark Streaming
– All events in the batch are identified by same time (RDD Time)
– No progression of time between events in the same batch
– No progression of time when RDD partitions are empty
• Critical for missing event scenarios. E.g. alert when order status “Received” is not followed by
“Shipped” for order Id 10001 within 1 hour
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Pattern Detection in CQP
Oracle Confidential – Internal/Restricted/Highly Restricted 4
Checks if temperature readings from a power sensor are wobbling
during a certain time interval.
The CQP code checks for a W-Pattern in temperature readings during a
10 minute interval and selects support levels as output
SELECT LAST(A.value), LAST(C.value) FROM TEMP_STREAM
MATCH_RECOGNIZE (
PARTITION BY DEVICE_ID
PATTERN (A+ B+ C+ D+) DURATION OF 10 MINUTES
DEFINE
A AS (value < PREV(value))
B AS (value > PREV(value))
C AS (value < PREV(value))
D AS (value > PREV(value))
)
A
B
C D
10 Minutes
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Distributed Complex Event Processing
Oracle Confidential – Internal/Restricted/Highly Restricted 5
• Continuous Query Processor
– Event by event processing
• Each event is assigned a unique timestamp
• Apache Spark
– Distributed computing with scale out and fault tolerance
Spark Streaming + Continuous Query Processor
=
Distributed, Scalable, and Fault Tolerant CEP Platform
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
How CQP complements Spark Streaming ?
Oracle Confidential – Internal/Restricted/Highly Restricted 6
• Continuous, event-by-event, and stateful processing
• Flexible temporal windows (Time & Rows)
• Automatic progression of time
– Heartbeat propagation to advance time
• Pattern detection without batch boundaries
– Built-in finite state automaton
• Declarative SQL like language
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
How Spark complements CQP ?
• Distributed computing framework for CQP
• OOTB sources for data Ingestion
• Horizontal scale-out
• Fault tolerance and high availability
• Spark can detect failures, restart required resources, and automatically replay parts
of the stream that experienced failure
Oracle Confidential – Internal
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
High-Level Architecture
Data Stream Oracle’s CQP Engine
Finite State Automaton for
Pattern Detection across
Discrete Events
HDFS
Journaled CQP engine State serialized to HDFS
after computing each partition
CQP Engine State restored on
Executor Restart and Recompute of a
partition
Geo Sensing Cartridge for
Spatial Analytics
RETE Rules for Conditional
Logic
Complex Pattern Detection, Temporal Queries, Spatial
Queries, Contextual Queries, and Conditional Logic are all
executed in Oracle’s CEP Engine
Distributed Cache for
Ultra-fast Context Lookup
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
CQP Spark Application
Oracle Confidential – Internal
Cluster
Spark Standalone or YARN or Mesos
Spark Driver
Spark Executor
Spark Executor
Spark Executor
• Transformation delegated to CQP
• CQP application parsed in Spark’s Driver process
• Driver builds a DAG (job definition) from CQP application
• Driver runs job for each micro-batch
CQP
CQP
CQP
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Spark Integration - High Level
• Extensions
– CQLDStream extends DStream overriding compute()
– CQLRDD extends RDD overriding compute()
• Initialization/DAG creation phase
– Create Spark Context, Streaming Context, and start CQP on each Executor
– Setup DAG of CQLDStream
• Physical plan execution phase
– Assign unique timestamp to each event in micro-batch based on RDD time
– Register query with CQP on RDD compute if not already registered
– Send micro-batch to CQP and accept returned results
– Checkpoint CQP state if HA enabled
Oracle Confidential – Internal
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Spark Integration - High Availability
• Executor Failure
– Restart CQP on newly restarted Executor
– On RDD compute, re-register query and restore query state from snapshots
– Continue processing stream
Oracle Confidential – Internal
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal
Demo
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal
Q & A
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
CQP scalability in Spark
Oracle Confidential – Internal/Restricted/Highly Restricted 14
0
20000
40000
60000
80000
w1(c3) w2(c5) w3(c7) w4(c9)
Processing Time (seconds) for 40 Million
Records
0
5000
10000
15000
20000
w1(c3) w2(c5) w3(c7) w4(c9)
Avg. Processing Time Per Batch
(milliseconds)
0
20
40
60
80
100
120
w1(c3) w2(c5) w3(c7) w4(c9)
Number of Batches
Processed over 10
Minutes
Legend
Wn = n number of workers or
executors
W2(c5) means 2 executors
and a total of 5 cores across
both executors

Weitere ähnliche Inhalte

Was ist angesagt?

Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...Amazon Web Services Japan
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsGuozhang Wang
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino ProjectMartin Traverso
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Databricks
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IODatabricks
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisCapacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisHostedbyConfluent
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationDatabricks
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deploymentYoshinori Matsunobu
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBMariaDB plc
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architecturesDaniel Marcous
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 

Was ist angesagt? (20)

Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisCapacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 

Ähnlich wie Bringing complex event processing to Spark streaming

Apache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream AnalyticsApache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream AnalyticsPrabhu Thukkaram
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics Franco Ucci
 
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.Kellyn Pot'Vin-Gorman
 
The Amazing and Elegant PL/SQL Function Result Cache
The Amazing and Elegant PL/SQL Function Result CacheThe Amazing and Elegant PL/SQL Function Result Cache
The Amazing and Elegant PL/SQL Function Result CacheSteven Feuerstein
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Pavel Hardak
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Eren Avşaroğulları
 
Servlet 4.0 Adopt-a-JSR 10 Minute Infodeck
Servlet 4.0 Adopt-a-JSR 10 Minute InfodeckServlet 4.0 Adopt-a-JSR 10 Minute Infodeck
Servlet 4.0 Adopt-a-JSR 10 Minute InfodeckEdward Burns
 
Kellyn Pot'Vin-Gorman - Power awr warehouse2
Kellyn Pot'Vin-Gorman - Power awr warehouse2Kellyn Pot'Vin-Gorman - Power awr warehouse2
Kellyn Pot'Vin-Gorman - Power awr warehouse2gaougorg
 
Kellyn Pot'Vin-Gorman - Awr and Ash
Kellyn Pot'Vin-Gorman - Awr and AshKellyn Pot'Vin-Gorman - Awr and Ash
Kellyn Pot'Vin-Gorman - Awr and Ashgaougorg
 
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...Courtney Llamas
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduDataWorks Summit
 
CON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EE
CON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EECON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EE
CON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EEMasoud Kalali
 

Ähnlich wie Bringing complex event processing to Spark streaming (20)

Apache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream AnalyticsApache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream Analytics
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics
 
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
 
The Amazing and Elegant PL/SQL Function Result Cache
The Amazing and Elegant PL/SQL Function Result CacheThe Amazing and Elegant PL/SQL Function Result Cache
The Amazing and Elegant PL/SQL Function Result Cache
 
ODTUG Webinar AWR Warehouse
ODTUG Webinar AWR WarehouseODTUG Webinar AWR Warehouse
ODTUG Webinar AWR Warehouse
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
 
Apouc 2014-enterprise-manager-12c
Apouc 2014-enterprise-manager-12cApouc 2014-enterprise-manager-12c
Apouc 2014-enterprise-manager-12c
 
Developer day v2
Developer day v2Developer day v2
Developer day v2
 
MySQL Replication
MySQL ReplicationMySQL Replication
MySQL Replication
 
Power of the AWR Warehouse
Power of the AWR WarehousePower of the AWR Warehouse
Power of the AWR Warehouse
 
UKOUG
UKOUG UKOUG
UKOUG
 
AWR and ASH Deep Dive
AWR and ASH Deep DiveAWR and ASH Deep Dive
AWR and ASH Deep Dive
 
AWR and ASH in an EM12c World
AWR and ASH in an EM12c WorldAWR and ASH in an EM12c World
AWR and ASH in an EM12c World
 
Servlet 4.0 Adopt-a-JSR 10 Minute Infodeck
Servlet 4.0 Adopt-a-JSR 10 Minute InfodeckServlet 4.0 Adopt-a-JSR 10 Minute Infodeck
Servlet 4.0 Adopt-a-JSR 10 Minute Infodeck
 
Kellyn Pot'Vin-Gorman - Power awr warehouse2
Kellyn Pot'Vin-Gorman - Power awr warehouse2Kellyn Pot'Vin-Gorman - Power awr warehouse2
Kellyn Pot'Vin-Gorman - Power awr warehouse2
 
Kellyn Pot'Vin-Gorman - Awr and Ash
Kellyn Pot'Vin-Gorman - Awr and AshKellyn Pot'Vin-Gorman - Awr and Ash
Kellyn Pot'Vin-Gorman - Awr and Ash
 
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache Kudu
 
CON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EE
CON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EECON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EE
CON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EE
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Kürzlich hochgeladen (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Bringing complex event processing to Spark streaming

  • 1. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 1 Oracle Stream Analytics Complex Event Processing for Apache Spark Streaming Complex Event Processing for Spark Streaming Prabhu Thukkaram, Senior Director, Oracle Product Development Hoyong Park, Architect, Oracle Product Development
  • 2. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 2 Complex Event Processing with Continuous Query Processor Continuous Query Processor Pre-registered Queries E3@T3, E2@T2, E1@T1 ResultsInput Events R3@T3, R2@T2, R1@T1 Heartbeats
  • 3. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Why Continuous Query Processor ? Oracle Confidential – Internal/Restricted/Highly Restricted 3 • Complex event processing requires events to be processed one at a time – Each event must be processed as identified by its individual timestamp – Real world events originate at different times and must be processed as such – CEP applications seek correlation and patterns across events in time within or across batches and irrespective of batch boundaries • Window length can span fractional batches • Micro-batching with Spark Streaming – All events in the batch are identified by same time (RDD Time) – No progression of time between events in the same batch – No progression of time when RDD partitions are empty • Critical for missing event scenarios. E.g. alert when order status “Received” is not followed by “Shipped” for order Id 10001 within 1 hour
  • 4. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Pattern Detection in CQP Oracle Confidential – Internal/Restricted/Highly Restricted 4 Checks if temperature readings from a power sensor are wobbling during a certain time interval. The CQP code checks for a W-Pattern in temperature readings during a 10 minute interval and selects support levels as output SELECT LAST(A.value), LAST(C.value) FROM TEMP_STREAM MATCH_RECOGNIZE ( PARTITION BY DEVICE_ID PATTERN (A+ B+ C+ D+) DURATION OF 10 MINUTES DEFINE A AS (value < PREV(value)) B AS (value > PREV(value)) C AS (value < PREV(value)) D AS (value > PREV(value)) ) A B C D 10 Minutes
  • 5. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Distributed Complex Event Processing Oracle Confidential – Internal/Restricted/Highly Restricted 5 • Continuous Query Processor – Event by event processing • Each event is assigned a unique timestamp • Apache Spark – Distributed computing with scale out and fault tolerance Spark Streaming + Continuous Query Processor = Distributed, Scalable, and Fault Tolerant CEP Platform
  • 6. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | How CQP complements Spark Streaming ? Oracle Confidential – Internal/Restricted/Highly Restricted 6 • Continuous, event-by-event, and stateful processing • Flexible temporal windows (Time & Rows) • Automatic progression of time – Heartbeat propagation to advance time • Pattern detection without batch boundaries – Built-in finite state automaton • Declarative SQL like language
  • 7. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | How Spark complements CQP ? • Distributed computing framework for CQP • OOTB sources for data Ingestion • Horizontal scale-out • Fault tolerance and high availability • Spark can detect failures, restart required resources, and automatically replay parts of the stream that experienced failure Oracle Confidential – Internal
  • 8. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | High-Level Architecture Data Stream Oracle’s CQP Engine Finite State Automaton for Pattern Detection across Discrete Events HDFS Journaled CQP engine State serialized to HDFS after computing each partition CQP Engine State restored on Executor Restart and Recompute of a partition Geo Sensing Cartridge for Spatial Analytics RETE Rules for Conditional Logic Complex Pattern Detection, Temporal Queries, Spatial Queries, Contextual Queries, and Conditional Logic are all executed in Oracle’s CEP Engine Distributed Cache for Ultra-fast Context Lookup
  • 9. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | CQP Spark Application Oracle Confidential – Internal Cluster Spark Standalone or YARN or Mesos Spark Driver Spark Executor Spark Executor Spark Executor • Transformation delegated to CQP • CQP application parsed in Spark’s Driver process • Driver builds a DAG (job definition) from CQP application • Driver runs job for each micro-batch CQP CQP CQP
  • 10. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Spark Integration - High Level • Extensions – CQLDStream extends DStream overriding compute() – CQLRDD extends RDD overriding compute() • Initialization/DAG creation phase – Create Spark Context, Streaming Context, and start CQP on each Executor – Setup DAG of CQLDStream • Physical plan execution phase – Assign unique timestamp to each event in micro-batch based on RDD time – Register query with CQP on RDD compute if not already registered – Send micro-batch to CQP and accept returned results – Checkpoint CQP state if HA enabled Oracle Confidential – Internal
  • 11. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Spark Integration - High Availability • Executor Failure – Restart CQP on newly restarted Executor – On RDD compute, re-register query and restore query state from snapshots – Continue processing stream Oracle Confidential – Internal
  • 12. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Demo
  • 13. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Q & A
  • 14. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | CQP scalability in Spark Oracle Confidential – Internal/Restricted/Highly Restricted 14 0 20000 40000 60000 80000 w1(c3) w2(c5) w3(c7) w4(c9) Processing Time (seconds) for 40 Million Records 0 5000 10000 15000 20000 w1(c3) w2(c5) w3(c7) w4(c9) Avg. Processing Time Per Batch (milliseconds) 0 20 40 60 80 100 120 w1(c3) w2(c5) w3(c7) w4(c9) Number of Batches Processed over 10 Minutes Legend Wn = n number of workers or executors W2(c5) means 2 executors and a total of 5 cores across both executors