SlideShare a Scribd company logo
1 of 16
Download to read offline
Structured Streaming in
Spark
Vikram Agrawal
Qubole
About Me
● Pursued Computer Science and Engineering from IIT Delhi
● Co-founded a web conferencing solution company before joining Qubole
● In last 5 years at Qubole, I wore multiple hats and worked across stacks to
provide big-data solutions over cloud
● Currently leading the Streaming Team At Qubole
Who should watch this?
● Big Data Engineer (DevOps, Architect, Software, Engineer, Admin)
● Data Platform Manager
● Big Data Enthusiast (Consultant, Executive, Data User, Analyst)
How is streaming used in production?
● Identifying sessions based on user behavior from real time activity streams
● Anomaly and fraud detection: running ML predictions on data streaming in to
keep the model updated continuously as new data comes in
● Time-based window aggregations: using window functions to do associative
aggregations and run real time stats
Data Processing Architecture
Data Processing Architecture
Streaming Paradigm
● Stream In Stream out
○ Low Latency - How Low?
○ Complexity of Analytics
○ Volume - How high?
● Stream In Batch out
○ No Tight Latency Constraint
○ Higher Ingestion Rate
○ Aggregation/Data or Schema
Transformation/Data
Enrichment
○ Downstream ETL Operation
Why use Spark Streaming
● No ultra low Latency requirement
○ Processing time of few secs is acceptable
● Scalable and Mature Processing engine
● Higher Level API abstraction
○ Ease of Code Reuse from Batch jobs
○ Simple and Modular
● Vibrant Community
○ Active Development on new features
Spark’s Functionality
Structured Streaming - under the hood
● Abstractions of Repeated Queries
○ Data Streams as unbounded
Table
○ Streaming query is a batch-
like operation on this table
Structured Streaming - under the hood
● Query Planning & Execution
○ In Batch Execution, Planner creates code & memory optimized execution plan
○ For Streaming Query, Planner convert streaming Logical plans to a series of incremental
execution plan to process next chunk of data
DataFrame Logical Plan Planner Execution Plan
Planner
Incremental Execution 1
Incremental Execution 2
Incremental Execution 3
Programming Paradigm
Start with Spark Session
Specify Data Source, schema and
other options (create input df)
Write your incremental query to
generate output
Specify Data Sink and other
options to export your data
Val S= SparkSession.builder.appName("kafka
streaming Example").getOrCreate()
val ds = S.readStream.format("kafka")
.option("kafka.bootstrap.servers", brokers)
option("subscribe",
topics).load().selectExpr("CAST(key AS STRING)",
"CAST(value AS STRING)").as[(String, String)
val c= ds.groupBy("value").count()
c.writeStream.queryName("aggregates").format("
memory").outputMode("complete").start()
Productionizing Streaming Application
● Monitoring
○ Throughput
○ Latency
○ Time Lag
● Fault Tolerance
○ Checkpointing
○ Exactly Once or At Least Once
Q&A
Structured Streaming in Spark

More Related Content

What's hot

Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...Flink Forward
 
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...Flink Forward
 
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...Flink Forward
 
Fall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using GrafanaFall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using Grafanatorkelo
 
The Future of Real-Time in Spark
The Future of Real-Time in SparkThe Future of Real-Time in Spark
The Future of Real-Time in SparkReynold Xin
 
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...Flink Forward
 
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...Flink Forward
 
Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...
Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...
Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...Flink Forward
 
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate Ido Shilon
 
GraphQL API on a Serverless Environment
GraphQL API on a Serverless EnvironmentGraphQL API on a Serverless Environment
GraphQL API on a Serverless EnvironmentItai Yaffe
 
Logging in The World of DevOps
Logging in The World of DevOps Logging in The World of DevOps
Logging in The World of DevOps DevOps Indonesia
 
How We Migrate PBs Data from Beijing to Shanghai
How We Migrate PBs Data from Beijing to ShanghaiHow We Migrate PBs Data from Beijing to Shanghai
How We Migrate PBs Data from Beijing to ShanghaiElmer Brown
 
Bus ticket management system
Bus ticket management systemBus ticket management system
Bus ticket management systemAbu Kaisar
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsYaroslav Tkachenko
 
Weavework Flagger Demo- AWS Container Day 2019 Barcelona
Weavework Flagger Demo- AWS Container Day 2019 BarcelonaWeavework Flagger Demo- AWS Container Day 2019 Barcelona
Weavework Flagger Demo- AWS Container Day 2019 BarcelonaAmazon Web Services
 
Streaming sql and druid
Streaming sql and druid Streaming sql and druid
Streaming sql and druid arupmalakar
 
Migrating batch ETLs to streaming Flink
Migrating batch ETLs to streaming FlinkMigrating batch ETLs to streaming Flink
Migrating batch ETLs to streaming FlinkWilliam Saar
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelFaisal Siddiqi
 
Spline: Data Lineage For Spark Structured Streaming
Spline: Data Lineage For Spark Structured StreamingSpline: Data Lineage For Spark Structured Streaming
Spline: Data Lineage For Spark Structured StreamingVaclav Kosar
 

What's hot (20)

Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...
 
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
 
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
 
Fall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using GrafanaFall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using Grafana
 
The Future of Real-Time in Spark
The Future of Real-Time in SparkThe Future of Real-Time in Spark
The Future of Real-Time in Spark
 
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
 
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
 
Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...
Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...
Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...
 
Implementing Real-Time IoT Stream Processing in Azure
Implementing Real-Time IoT Stream Processing in Azure Implementing Real-Time IoT Stream Processing in Azure
Implementing Real-Time IoT Stream Processing in Azure
 
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
 
GraphQL API on a Serverless Environment
GraphQL API on a Serverless EnvironmentGraphQL API on a Serverless Environment
GraphQL API on a Serverless Environment
 
Logging in The World of DevOps
Logging in The World of DevOps Logging in The World of DevOps
Logging in The World of DevOps
 
How We Migrate PBs Data from Beijing to Shanghai
How We Migrate PBs Data from Beijing to ShanghaiHow We Migrate PBs Data from Beijing to Shanghai
How We Migrate PBs Data from Beijing to Shanghai
 
Bus ticket management system
Bus ticket management systemBus ticket management system
Bus ticket management system
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your Analytics
 
Weavework Flagger Demo- AWS Container Day 2019 Barcelona
Weavework Flagger Demo- AWS Container Day 2019 BarcelonaWeavework Flagger Demo- AWS Container Day 2019 Barcelona
Weavework Flagger Demo- AWS Container Day 2019 Barcelona
 
Streaming sql and druid
Streaming sql and druid Streaming sql and druid
Streaming sql and druid
 
Migrating batch ETLs to streaming Flink
Migrating batch ETLs to streaming FlinkMigrating batch ETLs to streaming Flink
Migrating batch ETLs to streaming Flink
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
 
Spline: Data Lineage For Spark Structured Streaming
Spline: Data Lineage For Spark Structured StreamingSpline: Data Lineage For Spark Structured Streaming
Spline: Data Lineage For Spark Structured Streaming
 

Similar to Structured Streaming in Spark

Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSpark Summit
 
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Khai Tran
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebookAniket Mokashi
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkDataWorks Summit
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...NETWAYS
 
Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023Nelson Calero
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analyticsXiang Fu
 
Netflix Architecture and Open Source
Netflix Architecture and Open SourceNetflix Architecture and Open Source
Netflix Architecture and Open SourceAll Things Open
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uberconfluent
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streamingdatamantra
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015aspyker
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalSub Szabolcs Feczak
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 

Similar to Structured Streaming in Spark (20)

Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
 
Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Netflix Architecture and Open Source
Netflix Architecture and Open SourceNetflix Architecture and Open Source
Netflix Architecture and Open Source
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 

More from Digital Vidya

Emerging Trends in Marketing-Role of AI & Data Science
Emerging Trends in Marketing-Role of AI & Data ScienceEmerging Trends in Marketing-Role of AI & Data Science
Emerging Trends in Marketing-Role of AI & Data ScienceDigital Vidya
 
Digital Marketing Beyond Facebook & Google
Digital Marketing Beyond Facebook & GoogleDigital Marketing Beyond Facebook & Google
Digital Marketing Beyond Facebook & GoogleDigital Vidya
 
Making Money Out of Data
Making Money Out of DataMaking Money Out of Data
Making Money Out of DataDigital Vidya
 
Persuasion Strategies That Work Building Influence To Open Up Your Revenue St...
Persuasion Strategies That Work Building Influence To Open Up Your Revenue St...Persuasion Strategies That Work Building Influence To Open Up Your Revenue St...
Persuasion Strategies That Work Building Influence To Open Up Your Revenue St...Digital Vidya
 
How To Set-up An SEO Agency From Scratch As A Newbie
How To Set-up An SEO Agency From Scratch As A NewbieHow To Set-up An SEO Agency From Scratch As A Newbie
How To Set-up An SEO Agency From Scratch As A NewbieDigital Vidya
 
Lifecycle of a Data Science Project
Lifecycle of a Data Science ProjectLifecycle of a Data Science Project
Lifecycle of a Data Science ProjectDigital Vidya
 
7 B2B Marketing Trends for Driving Growth
7 B2B Marketing Trends for Driving Growth7 B2B Marketing Trends for Driving Growth
7 B2B Marketing Trends for Driving GrowthDigital Vidya
 
Social Video Analytics: From Demography to Psychography of User Behaviour
Social Video Analytics: From Demography to Psychography of User BehaviourSocial Video Analytics: From Demography to Psychography of User Behaviour
Social Video Analytics: From Demography to Psychography of User BehaviourDigital Vidya
 
How to Use Marketing Automation to Convert More Leads to Sales
How to Use Marketing Automation to Convert More Leads to SalesHow to Use Marketing Automation to Convert More Leads to Sales
How to Use Marketing Automation to Convert More Leads to SalesDigital Vidya
 
Native Advertising: Changing Digital Advertising Landscape
Native Advertising: Changing Digital Advertising LandscapeNative Advertising: Changing Digital Advertising Landscape
Native Advertising: Changing Digital Advertising LandscapeDigital Vidya
 
Personal Branding Using Social Media
Personal Branding Using Social MediaPersonal Branding Using Social Media
Personal Branding Using Social MediaDigital Vidya
 
Anomaly Detection Using Machine Learning In Industrial IoT
Anomaly Detection Using Machine Learning In Industrial IoTAnomaly Detection Using Machine Learning In Industrial IoT
Anomaly Detection Using Machine Learning In Industrial IoTDigital Vidya
 
Community Development with Social Media
Community Development with Social MediaCommunity Development with Social Media
Community Development with Social MediaDigital Vidya
 
Framework of Digital Media Marketing in India
Framework of Digital Media Marketing in IndiaFramework of Digital Media Marketing in India
Framework of Digital Media Marketing in IndiaDigital Vidya
 
The Secret to Search Engine Marketing Success in 2018
The Secret to Search Engine Marketing Success in 2018The Secret to Search Engine Marketing Success in 2018
The Secret to Search Engine Marketing Success in 2018Digital Vidya
 
People Centric Marketing - Create Impact by Putting People First
People Centric Marketing - Create Impact by Putting People First People Centric Marketing - Create Impact by Putting People First
People Centric Marketing - Create Impact by Putting People First Digital Vidya
 
Going Global? Key Steps to Expanding Your Business Globally
Going Global? Key Steps to Expanding Your Business GloballyGoing Global? Key Steps to Expanding Your Business Globally
Going Global? Key Steps to Expanding Your Business GloballyDigital Vidya
 
How to Optimize your Online Presence for 6X Growth in Sales?
 How to Optimize your Online Presence for 6X Growth in Sales? How to Optimize your Online Presence for 6X Growth in Sales?
How to Optimize your Online Presence for 6X Growth in Sales?Digital Vidya
 

More from Digital Vidya (20)

Emerging Trends in Marketing-Role of AI & Data Science
Emerging Trends in Marketing-Role of AI & Data ScienceEmerging Trends in Marketing-Role of AI & Data Science
Emerging Trends in Marketing-Role of AI & Data Science
 
Digital Marketing Beyond Facebook & Google
Digital Marketing Beyond Facebook & GoogleDigital Marketing Beyond Facebook & Google
Digital Marketing Beyond Facebook & Google
 
Making Money Out of Data
Making Money Out of DataMaking Money Out of Data
Making Money Out of Data
 
Say Yes To No SQL
Say Yes To No SQLSay Yes To No SQL
Say Yes To No SQL
 
Persuasion Strategies That Work Building Influence To Open Up Your Revenue St...
Persuasion Strategies That Work Building Influence To Open Up Your Revenue St...Persuasion Strategies That Work Building Influence To Open Up Your Revenue St...
Persuasion Strategies That Work Building Influence To Open Up Your Revenue St...
 
How To Set-up An SEO Agency From Scratch As A Newbie
How To Set-up An SEO Agency From Scratch As A NewbieHow To Set-up An SEO Agency From Scratch As A Newbie
How To Set-up An SEO Agency From Scratch As A Newbie
 
Lifecycle of a Data Science Project
Lifecycle of a Data Science ProjectLifecycle of a Data Science Project
Lifecycle of a Data Science Project
 
7 B2B Marketing Trends for Driving Growth
7 B2B Marketing Trends for Driving Growth7 B2B Marketing Trends for Driving Growth
7 B2B Marketing Trends for Driving Growth
 
Social Video Analytics: From Demography to Psychography of User Behaviour
Social Video Analytics: From Demography to Psychography of User BehaviourSocial Video Analytics: From Demography to Psychography of User Behaviour
Social Video Analytics: From Demography to Psychography of User Behaviour
 
AIRflow at Scale
AIRflow at ScaleAIRflow at Scale
AIRflow at Scale
 
How to Use Marketing Automation to Convert More Leads to Sales
How to Use Marketing Automation to Convert More Leads to SalesHow to Use Marketing Automation to Convert More Leads to Sales
How to Use Marketing Automation to Convert More Leads to Sales
 
Native Advertising: Changing Digital Advertising Landscape
Native Advertising: Changing Digital Advertising LandscapeNative Advertising: Changing Digital Advertising Landscape
Native Advertising: Changing Digital Advertising Landscape
 
Personal Branding Using Social Media
Personal Branding Using Social MediaPersonal Branding Using Social Media
Personal Branding Using Social Media
 
Anomaly Detection Using Machine Learning In Industrial IoT
Anomaly Detection Using Machine Learning In Industrial IoTAnomaly Detection Using Machine Learning In Industrial IoT
Anomaly Detection Using Machine Learning In Industrial IoT
 
Community Development with Social Media
Community Development with Social MediaCommunity Development with Social Media
Community Development with Social Media
 
Framework of Digital Media Marketing in India
Framework of Digital Media Marketing in IndiaFramework of Digital Media Marketing in India
Framework of Digital Media Marketing in India
 
The Secret to Search Engine Marketing Success in 2018
The Secret to Search Engine Marketing Success in 2018The Secret to Search Engine Marketing Success in 2018
The Secret to Search Engine Marketing Success in 2018
 
People Centric Marketing - Create Impact by Putting People First
People Centric Marketing - Create Impact by Putting People First People Centric Marketing - Create Impact by Putting People First
People Centric Marketing - Create Impact by Putting People First
 
Going Global? Key Steps to Expanding Your Business Globally
Going Global? Key Steps to Expanding Your Business GloballyGoing Global? Key Steps to Expanding Your Business Globally
Going Global? Key Steps to Expanding Your Business Globally
 
How to Optimize your Online Presence for 6X Growth in Sales?
 How to Optimize your Online Presence for 6X Growth in Sales? How to Optimize your Online Presence for 6X Growth in Sales?
How to Optimize your Online Presence for 6X Growth in Sales?
 

Recently uploaded

ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdfssuserdda66b
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 

Recently uploaded (20)

ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 

Structured Streaming in Spark

  • 1.
  • 3. About Me ● Pursued Computer Science and Engineering from IIT Delhi ● Co-founded a web conferencing solution company before joining Qubole ● In last 5 years at Qubole, I wore multiple hats and worked across stacks to provide big-data solutions over cloud ● Currently leading the Streaming Team At Qubole
  • 4. Who should watch this? ● Big Data Engineer (DevOps, Architect, Software, Engineer, Admin) ● Data Platform Manager ● Big Data Enthusiast (Consultant, Executive, Data User, Analyst)
  • 5. How is streaming used in production? ● Identifying sessions based on user behavior from real time activity streams ● Anomaly and fraud detection: running ML predictions on data streaming in to keep the model updated continuously as new data comes in ● Time-based window aggregations: using window functions to do associative aggregations and run real time stats
  • 8. Streaming Paradigm ● Stream In Stream out ○ Low Latency - How Low? ○ Complexity of Analytics ○ Volume - How high? ● Stream In Batch out ○ No Tight Latency Constraint ○ Higher Ingestion Rate ○ Aggregation/Data or Schema Transformation/Data Enrichment ○ Downstream ETL Operation
  • 9. Why use Spark Streaming ● No ultra low Latency requirement ○ Processing time of few secs is acceptable ● Scalable and Mature Processing engine ● Higher Level API abstraction ○ Ease of Code Reuse from Batch jobs ○ Simple and Modular ● Vibrant Community ○ Active Development on new features
  • 11. Structured Streaming - under the hood ● Abstractions of Repeated Queries ○ Data Streams as unbounded Table ○ Streaming query is a batch- like operation on this table
  • 12. Structured Streaming - under the hood ● Query Planning & Execution ○ In Batch Execution, Planner creates code & memory optimized execution plan ○ For Streaming Query, Planner convert streaming Logical plans to a series of incremental execution plan to process next chunk of data DataFrame Logical Plan Planner Execution Plan Planner Incremental Execution 1 Incremental Execution 2 Incremental Execution 3
  • 13. Programming Paradigm Start with Spark Session Specify Data Source, schema and other options (create input df) Write your incremental query to generate output Specify Data Sink and other options to export your data Val S= SparkSession.builder.appName("kafka streaming Example").getOrCreate() val ds = S.readStream.format("kafka") .option("kafka.bootstrap.servers", brokers) option("subscribe", topics).load().selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)").as[(String, String) val c= ds.groupBy("value").count() c.writeStream.queryName("aggregates").format(" memory").outputMode("complete").start()
  • 14. Productionizing Streaming Application ● Monitoring ○ Throughput ○ Latency ○ Time Lag ● Fault Tolerance ○ Checkpointing ○ Exactly Once or At Least Once
  • 15. Q&A