SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Spark Streaming
Kafka in Action
Dori Waldman
Big Data Lead
 Spark Streaming with Kafka – Receiver Based
 Spark Streaming with Kafka – Direct (No Receiver)
 Statefull Spark Streaming (Demo)
Agenda
What we do … Ad-Exchange
Real time trading (150ms average response time) and optimize campaigns
over ad spaces.
Tech Stack :
Why Spark ...
Use Case
Tens of Millions of transactions per minute (and growing …)
~ 15TB daily (24/7 99.99999 resiliency)
Data Aggregation: (#Video Success Rate)
Real time Aggregation and DB update
Raw data persistency as recovery backup
Retrospective aggregation updates (recalculate)
Analytic Data :
 Persist incoming events (Raw data persistency)
 Real time analytics and ML algorithm (inside)
 Based on high-level Kafka consumer
 The receiver stores Kafka messages in executors/workers
 Write-Ahead Logs to recover data on failures – Recommended
 ZK offsets are updated by Spark
 Data duplication (WAL/Kafka)
Receiver Approach - ”KafkaUtils.createStream”
Receiver Approach - Code
Spark Partition != Kafka Partition
val kafkaStream = { …
Basic
Advanced
Receiver Approach – Code (continue)
Architecture 1.0
Stream
Events
Events
Raw Data
Events
Consumer
Consumer
Aggregation
Aggregation
Spark Batch
Spark Stream
Architecture
Pros:
 Worked just fine with single MySQL server
 Simplicity – legacy code stays the same
 Real-time DB updates
 Partial Aggregation was done in Spark, DB was updated via
“Insert On Duplicate Key Update”
Cons:
 MySQL limitations (MySQL sharding is an issue, Cassandra is optimal)
 S3 raw data (in standard formats) is not trivial when using Spark
Monitoring
Architecture 2.0
Stream
Events
Events
Raw Data
Events
starts from largest “offset” by default
– columnar format (FS not DB)
atch update C* every few minutes (overwrite)
Consumer
Consumer
Raw Data
Raw Data
Aggregation
Architecture
Pros:
 Parquet is ideal for Spark analytics
 Backup data requires less disk space
Cons:
 DB is not updated in real time (streaming), we could use combination with
MySQL for current hour...
What has been changed:
 C* uses counters for “sum/update” which is a “bad” practice
(no “insert on duplicate key” using MySQL)
 Parquet conversion is a heavy job and it seems that streaming hourly
conversions (using batch in case of failure) is a better approach
Direct Approach – ”KafkaUtils.createDirectStream”
 Based on Kafka simple consumer
 Queries Kafka for the latest offsets in each topic+partition, define offset range for batch
 No need to create multiple input Kafka streams and consolidate them
 Spark creates an RDD partition for each Kafka
partition so data is consumed in parallel
 ZK offsets are not updated by Spark, offsets are
tracked by Spark within its checkpoints (might not
recover)
 No data duplication (no WAL)
 S3 / HDFS
 Save metadata – needed for recovery from driver failures
 RDD for statefull transformations (RDDs of previous batches)
Checkpoint...
Transfer data from driver to workers
Broadcast -
keep a read-only variable cached on each machine rather than shipping a copy of it with tasks
Accumulator - used to implement counters/sum, workers can only add to accumulator, driver can read its
value (you can extends AccumulatorParam[Vector])
Static (Scala Object)
Context (rdd) – get data after recovery
Direct Approach - Code
def start(sparkConfig: SparkConfiguration, decoder: String) {
val ssc = StreamingContext.getOrCreate(sparkCheckpointDirectory(sparkConfig),()=>functionToCreateContext(decoder,sparkConfig))
sys.ShutdownHookThread {
ssc.stop(stopSparkContext = true, stopGracefully = true)
}
ssc.start()
ssc.awaitTermination()
}
In house code
def functionToCreateContext(decoder: String,sparkConfig: SparkConfiguration ):StreamingContext = {
val sparkConf = new SparkConf().setMaster(sparkClusterHost).setAppName(sparkConfig.jobName)
sparkConf.set(S3_KEY, sparkConfig.awsKey)
sparkConf.set(S3_CREDS, sparkConfig.awsSecret)
sparkConf.set(PARQUET_OUTPUT_DIRECTORY, sparkConfig.parquetOutputDirectory)
val sparkContext = SparkContext.getOrCreate(sparkConf)
// Hadoop S3 writer optimization
sparkContext.hadoopConfiguration.set("spark.sql.parquet.output.committer.class",
"org.apache.spark.sql.parquet.DirectParquetOutputCommitter")
// Same as Avro, Parquet also supports schema evolution. This work happens in driver and takes
// relativly long time
sparkContext.hadoopConfiguration.set("parquet.enable.summary-metadata", "false")
sparkContext.hadoopConfiguration.setInt("parquet.metadata.read.parallelism", 100)
val ssc = new StreamingContext(sparkContext, Seconds(sparkConfig.batchTime))
ssc.checkpoint(sparkCheckpointDirectory(sparkConfig))
In house code (continue)
// evaluate stream value happened only if checkpoint folder is not exist
val streams = sparkConfig.kafkaConfig.streams map { c =>
val topic = c.topic.split(",").toSet
KafkaUtils.createDirectStream[String, String, StringDecoder, JsonDecoder](ssc, c.kafkaParams, topic)
}
streams.foreach { dsStream => {
dsStream.foreachRDD { rdd =>
val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
for (o <- offsetRanges) {
logInfo(s"Offset on the driver: ${offsetRanges.mkString}")
}
val sqlContext = SQLContext.getOrCreate(rdd.sparkContext)
sqlContext.setConf("spark.sql.parquet.compression.codec", "snappy")
// Data recovery after crash
val s3Accesskey = rdd.context.getConf.get(S3_KEY)
val s3SecretKey = rdd.context.getConf.get(S3_CREDS)
val outputDirectory = rdd.context.getConf.get(PARQUET_OUTPUT_DIRECTORY)
In house code (continue)
val data = sqlContext.read.json(rdd.map(_._2))
val carpetData = data.count()
if (carpetData > 0) {
// coalesce(1) – Data transfer optimization during shuffle
data.coalesce(1).write.mode(SaveMode.Append).partitionBy "day", "hour").parquet(“s3a//...")
// In case of S3Exception will not continue to update ZK.
zk.updateNode(o.topic, o.partition.toString, kafkaConsumerGroup, o.untilOffset.toString.getBytes)
}
}
}
}
ssc
}
In house code (continue)
SaveMode (Append/Overwrite) used to handle exist data (add new file / overwrite)
Spark Streaming does not update ZK (http://curator.apache.org/)
Spark Streaming saves offset in its checkpoint folder. Once it crashes it will continue from the last
offset
You can avoid using checkpoint for offsets and manage it manually
Config...
val sparkConf = new SparkConf().setMaster("local[4]").setAppName("demo")
val sparkContext = SparkContext.getOrCreate(sparkConf)
val sqlContext = SQLContext.getOrCreate(sparkContext)
val data = sqlContext.read.json(path)
data.coalesce(1).write.mode(SaveMode.Overwrite).partitionBy("table", "day") parquet (outputFolder)
Batch Code
 Built in support for backpressure Since Spark 1.5 (default is disabled)
 Reciever – spark.streaming.receiver.maxRate
 Direct – spark.streaming.kafka.maxRatePerPartition
Back Pressure
https://www.youtube.com/watch?v=fXnNEq1v3VA&list=PL-x35fyliRwgfhffEpywn4q23ykotgQJ6&index=16
http://spark.apache.org/docs/latest/streaming-kafka-integration.html
https://spark.apache.org/docs/1.6.0/streaming-programming-guide.html
http://spark.apache.org/docs/latest/streaming-programming-guide.html#deploying-applications
http://blog.cloudera.com/blog/2015/03/exactly-once-spark-streaming-from-apache-kafka/
http://www.michael-noll.com/blog/2014/10/01/kafka-spark-streaming-integration-example-tutorial/
http://koeninger.github.io/kafka-exactly-once/#1
http://www.slideshare.net/miguno/being-ready-for-apache-kafka-apache-big-data-europe-2015
http://www.slideshare.net/SparkSummit/recipes-for-running-spark-streaming-apploications-in-production-tathagata-daspptx
http://www.slideshare.net/databricks/strata-sj-everyday-im-shuffling-tips-for-writing-better-spark-programs
https://github.com/JerryLead/SparkInternals/blob/master/markdown/english/6-CacheAndCheckpoint.md
https://dzone.com/articles/uniting-spark-parquet-and-s3-as-an-alternative-to
http://blog.cloudera.com/blog/2013/10/parquet-at-salesforce-com/
https://www.appsflyer.com/blog/the-bleeding-edge-spark-parquet-and-s3/
Links – Spark & Kafka integration
Architecture – other spark options
We can use hourly window , do the aggregation in spark and overwrite C* raw in real time …
https://databricks.com/blog/2016/02/01/faster-stateful-stream-processing-in-spark-
streaming.html
https://docs.cloud.databricks.com/docs/spark/1.6/examples/Streaming%20mapWithState.html
Stateful Spark Streaming
Architecture 3.0
Stream
Events
Events
Raw Data
Events
Consumer
Consumer
Raw Data
Aggregation
Aggregation
Raw Data
Analytic data uses spark stream to transfer Kafka raw data to Parquet.
Regular Kafka consumer saves raw data backup in S3 (for streaming failure, spark
batch will convert them to parquet)
Aggregation data uses statefull Spark Streaming (mapWithState) to update C*
In case streaming failure spark batch will update data from Parquet to C*
Architecture
Pros:
Real-time DB updates
Cons:
 Too many components, relatively expensive (comparing to phase 1)
 According to documentation Spark upgrade has an issue with checkpoint
http://www.slideshare.net/planetcassandra/tuplejump-breakthrough-olap-performance-on-cassandra-and-spark?
ref=http://www.planetcassandra.org/blog/introducing-filodb/
Whats Next … FiloDB ? (probably
not , lots of nodes)
Parquet performance based on C*
Questions?
val ssc = new StreamingContext(sparkConfig.sparkConf, Seconds(batchTime))
val kafkaStreams = (1 to sparkConfig.workers) map {
i => new
FixedKafkaInputDStream[String, AggregationEvent, StringDecoder,
SerializedDecoder[AggregationEvent]](ssc,
kafkaConfiguration.kafkaMapParams,
topicMap,
StorageLevel.MEMORY_ONLY_SER).map(_._2) // for write ahead log
}
val unifiedStream = ssc.union(kafkaStreams) // manage all streams as one
val mapped = unifiedStream flatMap {
event => Aggregations.getEventAggregationsKeysAndValues(Option(event))
// convert event to aggregation object which contains
//key (“advertiserId”, “countryId”) and values (“click”, “impression”)
}
val reduced = mapped.reduceByKey {
_ + _ // per aggregation type we created “+” method that
//describe how to do the aggregation
}
K1 =
advertiserId = 5
countryId = 8
V1 =
clicks = 2
impression = 17
k1(e), v1(e)
k1(e), v2(e)
k2(e), v3(e)
k1(e), v1+v2
k2(e), v3(e)
In house Code –
Kafka messages semantics
(offset)

Weitere ähnliche Inhalte

Was ist angesagt?

Spark Summit 2014: Spark Job Server Talk
Spark Summit 2014:  Spark Job Server TalkSpark Summit 2014:  Spark Job Server Talk
Spark Summit 2014: Spark Job Server Talk
Evan Chan
 

Was ist angesagt? (20)

Building production spark streaming applications
Building production spark streaming applicationsBuilding production spark streaming applications
Building production spark streaming applications
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
 
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionA Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In Production
 
Spark Summit 2014: Spark Job Server Talk
Spark Summit 2014:  Spark Job Server TalkSpark Summit 2014:  Spark Job Server Talk
Spark Summit 2014: Spark Job Server Talk
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
 
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
whats new in java 8
whats new in java 8 whats new in java 8
whats new in java 8
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache spark
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream Processing
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
 
Stream Processing made simple with Kafka
Stream Processing made simple with KafkaStream Processing made simple with Kafka
Stream Processing made simple with Kafka
 
Lambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie StricklandLambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie Strickland
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
 
Robust and Scalable ETL over Cloud Storage with Apache Spark
Robust and Scalable ETL over Cloud Storage with Apache SparkRobust and Scalable ETL over Cloud Storage with Apache Spark
Robust and Scalable ETL over Cloud Storage with Apache Spark
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 

Andere mochten auch

Open data centers afcom - f13 dcw speaker ppt final - erik levitt
Open data centers   afcom - f13 dcw speaker ppt final - erik levittOpen data centers   afcom - f13 dcw speaker ppt final - erik levitt
Open data centers afcom - f13 dcw speaker ppt final - erik levitt
Ilissa Miller
 
ClickLabs-Corporate-Brochure
ClickLabs-Corporate-BrochureClickLabs-Corporate-Brochure
ClickLabs-Corporate-Brochure
Cyndi Satorre
 

Andere mochten auch (20)

Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFramesTaking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFrames
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Dori waldman android _course
Dori waldman android _courseDori waldman android _course
Dori waldman android _course
 
Technology & Business - Wharton 2014
Technology & Business - Wharton 2014Technology & Business - Wharton 2014
Technology & Business - Wharton 2014
 
Open data centers afcom - f13 dcw speaker ppt final - erik levitt
Open data centers   afcom - f13 dcw speaker ppt final - erik levittOpen data centers   afcom - f13 dcw speaker ppt final - erik levitt
Open data centers afcom - f13 dcw speaker ppt final - erik levitt
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
Fikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine LearningFikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine Learning
 
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
 
ClickLabs-Corporate-Brochure
ClickLabs-Corporate-BrochureClickLabs-Corporate-Brochure
ClickLabs-Corporate-Brochure
 
Tale of Kafka Consumer for Spark Streaming
Tale of Kafka Consumer for Spark StreamingTale of Kafka Consumer for Spark Streaming
Tale of Kafka Consumer for Spark Streaming
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
 
How to hack into the big data team
How to hack into the big data teamHow to hack into the big data team
How to hack into the big data team
 
Dori waldman android _course_2
Dori waldman android _course_2Dori waldman android _course_2
Dori waldman android _course_2
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVRO
 
Building Micro-Services with Scala
Building Micro-Services with ScalaBuilding Micro-Services with Scala
Building Micro-Services with Scala
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 

Ähnlich wie Spark streaming with kafka

Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
DataWorks Summit
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Databricks
 
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Databricks
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark API
Databricks
 

Ähnlich wie Spark streaming with kafka (20)

Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
 
Meetup spark structured streaming
Meetup spark structured streamingMeetup spark structured streaming
Meetup spark structured streaming
 
Strata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark StreamingStrata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark Streaming
 
Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...
Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...
Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...
 
Scala in increasingly demanding environments - DATABIZ
Scala in increasingly demanding environments - DATABIZScala in increasingly demanding environments - DATABIZ
Scala in increasingly demanding environments - DATABIZ
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
 
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark
 
Getting Started with Spark Structured Streaming - Current 22
Getting Started with Spark Structured Streaming - Current 22Getting Started with Spark Structured Streaming - Current 22
Getting Started with Spark Structured Streaming - Current 22
 
Getting Started With Spark Structured Streaming With Dustin Vannoy | Current ...
Getting Started With Spark Structured Streaming With Dustin Vannoy | Current ...Getting Started With Spark Structured Streaming With Dustin Vannoy | Current ...
Getting Started With Spark Structured Streaming With Dustin Vannoy | Current ...
 
Productionalizing spark streaming applications
Productionalizing spark streaming applicationsProductionalizing spark streaming applications
Productionalizing spark streaming applications
 
Apache Spark Structured Streaming + Apache Kafka = ♡
Apache Spark Structured Streaming + Apache Kafka = ♡Apache Spark Structured Streaming + Apache Kafka = ♡
Apache Spark Structured Streaming + Apache Kafka = ♡
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark API
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
 
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard MaasSpark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard Maas
 
Introduction to Spark with Scala
Introduction to Spark with ScalaIntroduction to Spark with Scala
Introduction to Spark with Scala
 
Witsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingWitsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streaming
 

Mehr von Dori Waldman (7)

openai.pptx
openai.pptxopenai.pptx
openai.pptx
 
iceberg introduction.pptx
iceberg introduction.pptxiceberg introduction.pptx
iceberg introduction.pptx
 
spark stream - kafka - the right way
spark stream - kafka - the right way spark stream - kafka - the right way
spark stream - kafka - the right way
 
Druid meetup @walkme
Druid meetup @walkmeDruid meetup @walkme
Druid meetup @walkme
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
 
Druid
DruidDruid
Druid
 
Memcached
MemcachedMemcached
Memcached
 

Kürzlich hochgeladen

Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Kürzlich hochgeladen (20)

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 

Spark streaming with kafka