SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Downloaden Sie, um offline zu lesen
@tgrall#Devoxx #sparkstreaming
Build a Time Series Application
with Spark and HBase
Tugdual Grall
@tgrall
MapR
Carol McDonald
@caroljmcdonald
MapR
@tgrall#Devoxx #sparkstreaming
Agenda
• Time Series
• Apache Spark & Spark Streaming
• Apache HBase
• Lab
@tgrall#Devoxx #sparkstreaming
About the Lab
• Use Spark & HBase in MapR Cluster
• Option 1: Use a SandBox (Virtual Box VM located on USB
Key)
• Option 2: Use Cloud Instance (SSH/SCP only)
• Content:
• Option 1: spark-streaming-hbase-workshop.zip on USB
• Option 2: download zip from
https://github.com/tgrall/spark-streaming-hbase-workshop
@tgrall#Devoxx #sparkstreaming
Time Series
@tgrall#Devoxx #sparkstreaming
What is a Time Series?
• Stuff with timestamps
• sensor measurements
• system stats
• log files
• ….
@tgrall#Devoxx #sparkstreaming
Got Some Examples?
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
What do we need to do?
• Acquire
• Measurement, transmission, reception
• Store
• Individually, or grouped for some amount of time
• Retrieve
• Ad hoc, flexible, correlate and aggregate
• Analyze and visualize
• We facilitate this via retrieval
@tgrall#Devoxx #sparkstreaming
Acquisition
Not usually our problem
• Sensors
• Data collection – agents, raspberry pi
• Transmission – via LAN/Wan, Mobile Network, Satellites
• Receipt into system – listening daemon or queue, or
depending on use case writing directly to the database
@tgrall#Devoxx #sparkstreaming
Storage Choice
• Flat files
• Great for rapid ingest with massive data
• Handles essentially any data type
• Less good for data requiring frequent updates
• Harder to find specific ranges
• Traditional RDBMS
• Ingests up to ~10,000/ sec; prefers well structured (numerical) data;
expensive
• NoSQL (such as MapR-DB or HBase)
• Easily handle 10,000 rows / sec / node – True linear scaling
• Handles wide variety of data
• Good for frequent updates
• Easily scanned in a range
@tgrall#Devoxx #sparkstreaming
Specific Example
Consider oil drilling rigs
• When drilling wells, there are *lots* of moving parts
• Typically a drilling rig makes about 10K samples/s
• Temperatures, pressures, magnetics, machine vibration
levels, salinity, voltage, currents, many others
• Typical project has 100 rigs
@tgrall#Devoxx #sparkstreaming
General Outline
10K samples / second / rig
x 100 rigs
= 1M samples / second
• But wait, there’s more
• Suppose you want to test your system
• Perhaps with a year of data
• And you want to load that data in << 1 year
• 100x real-time = 100M samples / second
@tgrall#Devoxx #sparkstreaming
Data Storage
• Typical time window is one hour
• Column names are offsets in time window
• Find series-uid in separate table
Key 13 43 73 103 …
…
series-uid.time-window 4.5 5.2 6.1 4.9
…
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
Why do we need NoSQL / HBase?
Relational Model
bottleneck
Key colB colC
val val val
xxx val val
Key colB colC
val val val
xxx val val
Key colB colC
val val val
xxx val val
Storage ModelRDBMS HBase
Distributed Joins, Transactions do
not scale
Data that is accessed together is
stored together
@tgrall#Devoxx #sparkstreaming
HBase is a ColumnFamily oriented Database
• Data is accessed and stored together:
• RowKey is the primary index
• Column Families group similar data by row key
CF_DATA
colA colB colC
Val val
val
CF_STATS
colA colB colC
val val
val
RowKey
series-abc.time-
window
series-efg.time-
window
Customer id Raw Data Stats
@tgrall#Devoxx #sparkstreaming
HBase is a Distributed Database
Key
Range
xxxx
xxxx
CF1
colA colB colC
val val
val
CF2
colA colB colC
val val
val
Key
Range
xxxx
xxxx
CF1
colA colB colC
val val
val
CF2
colA colB colC
val val
val
Key
Range
xxxx
xxxx
CF1
colA colB colC
val val
val
CF2
colA colB colC
val val
val
Put, Get by Key
Data is automatically
distributed across the cluster
• Key range is used for horizontal
partitioning
@tgrall#Devoxx #sparkstreaming
Basic Table Operations
• Create Table, define Column Families before data is
imported
• but not the rows keys or number/names of columns
• Low level API, technically more demanding
• Basic data access operations (CRUD):
put Inserts data into rows (both create and update)
get Accesses data from one row
scan Accesses data from a range of rows
delete Delete a row or a range of rows or columns
@tgrall#Devoxx #sparkstreaming
Learn More
• Free Online Training: http://learn.mapr.com
• DEV 320 - Apache HBase Data Model and Architecture
• DEV 325 - Apache HBase Schema Design
• DEV 330 - Developing Apache HBase Applications: Basics
• DEV 335 - Developing Apache HBase Applications: Advanced
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
What is Spark?
• Cluster Computing Platform
• Extends “MapReduce” with
extensions
• Streaming
• Interactive Analytics
• Run in Memory
@tgrall#Devoxx #sparkstreaming
What is Spark?
Fast
• 100x faster than M/R
Logistic regression in Hadoop and Spark
@tgrall#Devoxx #sparkstreaming
What is Spark?
Ease of Development
• Write programs quickly
• More Operators
• Interactive Shell
• Less Code
@tgrall#Devoxx #sparkstreaming
What is Spark?
Multi Language Support
• Scala
• Python
• Java
• SparkR
@tgrall#Devoxx #sparkstreaming
What is Spark?
Deployment Flexibility
• Deployment
• Local
• Standalone
• Storage
• HDFS
• MapR-FS
• S3
• Cassandra
• YARN
• Mesos
@tgrall#Devoxx #sparkstreaming
Unified Platform
Spark SQL
Spark Streaming
(Streaming)
MLlib
(Machine Learning)
Spark Core (General execution engine)
GraphX
(Graph Computation)
@tgrall#Devoxx #sparkstreaming
Spark Components
Driver Program
(application)
SparkContext
Cluster Manager
Worker
Executor
Task Task
Worker
Executor
Task Task
@tgrall#Devoxx #sparkstreaming
Spark Resilient Distributed Datasets
Sensor RDD
W
Executor
P4
W
Executor
P1 P3
W
Executor
P2
sc.textFile P1
8213034705, 95,
2.927373,
jake7870, 0……
P2
8213034705,
115, 2.943484,
Davidbresler2,
1….
P3
8213034705,
100, 2.951285,
gladimacowgirl,
58…
P4
8213034705,
117, 2.998947,
daysrus, 95….
@tgrall#Devoxx #sparkstreaming
Spark Resilient Distributed Datasets
Transformation
Filter()
Action
Count()
RDD
newRDD
Value
@tgrall#Devoxx #sparkstreaming
Spark Streaming
Spark SQL
Spark Streaming
(Streaming)
MLlib
(Machine Learning)
Spark Core (General execution engine)
GraphX
(Graph Computation)
@tgrall#Devoxx #sparkstreaming
What is Streaming?
• Data Stream:
• Unbounded sequence of data arriving continuously
• Stream processing:
• Low latency processing, querying, and analyzing of real time
streaming data
@tgrall#Devoxx #sparkstreaming
Why Spark Streaming
• Many applications must process
streaming data
• With the following Requirements:
• Results in near-real-time
• Handle large workloads
• latencies of few seconds
• Use Cases
• Website statistics, monitoring
• IoT
• Fraud detection
• Social network trends
• Advertising click monetization
put
put
put
put
Time stamped data
data
• Sensor, System Metrics, Events, log files
• Stock Ticker, User Activity
• Hi Volume, Velocity
Data for real-time
monitoring
@tgrall#Devoxx #sparkstreaming
What is Spark Streaming?
• Enables scalable, high-throughput, fault-tolerant stream
processing of live data
• Extension of the core Spark
Data Sources Data Sinks
@tgrall#Devoxx #sparkstreaming
Spark Streaming Architecture
• Divide data stream into batches of X seconds
• Called DStream = sequence of RDDs
Spark
Streaming
input data
stream
DStream RDD batches
Batch
interval
data from
time 0 to 1
data from
time 1 to 2
RDD @ time 2
data from
time 2 to 3
RDD @ time 3RDD @ time 1
@tgrall#Devoxx #sparkstreaming
Process DStream
• Process using transformations
• creates new RDDs
transform
Transform
map
reduceByValue
count
DStream
RDDs
Dstream
RDDs
transformtransform
data from
time 0 to 1
data from
time 1 to 2
RDD @ time 2
data from
time 2 to 3
RDD @ time 3RDD @ time 1
RDD @ time 1 RDD @ time 2 RDD @ time 3
@tgrall#Devoxx #sparkstreaming
Time Series
Data for
real-time monitoring
read
Sensor
Time stamped data
HBase
Processing
data
@tgrall#Devoxx #sparkstreaming
Lab “flow”
@tgrall#Devoxx #sparkstreaming
Convert Line of CSV data to Sensor
Object
case class Sensor(resid: String, date: String, time: String,
hz: Double, disp: Double, flo: Double, sedPPM: Double,
psi: Double, chlPPM: Double)
def parseSensor(str: String): Sensor = {
val p = str.split(",")
Sensor(p(0), p(1), p(2), p(3).toDouble, p(4).toDouble, p(5).toDouble,
p(6).toDouble, p(7).toDouble, p(8).toDouble)
}
@tgrall#Devoxx #sparkstreaming
Create a DStream
val ssc = new StreamingContext(sparkConf, Seconds(2))
val linesDStream = ssc.textFileStream(“/mapr/stream")
batch
time 0-1
linesDStream
batch
time 1-2
batch
time 1-2
DStream: a sequence of RDDs representing a
stream of data
stored in memory as an
RDD
@tgrall#Devoxx #sparkstreaming
Process DStream
val linesDStream = ssc.textFileStream(”directory path")
val sensorDStream = linesDStream.map(parseSensor)
map
new RDDs created for
every batch
batch
time 0-1
linesDStream RDDs
sensorDstream RDDs
batch
time 1-2
mapmap
batch
time 1-2
@tgrall#Devoxx #sparkstreaming
Save to HBase
rdd.map(Sensor.convertToPut).saveAsHadoopDataset(jobConfig)
Put objects written
To HBase
batch
time 0-1
linesRDD DStream
sensorRDD Dstream
batch
time 1-2
map
batch
time 1-2
HBase
save save save
output operation: persist data to external storage
map map
@tgrall#Devoxx #sparkstreaming
Learn More
• Free Spark Online Training: http://learn.mapr.com

Weitere ähnliche Inhalte

Was ist angesagt?

NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
Carol McDonald
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
MapR Technologies
 

Was ist angesagt? (20)

Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down Internet
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes Strategic
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
 
Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confHadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA conf
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real TimeApache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
 
MapR & Skytree:
MapR & Skytree: MapR & Skytree:
MapR & Skytree:
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 

Andere mochten auch

A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915
Dan Han
 
MongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: BenchmarkingMongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: Benchmarking
Olga Lavrentieva
 
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
vithakur
 
HBase Consistency and Performance Improvements
HBase Consistency and Performance ImprovementsHBase Consistency and Performance Improvements
HBase Consistency and Performance Improvements
DataWorks Summit
 
Search Analytics with Flume and HBase
Search Analytics with Flume and HBaseSearch Analytics with Flume and HBase
Search Analytics with Flume and HBase
Sematext Group, Inc.
 

Andere mochten auch (20)

Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Spark + HBase
Spark + HBase Spark + HBase
Spark + HBase
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
 
C* Summit 2013: High Throughput Analytics with Cassandra by Aaron Stannard
C* Summit 2013: High Throughput Analytics with Cassandra by Aaron StannardC* Summit 2013: High Throughput Analytics with Cassandra by Aaron Stannard
C* Summit 2013: High Throughput Analytics with Cassandra by Aaron Stannard
 
Spark!
Spark!Spark!
Spark!
 
Apache spark core
Apache spark coreApache spark core
Apache spark core
 
SparkSQL et Cassandra - Tool In Action Devoxx 2015
 SparkSQL et Cassandra - Tool In Action Devoxx 2015 SparkSQL et Cassandra - Tool In Action Devoxx 2015
SparkSQL et Cassandra - Tool In Action Devoxx 2015
 
The SparkSQL things you maybe confuse
The SparkSQL things you maybe confuseThe SparkSQL things you maybe confuse
The SparkSQL things you maybe confuse
 
Getting started with SparkSQL - Desert Code Camp 2016
Getting started with SparkSQL  - Desert Code Camp 2016Getting started with SparkSQL  - Desert Code Camp 2016
Getting started with SparkSQL - Desert Code Camp 2016
 
Making Scrum Work Inside Small Businesses
Making Scrum Work Inside Small Businesses Making Scrum Work Inside Small Businesses
Making Scrum Work Inside Small Businesses
 
Streaming map reduce
Streaming map reduceStreaming map reduce
Streaming map reduce
 
Musings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBaseMusings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBase
 
MongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: BenchmarkingMongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: Benchmarking
 
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
 
Lspe
LspeLspe
Lspe
 
HBase Consistency and Performance Improvements
HBase Consistency and Performance ImprovementsHBase Consistency and Performance Improvements
HBase Consistency and Performance Improvements
 
Apache HBase 0.98
Apache HBase 0.98Apache HBase 0.98
Apache HBase 0.98
 
Search Analytics with Flume and HBase
Search Analytics with Flume and HBaseSearch Analytics with Flume and HBase
Search Analytics with Flume and HBase
 

Ähnlich wie Build a Time Series Application with Apache Spark and Apache HBase

Ähnlich wie Build a Time Series Application with Apache Spark and Apache HBase (20)

ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
 
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
 
Big Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David GiardBig Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David Giard
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components
 
Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
 
Spark to DocumentDB connector
Spark to DocumentDB connectorSpark to DocumentDB connector
Spark to DocumentDB connector
 
Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 

Mehr von Carol McDonald

Mehr von Carol McDonald (18)

Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUs
 
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
 
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBAnalyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
 
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
 
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningPredicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
 
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBStructured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
 
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningDemystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
 
Spark machine learning predicting customer churn
Spark machine learning predicting customer churnSpark machine learning predicting customer churn
Spark machine learning predicting customer churn
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine Learning
 
Machine Learning Recommendations with Spark
Machine Learning Recommendations with SparkMachine Learning Recommendations with Spark
Machine Learning Recommendations with Spark
 
CU9411MW.DOC
CU9411MW.DOCCU9411MW.DOC
CU9411MW.DOC
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBase
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Build a Time Series Application with Apache Spark and Apache HBase