SlideShare a Scribd company logo
1 of 26
Download to read offline
Apache Cassandra and Python
For streaming Big Data
Prajod S Vettiyattil
Architect, Wipro
@prajods
https://in.linkedin.com/in/prajod
Nishant Sahay
Architect, Wipro
@nsahaytech
https://in.linkedin.com/in/nishantsahay
1
Open Source India
Nov 2015
Database track
Agenda
1. Time Series Data Analysis
2. Spark, Python, Cassandra and D3
3. Business problem
4. Solution using Logical Architecture
5. Data Processor
6. Data Persistence
7. Data Visualization
2
What this session is about
3
What
Big Data
Streaming
Time
Series
How
Spark
Python
Cassandra
D3.js,
Node.js
Tools: Python, Spark, Cassandra, Node and D3
• Python and Spark for Big data processing
• Cassandra for persistence and serving
• D3 for visualization
• Node for
• Enabling scalability
• Data aggregation
4
python
• Popular with Open source projects
• Wide support base
• Strong in data science
• Visualization libraries
• Statistics functions
5
Cassandra
• noSQL database
• Column family
• Dynamic columns
• AP in CAP theorem
• Tunable consistency
• Suited for time series storage
6
D3.js
• Data driven documents
• SVG, html, css and javascript
• Fine grained control of screen elements
• Plethora of UI widgets
7
Business Problem
•Handle streaming data
•Stock ticks
•Weather movements
•Satellite captures
•Astronomical observations
•Large Hadron Collider
•Ingest
•Persist
•Visualize
•Analysing stock prices
8
Logical Solution Architecture
Time Series
Data Producer
(IoT devices, Stock ticks)
Data Processor
(pySpark)
Data
Persistence
(Cassandra)
Visualization
Aggregator
(Node.js)
Visualization
(D3.js)
9
Data Processor: pySpark
•Apache Spark is a big data processor
•Streaming data
•Batch data
•Lambda architecture
•pySpark for using python’s power on top of Spark
•python
•Machine learning
•Statistics
•Visualization
•Cassandra integration
•pyspark-cassandra adapter from TargetHoldings
10
Logical Architecture diagram of Spark
Apache Spark
Spark
SQL
MLlib GraphX SparkR pySpark
11
Spark
Streaming
Apache Spark: Core
• In memory processing for Big Data
• Cached intermediate data sets
• Multi-step DAG based execution
• Resilient Distributed Data(RDD) sets
12
pySpark and Cassandra
Java
Python
Cassandra
13
Apache Spark: Processing stock ticks
• Ingest stock tick stream, coming in at a high rate
• Calculate moving average of stock prices
• Insert the average of prices into Cassandra
14
Data Persistence - Cassandra
• Master less: Peer to peer
• Built to Scale: Scales to support millions of operations per second
• High Availability: No single point of failure
• Ease of Use: Operational simplicity, CQL for developers
• It is supposedly battle tested at Facebook, Apple and Netflix :-)
15
Data Persistence - Cassandra
16
n1
n5
n2
n4
n3n7
n8
n6
Write Request -
Partition Key Hash value for n1
n8 – Coordinator Node
n1 – Primary responsible node handling
request
n2, n3 – Replication Nodes (RF=3)
Cassandra Data Model – Skinny Rows
Skinny Rows:
Primary Key with only partition key
CREATE TABLE stock_info(stock_id text, date text, price double, PRIMARY KEY
((stock_id, date));
stock_id date price
GAZP 2015-11-11 556.50
GAZP 2015-11-10 556.65
GAZP:2015-11-11
price
556.50
GAZP:2015-11-10
price
556.65
17
Composite Partition Key
Logical View Disc View
Node n1
Node n4
Cassandra Data Model – Wide Rows
Wide Rows
Primary key contains column (Clustering Columns) other than the
partition key.
CREATE TABLE stock_ticker(stock_id text, price double, event_time timestamp ,
PRIMARY KEY (stock_id, event_time);
GAZP
2015-11-10
13:30:00:price
556.45
2015-11-10
09:30:00:price
559.45
stock_
id
price date event_time
GAZP 559.45 2015-11-10 2015-11-10
09:30:00
GAZP 556.45 2015-11-10 2015-11-10
13:30:00
GAZP 556.65 2015-11-11 2015-11-11
18:00:00
2015-11-11
16:00:00:price
556.65
18
Logical View Disc View
Compound Primary Key (Partition+Clustering)
Node n1
Time Series – Cassandra Data Model
Wide Row + Row Partition
CREATE TABLE stock_info(stock_id text, date text, price double, event_time
timestamp, PRIMARY KEY ((stock_id, date), event_time);
stock_id price date event_time
GAZP 559.45 2015-11-10 2015-11-10
09:30:00
GAZP 556.45 2015-11-10 2015-11-10
13:30:00
GAZP 556.65 2015-11-11 2015-11-11
18:00:00
GAZP:2015-11-10
2015-11-10
13:30:00:price
556.45
2015-11-10
09:30:00:price
559.45
GAZP:2015-11-11
2015-11-11
18:00:00:price
556.65
19
Logical View Disc View
Node n1
Node n6
Summary – Cassandra Data Model
Skinny Row
Wide Row
Wide Row + Row Partition
Optimize with Expiring
Columns/Split day bucket to
multiple rows
20
GAZP:2015-11-10
2015-11-10 13:30:00:price
556.45
2015-11-10 09:30:00:price
559.45
GAZP:2015-11-11
2015-11-11 18:00:00:price
556.65
Node n1
Node n6
GAZP
2015-11-10
13:30:00:price
556.45
2015-11-10
09:30:00:price
559.45
2015-11-11
16:00:00:price
556.65
Node n1
GAZP:2015-11-11
price
556.50
GAZP:2015-11-10
price
556.65
Node n1
Node n4
Node.js, Cassandra and D3.js
D3.js graph
Browser
Web UI Layer
ExpressJS
cassandra-
driver
Server Layer Database Layer
Cassandra
DB
Rest Based
Polling
Get JSON
Data
CQL – Select
Time Series
Data
21
Data Aggregator
• Node.js is proxy for data aggregation
• Expose Rest endpoint for visualization
• Retrieve data from Cassandra
• Data transformation as per business need
• ExpressJS: Flexible web application framework
• Datastax cassandra-driver: client library for Apache Cassandra
• EJS: For quick templating of on-the-fly node application
22
Visualization - Frameworks
• D3 for transformation of time series data into visual information
• Consume REST API
• Generate customized data driven graphs and visualization
• Rickshaw is a JavaScript toolkit for creating interactive time series
graphs
• Built on D3.js
• Generate time-series graph
23
Visualization – Graphs
2424
Price
Moving Average
Trade Volume
Stock Price
Summary
• Processing time series data
• Apache Spark
• Cassandra
• Node.js
• D3.js
25
QUESTIONS
Prajod S Vettiyattil
Architect, Wipro
@prajods
https://in.linkedin.com/in/prajod
Nishant Sahay
Architect, Wipro
@nsahaytech
https://in.linkedin.com/in/nishantsahay

More Related Content

What's hot

C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...DataStax Academy
 
Spark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark Summit
 
An Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise SearchAn Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise SearchPatricia Gorla
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.Natalino Busa
 
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayAnalytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayMatthias Niehoff
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Helena Edelson
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Brian O'Neill
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax EnablementVincent Poncet
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Summit
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesRussell Spitzer
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesDuyhai Doan
 
Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringAnant Rustagi
 
The How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkThe How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkLegacy Typesafe (now Lightbend)
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureRussell Spitzer
 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisDuyhai Doan
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1Joe Stein
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandranickmbailey
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsMiklos Christine
 

What's hot (20)

C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
 
Spark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher Batey
 
An Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise SearchAn Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise Search
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
 
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayAnalytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax Enablement
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector Dataframes
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-Cases
 
Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroring
 
The How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkThe How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache Spark
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and Future
 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
 

Similar to Apache Cassandra and Python for Analyzing Streaming Big Data

Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudKaran Singh
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsYousun Jeong
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
High performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataHigh performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataVMware Tanzu
 
High performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataHigh performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataCarlos Andrés García
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...Chetan Khatri
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
 
Cassandra Lunch #89: Semi-Structured Data in Cassandra
Cassandra Lunch #89: Semi-Structured Data in CassandraCassandra Lunch #89: Semi-Structured Data in Cassandra
Cassandra Lunch #89: Semi-Structured Data in CassandraAnant Corporation
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Djamel Zouaoui
 
How Cloudflare analyzes -1m dns queries per second @ Percona E17
How Cloudflare analyzes -1m dns queries per second @ Percona E17How Cloudflare analyzes -1m dns queries per second @ Percona E17
How Cloudflare analyzes -1m dns queries per second @ Percona E17Tom Arnfeld
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & SparkMatthias Niehoff
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in SparkDatabricks
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Databricks
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksAnyscale
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraVictor Coustenoble
 
SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue SnappyData
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015Robbie Strickland
 

Similar to Apache Cassandra and Python for Analyzing Streaming Big Data (20)

Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloud
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
High performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataHigh performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyData
 
High performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataHigh performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyData
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
 
Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
Cassandra Lunch #89: Semi-Structured Data in Cassandra
Cassandra Lunch #89: Semi-Structured Data in CassandraCassandra Lunch #89: Semi-Structured Data in Cassandra
Cassandra Lunch #89: Semi-Structured Data in Cassandra
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
How Cloudflare analyzes -1m dns queries per second @ Percona E17
How Cloudflare analyzes -1m dns queries per second @ Percona E17How Cloudflare analyzes -1m dns queries per second @ Percona E17
How Cloudflare analyzes -1m dns queries per second @ Percona E17
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & Spark
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 

More from prajods

Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelinprajods
 
Event Driven Architecture with Apache Camel
Event Driven Architecture with Apache CamelEvent Driven Architecture with Apache Camel
Event Driven Architecture with Apache Camelprajods
 
RedHat MRG and Infinispan for Large Scale Integration
RedHat MRG and Infinispan for Large Scale IntegrationRedHat MRG and Infinispan for Large Scale Integration
RedHat MRG and Infinispan for Large Scale Integrationprajods
 
Apache Spark: The Next Gen toolset for Big Data Processing
Apache Spark: The Next Gen toolset for Big Data ProcessingApache Spark: The Next Gen toolset for Big Data Processing
Apache Spark: The Next Gen toolset for Big Data Processingprajods
 
JUDCon 2014: Gearing up for mobile development with AeroGear
JUDCon 2014: Gearing up for mobile development with AeroGearJUDCon 2014: Gearing up for mobile development with AeroGear
JUDCon 2014: Gearing up for mobile development with AeroGearprajods
 
Enabling Data as a Service with the JBoss Enterprise Data Services Platform
Enabling Data as a Service with the JBoss Enterprise Data Services PlatformEnabling Data as a Service with the JBoss Enterprise Data Services Platform
Enabling Data as a Service with the JBoss Enterprise Data Services Platformprajods
 
Apache Camel: The Swiss Army Knife of Open Source Integration
Apache Camel: The Swiss Army Knife of Open Source IntegrationApache Camel: The Swiss Army Knife of Open Source Integration
Apache Camel: The Swiss Army Knife of Open Source Integrationprajods
 

More from prajods (7)

Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 
Event Driven Architecture with Apache Camel
Event Driven Architecture with Apache CamelEvent Driven Architecture with Apache Camel
Event Driven Architecture with Apache Camel
 
RedHat MRG and Infinispan for Large Scale Integration
RedHat MRG and Infinispan for Large Scale IntegrationRedHat MRG and Infinispan for Large Scale Integration
RedHat MRG and Infinispan for Large Scale Integration
 
Apache Spark: The Next Gen toolset for Big Data Processing
Apache Spark: The Next Gen toolset for Big Data ProcessingApache Spark: The Next Gen toolset for Big Data Processing
Apache Spark: The Next Gen toolset for Big Data Processing
 
JUDCon 2014: Gearing up for mobile development with AeroGear
JUDCon 2014: Gearing up for mobile development with AeroGearJUDCon 2014: Gearing up for mobile development with AeroGear
JUDCon 2014: Gearing up for mobile development with AeroGear
 
Enabling Data as a Service with the JBoss Enterprise Data Services Platform
Enabling Data as a Service with the JBoss Enterprise Data Services PlatformEnabling Data as a Service with the JBoss Enterprise Data Services Platform
Enabling Data as a Service with the JBoss Enterprise Data Services Platform
 
Apache Camel: The Swiss Army Knife of Open Source Integration
Apache Camel: The Swiss Army Knife of Open Source IntegrationApache Camel: The Swiss Army Knife of Open Source Integration
Apache Camel: The Swiss Army Knife of Open Source Integration
 

Recently uploaded

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 

Recently uploaded (20)

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 

Apache Cassandra and Python for Analyzing Streaming Big Data

  • 1. Apache Cassandra and Python For streaming Big Data Prajod S Vettiyattil Architect, Wipro @prajods https://in.linkedin.com/in/prajod Nishant Sahay Architect, Wipro @nsahaytech https://in.linkedin.com/in/nishantsahay 1 Open Source India Nov 2015 Database track
  • 2. Agenda 1. Time Series Data Analysis 2. Spark, Python, Cassandra and D3 3. Business problem 4. Solution using Logical Architecture 5. Data Processor 6. Data Persistence 7. Data Visualization 2
  • 3. What this session is about 3 What Big Data Streaming Time Series How Spark Python Cassandra D3.js, Node.js
  • 4. Tools: Python, Spark, Cassandra, Node and D3 • Python and Spark for Big data processing • Cassandra for persistence and serving • D3 for visualization • Node for • Enabling scalability • Data aggregation 4
  • 5. python • Popular with Open source projects • Wide support base • Strong in data science • Visualization libraries • Statistics functions 5
  • 6. Cassandra • noSQL database • Column family • Dynamic columns • AP in CAP theorem • Tunable consistency • Suited for time series storage 6
  • 7. D3.js • Data driven documents • SVG, html, css and javascript • Fine grained control of screen elements • Plethora of UI widgets 7
  • 8. Business Problem •Handle streaming data •Stock ticks •Weather movements •Satellite captures •Astronomical observations •Large Hadron Collider •Ingest •Persist •Visualize •Analysing stock prices 8
  • 9. Logical Solution Architecture Time Series Data Producer (IoT devices, Stock ticks) Data Processor (pySpark) Data Persistence (Cassandra) Visualization Aggregator (Node.js) Visualization (D3.js) 9
  • 10. Data Processor: pySpark •Apache Spark is a big data processor •Streaming data •Batch data •Lambda architecture •pySpark for using python’s power on top of Spark •python •Machine learning •Statistics •Visualization •Cassandra integration •pyspark-cassandra adapter from TargetHoldings 10
  • 11. Logical Architecture diagram of Spark Apache Spark Spark SQL MLlib GraphX SparkR pySpark 11 Spark Streaming
  • 12. Apache Spark: Core • In memory processing for Big Data • Cached intermediate data sets • Multi-step DAG based execution • Resilient Distributed Data(RDD) sets 12
  • 14. Apache Spark: Processing stock ticks • Ingest stock tick stream, coming in at a high rate • Calculate moving average of stock prices • Insert the average of prices into Cassandra 14
  • 15. Data Persistence - Cassandra • Master less: Peer to peer • Built to Scale: Scales to support millions of operations per second • High Availability: No single point of failure • Ease of Use: Operational simplicity, CQL for developers • It is supposedly battle tested at Facebook, Apple and Netflix :-) 15
  • 16. Data Persistence - Cassandra 16 n1 n5 n2 n4 n3n7 n8 n6 Write Request - Partition Key Hash value for n1 n8 – Coordinator Node n1 – Primary responsible node handling request n2, n3 – Replication Nodes (RF=3)
  • 17. Cassandra Data Model – Skinny Rows Skinny Rows: Primary Key with only partition key CREATE TABLE stock_info(stock_id text, date text, price double, PRIMARY KEY ((stock_id, date)); stock_id date price GAZP 2015-11-11 556.50 GAZP 2015-11-10 556.65 GAZP:2015-11-11 price 556.50 GAZP:2015-11-10 price 556.65 17 Composite Partition Key Logical View Disc View Node n1 Node n4
  • 18. Cassandra Data Model – Wide Rows Wide Rows Primary key contains column (Clustering Columns) other than the partition key. CREATE TABLE stock_ticker(stock_id text, price double, event_time timestamp , PRIMARY KEY (stock_id, event_time); GAZP 2015-11-10 13:30:00:price 556.45 2015-11-10 09:30:00:price 559.45 stock_ id price date event_time GAZP 559.45 2015-11-10 2015-11-10 09:30:00 GAZP 556.45 2015-11-10 2015-11-10 13:30:00 GAZP 556.65 2015-11-11 2015-11-11 18:00:00 2015-11-11 16:00:00:price 556.65 18 Logical View Disc View Compound Primary Key (Partition+Clustering) Node n1
  • 19. Time Series – Cassandra Data Model Wide Row + Row Partition CREATE TABLE stock_info(stock_id text, date text, price double, event_time timestamp, PRIMARY KEY ((stock_id, date), event_time); stock_id price date event_time GAZP 559.45 2015-11-10 2015-11-10 09:30:00 GAZP 556.45 2015-11-10 2015-11-10 13:30:00 GAZP 556.65 2015-11-11 2015-11-11 18:00:00 GAZP:2015-11-10 2015-11-10 13:30:00:price 556.45 2015-11-10 09:30:00:price 559.45 GAZP:2015-11-11 2015-11-11 18:00:00:price 556.65 19 Logical View Disc View Node n1 Node n6
  • 20. Summary – Cassandra Data Model Skinny Row Wide Row Wide Row + Row Partition Optimize with Expiring Columns/Split day bucket to multiple rows 20 GAZP:2015-11-10 2015-11-10 13:30:00:price 556.45 2015-11-10 09:30:00:price 559.45 GAZP:2015-11-11 2015-11-11 18:00:00:price 556.65 Node n1 Node n6 GAZP 2015-11-10 13:30:00:price 556.45 2015-11-10 09:30:00:price 559.45 2015-11-11 16:00:00:price 556.65 Node n1 GAZP:2015-11-11 price 556.50 GAZP:2015-11-10 price 556.65 Node n1 Node n4
  • 21. Node.js, Cassandra and D3.js D3.js graph Browser Web UI Layer ExpressJS cassandra- driver Server Layer Database Layer Cassandra DB Rest Based Polling Get JSON Data CQL – Select Time Series Data 21
  • 22. Data Aggregator • Node.js is proxy for data aggregation • Expose Rest endpoint for visualization • Retrieve data from Cassandra • Data transformation as per business need • ExpressJS: Flexible web application framework • Datastax cassandra-driver: client library for Apache Cassandra • EJS: For quick templating of on-the-fly node application 22
  • 23. Visualization - Frameworks • D3 for transformation of time series data into visual information • Consume REST API • Generate customized data driven graphs and visualization • Rickshaw is a JavaScript toolkit for creating interactive time series graphs • Built on D3.js • Generate time-series graph 23
  • 24. Visualization – Graphs 2424 Price Moving Average Trade Volume Stock Price
  • 25. Summary • Processing time series data • Apache Spark • Cassandra • Node.js • D3.js 25
  • 26. QUESTIONS Prajod S Vettiyattil Architect, Wipro @prajods https://in.linkedin.com/in/prajod Nishant Sahay Architect, Wipro @nsahaytech https://in.linkedin.com/in/nishantsahay