SlideShare ist ein Scribd-Unternehmen logo
1 von 39
© Copyright 2015 Glassbeam Inc.
Ad Hoc Analytics
on
Internet of Complex Things
with
Spark and Cassandra
Mohammed Guller
September 2015
© Copyright 2015 Glassbeam Inc.
About Me
 Principal Architect at Glassbeam
 Founded two startups
 Passionate about building products, big
data analytics, and machine learning
www.linkedin.com/in/mohammedguller
@MohammedGuller
4
Available on Amazon
© Copyright 2015 Glassbeam Inc.
Internet of Things (IoT)
5
Network of objects embedded with software for
collecting and exchanging data over the Internet
© Copyright 2015 Glassbeam Inc.
Internet of Complex Things (IoCT)
6
 Data Center Devices
– Server, storage, controller
 Medical Devices
– X-Ray, MRI scan, CT scan
 Manufacturing Systems
 Cars
 Electric Vehicle Chargers
 Other Complex Devices
Glassbeam target market is focused on driving opera onal & business
naly cs value for connected product companies in Industrial IoT market
IT & Networks Medical & Health Care EV Chargers & Smart Grid
© Copyright 2015 Glassbeam Inc.
IT & Networks
Medical &
Healthcare
EV Chargers &
Smart Grid
Industrial & Mfg
Transportation
Glassbeam
7
target market is focused on driving opera onal & business
ue for connected product companies in Industrial IoT market
rks Medical & Health Care
Transporta on
EV Chargers & Smart Grid
Industrial & Mfg
5
Glassbeam target market is focused on d
analy cs value for connected product com
IT & Networks Medical & Health Care
TrIndustrial & Mfg
market is focused on driving opera onal & business
connected product companies in Industrial IoT market
Medical & Health Care EV Chargers & Smart Grid
Transporta on
5
Advanced and
Predictive Analytics
for Connected
Product Companies
© Copyright 2015 Glassbeam Inc.
10101000101011010101110101111010101000101001010101010111110001011001000110000110101110100110011111000000101
01101010011111000101001010110010100101100010011010101140101010000101010000101111001001101011010010101000001
11101010101000101011010101110101111010101000101001010101010111110001011001000110000110101110100110011111000
00010101101010011111000101001010110010100101100010011010101140101010000101010000101111001001101011010010101
00000100100110101101001001001101011010010010011010001001101011010010010011010110100101101001101001101001101
Analytics on Operational Data
8
Operational Data
to
Powerful Insights
© Copyright 2015 Glassbeam Inc.
High-level Architecture
9
1010100010101
10101011101011
1101010100010
1001010101010
11111000101100
1000110000110
10111010011001
11110000001010
11010100111110
0010100101011
0010100101100
0100110101011
4010101000010
10100001011110
0100110101101
0010101000001
11101001111001
0011010110100
1010101010100
0101011010101
11010111101010
1000101001010
10101011111000
1011001000110
00011010111010
011
Data
Inges on
Data
Transforma on
Data Stores Middleware Applica ons
Logs
(Streams/
docs)
SPL Library
S
C
A
L
A
R
I
N
F
O
S
E
R
V
E
R
LogVault
Explorer
Workbench
Standard Apps
Custom Apps
Rules & Alerts
DirectAccess
Glassbeam Studio
Cloud Enablement & Automa on
S3 Amazon
Raw logs
Cassandra
Processed Data
Solr Cloud
Index
Analy cs and
Machine learning
Spark
SQL
Spark
Streaming
MLlib
Event Processing & Rules Engine
End to End cloud based architecture built on modern
technologies to handle any machine, any data, any cloud
* SPL (Semiotic Parsing Language) and SCALAR are patent pending technology inventions of Glassbeam
© Copyright 2015 Glassbeam Inc.
Key Properties of IoCT Data
10
Volume Terabytes of Data
Variety Multi-structured Data
Velocity
Fast Paced Batch Data
Streaming Data
© Copyright 2015 Glassbeam Inc.
Why We Chose C*
11
Volume Economically Scale from Gigabytes
to Terabytes of Data
Variety Store Multi-structured Data
Velocity
Fast Ingest of New Data Quick
Reload of Old Data
Linear
Scalability
Dynamic
Schema
Fast
Writes
© Copyright 2015 Glassbeam Inc.
Modeling Data in C*
 Different from Modeling Data in RDBMS
 Queries Drive Table and Primary Key Definitions
– Primary Key Definition Limits the Kind of Queries You Can Run
– C* Does Not Support Joins
12
© Copyright 2015 Glassbeam Inc.
A Simple Table for Storing Event Data in C*
CREATE TABLE event (
sys_id text,
dt timestamp,
ts timestamp,
severity text,
module text,
message text,
PRIMARY KEY ((sys_id, dt), ts)
) WITH CLUSTERING ORDER BY (ts DESC);
13
© Copyright 2015 Glassbeam Inc.
Another Table to Filter Events by Severity
CREATE TABLE event_by_severity (
sys_id text,
dt timestamp,
ts timestamp,
severity text,
module text,
message text,
PRIMARY KEY ((sys_id, dt), severity, ts)
) WITH CLUSTERING ORDER BY (severity ASC, ts DESC);
14
© Copyright 2015 Glassbeam Inc.
Yet Another Table to Filter Events by Module
CREATE TABLE event_by_module (
sys_id text,
dt timestamp,
ts timestamp,
severity text,
module text,
message text,
PRIMARY KEY ((sys_id, dt), module, ts)
) WITH CLUSTERING ORDER BY (module ASC, ts DESC);
15
© Copyright 2015 Glassbeam Inc.
Ad Hoc Analytics with C*
 Oxymoron
 All queries Must be Known Upfront
16
© Copyright 2015 Glassbeam Inc.
Another Example
Sys_id Model Age OS City State Country
17
© Copyright 2015 Glassbeam Inc.
Intractable Number of Tables
Sys_id Model Age OS City State Country
18
• sys_by_model
• sys_by_os
• sys_by_age
• sys_by_state
• sys_by_state_age
• sys_by_age_state
• sys_by_model_age
• sys_by_age_model
• sys_by_age_model_state
• sys_by_model_state_age
• sys_by_model_state_os
© Copyright 2015 Glassbeam Inc.
Other Barriers to Ad Hoc Analytics
 No Aggregation
 No Group By
 No Joins
19
© Copyright 2015 Glassbeam Inc. 20
What Do
I Do
Now?
© Copyright 2015 Glassbeam Inc. 21
© Copyright 2015 Glassbeam Inc.
Spark
22
 Fast and General-purpose Cluster Computing
Framework for Processing Large Datasets
 API in Scala, Java, Python, SQL, and R
© Copyright 2015 Glassbeam Inc.
Integrated Libraries for a Variety of Tasks
23
Spark Core
Spark
SQL
GraphX
Spark
Streaming
MLlib &
Spark ML
© Copyright 2015 Glassbeam Inc.
One Minor Problem!
 Spark Does not Have Built-in Support for C*
 Built-in Support for HDFS, S3 and JDBC-compliant
Databases
24
© Copyright 2015 Glassbeam Inc.
Spark Cassandra Connector
 Open Source Library for Integrating Spark with C*
 Enables a Spark Application to Process Data in C* Just
Like Data from the Built-in Data Sources
25
© Copyright 2015 Glassbeam Inc.
Spark with C*
 Enables Ad Hoc Analytics
 CQL Limitations No Longer Apply
 Query Data Using SQL/HiveQL
– Filter on Any Column
– Aggregations
– Group By
– Join
26
© Copyright 2015 Glassbeam Inc.
Ad Hoc Analytics in Spark Shell
27
© Copyright 2015 Glassbeam Inc.
Launch the Spark Shell
/path/to/spark/bin/spark-shell 
--master spark://host:7077 
--packages com.datastax.spark:spark-cassandra-connector_2.10:1.4.0
28
© Copyright 2015 Glassbeam Inc.
Create a DataFrame
val events = sqlContext.read
.format("org.apache.spark.sql.cassandra")
.options( Map(
"keyspace" -> "test",
"table" -> "event"))
.load()
29
© Copyright 2015 Glassbeam Inc.
Fire Queries
events.cache()
events.select("ts", "module", "message").where($"severity" === "ERROR").show
events.select("ts", "severity", "message").where($"module" === "m1").show
events.select("ts", "message").where($"severity" === "ERROR" &&
$"module" === "m1").show
events.groupBy("severity").count()
30
© Copyright 2015 Glassbeam Inc.
Spark SQL JDBC/ODBC Server
 Analyze data in C* with just SQL/HiveQL
 Command Line Shell
– Beeline
 Graphical SQL Client
– Squirrel
 Data Visualization Applications
– Tableau
– ZoomData
– QlikView
31
© Copyright 2015 Glassbeam Inc.
Ad hoc Analytics with Spark SQL JDBC/ODBC server
32
© Copyright 2015 Glassbeam Inc.
Start the Spark SQL JDBC Server
/path/to/spark/sbin/start-thriftserver.sh 
--master spark://hostname:7077 
--packages com.datastax.spark:spark-cassandra-connector_2.10:1.4.0
33
© Copyright 2015 Glassbeam Inc.
Launch Beeline From a Terminal
/path/to/spark/bin/beeline
34
© Copyright 2015 Glassbeam Inc.
Connect to the Spark SQL JDBC Server
beeline> !connect jdbc:hive2://localhost:10000
35
© Copyright 2015 Glassbeam Inc.
Create a Temporary Table
0: jdbc:hive2://localhost:10000> CREATE TEMPORARY TABLE event
. . . . . . . . . . . . . . . .> USING org.apache.spark.sql.cassandra
. . . . . . . . . . . . . . . .> OPTIONS (
. . . . . . . . . . . . . . . .> keyspace "test",
. . . . . . . . . . . . . . . .> table "event"
. . . . . . . . . . . . . . . .> );
36
© Copyright 2015 Glassbeam Inc.
Query Data with SQL/HiveQL
...> CACHE TABLE event;
...> SELECT severity, count(1) as total FROM event GROUP BY severity;
...> SELECT module, severity, count(1) FROM event GROUP BY module, severity;
37
© Copyright 2015 Glassbeam Inc.
Caveats
 Latency
 Spark Query May Require Expensive Table Scan
– Reads Every Row
– Disk I / O Slow
38
© Copyright 2015 Glassbeam Inc.
Reduce the Impact of Slow Disk I / O
 Cache Tables
 Replace HDD with SSD
 Add More Nodes
39
© Copyright 2015 Glassbeam Inc.
Recommendations
 Known Queries Requiring Sub-second Response Time
– Query C* Directly
– Create Query Specific Tables
– Pre-aggregate Data
 Ad Hoc Queries
– Spark
40
© Copyright 2015 Glassbeam Inc. 41

Weitere ähnliche Inhalte

Was ist angesagt?

Migrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksMigrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksDatabricks
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsMark Rittman
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...Deepak Chandramouli
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014Wilfried Hoge
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
 
Deploying Big Data Platforms
Deploying Big Data PlatformsDeploying Big Data Platforms
Deploying Big Data PlatformsChris Kernaghan
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data IntegrationJeffrey T. Pollock
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationDatabricks
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark DataWorks Summit/Hadoop Summit
 
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesOracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesSteven Feuerstein
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data PipelineJesus Rodriguez
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformCaserta
 
Sidecars and a Microservices Mesh
Sidecars and a Microservices MeshSidecars and a Microservices Mesh
Sidecars and a Microservices MeshRed Hat Developers
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Kolja Manuel Rödel
 
Delta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache SparkDelta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache SparkGeorge Chow
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Databricks
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGateJeffrey T. Pollock
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksAmazon Web Services
 

Was ist angesagt? (20)

Migrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksMigrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for Databricks
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
Deploying Big Data Platforms
Deploying Big Data PlatformsDeploying Big Data Platforms
Deploying Big Data Platforms
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesOracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
 
Sidecars and a Microservices Mesh
Sidecars and a Microservices MeshSidecars and a Microservices Mesh
Sidecars and a Microservices Mesh
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
 
Delta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache SparkDelta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache Spark
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGate
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building Blocks
 

Andere mochten auch

Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...DataStax Academy
 
Pre-Con Education: Advanced and Reporting and Dashboards With Xtraction
Pre-Con Education: Advanced and Reporting and Dashboards With XtractionPre-Con Education: Advanced and Reporting and Dashboards With Xtraction
Pre-Con Education: Advanced and Reporting and Dashboards With XtractionCA Technologies
 
Hands-on Lab: Building Advanced Dashboards with Xtraction for CA Service Mana...
Hands-on Lab: Building Advanced Dashboards with Xtraction for CA Service Mana...Hands-on Lab: Building Advanced Dashboards with Xtraction for CA Service Mana...
Hands-on Lab: Building Advanced Dashboards with Xtraction for CA Service Mana...CA Technologies
 
Introduction to big data and apache spark
Introduction to big data and apache sparkIntroduction to big data and apache spark
Introduction to big data and apache sparkMohammed Guller
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingDatabricks
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataPatrick McFadin
 
Hands-On Lab: Let's Build an ITSM Dashboard
Hands-On Lab: Let's Build an ITSM DashboardHands-On Lab: Let's Build an ITSM Dashboard
Hands-On Lab: Let's Build an ITSM DashboardCA Technologies
 

Andere mochten auch (10)

Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...
 
Pre-Con Education: Advanced and Reporting and Dashboards With Xtraction
Pre-Con Education: Advanced and Reporting and Dashboards With XtractionPre-Con Education: Advanced and Reporting and Dashboards With Xtraction
Pre-Con Education: Advanced and Reporting and Dashboards With Xtraction
 
Hands-on Lab: Building Advanced Dashboards with Xtraction for CA Service Mana...
Hands-on Lab: Building Advanced Dashboards with Xtraction for CA Service Mana...Hands-on Lab: Building Advanced Dashboards with Xtraction for CA Service Mana...
Hands-on Lab: Building Advanced Dashboards with Xtraction for CA Service Mana...
 
Introduction to big data and apache spark
Introduction to big data and apache sparkIntroduction to big data and apache spark
Introduction to big data and apache spark
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
Hands-On Lab: Let's Build an ITSM Dashboard
Hands-On Lab: Let's Build an ITSM DashboardHands-On Lab: Let's Build an ITSM Dashboard
Hands-On Lab: Let's Build an ITSM Dashboard
 

Ähnlich wie Spark and Cassandra for Ad Hoc Analytics on IoT Data

IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services Torsten Steinbach
 
StampedeCon 2015 Keynote
StampedeCon 2015 KeynoteStampedeCon 2015 Keynote
StampedeCon 2015 KeynoteKen Owens
 
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015StampedeCon
 
Why and How to Monitor Application Performance in Azure
Why and How to Monitor Application Performance in AzureWhy and How to Monitor Application Performance in Azure
Why and How to Monitor Application Performance in AzureRiverbed Technology
 
Why and How to Monitor App Performance in Azure
Why and How to Monitor App Performance in AzureWhy and How to Monitor App Performance in Azure
Why and How to Monitor App Performance in AzureIan Downard
 
Snowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern AnalyticsSnowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern AnalyticsSenturus
 
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?MarketingArrowECS_CZ
 
Power apps - Cloud business applications platform
Power apps - Cloud business applications platformPower apps - Cloud business applications platform
Power apps - Cloud business applications platformVladimir Ljubibratic
 
Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Romit Mehta
 
Dataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platformDataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platformDeepak Chandramouli
 
Webinar on MongoDB BI Connectors
Webinar on MongoDB BI ConnectorsWebinar on MongoDB BI Connectors
Webinar on MongoDB BI ConnectorsSumit Sarkar
 
Data centric SDLC for automated clinical data development
Data centric SDLC for automated clinical data developmentData centric SDLC for automated clinical data development
Data centric SDLC for automated clinical data developmentKevin Lee
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformDeepak Chandramouli
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Amazon Web Services
 
Database@Home : The Future is Data Driven
Database@Home : The Future is Data DrivenDatabase@Home : The Future is Data Driven
Database@Home : The Future is Data DrivenTammy Bednar
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceKaran Sachdeva
 
Realise True Business Value .pdf
Realise True Business Value .pdfRealise True Business Value .pdf
Realise True Business Value .pdfThousandEyes
 
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeSoftware Guru
 

Ähnlich wie Spark and Cassandra for Ad Hoc Analytics on IoT Data (20)

IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
 
StampedeCon 2015 Keynote
StampedeCon 2015 KeynoteStampedeCon 2015 Keynote
StampedeCon 2015 Keynote
 
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
 
Why and How to Monitor Application Performance in Azure
Why and How to Monitor Application Performance in AzureWhy and How to Monitor Application Performance in Azure
Why and How to Monitor Application Performance in Azure
 
Why and How to Monitor App Performance in Azure
Why and How to Monitor App Performance in AzureWhy and How to Monitor App Performance in Azure
Why and How to Monitor App Performance in Azure
 
Snowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern AnalyticsSnowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern Analytics
 
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
 
Power apps - Cloud business applications platform
Power apps - Cloud business applications platformPower apps - Cloud business applications platform
Power apps - Cloud business applications platform
 
Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018
 
Dataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platformDataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platform
 
Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
 
Webinar on MongoDB BI Connectors
Webinar on MongoDB BI ConnectorsWebinar on MongoDB BI Connectors
Webinar on MongoDB BI Connectors
 
Data centric SDLC for automated clinical data development
Data centric SDLC for automated clinical data developmentData centric SDLC for automated clinical data development
Data centric SDLC for automated clinical data development
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
 
Database@Home : The Future is Data Driven
Database@Home : The Future is Data DrivenDatabase@Home : The Future is Data Driven
Database@Home : The Future is Data Driven
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data Science
 
Realise True Business Value .pdf
Realise True Business Value .pdfRealise True Business Value .pdf
Realise True Business Value .pdf
 
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
 

Kürzlich hochgeladen

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 

Kürzlich hochgeladen (20)

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 

Spark and Cassandra for Ad Hoc Analytics on IoT Data

  • 1. © Copyright 2015 Glassbeam Inc. Ad Hoc Analytics on Internet of Complex Things with Spark and Cassandra Mohammed Guller September 2015
  • 2. © Copyright 2015 Glassbeam Inc. About Me  Principal Architect at Glassbeam  Founded two startups  Passionate about building products, big data analytics, and machine learning www.linkedin.com/in/mohammedguller @MohammedGuller 4 Available on Amazon
  • 3. © Copyright 2015 Glassbeam Inc. Internet of Things (IoT) 5 Network of objects embedded with software for collecting and exchanging data over the Internet
  • 4. © Copyright 2015 Glassbeam Inc. Internet of Complex Things (IoCT) 6  Data Center Devices – Server, storage, controller  Medical Devices – X-Ray, MRI scan, CT scan  Manufacturing Systems  Cars  Electric Vehicle Chargers  Other Complex Devices Glassbeam target market is focused on driving opera onal & business naly cs value for connected product companies in Industrial IoT market IT & Networks Medical & Health Care EV Chargers & Smart Grid
  • 5. © Copyright 2015 Glassbeam Inc. IT & Networks Medical & Healthcare EV Chargers & Smart Grid Industrial & Mfg Transportation Glassbeam 7 target market is focused on driving opera onal & business ue for connected product companies in Industrial IoT market rks Medical & Health Care Transporta on EV Chargers & Smart Grid Industrial & Mfg 5 Glassbeam target market is focused on d analy cs value for connected product com IT & Networks Medical & Health Care TrIndustrial & Mfg market is focused on driving opera onal & business connected product companies in Industrial IoT market Medical & Health Care EV Chargers & Smart Grid Transporta on 5 Advanced and Predictive Analytics for Connected Product Companies
  • 6. © Copyright 2015 Glassbeam Inc. 10101000101011010101110101111010101000101001010101010111110001011001000110000110101110100110011111000000101 01101010011111000101001010110010100101100010011010101140101010000101010000101111001001101011010010101000001 11101010101000101011010101110101111010101000101001010101010111110001011001000110000110101110100110011111000 00010101101010011111000101001010110010100101100010011010101140101010000101010000101111001001101011010010101 00000100100110101101001001001101011010010010011010001001101011010010010011010110100101101001101001101001101 Analytics on Operational Data 8 Operational Data to Powerful Insights
  • 7. © Copyright 2015 Glassbeam Inc. High-level Architecture 9 1010100010101 10101011101011 1101010100010 1001010101010 11111000101100 1000110000110 10111010011001 11110000001010 11010100111110 0010100101011 0010100101100 0100110101011 4010101000010 10100001011110 0100110101101 0010101000001 11101001111001 0011010110100 1010101010100 0101011010101 11010111101010 1000101001010 10101011111000 1011001000110 00011010111010 011 Data Inges on Data Transforma on Data Stores Middleware Applica ons Logs (Streams/ docs) SPL Library S C A L A R I N F O S E R V E R LogVault Explorer Workbench Standard Apps Custom Apps Rules & Alerts DirectAccess Glassbeam Studio Cloud Enablement & Automa on S3 Amazon Raw logs Cassandra Processed Data Solr Cloud Index Analy cs and Machine learning Spark SQL Spark Streaming MLlib Event Processing & Rules Engine End to End cloud based architecture built on modern technologies to handle any machine, any data, any cloud * SPL (Semiotic Parsing Language) and SCALAR are patent pending technology inventions of Glassbeam
  • 8. © Copyright 2015 Glassbeam Inc. Key Properties of IoCT Data 10 Volume Terabytes of Data Variety Multi-structured Data Velocity Fast Paced Batch Data Streaming Data
  • 9. © Copyright 2015 Glassbeam Inc. Why We Chose C* 11 Volume Economically Scale from Gigabytes to Terabytes of Data Variety Store Multi-structured Data Velocity Fast Ingest of New Data Quick Reload of Old Data Linear Scalability Dynamic Schema Fast Writes
  • 10. © Copyright 2015 Glassbeam Inc. Modeling Data in C*  Different from Modeling Data in RDBMS  Queries Drive Table and Primary Key Definitions – Primary Key Definition Limits the Kind of Queries You Can Run – C* Does Not Support Joins 12
  • 11. © Copyright 2015 Glassbeam Inc. A Simple Table for Storing Event Data in C* CREATE TABLE event ( sys_id text, dt timestamp, ts timestamp, severity text, module text, message text, PRIMARY KEY ((sys_id, dt), ts) ) WITH CLUSTERING ORDER BY (ts DESC); 13
  • 12. © Copyright 2015 Glassbeam Inc. Another Table to Filter Events by Severity CREATE TABLE event_by_severity ( sys_id text, dt timestamp, ts timestamp, severity text, module text, message text, PRIMARY KEY ((sys_id, dt), severity, ts) ) WITH CLUSTERING ORDER BY (severity ASC, ts DESC); 14
  • 13. © Copyright 2015 Glassbeam Inc. Yet Another Table to Filter Events by Module CREATE TABLE event_by_module ( sys_id text, dt timestamp, ts timestamp, severity text, module text, message text, PRIMARY KEY ((sys_id, dt), module, ts) ) WITH CLUSTERING ORDER BY (module ASC, ts DESC); 15
  • 14. © Copyright 2015 Glassbeam Inc. Ad Hoc Analytics with C*  Oxymoron  All queries Must be Known Upfront 16
  • 15. © Copyright 2015 Glassbeam Inc. Another Example Sys_id Model Age OS City State Country 17
  • 16. © Copyright 2015 Glassbeam Inc. Intractable Number of Tables Sys_id Model Age OS City State Country 18 • sys_by_model • sys_by_os • sys_by_age • sys_by_state • sys_by_state_age • sys_by_age_state • sys_by_model_age • sys_by_age_model • sys_by_age_model_state • sys_by_model_state_age • sys_by_model_state_os
  • 17. © Copyright 2015 Glassbeam Inc. Other Barriers to Ad Hoc Analytics  No Aggregation  No Group By  No Joins 19
  • 18. © Copyright 2015 Glassbeam Inc. 20 What Do I Do Now?
  • 19. © Copyright 2015 Glassbeam Inc. 21
  • 20. © Copyright 2015 Glassbeam Inc. Spark 22  Fast and General-purpose Cluster Computing Framework for Processing Large Datasets  API in Scala, Java, Python, SQL, and R
  • 21. © Copyright 2015 Glassbeam Inc. Integrated Libraries for a Variety of Tasks 23 Spark Core Spark SQL GraphX Spark Streaming MLlib & Spark ML
  • 22. © Copyright 2015 Glassbeam Inc. One Minor Problem!  Spark Does not Have Built-in Support for C*  Built-in Support for HDFS, S3 and JDBC-compliant Databases 24
  • 23. © Copyright 2015 Glassbeam Inc. Spark Cassandra Connector  Open Source Library for Integrating Spark with C*  Enables a Spark Application to Process Data in C* Just Like Data from the Built-in Data Sources 25
  • 24. © Copyright 2015 Glassbeam Inc. Spark with C*  Enables Ad Hoc Analytics  CQL Limitations No Longer Apply  Query Data Using SQL/HiveQL – Filter on Any Column – Aggregations – Group By – Join 26
  • 25. © Copyright 2015 Glassbeam Inc. Ad Hoc Analytics in Spark Shell 27
  • 26. © Copyright 2015 Glassbeam Inc. Launch the Spark Shell /path/to/spark/bin/spark-shell --master spark://host:7077 --packages com.datastax.spark:spark-cassandra-connector_2.10:1.4.0 28
  • 27. © Copyright 2015 Glassbeam Inc. Create a DataFrame val events = sqlContext.read .format("org.apache.spark.sql.cassandra") .options( Map( "keyspace" -> "test", "table" -> "event")) .load() 29
  • 28. © Copyright 2015 Glassbeam Inc. Fire Queries events.cache() events.select("ts", "module", "message").where($"severity" === "ERROR").show events.select("ts", "severity", "message").where($"module" === "m1").show events.select("ts", "message").where($"severity" === "ERROR" && $"module" === "m1").show events.groupBy("severity").count() 30
  • 29. © Copyright 2015 Glassbeam Inc. Spark SQL JDBC/ODBC Server  Analyze data in C* with just SQL/HiveQL  Command Line Shell – Beeline  Graphical SQL Client – Squirrel  Data Visualization Applications – Tableau – ZoomData – QlikView 31
  • 30. © Copyright 2015 Glassbeam Inc. Ad hoc Analytics with Spark SQL JDBC/ODBC server 32
  • 31. © Copyright 2015 Glassbeam Inc. Start the Spark SQL JDBC Server /path/to/spark/sbin/start-thriftserver.sh --master spark://hostname:7077 --packages com.datastax.spark:spark-cassandra-connector_2.10:1.4.0 33
  • 32. © Copyright 2015 Glassbeam Inc. Launch Beeline From a Terminal /path/to/spark/bin/beeline 34
  • 33. © Copyright 2015 Glassbeam Inc. Connect to the Spark SQL JDBC Server beeline> !connect jdbc:hive2://localhost:10000 35
  • 34. © Copyright 2015 Glassbeam Inc. Create a Temporary Table 0: jdbc:hive2://localhost:10000> CREATE TEMPORARY TABLE event . . . . . . . . . . . . . . . .> USING org.apache.spark.sql.cassandra . . . . . . . . . . . . . . . .> OPTIONS ( . . . . . . . . . . . . . . . .> keyspace "test", . . . . . . . . . . . . . . . .> table "event" . . . . . . . . . . . . . . . .> ); 36
  • 35. © Copyright 2015 Glassbeam Inc. Query Data with SQL/HiveQL ...> CACHE TABLE event; ...> SELECT severity, count(1) as total FROM event GROUP BY severity; ...> SELECT module, severity, count(1) FROM event GROUP BY module, severity; 37
  • 36. © Copyright 2015 Glassbeam Inc. Caveats  Latency  Spark Query May Require Expensive Table Scan – Reads Every Row – Disk I / O Slow 38
  • 37. © Copyright 2015 Glassbeam Inc. Reduce the Impact of Slow Disk I / O  Cache Tables  Replace HDD with SSD  Add More Nodes 39
  • 38. © Copyright 2015 Glassbeam Inc. Recommendations  Known Queries Requiring Sub-second Response Time – Query C* Directly – Create Query Specific Tables – Pre-aggregate Data  Ad Hoc Queries – Spark 40
  • 39. © Copyright 2015 Glassbeam Inc. 41