SlideShare ist ein Scribd-Unternehmen logo
1 von 23
page
USING A FAST OPERATIONAL
DATABASE TO BUILD REAL-TIME
STREAMING AGGREGATIONS
page© 2016 VoltDB PROPRIETARY
•  It’s a data-intensive world
•  Your business is only as fast, as
competitive as your database
The Trillion Device World
2
UC Berkeley Professor Vincentelli,
Computerworld, September 2015
THE DATA-FICATION OF LIFE
page
Big Data
“Perishable insights can have exponentially more value than
after-the-fact traditional historical analytics.”
Mike  Gual.eri,  Principal  Analyst,  Forrester  Research  
Fast Data
DATA IS TRANSFORMING BUSINESS
page© 2016 VoltDB PROPRIETARY
VOLTDB: WE DON’T MAKE APPS, WE MAKE APPS…
4
• Real-time intelligence and context for richer interactions
• Make different decisions on each individual event or person
• Analyze and act on streaming data
• 100X faster than traditional databases
• World record performance in the cloud (YCSB)
• Millisecond response time
• High-speed data ingestion
• Simpler apps, easier to test and maintain
• Easier to program with SQL + Java
• Seamless ecosystem integration
• Data is always consistent and correct, never lost
Smarter
Faster
Simpler
10
Trillion Device World	
  
100X
Traditional DB	
  
100%
Consistent, Correct	
  
page© 2016 VoltDB PROPRIETARY
Batch/Iterative
Analytics
-  Statistical correlations
-  Multi-dimensional analysis
-  Predictive analytics
+
Big DataFast Data
Rapid Data Ingestion
and
Transformation
Streaming
Analytics
-  Filtering
-  Windowing
-  Aggregation
-  Enrichment
-  Correlations
Operational
Interaction/
Transactions
-  Context-aware
-  Personal
-  Real-time
FAST DATA APPLICATION REQUIREMENTS
page© 2016 VoltDB PROPRIETARY
Streaming
Analytics
-  Filtering
-  Windowing
-  Aggregation
-  Enrichment
-  Correlations
Batch/Iterative
Analytics
-  Statistical correlations
-  Multi-dimensional analysis
-  Predictive analyticsOperational
Interaction/
Transactions
-  Context-aware
-  Personal
-  Real-time
+
Rapid Data Ingestion
and
Transformation
Fast Data
1
2
3
1
2 3
Ingest Analyze Decide
Fast Data = + + 4
Export
+
4
Big Data
FAST DATA APPLICATION REQUIREMENTS
page© 2016 VoltDB PROPRIETARY
BUILDING FAST DATA APPLICATIONS
1.  Ingest:	
  Unbound	
  Streams	
  of	
  Data	
  
•  Stream	
  data	
  into	
  an	
  opera8onal	
  store	
  
•  VoltDB	
  has	
  in-­‐process	
  (in	
  database)	
  importers	
  
2.  Analyze:	
  Opera8onal	
  Store	
  processes	
  data	
  
•  Compute	
  Real-­‐8me	
  analy8cs	
  
3.  Decide:	
  Make	
  Per-­‐event	
  Decisions	
  
•  Transac8ons	
  
4.  Export:	
  	
  To	
  historical	
  data	
  store	
  
•  VoltDB	
  has	
  in-­‐process	
  Export	
  connecters	
  
•  Push	
  data	
  downstream	
  “data	
  lake”	
  
•  For	
  Historical	
  Analysis/Machine	
  Learning	
  
	
  
page© 2016 VoltDB PROPRIETARY
Streaming
Analytics
-  Filtering
-  Windowing
-  Aggregation
-  Enrichment
-  Correlations
Batch/Iterative
Analytics
-  Statistical correlations
-  Multi-dimensional analysis
-  Predictive analyticsOperational
Interaction/
Transactions
-  Context-aware
-  Personal
-  Real-time
+
Rapid Data Ingestion
and
Transformation
Big Data
FAST DATA APPLICATION REQUIREMENTS
Biography
-  Technical:
-  Started programing in 1985
-  Developed kernel apps like printer drivers and high
performance networking tools in C
-  MS in Electrical Engineering from Technical
University in Graz/Austria in 1995
-  Filed for two patents for improving RDBMS
Performance in 2005 (Symantec Corp) and 2008
(FOX news)
-  Hobbies:
-  Running (Marathons)
-  Photography
-  RC Airplanes
-  Electronics
Agenda -  Vision
-  Technical requirements
-  System Architecture
-  Why using VoltDB over HBASE or Cassandra
-  VoltDB, Things to consider when designing
solutions with VoltDB
-  Conclusion
-  Resources
Vision
-  Building a real-time analytic engine for:
-  real-time diagnoses of our Edge Servers
-  MaxCDN-Predict
-  Elastic Provisioning
-  Improving Serving performance
-  Using this data to bill customers
Technical Requirements
-  The system should have the following features:
-  Horizontally scalable
-  Real-time (15 seconds SLA) from the time content is served till it shows up
into the aggregates.
-  Zero production support:
-  Zero touch crash recovery
-  No data clean-up/recovery required
-  Guaranteed no data lost
-  SQL interface for mining and drill-down
-  Ad-Hoc queries of the not aggregated raw-data
MaxCDN’s Lambda Architecture
System Architecture
-  When Nginx serves the content, it logs this transaction
-  These logs are streamed into the aggregation farm from around the world. We
get ~ 32 TB of logs per day. This data gets pushed into 4 rabbit-mq queues.
-  A farm of 4 machines, clean up and pre-aggregate this data. They create a
batch of 70K raw-data along with corresponding aggregates and push it into a
rabbit-mq queue.
-  VoltDB cluster runs with:
-  7 machines in k-factor=0
-  Sync logging mode for “no data lost”
-  48 SitesPerHost. So, a total of 7*48 = 336 partitions.
System Architecture
-  VoltDB clients read these batches from rabbit-mq and push this data into a VoltDB
cluster composed of 7 machines. They use VoltDB’s “hashinator” to push an array
of data into only “one procedure call per Table per Partition”. These clients
guarantee batch level atomic processing across 1680 (=5*336) VoltDB stored
procedure calls
-  Tables are maintained in a ring-buffer fashion.
-  We can only keep ~ 30 min of most recent raw-data
-  The system behaves completely like a distributed transactional RDBMS in terms of
“no data lost guarantee”.
System Architecture
-  Zero touch crash recovery:
-  When VoltDB crashes:
-  Clients go into pause mode
-  Supervisord starts up VoltDB cluster in recovery mode
-  When VoltDB clients or other components crash:
-  VoltDB clients and all the other critical components run under Supvisord. So, they
get restarted automatically
-  Completely transactional processing through utilizing :
-  VoltDB’s atomic processing at the stored procedure level
-  Rabbit-MQ re-play guarantee
-  Idempotency
Why using VoltDB over HBASE or Cassandra
-  Simply because of the “multi-row WRITE atomicity”.
-  Multi-row WRITE atomicity results in much less CPU / I/O load as well as easier
implementation.
-  To make this clear let us consider our use-case of pushing our 70K batches of raw-
logs into a storage system:
-  VoltDB:
-  With VoltDB, we have got stored-proc level atomicity. Current implementation pushes 70000
rows into 336 partitions. So, each stored-proc call writes 70,000/336 = ~ 208 rows into the
rawlogs table. For these 208 rows, we add one row into the TX table with batch-id of this
batch.
Advantage of Multi-Write Atomicity
Why using VoltDB over HBASE or Cassandra
-  HBASE:
-  HBASE only offers single row atomicity. So, let us say, we have got also 336 partitions, but,
with HBASE, we have to include batch-id into each row. So, writing the batch-id 208 times
instead of one time. When we apply the batch,we have to go through “208 IF statements” for
each row and apply the batch if needed. So, this would mean a lot more CPU, I/O, and space
requirements.
-  If the batch size grows to 140K from 70K, these 208 WRITEs and “IF statments” will also grow
to 416.
VoltDB, Things to Consider when Designing Solutions
-  Good things:
-  SQL interface unlike Trident or Spark-Streaming
-  Merges the good things of the old-world like SQL and transactions with the
good things of the new world like ‘no-locks’, ‘k-factor’ HA, etc….
-  Very simple and intuitive API and usage
-  k-factor + logs + snapshots eliminates the need to backup the system.
-  Fast query performance
-  Horizontal scalability
VoltDB, Things to Consider when Designing Solutions
-  Each partition has got only one thread of execution for INSERT/UPDATE.
-  Workarounds:
-  Get faster CPUs
-  Pre-process the data outside VoltDB
-  Maximum data coming out of a partition is limited to 50 MB.
-  Workarounds:
-  Make sure there is no relevant query with a qualified set of bigger than 50 MB for any
partitions
-  The more partitions, the better
Conclusion
-  VoltDB merges the good things of the old-world and new world.
-  Provides an easy and scalable solution for real-time streaming aggregation
-  Like any other tool, has some limitations that need to be taken into account when
used towards a solution.
-  VoltDBDB Docs: https://docs.VoltDBdb.com/
-  Lambda Architecture:
https://VoltDBdb.com/blog/simplifying-complex-lambda-architecture
-  Lambda Architecture: http://lambda-architecture.net/
-  Storm/Trident: http://storm.apache.org/documentation/Trident-tutorial.html
-  Spark Streaming: http://spark.apache.org/streaming/
I am available by email: bpirvali@gmail.com
Resources

Weitere ähnliche Inhalte

Was ist angesagt?

Why you really want SQL in a Real-Time Enterprise Environment
Why you really want SQL in a Real-Time Enterprise EnvironmentWhy you really want SQL in a Real-Time Enterprise Environment
Why you really want SQL in a Real-Time Enterprise EnvironmentVoltDB
 
The Expert Guide to Fast Data
The Expert Guide to Fast Data The Expert Guide to Fast Data
The Expert Guide to Fast Data VoltDB
 
Mike Stonebraker on Designing An Architecture For Real-time Event Processing
Mike Stonebraker on Designing An Architecture For Real-time Event ProcessingMike Stonebraker on Designing An Architecture For Real-time Event Processing
Mike Stonebraker on Designing An Architecture For Real-time Event ProcessingVoltDB
 
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...VoltDB
 
Acting on Real-time Behavior: How Peak Games Won Transactions
Acting on Real-time Behavior: How Peak Games Won TransactionsActing on Real-time Behavior: How Peak Games Won Transactions
Acting on Real-time Behavior: How Peak Games Won TransactionsVoltDB
 
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDBReal-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDBVoltDB
 
How to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersVoltDB
 
Memory Database Technology is Driving a New Cycle of Business Innovation
Memory Database Technology is Driving a New Cycle of Business InnovationMemory Database Technology is Driving a New Cycle of Business Innovation
Memory Database Technology is Driving a New Cycle of Business InnovationVoltDB
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB
 
How to Build Cloud-based Microservice Environments with Docker and VoltDB
How to Build Cloud-based Microservice Environments with Docker and VoltDBHow to Build Cloud-based Microservice Environments with Docker and VoltDB
How to Build Cloud-based Microservice Environments with Docker and VoltDBVoltDB
 
The State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleThe State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleVoltDB
 
Transforming Your Business with Fast Data – Five Use Case Examples
Transforming Your Business with Fast Data – Five Use Case ExamplesTransforming Your Business with Fast Data – Five Use Case Examples
Transforming Your Business with Fast Data – Five Use Case ExamplesVoltDB
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Big Data Spain
 
HP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataHP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataRob Winters
 
Billions of Rows, Millions of Insights, Right Now
Billions of Rows, Millions of Insights, Right NowBillions of Rows, Millions of Insights, Right Now
Billions of Rows, Millions of Insights, Right NowRob Winters
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesRob Winters
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentHostedbyConfluent
 
Securing and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industrySecuring and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industryDataWorks Summit
 
Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil GamesRob Winters
 

Was ist angesagt? (20)

Why you really want SQL in a Real-Time Enterprise Environment
Why you really want SQL in a Real-Time Enterprise EnvironmentWhy you really want SQL in a Real-Time Enterprise Environment
Why you really want SQL in a Real-Time Enterprise Environment
 
The Expert Guide to Fast Data
The Expert Guide to Fast Data The Expert Guide to Fast Data
The Expert Guide to Fast Data
 
Mike Stonebraker on Designing An Architecture For Real-time Event Processing
Mike Stonebraker on Designing An Architecture For Real-time Event ProcessingMike Stonebraker on Designing An Architecture For Real-time Event Processing
Mike Stonebraker on Designing An Architecture For Real-time Event Processing
 
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
 
Acting on Real-time Behavior: How Peak Games Won Transactions
Acting on Real-time Behavior: How Peak Games Won TransactionsActing on Real-time Behavior: How Peak Games Won Transactions
Acting on Real-time Behavior: How Peak Games Won Transactions
 
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDBReal-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
 
How to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top Contenders
 
Memory Database Technology is Driving a New Cycle of Business Innovation
Memory Database Technology is Driving a New Cycle of Business InnovationMemory Database Technology is Driving a New Cycle of Business Innovation
Memory Database Technology is Driving a New Cycle of Business Innovation
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
 
How to Build Cloud-based Microservice Environments with Docker and VoltDB
How to Build Cloud-based Microservice Environments with Docker and VoltDBHow to Build Cloud-based Microservice Environments with Docker and VoltDB
How to Build Cloud-based Microservice Environments with Docker and VoltDB
 
The State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleThe State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and Scale
 
Instrumenting your Instruments
Instrumenting your Instruments Instrumenting your Instruments
Instrumenting your Instruments
 
Transforming Your Business with Fast Data – Five Use Case Examples
Transforming Your Business with Fast Data – Five Use Case ExamplesTransforming Your Business with Fast Data – Five Use Case Examples
Transforming Your Business with Fast Data – Five Use Case Examples
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
HP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataHP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big Data
 
Billions of Rows, Millions of Insights, Right Now
Billions of Rows, Millions of Insights, Right NowBillions of Rows, Millions of Insights, Right Now
Billions of Rows, Millions of Insights, Right Now
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil Games
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
 
Securing and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industrySecuring and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industry
 
Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil Games
 

Andere mochten auch

How to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contendersHow to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contendersAkmal Chaudhri
 
VoltDB : A Technical Overview
VoltDB : A Technical OverviewVoltDB : A Technical Overview
VoltDB : A Technical OverviewTim Callaghan
 
Lessons Learned: The Impact of Fast Data for Personalization
Lessons Learned: The Impact of Fast Data for PersonalizationLessons Learned: The Impact of Fast Data for Personalization
Lessons Learned: The Impact of Fast Data for PersonalizationVoltDB
 
Moving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataMoving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataVoltDB
 
Understanding the Operational Database Infrastructure for IoT and Fast Data
Understanding the Operational Database Infrastructure for IoT and Fast DataUnderstanding the Operational Database Infrastructure for IoT and Fast Data
Understanding the Operational Database Infrastructure for IoT and Fast DataVoltDB
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksHortonworks
 
Understanding the Top Four Use Cases for IoT
Understanding the Top Four Use Cases for IoTUnderstanding the Top Four Use Cases for IoT
Understanding the Top Four Use Cases for IoTVoltDB
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 

Andere mochten auch (8)

How to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contendersHow to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contenders
 
VoltDB : A Technical Overview
VoltDB : A Technical OverviewVoltDB : A Technical Overview
VoltDB : A Technical Overview
 
Lessons Learned: The Impact of Fast Data for Personalization
Lessons Learned: The Impact of Fast Data for PersonalizationLessons Learned: The Impact of Fast Data for Personalization
Lessons Learned: The Impact of Fast Data for Personalization
 
Moving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataMoving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time Data
 
Understanding the Operational Database Infrastructure for IoT and Fast Data
Understanding the Operational Database Infrastructure for IoT and Fast DataUnderstanding the Operational Database Infrastructure for IoT and Fast Data
Understanding the Operational Database Infrastructure for IoT and Fast Data
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
 
Understanding the Top Four Use Cases for IoT
Understanding the Top Four Use Cases for IoTUnderstanding the Top Four Use Cases for IoT
Understanding the Top Four Use Cases for IoT
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 

Ähnlich wie Using a Fast Operational Database to Build Real-time Streaming Aggregations

AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Crate.io
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with storesYoni Farin
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveTorsten Steinbach
 
In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)Chinmay Kulkarni
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...DataStax Academy
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeongYousun Jeong
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark Summit
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike, Inc.
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsYousun Jeong
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
Sqream DB on OpenPOWER performance
Sqream DB on OpenPOWER performanceSqream DB on OpenPOWER performance
Sqream DB on OpenPOWER performanceGanesan Narayanasamy
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...Insight Technology, Inc.
 

Ähnlich wie Using a Fast Operational Database to Build Real-time Streaming Aggregations (20)

AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with stores
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
 
In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory Architecture
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Sqream DB on OpenPOWER performance
Sqream DB on OpenPOWER performanceSqream DB on OpenPOWER performance
Sqream DB on OpenPOWER performance
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
 

Mehr von VoltDB

TripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech WorldTripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech WorldVoltDB
 
How First to Value Beats First to Market: Case Studies of Fast Data Success
How First to Value Beats First to Market: Case Studies of Fast Data SuccessHow First to Value Beats First to Market: Case Studies of Fast Data Success
How First to Value Beats First to Market: Case Studies of Fast Data SuccessVoltDB
 
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...VoltDB
 
The Two Generals Problem
The Two Generals ProblemThe Two Generals Problem
The Two Generals ProblemVoltDB
 
The 10 MS Rule: Getting to 'Yes' with Fast Data & Hadoop
The 10 MS Rule: Getting to 'Yes' with Fast Data & HadoopThe 10 MS Rule: Getting to 'Yes' with Fast Data & Hadoop
The 10 MS Rule: Getting to 'Yes' with Fast Data & HadoopVoltDB
 
Fast Data: Achieving Real-Time Data Analysis Across the Financial Data Continuum
Fast Data: Achieving Real-Time Data Analysis Across the Financial Data ContinuumFast Data: Achieving Real-Time Data Analysis Across the Financial Data Continuum
Fast Data: Achieving Real-Time Data Analysis Across the Financial Data ContinuumVoltDB
 

Mehr von VoltDB (6)

TripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech WorldTripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech World
 
How First to Value Beats First to Market: Case Studies of Fast Data Success
How First to Value Beats First to Market: Case Studies of Fast Data SuccessHow First to Value Beats First to Market: Case Studies of Fast Data Success
How First to Value Beats First to Market: Case Studies of Fast Data Success
 
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
 
The Two Generals Problem
The Two Generals ProblemThe Two Generals Problem
The Two Generals Problem
 
The 10 MS Rule: Getting to 'Yes' with Fast Data & Hadoop
The 10 MS Rule: Getting to 'Yes' with Fast Data & HadoopThe 10 MS Rule: Getting to 'Yes' with Fast Data & Hadoop
The 10 MS Rule: Getting to 'Yes' with Fast Data & Hadoop
 
Fast Data: Achieving Real-Time Data Analysis Across the Financial Data Continuum
Fast Data: Achieving Real-Time Data Analysis Across the Financial Data ContinuumFast Data: Achieving Real-Time Data Analysis Across the Financial Data Continuum
Fast Data: Achieving Real-Time Data Analysis Across the Financial Data Continuum
 

Kürzlich hochgeladen

How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 

Kürzlich hochgeladen (20)

How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 

Using a Fast Operational Database to Build Real-time Streaming Aggregations

  • 1. page USING A FAST OPERATIONAL DATABASE TO BUILD REAL-TIME STREAMING AGGREGATIONS
  • 2. page© 2016 VoltDB PROPRIETARY •  It’s a data-intensive world •  Your business is only as fast, as competitive as your database The Trillion Device World 2 UC Berkeley Professor Vincentelli, Computerworld, September 2015 THE DATA-FICATION OF LIFE
  • 3. page Big Data “Perishable insights can have exponentially more value than after-the-fact traditional historical analytics.” Mike  Gual.eri,  Principal  Analyst,  Forrester  Research   Fast Data DATA IS TRANSFORMING BUSINESS
  • 4. page© 2016 VoltDB PROPRIETARY VOLTDB: WE DON’T MAKE APPS, WE MAKE APPS… 4 • Real-time intelligence and context for richer interactions • Make different decisions on each individual event or person • Analyze and act on streaming data • 100X faster than traditional databases • World record performance in the cloud (YCSB) • Millisecond response time • High-speed data ingestion • Simpler apps, easier to test and maintain • Easier to program with SQL + Java • Seamless ecosystem integration • Data is always consistent and correct, never lost Smarter Faster Simpler 10 Trillion Device World   100X Traditional DB   100% Consistent, Correct  
  • 5. page© 2016 VoltDB PROPRIETARY Batch/Iterative Analytics -  Statistical correlations -  Multi-dimensional analysis -  Predictive analytics + Big DataFast Data Rapid Data Ingestion and Transformation Streaming Analytics -  Filtering -  Windowing -  Aggregation -  Enrichment -  Correlations Operational Interaction/ Transactions -  Context-aware -  Personal -  Real-time FAST DATA APPLICATION REQUIREMENTS
  • 6. page© 2016 VoltDB PROPRIETARY Streaming Analytics -  Filtering -  Windowing -  Aggregation -  Enrichment -  Correlations Batch/Iterative Analytics -  Statistical correlations -  Multi-dimensional analysis -  Predictive analyticsOperational Interaction/ Transactions -  Context-aware -  Personal -  Real-time + Rapid Data Ingestion and Transformation Fast Data 1 2 3 1 2 3 Ingest Analyze Decide Fast Data = + + 4 Export + 4 Big Data FAST DATA APPLICATION REQUIREMENTS
  • 7. page© 2016 VoltDB PROPRIETARY BUILDING FAST DATA APPLICATIONS 1.  Ingest:  Unbound  Streams  of  Data   •  Stream  data  into  an  opera8onal  store   •  VoltDB  has  in-­‐process  (in  database)  importers   2.  Analyze:  Opera8onal  Store  processes  data   •  Compute  Real-­‐8me  analy8cs   3.  Decide:  Make  Per-­‐event  Decisions   •  Transac8ons   4.  Export:    To  historical  data  store   •  VoltDB  has  in-­‐process  Export  connecters   •  Push  data  downstream  “data  lake”   •  For  Historical  Analysis/Machine  Learning    
  • 8. page© 2016 VoltDB PROPRIETARY Streaming Analytics -  Filtering -  Windowing -  Aggregation -  Enrichment -  Correlations Batch/Iterative Analytics -  Statistical correlations -  Multi-dimensional analysis -  Predictive analyticsOperational Interaction/ Transactions -  Context-aware -  Personal -  Real-time + Rapid Data Ingestion and Transformation Big Data FAST DATA APPLICATION REQUIREMENTS
  • 9. Biography -  Technical: -  Started programing in 1985 -  Developed kernel apps like printer drivers and high performance networking tools in C -  MS in Electrical Engineering from Technical University in Graz/Austria in 1995 -  Filed for two patents for improving RDBMS Performance in 2005 (Symantec Corp) and 2008 (FOX news) -  Hobbies: -  Running (Marathons) -  Photography -  RC Airplanes -  Electronics
  • 10. Agenda -  Vision -  Technical requirements -  System Architecture -  Why using VoltDB over HBASE or Cassandra -  VoltDB, Things to consider when designing solutions with VoltDB -  Conclusion -  Resources
  • 11. Vision -  Building a real-time analytic engine for: -  real-time diagnoses of our Edge Servers -  MaxCDN-Predict -  Elastic Provisioning -  Improving Serving performance -  Using this data to bill customers
  • 12. Technical Requirements -  The system should have the following features: -  Horizontally scalable -  Real-time (15 seconds SLA) from the time content is served till it shows up into the aggregates. -  Zero production support: -  Zero touch crash recovery -  No data clean-up/recovery required -  Guaranteed no data lost -  SQL interface for mining and drill-down -  Ad-Hoc queries of the not aggregated raw-data
  • 14. System Architecture -  When Nginx serves the content, it logs this transaction -  These logs are streamed into the aggregation farm from around the world. We get ~ 32 TB of logs per day. This data gets pushed into 4 rabbit-mq queues. -  A farm of 4 machines, clean up and pre-aggregate this data. They create a batch of 70K raw-data along with corresponding aggregates and push it into a rabbit-mq queue. -  VoltDB cluster runs with: -  7 machines in k-factor=0 -  Sync logging mode for “no data lost” -  48 SitesPerHost. So, a total of 7*48 = 336 partitions.
  • 15. System Architecture -  VoltDB clients read these batches from rabbit-mq and push this data into a VoltDB cluster composed of 7 machines. They use VoltDB’s “hashinator” to push an array of data into only “one procedure call per Table per Partition”. These clients guarantee batch level atomic processing across 1680 (=5*336) VoltDB stored procedure calls -  Tables are maintained in a ring-buffer fashion. -  We can only keep ~ 30 min of most recent raw-data -  The system behaves completely like a distributed transactional RDBMS in terms of “no data lost guarantee”.
  • 16. System Architecture -  Zero touch crash recovery: -  When VoltDB crashes: -  Clients go into pause mode -  Supervisord starts up VoltDB cluster in recovery mode -  When VoltDB clients or other components crash: -  VoltDB clients and all the other critical components run under Supvisord. So, they get restarted automatically -  Completely transactional processing through utilizing : -  VoltDB’s atomic processing at the stored procedure level -  Rabbit-MQ re-play guarantee -  Idempotency
  • 17. Why using VoltDB over HBASE or Cassandra -  Simply because of the “multi-row WRITE atomicity”. -  Multi-row WRITE atomicity results in much less CPU / I/O load as well as easier implementation. -  To make this clear let us consider our use-case of pushing our 70K batches of raw- logs into a storage system: -  VoltDB: -  With VoltDB, we have got stored-proc level atomicity. Current implementation pushes 70000 rows into 336 partitions. So, each stored-proc call writes 70,000/336 = ~ 208 rows into the rawlogs table. For these 208 rows, we add one row into the TX table with batch-id of this batch.
  • 19. Why using VoltDB over HBASE or Cassandra -  HBASE: -  HBASE only offers single row atomicity. So, let us say, we have got also 336 partitions, but, with HBASE, we have to include batch-id into each row. So, writing the batch-id 208 times instead of one time. When we apply the batch,we have to go through “208 IF statements” for each row and apply the batch if needed. So, this would mean a lot more CPU, I/O, and space requirements. -  If the batch size grows to 140K from 70K, these 208 WRITEs and “IF statments” will also grow to 416.
  • 20. VoltDB, Things to Consider when Designing Solutions -  Good things: -  SQL interface unlike Trident or Spark-Streaming -  Merges the good things of the old-world like SQL and transactions with the good things of the new world like ‘no-locks’, ‘k-factor’ HA, etc…. -  Very simple and intuitive API and usage -  k-factor + logs + snapshots eliminates the need to backup the system. -  Fast query performance -  Horizontal scalability
  • 21. VoltDB, Things to Consider when Designing Solutions -  Each partition has got only one thread of execution for INSERT/UPDATE. -  Workarounds: -  Get faster CPUs -  Pre-process the data outside VoltDB -  Maximum data coming out of a partition is limited to 50 MB. -  Workarounds: -  Make sure there is no relevant query with a qualified set of bigger than 50 MB for any partitions -  The more partitions, the better
  • 22. Conclusion -  VoltDB merges the good things of the old-world and new world. -  Provides an easy and scalable solution for real-time streaming aggregation -  Like any other tool, has some limitations that need to be taken into account when used towards a solution.
  • 23. -  VoltDBDB Docs: https://docs.VoltDBdb.com/ -  Lambda Architecture: https://VoltDBdb.com/blog/simplifying-complex-lambda-architecture -  Lambda Architecture: http://lambda-architecture.net/ -  Storm/Trident: http://storm.apache.org/documentation/Trident-tutorial.html -  Spark Streaming: http://spark.apache.org/streaming/ I am available by email: bpirvali@gmail.com Resources