SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Downloaden Sie, um offline zu lesen
page
HOW TO BUILD STREAMING DATA
APPLICATIONS: EVALUATING THE TOP
CONTENDERS
Akmal B. Chaudhri
about.me/akmalchaudhri
page© 2015 VoltDB PROPRIETARY page
INTRODUCTION
2
page
VOLTDB OVERVIEW
Mike Stonebraker
Founded in 2009 by database luminary
FAST
World Record Cloud Benchmark:
YCSB (Yahoo Cloud Serving Benchmark) - 2.4m million tps (transactions per second)
Other Stonebraker Companies
Customers
3
Technology
•  In-Memory (but data is durable to disk)
•  Scale-Out shared-nothing architecture
•  Reliability and fault tolerance
•  SQL + Java with ACID
•  Hadoop and data warehouse integration
•  Open source and commercially licensed (24X7)
© 2015 VoltDB PROPRIETARY
page
VOLTDB BENCHMARK ON AMAZON VIRTUAL AND
IBM SOFTLAYER BARE-METAL SERVERS
•  Yahoo Cloud Serving Benchmark (YCSB) is
a popular industry-standard benchmark for
cloud databases
•  AWS – virtualized servers
•  SoftLayer - bare-metal servers
•  Workload “B” - 95% reads with 5% updates.
•  Results: Best in class cloud performance
(run in the cloud)!
•  AWS - 285k tps for 3 nodes scaling linearly to
724k tps for a 12 node cluster
•  IBM SoftLayer - 1.02 million tps for 3 nodes
scaling linearly to 2.4 million tps for a 12 node
cluster
SoftLayer
AWS
SoftLayer: Update and Read Latency
Latency(ms) Throughput (ops/sec)© 2015 VoltDB PROPRIETARY
page
PREDICTION
5
All businesses will compete
on their ability to make
decisions “in the moment”
using Fast Data.
© 2015 VoltDB PROPRIETARY
page
FAST DATA SOURCES AND DRIVERS
Mobile
IoT
Social
Sensors
Logs
Data is doubling every two years
•  26 billion connected devices by
2020 (Gartner 2014)
•  37% of most data will be
processed at the edge in
milliseconds (Cisco IoT Study 12/11/14)
Mobile
IoT
6© 2015 VoltDB PROPRIETARY
page
Mobile
Billing and rights management, subscriber marketing, etc.
IoT, Energy, Sensor
Smart grid/meters, asset tracking & management
Personalized Targeting
Ad optimization, audience segmenting
Capital Markets
Risk, market data management, customer mgt
Infrastructure
Data pipeline, system performance, streaming ETL
EVERY COMPANY HAS FAST DATA PROBLEMS
UK Smart
Meter
7
VoltDB Customers
© 2015 VoltDB PROPRIETARY
page
FAST DATA IS A COMPETITIVE ADVANTAGE TODAY!
Instant insight
Instant action
Instant awareness
8
* VoltDB customers
“Event triggered, real-time
recommendations based on
customer behavior have 10-15
times the response rates than
mass marketing”
“We get competitive advantage
by analyzing device and user
data to create an interactive
and personalized consumer
experience across all devices.”
“Real time contextual offers
increase offer uptake rates by
75% and data revenues by
15%.”
*
*
© 2015 VoltDB PROPRIETARY
page
TRADITIONAL RDBMS
•  Heavy Overhead
•  1000s of concurrent versions
•  Contention for locked records
•  Contention for latching on lock table
•  Index bottlenecks
•  Disk I/O bottlenecks
•  Architecture limits scaling
© 2015 VoltDB PROPRIETARY 9
page
ARCHITECTURE IS IMPORTANT
Fast data requires
a different
architecture.
© 2015 VoltDB PROPRIETARY 10
page© 2015 VoltDB PROPRIETARY page
BIG DATA + FAST DATA
11
page
Collect' Explore'
(Data'Science)'
Analyze'
Act'
(Discoveries/'
Op:miza:ons)'
Big data
ecosystem has
several
components
© 2015 VoltDB PROPRIETARY 12
page
DATA ARCHITECTURE FOR FAST + BIG DATA
Enterprise Apps
ETL
CRM ERP Etc.
Data Lake
(HDFS, etc.)
BIG DATA
SQL on
Hadoop
Map
Reduce
Exploratory
Analytics
BI
Reporting
Fast Operational
Database
FAST DATA
Export
Ingest /
Interactive
Real-time
Analytics
Fast Serve
Analytics
Decisioning
13© 2015 VoltDB PROPRIETARY
page
Calculations Serving of Results
Real Time, Per Event, Interactive
VOLTDB AND FAST DATA PIPELINE
14© 2015 VoltDB PROPRIETARY
page
IN THE BIG CORNER
Systems facilitating exploration and analytics of large collections.
15
Example Technologies
Columnar OLAP warehouses
Hadoop Ecosystem
•  MapReduce
•  Hive, Pig
•  SQL.next: Impala, Drill, Shark
Example Applications
•  User segmentation & pre-scoring
•  Seasonal trending
•  Recommendation matrices
•  Building search indexes
•  Data Science: statistical clustering,
machine learning
© 2015 VoltDB PROPRIETARY
page
IN THE FAST CORNER
Systems facilitating real time ingest, analytics and decisions against
incoming streams of events.
16
Example Technologies
•  Streaming frameworks (e.g. Spark)
•  Fast OLAP (e.g. HANA)
•  Fast OLTP (e.g. VoltDB)
Example Applications
•  Micro-personalization
•  Recommendation serving
•  Alerting/alarming
•  Operational monitoring
•  Data enrichment (ETL elimination)
•  High throughput authorization
•  Ex: API quota enforcement
© 2015 VoltDB PROPRIETARY
page
TYPICAL FAST DATA QUESTIONS
17
Hadoop&
Volume'
SQL&/&OLAP&
Data'Science'
Fast&
Velocity'
•  Is the fast layer streaming?
•  It is often more like fast OLTP
•  How do the pieces communicate?
•  OLAP analytics from Big -> Fast
•  New events from Fast -> Big
•  Where do “analytics” belong?
•  Analytics per-event: with Fast
•  Analytics across history: with Big
•  Are streaming frameworks equivalent?
•  Traditional SQL CEP (Esper, Streambase)
•  Tuple DAGs (Storm)
•  Window processors on Hadoop (Spark)
&
© 2015 VoltDB PROPRIETARY
page
HOW TO SOLVE IT*
18
*"With"admiring"credit"to"G."Polya"
Considering'Data' Considering'Processing'
What&are&the&types&of&
data&to&be&managed&in&
fast&data&applica>ons?&
How&does&data&flow&
through&fast&data&
applica>ons?&
What&are&the&
calcula>ons&&&analy>cs&
that&are&necessary?&
© 2015 VoltDB PROPRIETARY
page
Data Temporality
Incoming events Click stream, tick stream, sensors,
metrics
Real-Time
Analytic Results
Event metadata Device version, location, user
profiles, point-of-interest data
OLAP Analytics Used in
Real-Time Decisions
Responses/side effects
Examples
Event Stream
Persistent
(Queryable)
Persistent
(Look-Ups)
Outgoing
events
Persistent
(Look-Ups)
Event Stream
Event Stream
Counters, streaming aggregates,
Time-series rollups
Scoring models, seasonal usage,
demographic trends
Policy enforcement decisions,
personalization recommendations
Enriched, filtered, correlated
transform of input feed
© 2015 VoltDB PROPRIETARY 19
page
SOURCES OF STATE
1.  Analytics outputs must be query-able.
2.  “Lookup tables” to create groupings for analytics
and to supply enrichment data.
3.  Session managements: grouping, filtering and
aggregating create intermediate state.
20© 2015 VoltDB PROPRIETARY
page 21
Considering'Data' Considering'Processing'
What&are&the&types&of&
data&to&be&managed&in&
fast&data&applica>ons?&
How&does&data&flow&
through&fast&data&
applica>ons?&
What&are&the&
calcula>ons&&&analy>cs&
that&are&necessary?&
© 2015 VoltDB PROPRIETARY
page
DATA FLOWS
Real-time Analytics
•  Streaming summaries for operations
•  KPI measurement
•  Analytics for apps
22
Real-Time Analytics
© 2015 VoltDB PROPRIETARY
page
DATA FLOWS
23
Fast Request/Response (and side effects)
•  Mobile Authorization
•  Campaign Evaluation
•  Quota Enforcement
•  Micro-Personalization
•  Recommendation Serving
Request/
Response
© 2015 VoltDB PROPRIETARY
page
DATA FLOWS
Data Pipelines
•  Data enrichment
•  Sessionization and re-assembly of incoming events.
•  Correlation (by time, location, identity)
•  Filtering
24
Pipeline
Data Lake
© 2015 VoltDB PROPRIETARY
page 25
Considering'Data' Considering'Processing'
What&are&the&types&of&
data&to&be&managed&in&
fast&data&applica>ons?&
How&does&data&flow&
through&fast&data&
applica>ons?&
What&are&the&
calcula>ons&&&analy>cs&
that&are&necessary?&
© 2015 VoltDB PROPRIETARY
page 26
Continuous Query
Transactional Event
Evaluation
Transformation
© 2015 VoltDB PROPRIETARY
page
FAST DATA STACK
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
•  Counters
•  Aggregations
•  Time series
•  Statistics
•  Store results
•  Query and
recombine
•  Fast serving
•  Per-event policy evaluations
•  Responses (synchronous):
authorization, personalization
•  Side-effects (asynchronous): alerts,
alarms
Export & Pipeline
© 2015 VoltDB PROPRIETARY 27
page 28
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
Counters
Aggregations
Time series
Statistics
Store results
Query and
recombine
Fast serving
Per-event policy evaluations
Responses (synchronous)
Side-effects (asynchronous)
Export & Pipeline
APACHE-ISH TECHNOLOGY STACK
Kafka / RabbitMQ
Storm, Flume, Sqoop
Storm +
Serving Layer
Spark +
Serving Layer
Cassandra,
HBase
Hadoop, Message queues
© 2015 VoltDB PROPRIETARY
page 29
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
Counters
Aggregations
Time series
Statistics
Store results
Query and
recombine
Fast serving
Per-event policy evaluations
Responses (synchronous)
Side-effects (asynchronous)
Export & Pipeline
VOLTDB TECHNOLOGY STACK
Kafka / RabbitMQ
VoltDB
SQL, Java for
Analytics
Transactions /
ACID
Hadoop, Message queues
© 2015 VoltDB PROPRIETARY
page 30
OLTP
(Transactions First)
Streaming
Event Processors
OLAP
(Columnar Analytics)
© 2015 VoltDB PROPRIETARY
page 31
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
Counters
Aggregations
Time series
Statistics
Store results
Query and
recombine
Fast serving
Per-event policy evaluations
Responses (synchronous)
Side-effects (asynchronous)
Export & Pipeline
STREAM TECHNOLOGY STACK
© 2015 VoltDB PROPRIETARY
page 32
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
Counters
Aggregations
Time series
Statistics
Store results
Query and
recombine
Fast serving
Per-event policy evaluations
Responses (synchronous)
Side-effects (asynchronous)
Export & Pipeline
OLAP TECHNOLOGY STACK
© 2015 VoltDB PROPRIETARY
page
Applications
&
Streams
Logs, Sensors,
Meter Readings,
IoT, Location
Real-Time
Applications
Message Queue
Ingest
Kafka Loader
CSV loaders
C++, C#, PHP, Python
Java (and others)
Export
CSV Data
Thrift Messages
JDBC
HTTP
Local File
Extensible Connectors
SQL
Views
Java
Analyze
ACID
Txns
State
Decide
Downstream
Pipeline
Hadoop
Data Warehouse
Message Queue
STREAMING DATA PIPELINE
© 2015 VoltDB PROPRIETARY 33
page© 2015 VoltDB PROPRIETARY page
FAST DATA PATTERNS
34
page
THREE FAST DATA APPLICATION PATTERNS
•  Real-Time Analytics
•  Real-time analytics for operations
•  Real-time KPI measurement
•  Real-time analytics for apps
•  Data Pipelines
•  Streaming data enrichment
•  Sessionization / re-assembly
•  Correlation (by time, by location, by id)
•  Filtering
•  Pre-aggregation
35
•  Fast Request/Response
•  Mobile Authorization
•  Campaign Authorization
•  Fast API Quota Enforcement
•  Micro-Personalization
•  Recommendation Serving
© 2015 VoltDB PROPRIETARY
page
VOLTDB: REAL-TIME ANALYTICS
36
VoltDB
Metadata
(Dimension table)
Session state
(Fact table) •  Operational analytics and
monitoring
•  RT analytics enabling user-
facing applications
•  KPI for internal BI/Dashboards
•  In-memory MPP SQL over
ODBC/JDBC
•  Cheap + correct materialized
views for streaming
aggregations
SQL, Views
Ingest
© 2015 VoltDB PROPRIETARY
page
VOLTDB: DATA PIPELINES WITH EXPORT
37
VoltDB
Metadata
(Dimension table)
Session state
(Fact table)
•  Filtering (ex: only RFID /
iBeacon readings that show
change from previous
location).
•  Sessionization
•  Common version re-writing
•  Data enrichment
•  MPP streaming Export
•  Row data, Thrift messages, CSV
•  OLAP, HDFS and message
queues
Export
© 2015 VoltDB PROPRIETARY
page
VOLTDB: REQUEST/RESPONSE DECISIONS
38
•  Authorization
•  RT balance checks, quota
enforcement
•  Personalization and
Recommendation Serving
•  Combine pre-score with
immediate context
•  Fully ACID transaction model.
•  Thousands to Millions per
second
•  At less than 5ms latencies
Metadata&
(Dimension&table)&
Session&state&
(Fact&table)&
ACID&Transac>ons&
© 2015 VoltDB PROPRIETARY
page© 2015 VoltDB PROPRIETARY page
VOLTDB V5.0
39
page
VOLTDB V5.0 – ACCELERATING FAST DATA
APPLICATION DEVELOPMENT
•  Hadoop/Big Data Ecosystem Integrations
•  Fast Data Pipeline Sample Applications
•  Ease of Database Development (traditional API)
•  VoltDB Management Center (VMC)
•  Updated Hortonworks HDP Certification
40© 2015 VoltDB PROPRIETARY
page
FAST DATA INTEGRATIONS - IMPORTERS
•  Kafka Loader
•  Subscribe to a Kafka topic and insert each message into a VoltDB
Table
•  JDBC Loader
•  Load a JDBC result set into a VoltDB Table
•  Vertica Udx
•  User-defined function to load Vertica result sets into a VoltDB
Table
•  Apache Hive and Apache Pig
•  Hadoop OutputFormat to load Hive and Pig result sets into VoltDB
© 2015 VoltDB PROPRIETARY 41
page
FAST DATA INTEGRATIONS - EXPORTERS
•  HDFS Export
•  Hadoop export via WebHDFS and HttpFS
•  HTTP Export
•  Delivery and Alerting via HTTP post/get
•  Kafka Export, RabbitMQ Export
•  Message queue delivery
•  Export format configurable
•  Avro, CSV, TSV, more coming…
© 2015 VoltDB PROPRIETARY 42
page
FAST DATA PIPELINE SAMPLE APPLICATION
•  Streaming Data, Real-time Analytics
•  Export to Hadoop
•  Export to OLAP (Vertica, others)
•  Place historical decision making intelligence into VoltDB
•  Closed Loop, via Hive, Pig OutputFormat or Vertica Udx
•  Download: https://github.com/VoltDB/app-fastdata
•  And see our blog posts:
http://voltdb.com/blog/fast-data-look-voltdb-sample-app
© 2015 VoltDB PROPRIETARY 43
page
LAMBDA ARCHITECTURE SAMPLE APPLICATION
•  Type of application: Real-time analytics
•  Demonstrates how to simplify the “Speed
Layer”
•  Using VoltDB, developers can replace both the
streaming and the operational data store portions of
the speed layer.
•  Less code, greatly reduced complexity
•  Improving the Lambda Architecture
•  Perform real-time analytics AND react, per event, to
the incoming data stream
•  Try it yourself: http://voltdb.com/community/applications
HOW MANY UNIQUE
USERS INTERACTED WITH
MY APP TODAY?
© 2015 VoltDB PROPRIETARY 44
page
VOLTDB MANAGEMENT CENTER (VMC)
A browser-based management tool for monitoring, examining, and querying a running VoltDB database
© 2015 VoltDB PROPRIETARY 45
page
UPDATED HORTONWORKS CERTIFICATION
© 2015 VoltDB PROPRIETARY 46
page© 2015 VoltDB PROPRIETARY page
CUSTOMER CASE STUDIES
47
page
60 Million meters under management,
saving millions in efficiency, reduced waste
VOLTDB DELIVERS SUPERIOR CUSTOMER VALUE
Customers Business Value
Internet Service
Provider
Discover 100% of DoS attacks, and
improved response time by 97%
Communications
Service Provider
Improved infrastructure utilization
by 150%
Online Game Analytics
Increased free-to-pay conversion rate
by 30%
Mobile Network Management
Saves $0.5 million/customer installation;
unlimited scale in the cloud
Mobile Ad Service
Provider
OpEx – 93% reduction in servers (100 to 7)
Saved millions in ad budget overages
48
Smart Meter, Energy
Management
© 2015 VoltDB PROPRIETARY
page 49© 2015 VoltDB PROPRIETARY
page
TRY V5.0 TODAY FOR FREE
•  VoltDB Enterprise Edition
•  Production-ready
•  Fully durable, highly available
•  Commercial license, fully supported
•  http://voltdb.com/download/software
•  Sample apps (in a Docker container)
•  http://voltdb.com/community/demo
•  VoltDB Community Edition – open source
•  http://github.com/voltdb
VoltDB runs over 6 BILLION transactions/day in production!
© 2015 VoltDB PROPRIETARY 50
Capability Spark,Streaming Storm TIBCO,Streambase IBM,Streams Google,Dataflow Amazon,Kinesis VoltDB
Focus Micro&Batching&for&Hadoop
Infrastructure&for&data&
capture Complex&Event&Processing
Stream&processing&and&
analytics&without&queries
Next&gen&MapReduce&in&the&
cloud
Infrastructure&for&data&
capture
Stream&processing,&analytics&with&
queries,&and&realCtime&decision&
making
Programming&Model Java,&Scala Clojure,&Java,&Ruby,&Python SQL
Proprietary&C&Stream&
Processing&Language&(SPL) Java Java
Java,&Relational,&SQL,&ACIDC
compliant
Latency&(milliseconds) >&&1,000&milliseconds milliseconds 1&millisecond 1&millisecond >&&2,000&milliseconds 35C100&milliseconds 1&milllisecond
Data&Capture/Ingestion Batch ! ! ! ! ! !
Stateful,Operation X X X X X X !
Ad,hoc,queries
Interactive,SQL X X X X X X !
Analytics,w/o,Queries ! with&add&on&DDLs ! ! ! ! !
Analytics,with,queries,and,perKevent,
decision,making X X X X X X !
Real&time&Data&Enrichment
Using&metadata&to&enrich,&denormalize,&etc.,&
incoming&event&streams X X X X X X !
Apply&OLAP&results&to&real&time&data&stream X X X ! X X !
ScaleCout&architecture ! ! X ! ! ! !
Reliability:&ability&to&persist&data X X X X X !
Fault&Tolerant ! ! ! ! ! !
Requires&Zookeeper&for&HA
Reliability:&ability&to&persist&data X X ! ! X X !
Cluster&&&Resource&Management Need&to&addCon&Zookeeper
Need&to&addCon&Zookeeper;&
supports&YARN BuiltCIn BuiltCIn BuiltCIn BuiltCIn BuiltCIn
Support Cloudera Hortonworks TIBCO IBM Google Amazon VoltDB
Output&(OLAP&Integration) HDFS,&Flume,&Kafka,,&ZeroMQ HDFS,&Kafka,&Redis,&RDBMS
HDFS,&CSV,&IBM&Netezza,&HP&
Vertica,&&Microsoft,&Oracle,&
Sybase
HDFS,&CSV,&IBM&Netezza,&HP&
Vertica,&&Microsoft,&Oracle,&
Sybase Google Amazon
HDFS,&Kafka,&RabbitMQ,&CSV,&
Netezza,&HP&Vertica,&JDBC
Available&as&Open&Source Yes,&Apache&license Yes,&Apache&license X X X X Yes,&AGPL&License
Comparing,Fast,Data,Application,Platforms:,From,Simple,Streaming,to,RealKTime,Interaction,with,Decision,Making
Ingestion&&&&C>&&&Analytics&&w/o&Queries&&&&&C>&&&&&Analytics&with&queries&&&&&C&>&&&&Data&Enrichment&C>&&&Real&time&Decisions
Fast,data,applications,three,unique,requirements:,rapid,data,ingestion,,realKtime,analytics,on,streaming,data,,and,per,event,realKtime,decisions

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Relational to Big Graph
Relational to Big GraphRelational to Big Graph
Relational to Big Graph
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
 
From Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseFrom Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data Warehouse
 
Company report xinglian
Company report xinglianCompany report xinglian
Company report xinglian
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to big data and apache spark
Introduction to big data and apache sparkIntroduction to big data and apache spark
Introduction to big data and apache spark
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on Read
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
 
2022 02 Integration Bootcamp
2022 02 Integration Bootcamp2022 02 Integration Bootcamp
2022 02 Integration Bootcamp
 
DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
The Value of Explicit Schema for Graph Use Cases
The Value of Explicit Schema for Graph Use CasesThe Value of Explicit Schema for Graph Use Cases
The Value of Explicit Schema for Graph Use Cases
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory Webcast
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data Solution
 
Microsoft azure documentDB
Microsoft azure documentDBMicrosoft azure documentDB
Microsoft azure documentDB
 
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
 

Andere mochten auch

Andere mochten auch (15)

Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming Aggregations
 
Fast Data Choices: 5 Strategies for Evaluating Alternative Business and Techn...
Fast Data Choices: 5 Strategies for Evaluating Alternative Business and Techn...Fast Data Choices: 5 Strategies for Evaluating Alternative Business and Techn...
Fast Data Choices: 5 Strategies for Evaluating Alternative Business and Techn...
 
Memory Database Technology is Driving a New Cycle of Business Innovation
Memory Database Technology is Driving a New Cycle of Business InnovationMemory Database Technology is Driving a New Cycle of Business Innovation
Memory Database Technology is Driving a New Cycle of Business Innovation
 
Eat Your Data and Have It Too: Get the Blazing Performance of In-Memory Opera...
Eat Your Data and Have It Too: Get the Blazing Performance of In-Memory Opera...Eat Your Data and Have It Too: Get the Blazing Performance of In-Memory Opera...
Eat Your Data and Have It Too: Get the Blazing Performance of In-Memory Opera...
 
VoltDB : A Technical Overview
VoltDB : A Technical OverviewVoltDB : A Technical Overview
VoltDB : A Technical Overview
 
Lessons Learned: The Impact of Fast Data for Personalization
Lessons Learned: The Impact of Fast Data for PersonalizationLessons Learned: The Impact of Fast Data for Personalization
Lessons Learned: The Impact of Fast Data for Personalization
 
Transforming Your Business with Fast Data – Five Use Case Examples
Transforming Your Business with Fast Data – Five Use Case ExamplesTransforming Your Business with Fast Data – Five Use Case Examples
Transforming Your Business with Fast Data – Five Use Case Examples
 
Arguments for a Unified IoT Architecture
Arguments for a Unified IoT ArchitectureArguments for a Unified IoT Architecture
Arguments for a Unified IoT Architecture
 
Moving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataMoving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time Data
 
Understanding the Operational Database Infrastructure for IoT and Fast Data
Understanding the Operational Database Infrastructure for IoT and Fast DataUnderstanding the Operational Database Infrastructure for IoT and Fast Data
Understanding the Operational Database Infrastructure for IoT and Fast Data
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
 
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big DataVoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
 
Understanding the Top Four Use Cases for IoT
Understanding the Top Four Use Cases for IoTUnderstanding the Top Four Use Cases for IoT
Understanding the Top Four Use Cases for IoT
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 

Ähnlich wie How to build streaming data applications - evaluating the top contenders

Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
NoSQLmatters
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive EnterpriseSmart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
DataWorks Summit
 

Ähnlich wie How to build streaming data applications - evaluating the top contenders (20)

Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
 
Fast Data – the New Big Data
Fast Data – the New Big DataFast Data – the New Big Data
Fast Data – the New Big Data
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
 
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDBReal-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive EnterpriseSmart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
 
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
 
Key Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShareKey Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShare
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
 
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Key Database Criteria for Cloud Applications
Key Database Criteria for Cloud ApplicationsKey Database Criteria for Cloud Applications
Key Database Criteria for Cloud Applications
 
Financial impact of Cloud Computing
Financial impact of Cloud ComputingFinancial impact of Cloud Computing
Financial impact of Cloud Computing
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Maximize cloud and application performance with hundreds of operations bridge...
Maximize cloud and application performance with hundreds of operations bridge...Maximize cloud and application performance with hundreds of operations bridge...
Maximize cloud and application performance with hundreds of operations bridge...
 
The New Model
The New ModelThe New Model
The New Model
 
Vertica Analytics Database general overview
Vertica Analytics Database general overviewVertica Analytics Database general overview
Vertica Analytics Database general overview
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

How to build streaming data applications - evaluating the top contenders

  • 1. page HOW TO BUILD STREAMING DATA APPLICATIONS: EVALUATING THE TOP CONTENDERS Akmal B. Chaudhri about.me/akmalchaudhri
  • 2. page© 2015 VoltDB PROPRIETARY page INTRODUCTION 2
  • 3. page VOLTDB OVERVIEW Mike Stonebraker Founded in 2009 by database luminary FAST World Record Cloud Benchmark: YCSB (Yahoo Cloud Serving Benchmark) - 2.4m million tps (transactions per second) Other Stonebraker Companies Customers 3 Technology •  In-Memory (but data is durable to disk) •  Scale-Out shared-nothing architecture •  Reliability and fault tolerance •  SQL + Java with ACID •  Hadoop and data warehouse integration •  Open source and commercially licensed (24X7) © 2015 VoltDB PROPRIETARY
  • 4. page VOLTDB BENCHMARK ON AMAZON VIRTUAL AND IBM SOFTLAYER BARE-METAL SERVERS •  Yahoo Cloud Serving Benchmark (YCSB) is a popular industry-standard benchmark for cloud databases •  AWS – virtualized servers •  SoftLayer - bare-metal servers •  Workload “B” - 95% reads with 5% updates. •  Results: Best in class cloud performance (run in the cloud)! •  AWS - 285k tps for 3 nodes scaling linearly to 724k tps for a 12 node cluster •  IBM SoftLayer - 1.02 million tps for 3 nodes scaling linearly to 2.4 million tps for a 12 node cluster SoftLayer AWS SoftLayer: Update and Read Latency Latency(ms) Throughput (ops/sec)© 2015 VoltDB PROPRIETARY
  • 5. page PREDICTION 5 All businesses will compete on their ability to make decisions “in the moment” using Fast Data. © 2015 VoltDB PROPRIETARY
  • 6. page FAST DATA SOURCES AND DRIVERS Mobile IoT Social Sensors Logs Data is doubling every two years •  26 billion connected devices by 2020 (Gartner 2014) •  37% of most data will be processed at the edge in milliseconds (Cisco IoT Study 12/11/14) Mobile IoT 6© 2015 VoltDB PROPRIETARY
  • 7. page Mobile Billing and rights management, subscriber marketing, etc. IoT, Energy, Sensor Smart grid/meters, asset tracking & management Personalized Targeting Ad optimization, audience segmenting Capital Markets Risk, market data management, customer mgt Infrastructure Data pipeline, system performance, streaming ETL EVERY COMPANY HAS FAST DATA PROBLEMS UK Smart Meter 7 VoltDB Customers © 2015 VoltDB PROPRIETARY
  • 8. page FAST DATA IS A COMPETITIVE ADVANTAGE TODAY! Instant insight Instant action Instant awareness 8 * VoltDB customers “Event triggered, real-time recommendations based on customer behavior have 10-15 times the response rates than mass marketing” “We get competitive advantage by analyzing device and user data to create an interactive and personalized consumer experience across all devices.” “Real time contextual offers increase offer uptake rates by 75% and data revenues by 15%.” * * © 2015 VoltDB PROPRIETARY
  • 9. page TRADITIONAL RDBMS •  Heavy Overhead •  1000s of concurrent versions •  Contention for locked records •  Contention for latching on lock table •  Index bottlenecks •  Disk I/O bottlenecks •  Architecture limits scaling © 2015 VoltDB PROPRIETARY 9
  • 10. page ARCHITECTURE IS IMPORTANT Fast data requires a different architecture. © 2015 VoltDB PROPRIETARY 10
  • 11. page© 2015 VoltDB PROPRIETARY page BIG DATA + FAST DATA 11
  • 13. page DATA ARCHITECTURE FOR FAST + BIG DATA Enterprise Apps ETL CRM ERP Etc. Data Lake (HDFS, etc.) BIG DATA SQL on Hadoop Map Reduce Exploratory Analytics BI Reporting Fast Operational Database FAST DATA Export Ingest / Interactive Real-time Analytics Fast Serve Analytics Decisioning 13© 2015 VoltDB PROPRIETARY
  • 14. page Calculations Serving of Results Real Time, Per Event, Interactive VOLTDB AND FAST DATA PIPELINE 14© 2015 VoltDB PROPRIETARY
  • 15. page IN THE BIG CORNER Systems facilitating exploration and analytics of large collections. 15 Example Technologies Columnar OLAP warehouses Hadoop Ecosystem •  MapReduce •  Hive, Pig •  SQL.next: Impala, Drill, Shark Example Applications •  User segmentation & pre-scoring •  Seasonal trending •  Recommendation matrices •  Building search indexes •  Data Science: statistical clustering, machine learning © 2015 VoltDB PROPRIETARY
  • 16. page IN THE FAST CORNER Systems facilitating real time ingest, analytics and decisions against incoming streams of events. 16 Example Technologies •  Streaming frameworks (e.g. Spark) •  Fast OLAP (e.g. HANA) •  Fast OLTP (e.g. VoltDB) Example Applications •  Micro-personalization •  Recommendation serving •  Alerting/alarming •  Operational monitoring •  Data enrichment (ETL elimination) •  High throughput authorization •  Ex: API quota enforcement © 2015 VoltDB PROPRIETARY
  • 17. page TYPICAL FAST DATA QUESTIONS 17 Hadoop& Volume' SQL&/&OLAP& Data'Science' Fast& Velocity' •  Is the fast layer streaming? •  It is often more like fast OLTP •  How do the pieces communicate? •  OLAP analytics from Big -> Fast •  New events from Fast -> Big •  Where do “analytics” belong? •  Analytics per-event: with Fast •  Analytics across history: with Big •  Are streaming frameworks equivalent? •  Traditional SQL CEP (Esper, Streambase) •  Tuple DAGs (Storm) •  Window processors on Hadoop (Spark) & © 2015 VoltDB PROPRIETARY
  • 18. page HOW TO SOLVE IT* 18 *"With"admiring"credit"to"G."Polya" Considering'Data' Considering'Processing' What&are&the&types&of& data&to&be&managed&in& fast&data&applica>ons?& How&does&data&flow& through&fast&data& applica>ons?& What&are&the& calcula>ons&&&analy>cs& that&are&necessary?& © 2015 VoltDB PROPRIETARY
  • 19. page Data Temporality Incoming events Click stream, tick stream, sensors, metrics Real-Time Analytic Results Event metadata Device version, location, user profiles, point-of-interest data OLAP Analytics Used in Real-Time Decisions Responses/side effects Examples Event Stream Persistent (Queryable) Persistent (Look-Ups) Outgoing events Persistent (Look-Ups) Event Stream Event Stream Counters, streaming aggregates, Time-series rollups Scoring models, seasonal usage, demographic trends Policy enforcement decisions, personalization recommendations Enriched, filtered, correlated transform of input feed © 2015 VoltDB PROPRIETARY 19
  • 20. page SOURCES OF STATE 1.  Analytics outputs must be query-able. 2.  “Lookup tables” to create groupings for analytics and to supply enrichment data. 3.  Session managements: grouping, filtering and aggregating create intermediate state. 20© 2015 VoltDB PROPRIETARY
  • 22. page DATA FLOWS Real-time Analytics •  Streaming summaries for operations •  KPI measurement •  Analytics for apps 22 Real-Time Analytics © 2015 VoltDB PROPRIETARY
  • 23. page DATA FLOWS 23 Fast Request/Response (and side effects) •  Mobile Authorization •  Campaign Evaluation •  Quota Enforcement •  Micro-Personalization •  Recommendation Serving Request/ Response © 2015 VoltDB PROPRIETARY
  • 24. page DATA FLOWS Data Pipelines •  Data enrichment •  Sessionization and re-assembly of incoming events. •  Correlation (by time, location, identity) •  Filtering 24 Pipeline Data Lake © 2015 VoltDB PROPRIETARY
  • 26. page 26 Continuous Query Transactional Event Evaluation Transformation © 2015 VoltDB PROPRIETARY
  • 27. page FAST DATA STACK Applications, Message Queues, Data Sources Ingest Analyze Decide •  Counters •  Aggregations •  Time series •  Statistics •  Store results •  Query and recombine •  Fast serving •  Per-event policy evaluations •  Responses (synchronous): authorization, personalization •  Side-effects (asynchronous): alerts, alarms Export & Pipeline © 2015 VoltDB PROPRIETARY 27
  • 28. page 28 Applications, Message Queues, Data Sources Ingest Analyze Decide Counters Aggregations Time series Statistics Store results Query and recombine Fast serving Per-event policy evaluations Responses (synchronous) Side-effects (asynchronous) Export & Pipeline APACHE-ISH TECHNOLOGY STACK Kafka / RabbitMQ Storm, Flume, Sqoop Storm + Serving Layer Spark + Serving Layer Cassandra, HBase Hadoop, Message queues © 2015 VoltDB PROPRIETARY
  • 29. page 29 Applications, Message Queues, Data Sources Ingest Analyze Decide Counters Aggregations Time series Statistics Store results Query and recombine Fast serving Per-event policy evaluations Responses (synchronous) Side-effects (asynchronous) Export & Pipeline VOLTDB TECHNOLOGY STACK Kafka / RabbitMQ VoltDB SQL, Java for Analytics Transactions / ACID Hadoop, Message queues © 2015 VoltDB PROPRIETARY
  • 30. page 30 OLTP (Transactions First) Streaming Event Processors OLAP (Columnar Analytics) © 2015 VoltDB PROPRIETARY
  • 31. page 31 Applications, Message Queues, Data Sources Ingest Analyze Decide Counters Aggregations Time series Statistics Store results Query and recombine Fast serving Per-event policy evaluations Responses (synchronous) Side-effects (asynchronous) Export & Pipeline STREAM TECHNOLOGY STACK © 2015 VoltDB PROPRIETARY
  • 32. page 32 Applications, Message Queues, Data Sources Ingest Analyze Decide Counters Aggregations Time series Statistics Store results Query and recombine Fast serving Per-event policy evaluations Responses (synchronous) Side-effects (asynchronous) Export & Pipeline OLAP TECHNOLOGY STACK © 2015 VoltDB PROPRIETARY
  • 33. page Applications & Streams Logs, Sensors, Meter Readings, IoT, Location Real-Time Applications Message Queue Ingest Kafka Loader CSV loaders C++, C#, PHP, Python Java (and others) Export CSV Data Thrift Messages JDBC HTTP Local File Extensible Connectors SQL Views Java Analyze ACID Txns State Decide Downstream Pipeline Hadoop Data Warehouse Message Queue STREAMING DATA PIPELINE © 2015 VoltDB PROPRIETARY 33
  • 34. page© 2015 VoltDB PROPRIETARY page FAST DATA PATTERNS 34
  • 35. page THREE FAST DATA APPLICATION PATTERNS •  Real-Time Analytics •  Real-time analytics for operations •  Real-time KPI measurement •  Real-time analytics for apps •  Data Pipelines •  Streaming data enrichment •  Sessionization / re-assembly •  Correlation (by time, by location, by id) •  Filtering •  Pre-aggregation 35 •  Fast Request/Response •  Mobile Authorization •  Campaign Authorization •  Fast API Quota Enforcement •  Micro-Personalization •  Recommendation Serving © 2015 VoltDB PROPRIETARY
  • 36. page VOLTDB: REAL-TIME ANALYTICS 36 VoltDB Metadata (Dimension table) Session state (Fact table) •  Operational analytics and monitoring •  RT analytics enabling user- facing applications •  KPI for internal BI/Dashboards •  In-memory MPP SQL over ODBC/JDBC •  Cheap + correct materialized views for streaming aggregations SQL, Views Ingest © 2015 VoltDB PROPRIETARY
  • 37. page VOLTDB: DATA PIPELINES WITH EXPORT 37 VoltDB Metadata (Dimension table) Session state (Fact table) •  Filtering (ex: only RFID / iBeacon readings that show change from previous location). •  Sessionization •  Common version re-writing •  Data enrichment •  MPP streaming Export •  Row data, Thrift messages, CSV •  OLAP, HDFS and message queues Export © 2015 VoltDB PROPRIETARY
  • 38. page VOLTDB: REQUEST/RESPONSE DECISIONS 38 •  Authorization •  RT balance checks, quota enforcement •  Personalization and Recommendation Serving •  Combine pre-score with immediate context •  Fully ACID transaction model. •  Thousands to Millions per second •  At less than 5ms latencies Metadata& (Dimension&table)& Session&state& (Fact&table)& ACID&Transac>ons& © 2015 VoltDB PROPRIETARY
  • 39. page© 2015 VoltDB PROPRIETARY page VOLTDB V5.0 39
  • 40. page VOLTDB V5.0 – ACCELERATING FAST DATA APPLICATION DEVELOPMENT •  Hadoop/Big Data Ecosystem Integrations •  Fast Data Pipeline Sample Applications •  Ease of Database Development (traditional API) •  VoltDB Management Center (VMC) •  Updated Hortonworks HDP Certification 40© 2015 VoltDB PROPRIETARY
  • 41. page FAST DATA INTEGRATIONS - IMPORTERS •  Kafka Loader •  Subscribe to a Kafka topic and insert each message into a VoltDB Table •  JDBC Loader •  Load a JDBC result set into a VoltDB Table •  Vertica Udx •  User-defined function to load Vertica result sets into a VoltDB Table •  Apache Hive and Apache Pig •  Hadoop OutputFormat to load Hive and Pig result sets into VoltDB © 2015 VoltDB PROPRIETARY 41
  • 42. page FAST DATA INTEGRATIONS - EXPORTERS •  HDFS Export •  Hadoop export via WebHDFS and HttpFS •  HTTP Export •  Delivery and Alerting via HTTP post/get •  Kafka Export, RabbitMQ Export •  Message queue delivery •  Export format configurable •  Avro, CSV, TSV, more coming… © 2015 VoltDB PROPRIETARY 42
  • 43. page FAST DATA PIPELINE SAMPLE APPLICATION •  Streaming Data, Real-time Analytics •  Export to Hadoop •  Export to OLAP (Vertica, others) •  Place historical decision making intelligence into VoltDB •  Closed Loop, via Hive, Pig OutputFormat or Vertica Udx •  Download: https://github.com/VoltDB/app-fastdata •  And see our blog posts: http://voltdb.com/blog/fast-data-look-voltdb-sample-app © 2015 VoltDB PROPRIETARY 43
  • 44. page LAMBDA ARCHITECTURE SAMPLE APPLICATION •  Type of application: Real-time analytics •  Demonstrates how to simplify the “Speed Layer” •  Using VoltDB, developers can replace both the streaming and the operational data store portions of the speed layer. •  Less code, greatly reduced complexity •  Improving the Lambda Architecture •  Perform real-time analytics AND react, per event, to the incoming data stream •  Try it yourself: http://voltdb.com/community/applications HOW MANY UNIQUE USERS INTERACTED WITH MY APP TODAY? © 2015 VoltDB PROPRIETARY 44
  • 45. page VOLTDB MANAGEMENT CENTER (VMC) A browser-based management tool for monitoring, examining, and querying a running VoltDB database © 2015 VoltDB PROPRIETARY 45
  • 46. page UPDATED HORTONWORKS CERTIFICATION © 2015 VoltDB PROPRIETARY 46
  • 47. page© 2015 VoltDB PROPRIETARY page CUSTOMER CASE STUDIES 47
  • 48. page 60 Million meters under management, saving millions in efficiency, reduced waste VOLTDB DELIVERS SUPERIOR CUSTOMER VALUE Customers Business Value Internet Service Provider Discover 100% of DoS attacks, and improved response time by 97% Communications Service Provider Improved infrastructure utilization by 150% Online Game Analytics Increased free-to-pay conversion rate by 30% Mobile Network Management Saves $0.5 million/customer installation; unlimited scale in the cloud Mobile Ad Service Provider OpEx – 93% reduction in servers (100 to 7) Saved millions in ad budget overages 48 Smart Meter, Energy Management © 2015 VoltDB PROPRIETARY
  • 49. page 49© 2015 VoltDB PROPRIETARY
  • 50. page TRY V5.0 TODAY FOR FREE •  VoltDB Enterprise Edition •  Production-ready •  Fully durable, highly available •  Commercial license, fully supported •  http://voltdb.com/download/software •  Sample apps (in a Docker container) •  http://voltdb.com/community/demo •  VoltDB Community Edition – open source •  http://github.com/voltdb VoltDB runs over 6 BILLION transactions/day in production! © 2015 VoltDB PROPRIETARY 50
  • 51. Capability Spark,Streaming Storm TIBCO,Streambase IBM,Streams Google,Dataflow Amazon,Kinesis VoltDB Focus Micro&Batching&for&Hadoop Infrastructure&for&data& capture Complex&Event&Processing Stream&processing&and& analytics&without&queries Next&gen&MapReduce&in&the& cloud Infrastructure&for&data& capture Stream&processing,&analytics&with& queries,&and&realCtime&decision& making Programming&Model Java,&Scala Clojure,&Java,&Ruby,&Python SQL Proprietary&C&Stream& Processing&Language&(SPL) Java Java Java,&Relational,&SQL,&ACIDC compliant Latency&(milliseconds) >&&1,000&milliseconds milliseconds 1&millisecond 1&millisecond >&&2,000&milliseconds 35C100&milliseconds 1&milllisecond Data&Capture/Ingestion Batch ! ! ! ! ! ! Stateful,Operation X X X X X X ! Ad,hoc,queries Interactive,SQL X X X X X X ! Analytics,w/o,Queries ! with&add&on&DDLs ! ! ! ! ! Analytics,with,queries,and,perKevent, decision,making X X X X X X ! Real&time&Data&Enrichment Using&metadata&to&enrich,&denormalize,&etc.,& incoming&event&streams X X X X X X ! Apply&OLAP&results&to&real&time&data&stream X X X ! X X ! ScaleCout&architecture ! ! X ! ! ! ! Reliability:&ability&to&persist&data X X X X X ! Fault&Tolerant ! ! ! ! ! ! Requires&Zookeeper&for&HA Reliability:&ability&to&persist&data X X ! ! X X ! Cluster&&&Resource&Management Need&to&addCon&Zookeeper Need&to&addCon&Zookeeper;& supports&YARN BuiltCIn BuiltCIn BuiltCIn BuiltCIn BuiltCIn Support Cloudera Hortonworks TIBCO IBM Google Amazon VoltDB Output&(OLAP&Integration) HDFS,&Flume,&Kafka,,&ZeroMQ HDFS,&Kafka,&Redis,&RDBMS HDFS,&CSV,&IBM&Netezza,&HP& Vertica,&&Microsoft,&Oracle,& Sybase HDFS,&CSV,&IBM&Netezza,&HP& Vertica,&&Microsoft,&Oracle,& Sybase Google Amazon HDFS,&Kafka,&RabbitMQ,&CSV,& Netezza,&HP&Vertica,&JDBC Available&as&Open&Source Yes,&Apache&license Yes,&Apache&license X X X X Yes,&AGPL&License Comparing,Fast,Data,Application,Platforms:,From,Simple,Streaming,to,RealKTime,Interaction,with,Decision,Making Ingestion&&&&C>&&&Analytics&&w/o&Queries&&&&&C>&&&&&Analytics&with&queries&&&&&C&>&&&&Data&Enrichment&C>&&&Real&time&Decisions Fast,data,applications,three,unique,requirements:,rapid,data,ingestion,,realKtime,analytics,on,streaming,data,,and,per,event,realKtime,decisions