Weitere ähnliche Inhalte Ähnlich wie How to build streaming data applications - evaluating the top contenders (20) Kürzlich hochgeladen (20) How to build streaming data applications - evaluating the top contenders1. page
HOW TO BUILD STREAMING DATA
APPLICATIONS: EVALUATING THE TOP
CONTENDERS
Akmal B. Chaudhri
about.me/akmalchaudhri
3. page
VOLTDB OVERVIEW
Mike Stonebraker
Founded in 2009 by database luminary
FAST
World Record Cloud Benchmark:
YCSB (Yahoo Cloud Serving Benchmark) - 2.4m million tps (transactions per second)
Other Stonebraker Companies
Customers
3
Technology
• In-Memory (but data is durable to disk)
• Scale-Out shared-nothing architecture
• Reliability and fault tolerance
• SQL + Java with ACID
• Hadoop and data warehouse integration
• Open source and commercially licensed (24X7)
© 2015 VoltDB PROPRIETARY
4. page
VOLTDB BENCHMARK ON AMAZON VIRTUAL AND
IBM SOFTLAYER BARE-METAL SERVERS
• Yahoo Cloud Serving Benchmark (YCSB) is
a popular industry-standard benchmark for
cloud databases
• AWS – virtualized servers
• SoftLayer - bare-metal servers
• Workload “B” - 95% reads with 5% updates.
• Results: Best in class cloud performance
(run in the cloud)!
• AWS - 285k tps for 3 nodes scaling linearly to
724k tps for a 12 node cluster
• IBM SoftLayer - 1.02 million tps for 3 nodes
scaling linearly to 2.4 million tps for a 12 node
cluster
SoftLayer
AWS
SoftLayer: Update and Read Latency
Latency(ms) Throughput (ops/sec)© 2015 VoltDB PROPRIETARY
6. page
FAST DATA SOURCES AND DRIVERS
Mobile
IoT
Social
Sensors
Logs
Data is doubling every two years
• 26 billion connected devices by
2020 (Gartner 2014)
• 37% of most data will be
processed at the edge in
milliseconds (Cisco IoT Study 12/11/14)
Mobile
IoT
6© 2015 VoltDB PROPRIETARY
7. page
Mobile
Billing and rights management, subscriber marketing, etc.
IoT, Energy, Sensor
Smart grid/meters, asset tracking & management
Personalized Targeting
Ad optimization, audience segmenting
Capital Markets
Risk, market data management, customer mgt
Infrastructure
Data pipeline, system performance, streaming ETL
EVERY COMPANY HAS FAST DATA PROBLEMS
UK Smart
Meter
7
VoltDB Customers
© 2015 VoltDB PROPRIETARY
8. page
FAST DATA IS A COMPETITIVE ADVANTAGE TODAY!
Instant insight
Instant action
Instant awareness
8
* VoltDB customers
“Event triggered, real-time
recommendations based on
customer behavior have 10-15
times the response rates than
mass marketing”
“We get competitive advantage
by analyzing device and user
data to create an interactive
and personalized consumer
experience across all devices.”
“Real time contextual offers
increase offer uptake rates by
75% and data revenues by
15%.”
*
*
© 2015 VoltDB PROPRIETARY
9. page
TRADITIONAL RDBMS
• Heavy Overhead
• 1000s of concurrent versions
• Contention for locked records
• Contention for latching on lock table
• Index bottlenecks
• Disk I/O bottlenecks
• Architecture limits scaling
© 2015 VoltDB PROPRIETARY 9
13. page
DATA ARCHITECTURE FOR FAST + BIG DATA
Enterprise Apps
ETL
CRM ERP Etc.
Data Lake
(HDFS, etc.)
BIG DATA
SQL on
Hadoop
Map
Reduce
Exploratory
Analytics
BI
Reporting
Fast Operational
Database
FAST DATA
Export
Ingest /
Interactive
Real-time
Analytics
Fast Serve
Analytics
Decisioning
13© 2015 VoltDB PROPRIETARY
15. page
IN THE BIG CORNER
Systems facilitating exploration and analytics of large collections.
15
Example Technologies
Columnar OLAP warehouses
Hadoop Ecosystem
• MapReduce
• Hive, Pig
• SQL.next: Impala, Drill, Shark
Example Applications
• User segmentation & pre-scoring
• Seasonal trending
• Recommendation matrices
• Building search indexes
• Data Science: statistical clustering,
machine learning
© 2015 VoltDB PROPRIETARY
16. page
IN THE FAST CORNER
Systems facilitating real time ingest, analytics and decisions against
incoming streams of events.
16
Example Technologies
• Streaming frameworks (e.g. Spark)
• Fast OLAP (e.g. HANA)
• Fast OLTP (e.g. VoltDB)
Example Applications
• Micro-personalization
• Recommendation serving
• Alerting/alarming
• Operational monitoring
• Data enrichment (ETL elimination)
• High throughput authorization
• Ex: API quota enforcement
© 2015 VoltDB PROPRIETARY
17. page
TYPICAL FAST DATA QUESTIONS
17
Hadoop&
Volume'
SQL&/&OLAP&
Data'Science'
Fast&
Velocity'
• Is the fast layer streaming?
• It is often more like fast OLTP
• How do the pieces communicate?
• OLAP analytics from Big -> Fast
• New events from Fast -> Big
• Where do “analytics” belong?
• Analytics per-event: with Fast
• Analytics across history: with Big
• Are streaming frameworks equivalent?
• Traditional SQL CEP (Esper, Streambase)
• Tuple DAGs (Storm)
• Window processors on Hadoop (Spark)
&
© 2015 VoltDB PROPRIETARY
18. page
HOW TO SOLVE IT*
18
*"With"admiring"credit"to"G."Polya"
Considering'Data' Considering'Processing'
What&are&the&types&of&
data&to&be&managed&in&
fast&data&applica>ons?&
How&does&data&flow&
through&fast&data&
applica>ons?&
What&are&the&
calcula>ons&&&analy>cs&
that&are&necessary?&
© 2015 VoltDB PROPRIETARY
19. page
Data Temporality
Incoming events Click stream, tick stream, sensors,
metrics
Real-Time
Analytic Results
Event metadata Device version, location, user
profiles, point-of-interest data
OLAP Analytics Used in
Real-Time Decisions
Responses/side effects
Examples
Event Stream
Persistent
(Queryable)
Persistent
(Look-Ups)
Outgoing
events
Persistent
(Look-Ups)
Event Stream
Event Stream
Counters, streaming aggregates,
Time-series rollups
Scoring models, seasonal usage,
demographic trends
Policy enforcement decisions,
personalization recommendations
Enriched, filtered, correlated
transform of input feed
© 2015 VoltDB PROPRIETARY 19
20. page
SOURCES OF STATE
1. Analytics outputs must be query-able.
2. “Lookup tables” to create groupings for analytics
and to supply enrichment data.
3. Session managements: grouping, filtering and
aggregating create intermediate state.
20© 2015 VoltDB PROPRIETARY
23. page
DATA FLOWS
23
Fast Request/Response (and side effects)
• Mobile Authorization
• Campaign Evaluation
• Quota Enforcement
• Micro-Personalization
• Recommendation Serving
Request/
Response
© 2015 VoltDB PROPRIETARY
24. page
DATA FLOWS
Data Pipelines
• Data enrichment
• Sessionization and re-assembly of incoming events.
• Correlation (by time, location, identity)
• Filtering
24
Pipeline
Data Lake
© 2015 VoltDB PROPRIETARY
27. page
FAST DATA STACK
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
• Counters
• Aggregations
• Time series
• Statistics
• Store results
• Query and
recombine
• Fast serving
• Per-event policy evaluations
• Responses (synchronous):
authorization, personalization
• Side-effects (asynchronous): alerts,
alarms
Export & Pipeline
© 2015 VoltDB PROPRIETARY 27
28. page 28
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
Counters
Aggregations
Time series
Statistics
Store results
Query and
recombine
Fast serving
Per-event policy evaluations
Responses (synchronous)
Side-effects (asynchronous)
Export & Pipeline
APACHE-ISH TECHNOLOGY STACK
Kafka / RabbitMQ
Storm, Flume, Sqoop
Storm +
Serving Layer
Spark +
Serving Layer
Cassandra,
HBase
Hadoop, Message queues
© 2015 VoltDB PROPRIETARY
29. page 29
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
Counters
Aggregations
Time series
Statistics
Store results
Query and
recombine
Fast serving
Per-event policy evaluations
Responses (synchronous)
Side-effects (asynchronous)
Export & Pipeline
VOLTDB TECHNOLOGY STACK
Kafka / RabbitMQ
VoltDB
SQL, Java for
Analytics
Transactions /
ACID
Hadoop, Message queues
© 2015 VoltDB PROPRIETARY
31. page 31
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
Counters
Aggregations
Time series
Statistics
Store results
Query and
recombine
Fast serving
Per-event policy evaluations
Responses (synchronous)
Side-effects (asynchronous)
Export & Pipeline
STREAM TECHNOLOGY STACK
© 2015 VoltDB PROPRIETARY
32. page 32
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
Counters
Aggregations
Time series
Statistics
Store results
Query and
recombine
Fast serving
Per-event policy evaluations
Responses (synchronous)
Side-effects (asynchronous)
Export & Pipeline
OLAP TECHNOLOGY STACK
© 2015 VoltDB PROPRIETARY
33. page
Applications
&
Streams
Logs, Sensors,
Meter Readings,
IoT, Location
Real-Time
Applications
Message Queue
Ingest
Kafka Loader
CSV loaders
C++, C#, PHP, Python
Java (and others)
Export
CSV Data
Thrift Messages
JDBC
HTTP
Local File
Extensible Connectors
SQL
Views
Java
Analyze
ACID
Txns
State
Decide
Downstream
Pipeline
Hadoop
Data Warehouse
Message Queue
STREAMING DATA PIPELINE
© 2015 VoltDB PROPRIETARY 33
35. page
THREE FAST DATA APPLICATION PATTERNS
• Real-Time Analytics
• Real-time analytics for operations
• Real-time KPI measurement
• Real-time analytics for apps
• Data Pipelines
• Streaming data enrichment
• Sessionization / re-assembly
• Correlation (by time, by location, by id)
• Filtering
• Pre-aggregation
35
• Fast Request/Response
• Mobile Authorization
• Campaign Authorization
• Fast API Quota Enforcement
• Micro-Personalization
• Recommendation Serving
© 2015 VoltDB PROPRIETARY
36. page
VOLTDB: REAL-TIME ANALYTICS
36
VoltDB
Metadata
(Dimension table)
Session state
(Fact table) • Operational analytics and
monitoring
• RT analytics enabling user-
facing applications
• KPI for internal BI/Dashboards
• In-memory MPP SQL over
ODBC/JDBC
• Cheap + correct materialized
views for streaming
aggregations
SQL, Views
Ingest
© 2015 VoltDB PROPRIETARY
37. page
VOLTDB: DATA PIPELINES WITH EXPORT
37
VoltDB
Metadata
(Dimension table)
Session state
(Fact table)
• Filtering (ex: only RFID /
iBeacon readings that show
change from previous
location).
• Sessionization
• Common version re-writing
• Data enrichment
• MPP streaming Export
• Row data, Thrift messages, CSV
• OLAP, HDFS and message
queues
Export
© 2015 VoltDB PROPRIETARY
38. page
VOLTDB: REQUEST/RESPONSE DECISIONS
38
• Authorization
• RT balance checks, quota
enforcement
• Personalization and
Recommendation Serving
• Combine pre-score with
immediate context
• Fully ACID transaction model.
• Thousands to Millions per
second
• At less than 5ms latencies
Metadata&
(Dimension&table)&
Session&state&
(Fact&table)&
ACID&Transac>ons&
© 2015 VoltDB PROPRIETARY
40. page
VOLTDB V5.0 – ACCELERATING FAST DATA
APPLICATION DEVELOPMENT
• Hadoop/Big Data Ecosystem Integrations
• Fast Data Pipeline Sample Applications
• Ease of Database Development (traditional API)
• VoltDB Management Center (VMC)
• Updated Hortonworks HDP Certification
40© 2015 VoltDB PROPRIETARY
41. page
FAST DATA INTEGRATIONS - IMPORTERS
• Kafka Loader
• Subscribe to a Kafka topic and insert each message into a VoltDB
Table
• JDBC Loader
• Load a JDBC result set into a VoltDB Table
• Vertica Udx
• User-defined function to load Vertica result sets into a VoltDB
Table
• Apache Hive and Apache Pig
• Hadoop OutputFormat to load Hive and Pig result sets into VoltDB
© 2015 VoltDB PROPRIETARY 41
42. page
FAST DATA INTEGRATIONS - EXPORTERS
• HDFS Export
• Hadoop export via WebHDFS and HttpFS
• HTTP Export
• Delivery and Alerting via HTTP post/get
• Kafka Export, RabbitMQ Export
• Message queue delivery
• Export format configurable
• Avro, CSV, TSV, more coming…
© 2015 VoltDB PROPRIETARY 42
43. page
FAST DATA PIPELINE SAMPLE APPLICATION
• Streaming Data, Real-time Analytics
• Export to Hadoop
• Export to OLAP (Vertica, others)
• Place historical decision making intelligence into VoltDB
• Closed Loop, via Hive, Pig OutputFormat or Vertica Udx
• Download: https://github.com/VoltDB/app-fastdata
• And see our blog posts:
http://voltdb.com/blog/fast-data-look-voltdb-sample-app
© 2015 VoltDB PROPRIETARY 43
44. page
LAMBDA ARCHITECTURE SAMPLE APPLICATION
• Type of application: Real-time analytics
• Demonstrates how to simplify the “Speed
Layer”
• Using VoltDB, developers can replace both the
streaming and the operational data store portions of
the speed layer.
• Less code, greatly reduced complexity
• Improving the Lambda Architecture
• Perform real-time analytics AND react, per event, to
the incoming data stream
• Try it yourself: http://voltdb.com/community/applications
HOW MANY UNIQUE
USERS INTERACTED WITH
MY APP TODAY?
© 2015 VoltDB PROPRIETARY 44
45. page
VOLTDB MANAGEMENT CENTER (VMC)
A browser-based management tool for monitoring, examining, and querying a running VoltDB database
© 2015 VoltDB PROPRIETARY 45
48. page
60 Million meters under management,
saving millions in efficiency, reduced waste
VOLTDB DELIVERS SUPERIOR CUSTOMER VALUE
Customers Business Value
Internet Service
Provider
Discover 100% of DoS attacks, and
improved response time by 97%
Communications
Service Provider
Improved infrastructure utilization
by 150%
Online Game Analytics
Increased free-to-pay conversion rate
by 30%
Mobile Network Management
Saves $0.5 million/customer installation;
unlimited scale in the cloud
Mobile Ad Service
Provider
OpEx – 93% reduction in servers (100 to 7)
Saved millions in ad budget overages
48
Smart Meter, Energy
Management
© 2015 VoltDB PROPRIETARY
50. page
TRY V5.0 TODAY FOR FREE
• VoltDB Enterprise Edition
• Production-ready
• Fully durable, highly available
• Commercial license, fully supported
• http://voltdb.com/download/software
• Sample apps (in a Docker container)
• http://voltdb.com/community/demo
• VoltDB Community Edition – open source
• http://github.com/voltdb
VoltDB runs over 6 BILLION transactions/day in production!
© 2015 VoltDB PROPRIETARY 50
51. Capability Spark,Streaming Storm TIBCO,Streambase IBM,Streams Google,Dataflow Amazon,Kinesis VoltDB
Focus Micro&Batching&for&Hadoop
Infrastructure&for&data&
capture Complex&Event&Processing
Stream&processing&and&
analytics&without&queries
Next&gen&MapReduce&in&the&
cloud
Infrastructure&for&data&
capture
Stream&processing,&analytics&with&
queries,&and&realCtime&decision&
making
Programming&Model Java,&Scala Clojure,&Java,&Ruby,&Python SQL
Proprietary&C&Stream&
Processing&Language&(SPL) Java Java
Java,&Relational,&SQL,&ACIDC
compliant
Latency&(milliseconds) >&&1,000&milliseconds milliseconds 1&millisecond 1&millisecond >&&2,000&milliseconds 35C100&milliseconds 1&milllisecond
Data&Capture/Ingestion Batch ! ! ! ! ! !
Stateful,Operation X X X X X X !
Ad,hoc,queries
Interactive,SQL X X X X X X !
Analytics,w/o,Queries ! with&add&on&DDLs ! ! ! ! !
Analytics,with,queries,and,perKevent,
decision,making X X X X X X !
Real&time&Data&Enrichment
Using&metadata&to&enrich,&denormalize,&etc.,&
incoming&event&streams X X X X X X !
Apply&OLAP&results&to&real&time&data&stream X X X ! X X !
ScaleCout&architecture ! ! X ! ! ! !
Reliability:&ability&to&persist&data X X X X X !
Fault&Tolerant ! ! ! ! ! !
Requires&Zookeeper&for&HA
Reliability:&ability&to&persist&data X X ! ! X X !
Cluster&&&Resource&Management Need&to&addCon&Zookeeper
Need&to&addCon&Zookeeper;&
supports&YARN BuiltCIn BuiltCIn BuiltCIn BuiltCIn BuiltCIn
Support Cloudera Hortonworks TIBCO IBM Google Amazon VoltDB
Output&(OLAP&Integration) HDFS,&Flume,&Kafka,,&ZeroMQ HDFS,&Kafka,&Redis,&RDBMS
HDFS,&CSV,&IBM&Netezza,&HP&
Vertica,&&Microsoft,&Oracle,&
Sybase
HDFS,&CSV,&IBM&Netezza,&HP&
Vertica,&&Microsoft,&Oracle,&
Sybase Google Amazon
HDFS,&Kafka,&RabbitMQ,&CSV,&
Netezza,&HP&Vertica,&JDBC
Available&as&Open&Source Yes,&Apache&license Yes,&Apache&license X X X X Yes,&AGPL&License
Comparing,Fast,Data,Application,Platforms:,From,Simple,Streaming,to,RealKTime,Interaction,with,Decision,Making
Ingestion&&&&C>&&&Analytics&&w/o&Queries&&&&&C>&&&&&Analytics&with&queries&&&&&C&>&&&&Data&Enrichment&C>&&&Real&time&Decisions
Fast,data,applications,three,unique,requirements:,rapid,data,ingestion,,realKtime,analytics,on,streaming,data,,and,per,event,realKtime,decisions