Weitere ähnliche Inhalte Ähnlich wie Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top Contenders - NoSQL matters Dublin 2015 (20) Mehr von NoSQLmatters (20) Kürzlich hochgeladen (20) Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top Contenders - NoSQL matters Dublin 20151. page
HOW TO BUILD STREAMING DATA
APPLICATIONS: EVALUATING THE TOP
CONTENDERS
Akmal B. Chaudhri
about.me/akmalchaudhri
2. page
MY BACKGROUND
• ~25 years experience in IT
• Developer (Reuters)
• Academic (City University)
• Consultant (Logica)
• Technical Architect (CA)
• Senior Architect (Informix)
• Senior IT Specialist (IBM)
• TI (Hortonworks)
• SA (DataStax)
• Worked with various technologies
• Programming languages
• IDE
• Database Systems
• Client-facing roles
• Developers
• Senior executives
• Journalists
• Broad industry experience
• Community outreach
• University relations
• 10 books, many presentations
2© 2015 VoltDB PROPRIETARY
5. page
VOLTDB OVERVIEW
Mike Stonebraker
Founded in 2009 by database luminary
FAST
World Record Cloud Benchmark:
YCSB (Yahoo Cloud Serving Benchmark) - 2.4m million tps (transactions per second)
Other Stonebraker Companies
Customers
5
Technology
• In-Memory (but data is durable to disk)
• Scale-Out shared-nothing architecture
• Reliability and fault tolerance
• SQL + Java with ACID
• Hadoop and data warehouse integration
• Open source and commercially licensed (24X7)
© 2015 VoltDB PROPRIETARY
6. page
VOLTDB BENCHMARK ON AMAZON VIRTUAL AND
IBM SOFTLAYER BARE-METAL SERVERS
• Yahoo Cloud Serving Benchmark (YCSB) is
a popular industry-standard benchmark for
cloud databases
• AWS – virtualized servers
• SoftLayer - bare-metal servers
• Workload “B” - 95% reads with 5% updates.
• Results: Best in class cloud performance
(run in the cloud)!
• AWS - 285k tps for 3 nodes scaling linearly to
724k tps for a 12 node cluster
• IBM SoftLayer - 1.02 million tps for 3 nodes
scaling linearly to 2.4 million tps for a 12 node
cluster
SoftLayer
AWS
SoftLayer: Update and Read Latency
Latency(ms) Throughput (ops/sec)© 2015 VoltDB PROPRIETARY
8. page
FAST DATA SOURCES AND DRIVERS
Mobile
IoT
Social
Sensors
Logs
Data is doubling every two years
• 26 billion connected devices by
2020 (Gartner 2014)
• 37% of most data will be
processed at the edge in
milliseconds (Cisco IoT Study 12/11/14)
Mobile
IoT
8© 2015 VoltDB PROPRIETARY
9. page
Mobile
Billing and rights management, subscriber marketing, etc.
IoT, Energy, Sensor
Smart grid/meters, asset tracking & management
Personalized Targeting
Ad optimization, audience segmenting
Capital Markets
Risk, market data management, customer mgt
Infrastructure
Data pipeline, system performance, streaming ETL
EVERY COMPANY HAS FAST DATA PROBLEMS
UK Smart
Meter
9
VoltDB Customers
© 2015 VoltDB PROPRIETARY
10. page
FAST DATA IS A COMPETITIVE ADVANTAGE TODAY!
Instant insight
Instant action
Instant awareness
10
* VoltDB customers
“Event triggered, real-time
recommendations based on
customer behavior have 10-15
times the response rates than
mass marketing”
“We get competitive advantage
by analyzing device and user
data to create an interactive
and personalized consumer
experience across all devices.”
“Real time contextual offers
increase offer uptake rates by
75% and data revenues by
15%.”
*
*
© 2015 VoltDB PROPRIETARY
11. page
TRADITIONAL RDBMS
• Heavy Overhead
• 1000s of concurrent versions
• Contention for locked records
• Contention for latching on lock table
• Index bottlenecks
• Disk I/O bottlenecks
• Architecture limits scaling
© 2015 VoltDB PROPRIETARY 11
14. page
Collect
Explore
(Data
Science)
Analyze
Act
(Discoveries/
Op:miza:ons)
Big data
ecosystem has
several
components
© 2015 VoltDB PROPRIETARY 14
15. page
DATA ARCHITECTURE FOR FAST + BIG DATA
Enterprise Apps
ETL
CRM ERP Etc.
Data Lake
(HDFS, etc.)
BIG DATA
SQL on
Hadoop
Map
Reduce
Exploratory
Analytics
BI
Reporting
Fast Operational
Database
FAST DATA
Export
Ingest /
Interactive
Real-time
Analytics
Fast Serve
Analytics
Decisioning
15© 2015 VoltDB PROPRIETARY
17. page
IN THE BIG CORNER
Systems facilitating exploration and analytics of large collections.
17
Example Technologies
Columnar OLAP warehouses
Hadoop Ecosystem
• MapReduce
• Hive, Pig
• SQL.next: Impala, Drill, Shark
Example Applications
• User segmentation & pre-scoring
• Seasonal trending
• Recommendation matrices
• Building search indexes
• Data Science: statistical clustering,
machine learning
© 2015 VoltDB PROPRIETARY
18. page
IN THE FAST CORNER
Systems facilitating real time ingest, analytics and decisions against
incoming streams of events.
18
Example Technologies
• Streaming frameworks (e.g. Spark)
• Fast OLAP (e.g. HANA)
• Fast OLTP (e.g. VoltDB)
Example Applications
• Micro-personalization
• Recommendation serving
• Alerting/alarming
• Operational monitoring
• Data enrichment (ETL elimination)
• High throughput authorization
• Ex: API quota enforcement
© 2015 VoltDB PROPRIETARY
19. page
TYPICAL FAST DATA QUESTIONS
19
Hadoop
Volume
SQL
/
OLAP
Data
Science
Fast
Velocity
• Is the fast layer streaming?
• It is often more like fast OLTP
• How do the pieces communicate?
• OLAP analytics from Big -> Fast
• New events from Fast -> Big
• Where do “analytics” belong?
• Analytics per-event: with Fast
• Analytics across history: with Big
• Are streaming frameworks equivalent?
• Traditional SQL CEP (Esper, Streambase)
• Tuple DAGs (Storm)
• Window processors on Hadoop (Spark)
© 2015 VoltDB PROPRIETARY
20. page
HOW TO SOLVE IT*
20
*
With
admiring
credit
to
G.
Polya
Considering
Data
Considering
Processing
What
are
the
types
of
data
to
be
managed
in
fast
data
applica>ons?
How
does
data
flow
through
fast
data
applica>ons?
What
are
the
calcula>ons
&
analy>cs
that
are
necessary?
© 2015 VoltDB PROPRIETARY
21. page
Data Temporality
Incoming events Click stream, tick stream, sensors,
metrics
Real-Time
Analytic Results
Event metadata Device version, location, user
profiles, point-of-interest data
OLAP Analytics Used in
Real-Time Decisions
Responses/side effects
Examples
Event Stream
Persistent
(Queryable)
Persistent
(Look-Ups)
Outgoing
events
Persistent
(Look-Ups)
Event Stream
Event Stream
Counters, streaming aggregates,
Time-series rollups
Scoring models, seasonal usage,
demographic trends
Policy enforcement decisions,
personalization recommendations
Enriched, filtered, correlated
transform of input feed
© 2015 VoltDB PROPRIETARY 21
22. page
SOURCES OF STATE
1. Analytics outputs must be query-able.
2. “Lookup tables” to create groupings for analytics
and to supply enrichment data.
3. Session managements: grouping, filtering and
aggregating create intermediate state.
22© 2015 VoltDB PROPRIETARY
23. page 23
Considering
Data
Considering
Processing
What
are
the
types
of
data
to
be
managed
in
fast
data
applica>ons?
How
does
data
flow
through
fast
data
applica>ons?
What
are
the
calcula>ons
&
analy>cs
that
are
necessary?
© 2015 VoltDB PROPRIETARY
25. page
DATA FLOWS
25
Fast Request/Response (and side effects)
• Mobile Authorization
• Campaign Evaluation
• Quota Enforcement
• Micro-Personalization
• Recommendation Serving
Request/
Response
© 2015 VoltDB PROPRIETARY
26. page
DATA FLOWS
Data Pipelines
• Data enrichment
• Sessionization and re-assembly of incoming events.
• Correlation (by time, location, identity)
• Filtering
26
Pipeline
Data Lake
© 2015 VoltDB PROPRIETARY
27. page 27
Considering
Data
Considering
Processing
What
are
the
types
of
data
to
be
managed
in
fast
data
applica>ons?
How
does
data
flow
through
fast
data
applica>ons?
What
are
the
calcula>ons
&
analy>cs
that
are
necessary?
© 2015 VoltDB PROPRIETARY
29. page
FAST DATA STACK
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
• Counters
• Aggregations
• Time series
• Statistics
• Store results
• Query and
recombine
• Fast serving
• Per-event policy evaluations
• Responses (synchronous):
authorization, personalization
• Side-effects (asynchronous): alerts,
alarms
Export & Pipeline
© 2015 VoltDB PROPRIETARY 29
30. page 30
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
Counters
Aggregations
Time series
Statistics
Store results
Query and
recombine
Fast serving
Per-event policy evaluations
Responses (synchronous)
Side-effects (asynchronous)
Export & Pipeline
APACHE-ISH TECHNOLOGY STACK
Kafka / RabbitMQ
Storm, Flume, Sqoop
Storm +
Serving Layer
Spark +
Serving Layer
Cassandra,
HBase
Hadoop, Message queues
© 2015 VoltDB PROPRIETARY
31. page 31
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
Counters
Aggregations
Time series
Statistics
Store results
Query and
recombine
Fast serving
Per-event policy evaluations
Responses (synchronous)
Side-effects (asynchronous)
Export & Pipeline
VOLTDB TECHNOLOGY STACK
Kafka / RabbitMQ
VoltDB
SQL, Java for
Analytics
Transactions /
ACID
Hadoop, Message queues
© 2015 VoltDB PROPRIETARY
33. page 33
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
Counters
Aggregations
Time series
Statistics
Store results
Query and
recombine
Fast serving
Per-event policy evaluations
Responses (synchronous)
Side-effects (asynchronous)
Export & Pipeline
STREAM TECHNOLOGY STACK
© 2015 VoltDB PROPRIETARY
34. page 34
Applications, Message Queues, Data Sources
Ingest
Analyze Decide
Counters
Aggregations
Time series
Statistics
Store results
Query and
recombine
Fast serving
Per-event policy evaluations
Responses (synchronous)
Side-effects (asynchronous)
Export & Pipeline
OLAP TECHNOLOGY STACK
© 2015 VoltDB PROPRIETARY
35. page
Applications
&
Streams
Logs, Sensors,
Meter Readings,
IoT, Location
Real-Time
Applications
Message Queue
Ingest
Kafka Loader
CSV loaders
C++, C#, PHP, Python
Java (and others)
Export
CSV Data
Thrift Messages
JDBC
HTTP
Local File
Extensible Connectors
SQL
Views
Java
Analyze
ACID
Txns
State
Decide
Downstream
Pipeline
Hadoop
Data Warehouse
Message Queue
STREAMING DATA PIPELINE
© 2015 VoltDB PROPRIETARY 35
37. page
60 Million meters under management,
saving millions in efficiency, reduced waste
VOLTDB DELIVERS SUPERIOR CUSTOMER VALUE
Customers Business Value
Internet Service
Provider
Discover 100% of DoS attacks, and
improved response time by 97%
Communications
Service Provider
Improved infrastructure utilization
by 150%
Online Game Analytics
Increased free-to-pay conversion rate
by 30%
Mobile Network Management
Saves $0.5 million/customer installation;
unlimited scale in the cloud
Mobile Ad Service
Provider
OpEx – 93% reduction in servers (100 to 7)
Saved millions in ad budget overages
50
Smart Meter, Energy
Management
© 2015 VoltDB PROPRIETARY
39. page
TRY V5.0 TODAY FOR FREE
• VoltDB Enterprise Edition
• Production-ready
• Fully durable, highly available
• Commercial license, fully supported
• http://voltdb.com/download/software
• Sample apps (in a Docker container)
• http://voltdb.com/community/demo
• VoltDB Community Edition – open source
• http://github.com/voltdb
VoltDB runs over 6 BILLION transactions/day in production!
© 2015 VoltDB PROPRIETARY 52