SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Downloaden Sie, um offline zu lesen
Getting It Right Exactly Once:
Principles for Streaming Architectures
Darryl Smith, Chief Data Platform Architect and Distinguished Engineer, Dell Technologies
September 2016 | Strata+Hadoop World, NY
2
Getting Started
 I’m Darryl Smith
• Chief Data Platform Architect
and Distinguished Engineer
Dell Technologies
 Agenda
• Real-Time And The Need For Streaming
• Adding Real-Time And Streaming To The Data Lake
• Results, Plans, Lessons Learned
• Demonstration
3
Trickle, Flood, or Torrent…
Streaming is about
continuous data motion,
more than speed
or volume
4
The Conversation Around Streaming
Website and Mobile
Application Logs
Internet of Things
Sensors
The Enterprise Reality
5
Batch > Real-Time > Streaming
Enterprise Opportunities
Immediate Business Advantage
Website and Mobile
Application Logs
Internet of Things
Sensors
6
The Enterprise Streaming Play
Moving from batch to real-time streams
avoids surges, normalizes compute,
and drives value
7
Real time and the need for streaming
8
Drive DellEMC towards a
Predictive Enterprise via
intelligent data driving agility,
increasing revenue and
productivity resulting in a
competitive advantage
Analytics Vision
9
 Need to use new data for
competitive advantage
• Volume, Variety and Velocity
 Leverage near real time and
streaming data sets to
optimize predictions
• Make faster, better decisions
 Cost-effectively scale to
improve query and load
performance
 Put the data in the hands of
the business
Becoming An Analytical Enterprise
DRIVE
COMPETITIVE
ADVANTAGE
COST-
EFFECTIVELY
SCALE
DATA ACCESS
BY BUSINESS
NEAR
REAL-TIME
ANALYTICS
10
Problem Statement
Teams do not have access
to maintenance renewal
quotes in the timeframes
or the degree of quality
which they need for Tech
Refresh and Renewal
sales.
Desired Outcome
Implement a cost-effective,
real-time solution that
improves productivity
and gives confidence to
produce desired outcomes
efficiently.
Scoping The Business Objectives
11
Business Drivers
CURRENT REALITY
VISION FOR THE
FUTURE
TO REALIZE
THIS VISION:
IMPLEMENT
CALM
SOLUTION
PHASES AND
OPTIMZE
BUSINESS
PROCESSES
HIGH TOUCH
TACTICAL EXECUTION
LOW TOUCH SELF
SERVICE
DATE DRIVEN
PROCESSES
BUSINESS VALUE
DRIVEN PROCESSES
INEFFICENCIES &
LOST PRODUCTITY
INCREASED
PRODUCTIVITY
SILOED DATA /
LIMITED VIEWS
SINGLE VIEW OF
DATA/DATA SCORING
VARIABLE DATA
QUALITY
DATA QUALITY &
CONFIDENCE
12
The Need for “CALM”
Customer Asset Lifecycle Management
For
enterprise sales
Who need
accurate and timely customer information
CALM is a
real-time application
Providing
up to the moment customer 360 dashboards
For enterprise sales
Who need accurate and timely customer information
CALM is a real-time application
Providing up to the moment customer 360
o
dashboards
Install Base
Pricing
Device Config
Contacts
Contracts
Analytics Contracts
Component
Data
Offers
Scorecard
13
Data Lake Architecture
D A T A P L A T F O R M
V M W A R E V C L O U D S U I T E
E X E C U T I O N
P R O C E S S GREENPLUM DBSPRING XD PIVOTAL HD
Gemfire
H A D O O P
INGESTION
DATAGOVERNANCE
Cassandra PostgreSQL MemSQL
HDFS ON ISILON
HADOOP ON SCALEIO
VCE VBLOCK/VxRACK | XTREMIO | DATA DOMAIN
A N A L Y T I C S
T O O L B O X
Network WebSensor SupplierSocial Media Market
S T R U C T U R E DU N S T R U C T U R E D
CRM PLMERP
APPLICATIONS
ApacheRangerAttivioCollibra
Real-TimeMicro-BatchBatch
14
Data Ingestion
• Small to Big Data (high-throughput)
• Structured and unstructured Data from any Source
• Streams and Batches
• Secure, multi-tenant, configurable Framework
Real-Time Analytics
• Tap into streams for in-memory Analytics
• Real Time Data insights and decisions
Services
• Data Ingestion to Data Lake
• Data Lake APIs
• Data Alerting
Business Data Lake Offerings
Unstructured
Structured
15
Adding Real Time and Streaming
to the Data Lake
16
Seeking A Fast Database
A compliment to the business data lake
O P C M
HammerDB Platform Benchmarks
HammerDB workloads testing was done following EMC’s Oracle and SQL Server
DBA Teams standard practices.
 Definition of workload. Mix of 5 transactions as follows:
• New order: receive a new order from a customer: 45%
• Payment: update the customer balance to record a payment: 43%
• Delivery: deliver orders asynchronously: 4%
• Order status: retrieve the status of customer’s most recent order: 4%
• Stock level: return the status of the warehouse’s inventory: 4%
 Testing scenario:
• 100 warehouses 8 vUsers. Database creation and initial data loading.
• Timed testing. 20 minutes per each testing session.
• Scaled number of virtual users for each testing session from 1 until 44.
 No changes done to the systems and databases configuration while running the
test.
HammerDB Workload Testing
 Each test was 16 vCPU x 32 GB RAM
• RedHat 6.4
• Oracle 11g R2
• Windows Core 2012 R2
• SQL Server 2012 Ent Ed.
• RedHat 6.4
• PostgreSQL 9.3.3
HammerDB Workload - Results
Results
Query PostgreSQL MemSQL
Opportunity(5K) 5 seconds 200ms
Sales Order(170K) 1-1.5 Minutes 6 seconds
Territory(60K) 60 seconds 5 seconds
PostgreSQL vs In-Memory DB
We picked 5 top queries run by different business functions.
Presented here are 3 queries that had response times that did not meet the SLA.
21
Business Data Lake – Ingestion to Fulfillment
Raw Data
Summary
Data
DATAGOVERNOR
Consumers
Predictive/
Prescriptive
Analytics
Processed
Data
Analytical Data
GREENPLUM DATABASE
HADOOP
RAW
Data
INGEST
MANAGER
SPRING XD
SPARK
SQOOP
Execution Tier
CASSANDRAGEMFIRE
MEMSQL POSTGRESQL
Real-Time
Tap
22
Here Are The Data Flows We Built
Low Velocity
Batch
Real-Time
23
Data Flow Patterns – Low Velocity
Analytical [BATCH]
Ingestion
Data
Service
JDBC
Application
Presentation [SPEED/SERVING]
GREENPLUM
DATABASE
PIVOTAL HD
POSTGRESQL
MEMSQL
Raw
Data
One-Time
CASSANDRA
GEMFIRE
Analytical [BATCH]
Ingestion
Data
Service
JDBC
Application
GREENPLUM
DATABASE
PIVOTAL HD
24
Data Flow Patterns – Batch
Batch
Presentation [SPEED/SERVING]
POSTGRESQL
MEMSQL CASSANDRA
GEMFIRE
25
Data Flow Patterns – Real Time
Real-time
Initial Load
Analytical [BATCH]
Ingestion
Data
Service
JDBC
Application
GREENPLUM
DATABASE
PIVOTAL HD
Presentation [SPEED/SERVING]
POSTGRESQL
MEMSQL CASSANDRA
GEMFIRE
26
Nothing Closer To Real Time Than Streaming
 Let’s look at the leading edge
 Apache Kafka
 Messaging Semantics
• At most once
• At least once
• Exactly once
27
At most once
000
?
01 02 03 04
28
At least once
01 02 03 04
000
?
29
Exactly Once
000
01 02 03 04
01
30
Understanding Streaming Semantics
At most once At least once Exactly once
Message pulled once Message pulled one or
more times;
processed each time
Message pulled one or
more times;
processed once
May or may not be
received
Receipt guaranteed Receipt guaranteed
No duplicates Likely duplicates No duplicates
Possible missing data No missing data No missing data
000
? 000000
?
01
01
01
31
Rendering In Real Time
 Picking the right business intelligence layer
• Tableau
• Custom Application (CF, D3, Docker)
• Additional Third Party Solutions
32
Results, Plans, Lessons Learned
33
Business Benefits
DATA QUERYING
Down from 4 hours per quarter
to less than 1 minute per year
SIMPLIFIED
PROVISIONING
Reduced number of tables/report
required
DATA
GOVERNANCE
Provides one version of
the truth
TIME TO MARKET
Reduced number of tables/report
required
TOOL
AGNOSTIC
Business logic in the DB not
the tool provides increased
flexibility
34
Use Case: Customer Account Profile
 STREAMLINED analytics ENVIRONMENT TO GAIN A HOLISTIC CUSTOMER VIEW
Service Request
Contracts
Installed Base
Bookings
Billings
EMC DATA
LAKE
BDL
SERVICES
DATA
WORKSPACES
DATA INGESTION
Prof Services
23 BUSINESS MANAGED WORKSPACES
35
Customer Asset Lifecycle Management
Platform Roadmap
Phase 1 : Foundational
Capabilities/Discovery
Phase 2 : Scale Platform /
Automate
Future Phases : Global Standard tool
Integrations , advanced Analytics
BAaaS/Tableau
Scalable
Platform
Integrated
Platform
GBS
Renewals
Inside
Sales
Additional
Business groups
Oct 2015 2016 TBDAug 2015
BDL Platform
Enablement CollaborationAcceleration
In-Memory Capabilities
(POC)
We are here
36
Data Services Roadmap
Security
Planned integration into
custom BDL security API for
managing Role Based Access
Control (RBAC) to the
underlying data
Business Data Lake Plans
37
Lessons Learned – Key Takeaways
EDUCATE ASSESS INFRASTRUCTURE JOURNEY
Educate the
business
Use examples of
business impact
Assess in-house
big data skills
Ensure plan to
support the
organization for 3-
5 years
Choose the best
possible infrastructure
Make sure your Big
Data technology
platform can evolve
Remember it is a
journey
Look for small wins
as well as big wins.
38
Lessons Learned: Analytics and Data
Sourcing the right skills, working with a different philosophy,
and some new tools will help you meet your analytical goals
TRANSFORM YOUR
PEOPLE
CHANGE YOUR
PROCESSES
ADAPT YOUR
TECHNOLOGY
 Data science in the
organization, IT or both?
 Helping business units
take initiative
 New philosophy to
running analytics projects
 How and when to share
data
 Steadily refine toolsets
based on needed analysis
 Identify to infrastructure
layers
39
Demonstration
40
Demo Agenda
Showcase exactly-once semantics from Kafka
1: Data set of 200,000 transactions summing to zero
2: CREATE TABE AND CREATE PIPELINE
3: Push to Kafka and confirm exactly-once
4: Validate Resiliency and confirm exactly-once
Step 1: Data Source
 start with a data set of 200,000 transactions representing
money/goods that sum to zero
 200,000 transactions
• Transaction number
• Increase / Decrease
• Amount
Step 2: CREATE TABLE AND CREATE PIPELINE
 create a table and pipeline in MemSQL that subscribes to
that Kafka topic
CREATE TABLE
CREATE PIPELINE
Step 3: Push to Kafka
 Push that data set to Kafka
 Validate exactly-once delivery by querying MemSQL
• show tables;
• show pipelines;
• select sum(amount) from transactions;
 Should be 0 in the demo
• select count(*) from transactions;
 Should be 200,000 in the demo
46
Step 4: Resiliency
 induce a failures to show resiliency during exactly-once
workflows
a. randomly_fail_batches.py
b. restart Kafka and show error count
c. continue and validate exactly-once semantics
48
Errors
Total
Transactions
Sum
The mission is clear:
We’re moving
from batch to real-time
with streaming
Thank You
Darryl Smith
Chief Data Platform Architect and Distinguished Engineer
Dell Technologies

Weitere ähnliche Inhalte

Was ist angesagt?

Modeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and SparkModeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and SparkSingleStore
 
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsThe Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsSingleStore
 
Building an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 MinutesBuilding an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 MinutesSingleStore
 
O'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesO'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesSingleStore
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
 
Real-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQLReal-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQLSingleStore
 
Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeSingleStore
 
Driving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive AnalyticsDriving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive AnalyticsSingleStore
 
Internet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureInternet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureSingleStore
 
See who is using MemSQL
See who is using MemSQLSee who is using MemSQL
See who is using MemSQLjenjermain
 
Machines and the Magic of Fast Learning
Machines and the Magic of Fast LearningMachines and the Magic of Fast Learning
Machines and the Magic of Fast LearningSingleStore
 
Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusDatabricks
 
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQLBuilding Real-Time Data Pipelines with Kafka, Spark, and MemSQL
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQLSingleStore
 
Enabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTSingleStore
 
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkSingleStore
 
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsWebinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsSnapLogic
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)Eva Tse
 
Spark Summit East Keynote by Anjul Bhambhri
Spark Summit East Keynote by Anjul BhambhriSpark Summit East Keynote by Anjul Bhambhri
Spark Summit East Keynote by Anjul BhambhriJen Aman
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahDatabricks
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Jason Flittner
 

Was ist angesagt? (20)

Modeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and SparkModeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and Spark
 
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsThe Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
 
Building an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 MinutesBuilding an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 Minutes
 
O'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesO'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data Pipelines
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Real-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQLReal-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQL
 
Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
 
Driving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive AnalyticsDriving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive Analytics
 
Internet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureInternet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data Infrastructure
 
See who is using MemSQL
See who is using MemSQLSee who is using MemSQL
See who is using MemSQL
 
Machines and the Magic of Fast Learning
Machines and the Magic of Fast LearningMachines and the Magic of Fast Learning
Machines and the Magic of Fast Learning
 
Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for Fluvius
 
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQLBuilding Real-Time Data Pipelines with Kafka, Spark, and MemSQL
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL
 
Enabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoT
 
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
 
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsWebinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud Analytics
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
 
Spark Summit East Keynote by Anjul Bhambhri
Spark Summit East Keynote by Anjul BhambhriSpark Summit East Keynote by Anjul Bhambhri
Spark Summit East Keynote by Anjul Bhambhri
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017
 

Andere mochten auch

Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and SparkReal-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and SparkSingleStore
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingSingleStore
 
ION performance brief hp dl980-8b
ION performance brief   hp dl980-8bION performance brief   hp dl980-8b
ION performance brief hp dl980-8bLouis liu
 
Huawei SAPPHIRE presentation on KunLun 32-socket server
Huawei SAPPHIRE presentation on KunLun 32-socket serverHuawei SAPPHIRE presentation on KunLun 32-socket server
Huawei SAPPHIRE presentation on KunLun 32-socket serverMike Nelson
 
Introducing MemSQL 4
Introducing MemSQL 4Introducing MemSQL 4
Introducing MemSQL 4SingleStore
 
MemSQL DB Class, Ankur Goyal
MemSQL DB Class, Ankur GoyalMemSQL DB Class, Ankur Goyal
MemSQL DB Class, Ankur GoyalSingleStore
 
Spark and the Enterprise by Tony Baer
Spark and the Enterprise by Tony BaerSpark and the Enterprise by Tony Baer
Spark and the Enterprise by Tony BaerSpark Summit
 
MemSQL - The Real-time Analytics Platform
MemSQL - The Real-time Analytics PlatformMemSQL - The Real-time Analytics Platform
MemSQL - The Real-time Analytics PlatformSingleStore
 
In-Memory Database System Built for Speed and Scale
In-Memory Database System Built for Speed and ScaleIn-Memory Database System Built for Speed and Scale
In-Memory Database System Built for Speed and ScaleSingleStore
 
Elevating customer analytics - how to gain a 720 degree view of your customer
Elevating customer analytics - how to gain a 720 degree view of your customerElevating customer analytics - how to gain a 720 degree view of your customer
Elevating customer analytics - how to gain a 720 degree view of your customerActian Corporation
 
The Road To RAM - Carlos Bueno, MemSQL
The Road To RAM - Carlos Bueno, MemSQLThe Road To RAM - Carlos Bueno, MemSQL
The Road To RAM - Carlos Bueno, MemSQLSingleStore
 
INTRODUCING: CREATE PIPELINE
INTRODUCING: CREATE PIPELINEINTRODUCING: CREATE PIPELINE
INTRODUCING: CREATE PIPELINESingleStore
 
Journey to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme GrowthJourney to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme GrowthSingleStore
 
In-Memory Database Performance on AWS M4 Instances
In-Memory Database Performance on AWS M4 InstancesIn-Memory Database Performance on AWS M4 Instances
In-Memory Database Performance on AWS M4 InstancesSingleStore
 
The Magic of Tuning in PostgreSQL
The Magic of Tuning in PostgreSQLThe Magic of Tuning in PostgreSQL
The Magic of Tuning in PostgreSQLAshnikbiz
 
Virtual san hardware guidance & best practices
Virtual san hardware guidance & best practicesVirtual san hardware guidance & best practices
Virtual san hardware guidance & best practicessolarisyougood
 
Lambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie StricklandLambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie StricklandSpark Summit
 
Spark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken TsaiSpark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken TsaiSpark Summit
 

Andere mochten auch (19)

Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and SparkReal-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
 
ION performance brief hp dl980-8b
ION performance brief   hp dl980-8bION performance brief   hp dl980-8b
ION performance brief hp dl980-8b
 
Huawei SAPPHIRE presentation on KunLun 32-socket server
Huawei SAPPHIRE presentation on KunLun 32-socket serverHuawei SAPPHIRE presentation on KunLun 32-socket server
Huawei SAPPHIRE presentation on KunLun 32-socket server
 
Introducing MemSQL 4
Introducing MemSQL 4Introducing MemSQL 4
Introducing MemSQL 4
 
MemSQL DB Class, Ankur Goyal
MemSQL DB Class, Ankur GoyalMemSQL DB Class, Ankur Goyal
MemSQL DB Class, Ankur Goyal
 
Spark and the Enterprise by Tony Baer
Spark and the Enterprise by Tony BaerSpark and the Enterprise by Tony Baer
Spark and the Enterprise by Tony Baer
 
MemSQL - The Real-time Analytics Platform
MemSQL - The Real-time Analytics PlatformMemSQL - The Real-time Analytics Platform
MemSQL - The Real-time Analytics Platform
 
In-Memory Database System Built for Speed and Scale
In-Memory Database System Built for Speed and ScaleIn-Memory Database System Built for Speed and Scale
In-Memory Database System Built for Speed and Scale
 
Elevating customer analytics - how to gain a 720 degree view of your customer
Elevating customer analytics - how to gain a 720 degree view of your customerElevating customer analytics - how to gain a 720 degree view of your customer
Elevating customer analytics - how to gain a 720 degree view of your customer
 
The Road To RAM - Carlos Bueno, MemSQL
The Road To RAM - Carlos Bueno, MemSQLThe Road To RAM - Carlos Bueno, MemSQL
The Road To RAM - Carlos Bueno, MemSQL
 
INTRODUCING: CREATE PIPELINE
INTRODUCING: CREATE PIPELINEINTRODUCING: CREATE PIPELINE
INTRODUCING: CREATE PIPELINE
 
MemSQL
MemSQLMemSQL
MemSQL
 
Journey to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme GrowthJourney to the Real-Time Analytics in Extreme Growth
Journey to the Real-Time Analytics in Extreme Growth
 
In-Memory Database Performance on AWS M4 Instances
In-Memory Database Performance on AWS M4 InstancesIn-Memory Database Performance on AWS M4 Instances
In-Memory Database Performance on AWS M4 Instances
 
The Magic of Tuning in PostgreSQL
The Magic of Tuning in PostgreSQLThe Magic of Tuning in PostgreSQL
The Magic of Tuning in PostgreSQL
 
Virtual san hardware guidance & best practices
Virtual san hardware guidance & best practicesVirtual san hardware guidance & best practices
Virtual san hardware guidance & best practices
 
Lambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie StricklandLambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie Strickland
 
Spark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken TsaiSpark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken Tsai
 

Ähnlich wie Getting It Right Exactly Once: Principles for Streaming Architectures

AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAmazon Web Services
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAmazon Web Services
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationDatabricks
 
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...Deepak Chandramouli
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...HostedbyConfluent
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스Amazon Web Services Korea
 
Achieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataAchieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataInside Analysis
 
Slides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudSlides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudDATAVERSITY
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
Relevance of time series databases & druid.io
Relevance of time series databases & druid.ioRelevance of time series databases & druid.io
Relevance of time series databases & druid.ioMuniraju V
 
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...Precisely
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraAttunity
 
Igniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner CableIgniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner CableTim Case
 

Ähnlich wie Getting It Right Exactly Once: Principles for Streaming Architectures (20)

AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
 
Crimson 3 - Final case presentation
Crimson 3 - Final case presentationCrimson 3 - Final case presentation
Crimson 3 - Final case presentation
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
 
Achieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataAchieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate Data
 
Slides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudSlides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-Cloud
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Relevance of time series databases & druid.io
Relevance of time series databases & druid.ioRelevance of time series databases & druid.io
Relevance of time series databases & druid.io
 
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
 
Igniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner CableIgniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner Cable
 

Mehr von SingleStore

How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsSingleStore
 
Architecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemSingleStore
 
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeSingleStore
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics SingleStore
 
Building a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQLBuilding a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQLSingleStore
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastSingleStore
 
Introduction to MemSQL
Introduction to MemSQLIntroduction to MemSQL
Introduction to MemSQLSingleStore
 
An Engineering Approach to Database Evaluations
An Engineering Approach to Database EvaluationsAn Engineering Approach to Database Evaluations
An Engineering Approach to Database EvaluationsSingleStore
 
Building a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed ArchitectureBuilding a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed ArchitectureSingleStore
 
Stream Processing with Pipelines and Stored Procedures
Stream Processing with Pipelines  and Stored ProceduresStream Processing with Pipelines  and Stored Procedures
Stream Processing with Pipelines and Stored ProceduresSingleStore
 
Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017SingleStore
 
Image Recognition on Streaming Data
Image Recognition  on Streaming DataImage Recognition  on Streaming Data
Image Recognition on Streaming DataSingleStore
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSingleStore
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondSingleStore
 
How Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementHow Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementSingleStore
 
Teaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AITeaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AISingleStore
 
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid CloudGartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid CloudSingleStore
 
Gartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming DataGartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming DataSingleStore
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSingleStore
 
Real-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleReal-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleSingleStore
 

Mehr von SingleStore (20)

How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and Analytics
 
Architecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS Ecosystem
 
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics
 
Building a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQLBuilding a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQL
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks Webcast
 
Introduction to MemSQL
Introduction to MemSQLIntroduction to MemSQL
Introduction to MemSQL
 
An Engineering Approach to Database Evaluations
An Engineering Approach to Database EvaluationsAn Engineering Approach to Database Evaluations
An Engineering Approach to Database Evaluations
 
Building a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed ArchitectureBuilding a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed Architecture
 
Stream Processing with Pipelines and Stored Procedures
Stream Processing with Pipelines  and Stored ProceduresStream Processing with Pipelines  and Stored Procedures
Stream Processing with Pipelines and Stored Procedures
 
Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017
 
Image Recognition on Streaming Data
Image Recognition  on Streaming DataImage Recognition  on Streaming Data
Image Recognition on Streaming Data
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
 
How Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementHow Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data Management
 
Teaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AITeaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AI
 
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid CloudGartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
 
Gartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming DataGartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming Data
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
 
Real-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleReal-Time Analytics at Uber Scale
Real-Time Analytics at Uber Scale
 

Kürzlich hochgeladen

Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxFinatron037
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxdhiyaneswaranv1
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 

Kürzlich hochgeladen (16)

Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptx
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 

Getting It Right Exactly Once: Principles for Streaming Architectures

  • 1. Getting It Right Exactly Once: Principles for Streaming Architectures Darryl Smith, Chief Data Platform Architect and Distinguished Engineer, Dell Technologies September 2016 | Strata+Hadoop World, NY
  • 2. 2 Getting Started  I’m Darryl Smith • Chief Data Platform Architect and Distinguished Engineer Dell Technologies  Agenda • Real-Time And The Need For Streaming • Adding Real-Time And Streaming To The Data Lake • Results, Plans, Lessons Learned • Demonstration
  • 3. 3 Trickle, Flood, or Torrent… Streaming is about continuous data motion, more than speed or volume
  • 4. 4 The Conversation Around Streaming Website and Mobile Application Logs Internet of Things Sensors
  • 5. The Enterprise Reality 5 Batch > Real-Time > Streaming Enterprise Opportunities Immediate Business Advantage Website and Mobile Application Logs Internet of Things Sensors
  • 6. 6 The Enterprise Streaming Play Moving from batch to real-time streams avoids surges, normalizes compute, and drives value
  • 7. 7 Real time and the need for streaming
  • 8. 8 Drive DellEMC towards a Predictive Enterprise via intelligent data driving agility, increasing revenue and productivity resulting in a competitive advantage Analytics Vision
  • 9. 9  Need to use new data for competitive advantage • Volume, Variety and Velocity  Leverage near real time and streaming data sets to optimize predictions • Make faster, better decisions  Cost-effectively scale to improve query and load performance  Put the data in the hands of the business Becoming An Analytical Enterprise DRIVE COMPETITIVE ADVANTAGE COST- EFFECTIVELY SCALE DATA ACCESS BY BUSINESS NEAR REAL-TIME ANALYTICS
  • 10. 10 Problem Statement Teams do not have access to maintenance renewal quotes in the timeframes or the degree of quality which they need for Tech Refresh and Renewal sales. Desired Outcome Implement a cost-effective, real-time solution that improves productivity and gives confidence to produce desired outcomes efficiently. Scoping The Business Objectives
  • 11. 11 Business Drivers CURRENT REALITY VISION FOR THE FUTURE TO REALIZE THIS VISION: IMPLEMENT CALM SOLUTION PHASES AND OPTIMZE BUSINESS PROCESSES HIGH TOUCH TACTICAL EXECUTION LOW TOUCH SELF SERVICE DATE DRIVEN PROCESSES BUSINESS VALUE DRIVEN PROCESSES INEFFICENCIES & LOST PRODUCTITY INCREASED PRODUCTIVITY SILOED DATA / LIMITED VIEWS SINGLE VIEW OF DATA/DATA SCORING VARIABLE DATA QUALITY DATA QUALITY & CONFIDENCE
  • 12. 12 The Need for “CALM” Customer Asset Lifecycle Management For enterprise sales Who need accurate and timely customer information CALM is a real-time application Providing up to the moment customer 360 dashboards For enterprise sales Who need accurate and timely customer information CALM is a real-time application Providing up to the moment customer 360 o dashboards Install Base Pricing Device Config Contacts Contracts Analytics Contracts Component Data Offers Scorecard
  • 13. 13 Data Lake Architecture D A T A P L A T F O R M V M W A R E V C L O U D S U I T E E X E C U T I O N P R O C E S S GREENPLUM DBSPRING XD PIVOTAL HD Gemfire H A D O O P INGESTION DATAGOVERNANCE Cassandra PostgreSQL MemSQL HDFS ON ISILON HADOOP ON SCALEIO VCE VBLOCK/VxRACK | XTREMIO | DATA DOMAIN A N A L Y T I C S T O O L B O X Network WebSensor SupplierSocial Media Market S T R U C T U R E DU N S T R U C T U R E D CRM PLMERP APPLICATIONS ApacheRangerAttivioCollibra Real-TimeMicro-BatchBatch
  • 14. 14 Data Ingestion • Small to Big Data (high-throughput) • Structured and unstructured Data from any Source • Streams and Batches • Secure, multi-tenant, configurable Framework Real-Time Analytics • Tap into streams for in-memory Analytics • Real Time Data insights and decisions Services • Data Ingestion to Data Lake • Data Lake APIs • Data Alerting Business Data Lake Offerings Unstructured Structured
  • 15. 15 Adding Real Time and Streaming to the Data Lake
  • 16. 16 Seeking A Fast Database A compliment to the business data lake O P C M
  • 17. HammerDB Platform Benchmarks HammerDB workloads testing was done following EMC’s Oracle and SQL Server DBA Teams standard practices.  Definition of workload. Mix of 5 transactions as follows: • New order: receive a new order from a customer: 45% • Payment: update the customer balance to record a payment: 43% • Delivery: deliver orders asynchronously: 4% • Order status: retrieve the status of customer’s most recent order: 4% • Stock level: return the status of the warehouse’s inventory: 4%  Testing scenario: • 100 warehouses 8 vUsers. Database creation and initial data loading. • Timed testing. 20 minutes per each testing session. • Scaled number of virtual users for each testing session from 1 until 44.  No changes done to the systems and databases configuration while running the test.
  • 18. HammerDB Workload Testing  Each test was 16 vCPU x 32 GB RAM • RedHat 6.4 • Oracle 11g R2 • Windows Core 2012 R2 • SQL Server 2012 Ent Ed. • RedHat 6.4 • PostgreSQL 9.3.3
  • 19. HammerDB Workload - Results Results
  • 20. Query PostgreSQL MemSQL Opportunity(5K) 5 seconds 200ms Sales Order(170K) 1-1.5 Minutes 6 seconds Territory(60K) 60 seconds 5 seconds PostgreSQL vs In-Memory DB We picked 5 top queries run by different business functions. Presented here are 3 queries that had response times that did not meet the SLA.
  • 21. 21 Business Data Lake – Ingestion to Fulfillment Raw Data Summary Data DATAGOVERNOR Consumers Predictive/ Prescriptive Analytics Processed Data Analytical Data GREENPLUM DATABASE HADOOP RAW Data INGEST MANAGER SPRING XD SPARK SQOOP Execution Tier CASSANDRAGEMFIRE MEMSQL POSTGRESQL Real-Time Tap
  • 22. 22 Here Are The Data Flows We Built Low Velocity Batch Real-Time
  • 23. 23 Data Flow Patterns – Low Velocity Analytical [BATCH] Ingestion Data Service JDBC Application Presentation [SPEED/SERVING] GREENPLUM DATABASE PIVOTAL HD POSTGRESQL MEMSQL Raw Data One-Time CASSANDRA GEMFIRE
  • 24. Analytical [BATCH] Ingestion Data Service JDBC Application GREENPLUM DATABASE PIVOTAL HD 24 Data Flow Patterns – Batch Batch Presentation [SPEED/SERVING] POSTGRESQL MEMSQL CASSANDRA GEMFIRE
  • 25. 25 Data Flow Patterns – Real Time Real-time Initial Load Analytical [BATCH] Ingestion Data Service JDBC Application GREENPLUM DATABASE PIVOTAL HD Presentation [SPEED/SERVING] POSTGRESQL MEMSQL CASSANDRA GEMFIRE
  • 26. 26 Nothing Closer To Real Time Than Streaming  Let’s look at the leading edge  Apache Kafka  Messaging Semantics • At most once • At least once • Exactly once
  • 28. 28 At least once 01 02 03 04 000 ?
  • 30. 30 Understanding Streaming Semantics At most once At least once Exactly once Message pulled once Message pulled one or more times; processed each time Message pulled one or more times; processed once May or may not be received Receipt guaranteed Receipt guaranteed No duplicates Likely duplicates No duplicates Possible missing data No missing data No missing data 000 ? 000000 ? 01 01 01
  • 31. 31 Rendering In Real Time  Picking the right business intelligence layer • Tableau • Custom Application (CF, D3, Docker) • Additional Third Party Solutions
  • 33. 33 Business Benefits DATA QUERYING Down from 4 hours per quarter to less than 1 minute per year SIMPLIFIED PROVISIONING Reduced number of tables/report required DATA GOVERNANCE Provides one version of the truth TIME TO MARKET Reduced number of tables/report required TOOL AGNOSTIC Business logic in the DB not the tool provides increased flexibility
  • 34. 34 Use Case: Customer Account Profile  STREAMLINED analytics ENVIRONMENT TO GAIN A HOLISTIC CUSTOMER VIEW Service Request Contracts Installed Base Bookings Billings EMC DATA LAKE BDL SERVICES DATA WORKSPACES DATA INGESTION Prof Services 23 BUSINESS MANAGED WORKSPACES
  • 35. 35 Customer Asset Lifecycle Management Platform Roadmap Phase 1 : Foundational Capabilities/Discovery Phase 2 : Scale Platform / Automate Future Phases : Global Standard tool Integrations , advanced Analytics BAaaS/Tableau Scalable Platform Integrated Platform GBS Renewals Inside Sales Additional Business groups Oct 2015 2016 TBDAug 2015 BDL Platform Enablement CollaborationAcceleration In-Memory Capabilities (POC) We are here
  • 36. 36 Data Services Roadmap Security Planned integration into custom BDL security API for managing Role Based Access Control (RBAC) to the underlying data Business Data Lake Plans
  • 37. 37 Lessons Learned – Key Takeaways EDUCATE ASSESS INFRASTRUCTURE JOURNEY Educate the business Use examples of business impact Assess in-house big data skills Ensure plan to support the organization for 3- 5 years Choose the best possible infrastructure Make sure your Big Data technology platform can evolve Remember it is a journey Look for small wins as well as big wins.
  • 38. 38 Lessons Learned: Analytics and Data Sourcing the right skills, working with a different philosophy, and some new tools will help you meet your analytical goals TRANSFORM YOUR PEOPLE CHANGE YOUR PROCESSES ADAPT YOUR TECHNOLOGY  Data science in the organization, IT or both?  Helping business units take initiative  New philosophy to running analytics projects  How and when to share data  Steadily refine toolsets based on needed analysis  Identify to infrastructure layers
  • 40. 40 Demo Agenda Showcase exactly-once semantics from Kafka 1: Data set of 200,000 transactions summing to zero 2: CREATE TABE AND CREATE PIPELINE 3: Push to Kafka and confirm exactly-once 4: Validate Resiliency and confirm exactly-once
  • 41. Step 1: Data Source  start with a data set of 200,000 transactions representing money/goods that sum to zero
  • 42.  200,000 transactions • Transaction number • Increase / Decrease • Amount
  • 43. Step 2: CREATE TABLE AND CREATE PIPELINE  create a table and pipeline in MemSQL that subscribes to that Kafka topic
  • 45. Step 3: Push to Kafka  Push that data set to Kafka  Validate exactly-once delivery by querying MemSQL • show tables; • show pipelines; • select sum(amount) from transactions;  Should be 0 in the demo • select count(*) from transactions;  Should be 200,000 in the demo
  • 46. 46
  • 47. Step 4: Resiliency  induce a failures to show resiliency during exactly-once workflows a. randomly_fail_batches.py b. restart Kafka and show error count c. continue and validate exactly-once semantics
  • 48. 48
  • 50. The mission is clear: We’re moving from batch to real-time with streaming
  • 51. Thank You Darryl Smith Chief Data Platform Architect and Distinguished Engineer Dell Technologies