7. 7
About Confluent and Apache Kafka
• Founded by the creators of Apache Kafka
• Founded September 2014
• Technology developed while at LinkedIn
• 73% of active Kafka committers
Cheryl
Dalrymple
CFO
Jay
Kreps
CEO
Neha
Narkhede
CTO, VP
Engineering
Luanne
Dauber
CMO
Leadership
Todd
Barnett
VP WW Sales
Jabari
Norton
VP Business
Dev
8. 8
What is a Stream Data Platform?
KAFKA
Stream Data
Platform
Search
NoSQL
RDBMS Monitoring
Stream ProcessingReal-time Analytics Data Warehouse
Apps
Apps
Hadoop
Synchronous Req/Response
0 – 100s ms
Near Real Time
> 100s ms
Offline Batch
> 1 hour
Build streaming applications
Deploy streaming applications at scale
Monitor and manage streaming applications
Common Kafka Use Cases
• Log data
• Database changes
• Sensors and device data
• Monitoring streams
• Call data records
• Monitoring
• Asynchronous
applications
• Fraud and security
12. Fast, Performant Data Storage
Data
Transformation
User
Interface
Architecting for Real-Time Analytics
Database
Message
Queue
Data
Producers
(simulating
sensor activity)
gateway
gateway
...
gateway
14. 14
Designed for Modern Operational Workloads
Scalable SQL
In-Memory
and
Solid-State
Distributed Datacenter or Cloud
▪ Multi-mode
▪ OLTP, OLAP, HTAP
▪ Multi-model
▪ ANSI SQL
▪ Document/JSON
▪ Geospatial
▪ In-Memory rowstore
▪ Solid-state columnstore
▪ Stream directly to rowstore
or columnstore
▪ Distributed query optimizer
and execution
▪ Scale-out on commodity
hardware
▪ Deploy on-premises
▪ Cloud agnostic
▪ Amazon
▪ Microsoft
▪ Google
▪ Digital Ocean
Simple Real-Time Low Cost Flexible
SSD
15. 15
Real-Time Processing Features
▪ Ecosystem Compatibility
• MySQL Wire Protocol
• Stream processing through Integrated Apache Spark
▪ In-Memory Performance
• Code Compilation for SQL queries
• Maximum Concurrency with Lock-free components
• Full Data Durability and High Availability
▪ Distributed System Processing
• Distributed Database Joins
• Distributed Query Optimizer
▪ Multi-mode and Multi-model data
• In-Memory Rowstore and Flash/SSD Columnstore
• SQL, JSON and Geospatial data
16. ▪ MemSQL Streamliner is an integrated MemSQL and Apache Spark solution
▪ Deploys Apache Spark with one click
▪ Creates real-time data pipelines through a graphical UI
▪ Open sourced on GitHub at memsql.github.io/spark-streamliner
Real-Time
Application
Real-Time
Inputs
16
Real-Time Data Pipelines with Spark
STREAMLINER
Apache Spark
Extract, Transform, Load
17. Orchestration / Containers
Cloud / On-Premises Platform
MessagingInputs Real-Time Applications
Business Intelligence
Dashboards
Relational Key-Value Document Geospatial
Existing Data Stores
Rowstore
Columnstore
Real-Time
Data Pipelines
Hadoop Amazon S3MySQL
17
MemSQL Ecosystem and Architecture
21. MemEx: IoT Showcase Application
- Combines MemSQL, Apache Kafka,
and Spark for global supply chain
management
- Enables enterprises to predict
throughput of supply warehouses
- Processes 2 million data points, based
on 2,000 sensors across 1,000
warehouses
28. 28
Real-time drilling sensor data to manage the high stakes of
producing oil in a depressed market and maximizing productivity.
+ Top Energy Firm
28
29. TECHNICAL BENEFITS
- Enabled machine learning scoring of streaming data for real-time
Predictive Analytics
- Integrated SAS BI PMML for deep analytics
- Joined multiple data types and third party sources including
geospatial and weather data
29
30. 30
Spark MLlib Predictive Model
REAL-TIME
INPUTS
Streamliner
Raw Sensor 1 + Predictive Score 1
S1 P1
1
BUSINESS
LOGIC
31. Continued Rise of IoT
31
Sensor Array
PoS Systems
Connected Fleets
Mobile Apps
Security
Reporting Systems
Log Systems
Data Lake
Data Warehouse
Databases
“By 2020, over 20 billion connected things will be in use across a
range of industries; the IoT will touch every role across the enterprise.”
Source: Gartner
32. 32
“These are highly automated drones. They have what is
called sense-and-avoid technology. That means, basically,
seeing and then avoiding obstacles.”
Yahoo, January 2016: https://www.yahoo.com/tech/exclusive-amazon-reveals-details-about-1343951725436982.html
32
Amazon Invests in Drones for 30 Minute
Post-Order Deliveries
33. 33
Fedex Breaks Record With 317 Million
Packages Shipped Over Christmas 2015
“FedEx Ground continues to advance the industry’s most
automated hub network with investments in package sortation
systems that enable flexible and reliable operations and
six-sided scanning tunnels that boost data and image capture.”
FedEx, October 2015: http://about.van.fedex.com/newsroom/global-english/fedex-forecasts-record-volume-this-holiday-season/
33
34. The Evolution of Data Analytics
34
Descriptive Analytics Predictive AnalyticsReal-Time Analytics