By 2020, 50% of all new software will process machine-generated data of some sort (Gartner). Historically, machine data use cases have required non-SQL data stores like Splunk, Elasticsearch, or InfluxDB.
Today, new SQL DB architectures rival the non-SQL solutions in ease of use, scalability, cost, and performance. Please join this webinar for a detailed comparison of machine data management approaches.
2. Logistics…
• Submit questions at any time via the questions panel
• Slides & recording will be shared via email after the event
3. Agenda
–
• Machine data - the next big wave?
• Machine data use cases
• Machine data management options - Splunk, ELK, Time Series,
• Reinventing SQL for machine data
• SQL examples
• Questions & answers
4. I like databases
25 years in DBMS & software development companies
IMHO…the coolest ways software is changing what’s
possible in life and business…is usually due to some
database changing what’s possible with software.
5. The next wave of big data
will come from machines
“Things Data”
6. The next wave…“Things Data”
–
By 2020, 50% of new
software systems are IoT
related
IoT
7. Putting Machine Data to Work
—
• Definitive record of all activity and behavior
- What happened, when, where, by whom
• Tells us how to optimize:
- Customer experience
- Safety
- Production
- Profitability
• Where things are going right vs. wrong
• Fingerprints of fraud
8. Customer:
–
“CrateDB’s real-time SQL
performance, simple scaling, and
high availability make it a key
element of our stack”
Sekhar Sarukkai
Co-founder
Use case: Cyber Security - Campbell, CA
• Leading Cloud Access Security Broker (CASB)
• SaaS system monitors internet traffic for security risks
- 700 customers, 40% of F500
Data Challenges
• Original MySQL-ElasticSearch platform grew too costly to run &
too hard to maintain
- Duplicate data storage, DB syncing code
CrateDB Results
• Replaced MySQL/ElasticSearch with CrateDB in 2015
• ~100TB data, billions of network messages per day
• Real-time queries for 1000s of concurrent users
• 20x faster, 75% lower AWS costs
9. Customer:
–
Use case: Industrial IoT - Atlanta, GA
• $4B producer of bottles for Coca Cola, P&G, Unilever
• 2016 initiative: Use real-time IoT data to optimize overall equipment
effectiveness across 170 factories
Data Challenges
• Diversity - 900 different sensor types per production line
• MS SQL Server too slow and inflexible
- 900 tables (1 per sensor type)
- 3 - 5 minute query response times
CrateDB Results
• Easier development - 1 table vs. 900 in SQL Server
• Faster dashboards - 20ms vs. 4,000ms
• Central cloud + edge deployment = insight on factory floor and in
central “Mission Control”
• Lower labor costs and greater overall equipment effectiveness (OEE)
“Thousands of sensors generate data
along our production lines, and CrateDB
allows us to analyze that firehose of data
24 hours a day to make real-time
improvements to factory efficiency.”
Philipp Lehner,
CEO Alpla, USA
10. Customer:
–
Use case: Smart Lighting - Los Angeles, CA
• $2B global leader in IoT-enabled industrial lighting
• Lighting Burj Khalifa, OfficeMax & Sainsbury’s chains
• Software to control & monitor complex network of lighting, plus
presence, energy, & WiFi sensors
Data Challenges
• MySQL could not scale to support new initiatives:
- Shift to SaaS - central cloud portal
- Real-time reporting
- Time series analysis of operational metrics
CrateDB Results
• Easy migration from MySQL, in weeks
• Simple scaling with CrateDB on Docker
• Real-time data - concurrent SaaS users and API for application
partners
• 40x better DBMS price-performance vs. MySQL
11. Customer:
–
“We need to process massive amounts of
data our customers’ vehicles generate, in
real time. CrateDB offered the best
performance, scalability, and ease-of-use of
any SQL or NoSQL DBMS we tried.”
Mark Sutheran,
Founder, Clickdrive
Use case: Vehicle Fleet Management - Singapore
• Internet-enabled vehicle fleet monitoring system
• Used by Singapore taxis, insurance vehicle fleets
• Real-time monitoring of vehicle location & health, improves fleet
utilization, safety, driver behavior, profitability
Data Challenges
• Real-time vehicle status & location, while ingesting 1,500 data points per
second per car, 24x7
• Data science - query 10s of terabytes of vehicle system data to develop
predictive maintenance algorithms
• MySQL can’t scale, Cassandra required too much tuning
CrateDB Results
• Revealed hidden maintenance issues with 50% of vehicles
• Reduced repair costs 20% by predicting problems earlier
• Data processing speed enabling development of 3D accident recreation
within minutes
12. The Next Wave of Big Data
–
“IoT is creating unparalleled information
management and analytics challenges.”
- Jim Hare, Gartner
Every
Step
Every
Lightbulb
Every
Message
Every
Bottle
•Firehose of data
•Complex data
•Real-time
•Edge + Cloud
Millions of data points per second
Instantly actionable - current & large historic data sets
Run anywhere. Cloud. On-premises Containers. Small
footprint or large clusters with 100+ nodes.
Joins, Time Series, Geospatial, JSON, Text search, AI, Blobs
14. But More Likely …
-
First… Then… Lately…
Log search,
analytics
Full stack -
forwarders,
indexers, search
heads, visualization
Open source
Log search,
analytics
Full stack -
Elasticsearch,
Logstash, Kibana
Time Series,
IT metrics
15. Traditional SQL Splunk, et al
Firehose of data ❌ ✅
Complex queries &
dynamic data
❌ ✅
Fast (Real-time) Queries ❌ ✴
Why Not SQL?
–
16. SQL Mainstream Must be Enabled to Achieve IoT Growth
–
45:1
Ratio of SQL to NoSQL
developers
(Source: LinkedIn)
By 2020, 50% of new
systems are IoT related
IoT
20. CrateDB - the key inventions
–
Distributed SQL with search, time
series, geospatial, aggregations
Cloud-native architecture
easy scaling via Containers
NoSQL storage & clustering for
horizontal scaling & dynamic schema
Columnar Caches for real-time, in-
memory SQL query performance
shared-nothing architecture
21. If you know SQL, you know CrateDB
–
Simple install
Zero-configuration, auto-join
Compatible
ANSI SQL vis Postgres-wire
protocol, JDBC, REST
Real-time performance
Distributed SQL query engine
Dynamic schema
all data (structured + JSON), time
series, geospatial
Distributed SQL query versatility
Aggregations, time series, search,
geospatial…
Simpler scalability
Shared nothing, horizontal scale out
Always on
High availability, replication, self-
healing
Flexible
No lock-in, runs any cloud and on-
premise
22. CrateDB Traditional SQL NoSQL
Firehose of data ✅ ✴ ✅
Complex,
dynamic data ✅ ❌ ✅
Real-Time Queries ✅ ❌ ✴
SQL ✅ ✅ ❌
New DBMS Required for “Things Data” Era?
–
23. Performance?
–
• CrateDB linear scalability
- Performance rises linearly with cluster
size
• CrateDB vs. PostgreSQL
- Complex queries run 29x faster in
CrateDB on 30% lower hardware cost
• InfluxDB (time series)
- 7x more query throughput under
concurrent user load - better for multi-
user time series apps (SaaS)
24. Apps
DB
Input
CrateDB Open Machine Data Stack - build your own with SQL
—
‣ Integrates easily
‣ Low learning curve
‣ Greatest flexibility
‣ No lock in
Custom
SQL Apps
25. Built for the Open Machine Data Stack
—
A database rarely exists independently. Instead, it is usually part of an ecosystem of tools and
other products, with each covering a different need in a data pipeline.
1. Trackers 2. Collectors 3. Enrich 4. Storage
5. Data
Modeling
6. Analytics
26. If You’re Doing Distributed…
–
Gateway
Devices
Servers, Sensors,
Actuators, Machines,
Wearables, Cars etc.
Applications
& PlatformsGateway & DB
Edge Public/Hybrid/Private
shared-nothing architecture
CrateDB enables use-cases at the “edge” and in the cloud, with SQL, horizontal scaling, high availability, and multi-model data
structures. With CrateDB, customers can extract value from realtime data, enabling applications & services not possible before.
27. MQTT Broker & Ingestion Framework
–
• Message queues were invented to compensate for
DBMS weaknesses
- Downtime
- Slow ingestion
• New databases like CrateDB don’t have those
pitfalls
• Embedding MQTT broker in CrateDB
- Define “Ingestion rules” in CrateDB
• MQTT topic —> Target table for storage
- Stores messages in tables
- Eliminates the need for extra middleware
• Lowers hosting costs, complexity, development time
Message Queue
Devices
MQTT messages
versus
DBMS
slow ingest &
DB downtime Fast ingestion. Always-on architecture
Embedded MQTT Broker
MQTT messages
Devices
MQTT Broker
MQTT Consumer/Writer
28. CrateDB Output Plugin for Telegraf
–
• Telegraf is a plugin-driven server for
collecting metrics, usually connecting
to InfluxDB
• New Telegraf plug-in writes to
CrateDB via the PostgreSQL protocol
• More turnkey integration with popular
time series data sources
• Makes it easy to migrate existing time
series data workloads to CrateDB
- For more complex data & queries
- SQL access
- Larger data / time windows
- More concurrent users
Applications
& Platforms
shared-nothing architecture
System
Stats
DBs
Networks
Message
Queues
Apps
Telegraf
Connect CrateDB to
dozens of data sources
SQL
29. Prometheus Integration
–
• Prometheus is a standard time series store
for monitoring IT infrastructure
- Simple, standard systems monitoring data
endpoint e.g. Docker
• Prometheus Remote Adapter for CrateDB
- Developed by RobustPerception.io
- Standard way for Prometheus to pass read/
write requests to other back-end databases
• Docker & other IT software can use CrateDB
for larger, more complex time series analysis
CrateDB
Adapter
Local storage
Unlimited storage
Unlimited data &
query complexity
Remote
read/write
protocol
Prometheus
IT Software
CrateDB
Systems
monitoring
event data
31. Customer - ALPLA
–
•172 factories in 45 countries
•18,000 employees
•Global manufacturer
- Innovation leader
- Cost leader
•Plastic packaging products
- Bottles, caps, …
• eg. every CocaCola bottle in USA
32. Use Case
–
•Through real-time monitoring:
- Increase equipment efficiency (OEE)
- Decrease resource utilization
- Simplify labor management
•Complexity:
- 1500 production lines
- 900 different sensor types
- 160M bottles/day to be measured
33. Data collection
–
Production machine
data is collected at the
edge (Docker, CrateDB)
JSON messages sent
over internet to cloud
Central data storage for
realtime dashboards,
monitoring, alerting,
prediction, machine
learning
34. Solution
–
24x7 central
Mission Control
for all factories
• Scale to all production lines, connect all feeds, collect all raw data
• Aggregate, monitor, predict things from huge data volumes
• Take action from data immediately through tablets, Hololens, etc.
35. Docker in the
cloud
–
• RabbitMQ receiving data
• CrateDB as storage for raw data
• Enrichment of data
• CrateDB as storage for enriched
data
• API
• Realtime management system
• Dashboards
• API for Hololens
RabbitMQ
CrateDB Enrichment
API Dashboards
Hololens …
36. In Summary…
-
• New machine data requirements
- Firehose
- Complex
- Real time
• SQL coming [back] to the rescue
- New DBMS architecture
- Same scale, performance, dynamic data as NoSQL
- Easier learning curve & integration (more choices)
- Better economics
• Splunk & ELK stack a good choice when
- You need turnkey Security Analytics / SIEM
37. Thank You!
-
• CrateDB
- https://crate.io
• Slides & recording of this will be sent to you shortly, via email
• Ping me any time
- Andy Ellicott
- andy@crate.io