1. Simply Scalable SQL
Sense, Analyze, Predict and Control in Real-time
Giacomo Ceribelli Pervasive Systems 2018
2. What is CrateDB?
CrateDB is a distributed
SQL database built on top
of a NoSQL foundation.
It combines the familiarity of
SQL with the scalability and
data flexibility of NoSQL.
4. Top Features
❏ Simply Scalable
Automatic data rebalancing and a shared-
nothing architecture enable you to scale
simply.
❏ Transactional
CrateDB is eventually consistent, but offers
transactional semantics. CrateDB is consistent at
the row level, so each row is either fully written or
not.
5. Top Features
❏ Real-time data ingestion
CrateDB can deliver millisecond-speed query
performance, even when writes are in action.
❏ Time series analysis
CrateDB makes time series analysis fast and
easy with automatic table partitions
7. Openness and
flexibility
➢ Run CrateDB anywhere, in your data
center or in the cloud
➢ Connect to CrateDB from most any
language or SQL application
➢ Extend CrateDB functionality by writing
your own plug ins
➢ Deploy CrateDB as a container on
Docker, Kubernetes, or others
➢ Use CrateDB for free, under the Apache
2.0 open source license.
8. Why is Crate
built for IoT?
● Millions of data points per second
● Real-time queries
● Built-in MQTT broker
● IoT Analytics
● At the Edge and the Cloud
9. How to Run
Crate
Just open the terminal and run
bash -c "$(curl -L try.crate.io)"
It will start downloading and
executing Crate in local
10. Edit crate.yml file in order to
configure and run Crate with
selected preferences
e.g.
● cluster.name: pervasiveSystems
● node.name: Node 1
● discovery.zen.minimum_master_nodes: 2
● network.host: _site_
● bootstrap.memory_lock: False
Configuration
11. CrateDB Admin UI
General Tools
● General Overview of the Cluster
● SQL Console for Live Querying
● Table Visualization
● Nodes Visualization
● Live Monitoring
● Users Logs and Privileges
12. How to scale
● Master Nodes must be at
least half of the total nodes.
● Possibility to add a Node in
every situation
● If a Node crash the load is
redistributed without
problems
Master
Node
Master
Node
Slave
Node
Slave
Node
Slave
Node
Cluster
13. JOIN THE CLUSTER
It is possible to run an Instance of
CrateDB on any device as a node,
just download it and join the data
cluster.
14. RealTime Queries
Thanks to SQL Handler,
queries can be executed
as in any other SQL DB
also if built on top of
NoSQL.
SQL syntax is well known
and there is a wide
support on the web.
SQL query
15. Data visualization
Possibility of graph monitoring:
➢ Query Speed
➢ Query per second
➢ CPU (overall and CrateDB) usage
➢ Heap usage
➢ Disk usage
➢ Disk I/O
DISTRIBUYTED SQL ON TOP OF NoSQL
COMBINE SIMPLENESS OF SQL with flexibility and scalability of NoSQL
SCALABLE: Just add new machines to create and grow a CrateDB cluster. There’s no need to know how to redistribute data on the cluster because CrateDB does it for you.
TRANSACTIONAL: By offering read-after-write consistency we allow synchronous real-time access to single records, immediately after they are written.
Even though CrateDB does not support ACID transactions with rollbacks etc, it offers Optimistic Concurrency Control by providing an internal versioning, that allows detection and resolution of write conflicts.
REAL TIME: Analytic data is often loaded in batches, with transactional locks and other overhead. By contrast, CrateDB eliminates locking overhead to enable massive write performance (e.g. 40.000+ inserts per second per node on commodity hardware).
TIME: Time series data is important for identifying trends and anomalies. Partitioning data by time intervals delivers very fast time series query performance.
Bfore it was difficult to create a distributed database, not anymore now
run anywhere,
connect with sql
write own plugins
deploy as a container
free
IOT
milions of data points per second
real-time queries
built-in MQTT broker
IOT analytics
simpleness of running crate
configuration
general tools of Admin GUI
Master and “slaves”
master at least half
add node in every situation
if node crash no problem
SQL syntax is simple and well supported,
fast also if built on top of nosql
visualize data information about query speed and per second
system information
great because anyone can run simply a distributed database over more systems
perfect for iot
mysql and crate connection are the same so just migrate db (esempio)
great for fast scaling with a lot of data but not for small databases