GridDB is a highly scalable in-memory NoSQL database designed for IoT applications. It uses a key-container data model and provides high performance for periodic time-series data through its in-memory and disk storage architecture. GridDB demonstrated superior performance and stability compared to Cassandra in Yahoo Cloud Services Benchmark tests. It supports features important for IoT like time-series data types and operations.
Purpose-built NoSQL Database for IoT by Basavaraj Soppannavar
1. 1
Basavaraj Soppannavar
Sr. Strategist, IoT
Toshiba America Research Inc.
Purpose-built In-Memory NoSQL Database
For
Internet of Things
5th Aug 2017
Los Angeles
2. Agenda
Internet of Things
IoT Data & its properties
GridDB
Real Use Cases
GridDB by Toshiba 2
4. Internet of Things Predictions
Number of Connected Devices
4GridDB by Toshiba
By 2020 the number of connected devices will be
• 50 Billion – Cisco
• 28.1 Billion* – IDC
• 20.8 Billion* – Gartner
*not including smartphones & computers
Most IoT smart devices aren’t in your home or phone—they are in factories,
businesses, and healthcare – Intel Infographics
• 40.2 % in Business and Manufacturing
• 30.3 % in Healthcare
IoT Revenue projections
• $300 Billion – Gartner
• $470 Billion – Bain
IoT Economics
5. Technology Stack of IoT
Data Aggregation / Processing
Session / Communication
Transport
Link
Connectivity
Data Storage and Retrieval
CoAP, MQTT, DDS, XMPP, AMQP, HTTP
IPV4, IPV6
Ethernet, WiFi, Bluetooth, BLE, Zigbee, Zwave, RFiD, 2G, 3G, LTE
Wireless, USB, RJ45(Ethernet), DSL
Storm, Kafka, Fluentd, RabbitMQ
GridDB, HBase, Cassandra, MongoDB, MS-SQL, Hadoop
Analytics & AIDeviceandDataManagement
SecurityandPrivacy
BI, Visualization, Data Mining, DPP* Analytics, Machine Learning
Applications Mobile, Web, Business Apps
Device Sensors, Embedded chips, Cameras, Wearables
*Descriptive, Predictive, Prescriptive
5GridDB by Toshiba
8. Properties of IoT Data
Periodic
Large volume
but
Small record size
Structured
Time
Stamped
8GridDB by Toshiba
Timestamp Voltage Current Temperature
2017/05/03 10:45:00 100 0.64 20.5
2017/05/03 10:45:30 101 0.63 20.4
2017/05/03 10:46:00 99 0.65 20.5
.
.
.
.
.
.
.
.
.
.
.
.
Single record (size less than 100 bytes)
Millions of records
9. Database Requirements of IoT
Highly Available &
Fault Tolerant
Great read and write
performance for millions
of records
Time series data &
operations support
Fast Search and Range
Queries
Spatial and geo-location
support
Real-time streaming
support
9GridDB by Toshiba
Support for ever-increasing data (Scale Out)
10. Evolution of Database Management Systems
RDBMS
NoSQL DBs
Key Value Store
Wide Column Store
Document Store
Graph Store
Hadoop
OLAP / DW
Riak, Aerospike
Cassandra, HBase
MongoDB, Couchbase
Neo4j
MySQL, Postgres
Cloudera, Hortonworks
Teradata, Vertica, GreenPlum
RDBMS RDBMS
OLAP / DW
Operational / Transactional
Database
Data Warehouse for BI
and Analytics
OLAP – Online Analytical Processing
DW – Data Warehouse
10GridDB by Toshiba
Inspired by Source: https://practicalanalytics.co/2015/06/02/the-maturing-nosql-ecoystem-a-c-level-guide/
90s 2000s Today
15. NoSQL Data Models
15GridDB by Toshiba
• GridDB has a unique Key-Container data model
• Container can be visualized as a table of a Relational Database
• Fixed schema
16. Key Container Data Model
16GridDB by Toshiba
Container is a group of data set with a schema
GridDB supports 2 types of containers
Collection container – For generic records management
Time-series container – For time series records management
Key Container model provides
Data Consistency within the container (ACID is guaranteed within the container)
Faster data retrieval and search because of schema
TQL, an SQL-like query language for reading data from the containers
17. Key Container Data Model - Example
17GridDB by Toshiba
static class SMData {
@RowKey Date timestamp;
int voltage;
double current;
int temp;
}
TimeSeries<SMData> ts = store.putTimeSeries(SM101, SMData.class);
Schema definition
Creating a TS Container
Container name
“Key”
Schema
18. High Performance
18GridDB by Toshiba
GridDB’s hybrid composition of In-Memory and Disk architecture is optimized for maximum performance
Memory from multiple nodes
Node/Server Node Node Node
SSD/DiskSSD/HDD SSD/Disk SSD/Disk
Add new nodes
GridDB 4-node Cluster
In-Memory + Disk Hybrid
Excess data from memory is saved on to SDD/Disk
19. YCSB Performance Results
19GridDB by Toshiba
• Tests performed under same hardware systems (MS Azure Standard_D2 dual core CPUs, 7GB RAM per node)
• 1 client per core; 128 threads per client
*Tests performed by Fixstars
0
100
200
300
400
A B C D F
Avg.Throughput
('000ops/sec)
YCSB Workloads
Throughput - 16 nodes
GridDB
Cassandra
0
100
200
300
400
500
600
700
800
A B C D F
Avg.Throughput
('000ops/sec)
YCSB Workloads
Throughput - 32 nodes
GridDB
Cassandra
0
50
100
150
A B C D F
Latencyin
Microseconds
YCSB Workloads
Read Latency – 16 nodes
GridDB
Cassandra
Yahoo Cloud Servicing Benchmark (YCSB) comparing
GridDB and Cassandra shows that*
Average throughput of GridDB is 4x-5x higher than
that of Cassandra
Average latency of GridDB is 3x-4x lower than that of
Cassandra
20. Superior Stability
20GridDB by Toshiba
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000
Throughput(ops)
Elapsed Time (seconds)
YCSB Workload A 24Hrs Stability test
GridDB
Cassandra
3hrs 15hrs9hrs 21hrs 25hrs
Tests performed by Fixstars
21. High Availability
21GridDB by Toshiba
Advanced Master-Slave Model - Hybrid Cluster Management
• No Single Point of Failure (SPOF) – Master node is selected automatically
• No Split Brain – Quorum Policy is applied
Autonomous Data Distribution
• Data distribution and failover are taken care of automatically
Master
Original Replica
Original Replica
Original Replica
Original Replica
OriginalReplica
Data Distribution Table (Cached)
Hybrid Cluster Management Failover
Node 1 Node 2 Node 3 Node 4 Node 5
Data Replication
Client Client Client
Add new nodes
22. Time Series Features
22GridDB by Toshiba
• TDPA
• GridDB implements Time Series Data Placement Algorithm for high frequency data to maximize
memory utilization
• Expiry Release Function
• Data retention period can be set to a particular period to release the old data and free storage
• Aggregate Functions
• MIN, MAX, AVG, VARIANCE, STDDEV
• Sampling and Interpolation Functions
• TIME_INTERPOLATED, TIME_SAMPLING, TIME_NEXT, TIME_PREVIOUS
• Trigger functions
• JMS and REST notifications
GridDB is optimized for Time-Series operations
23. Real Use Cases
1. Building Energy Management Systems
2. Smart Meters – Electric Power Company
3. Smart City – Ishinomaki City
24. 1. Building Energy Management Systems
24GridDB by Toshiba
• 100+ buildings are managed by the BEMS company in Kawasaki, Japan
• BEMS company manages over 1 Peta Byte (million Gigabytes) of sensor data each year
• Average 5MB data per sensor per day or approximately 2GB data from each sensor per year
• 100-1000 sensors per building depending on the sqft area making the collected sensor data of 1TB per building
per year
GridDB was used for its easy scalability, simple data model and Time Series querying & functions
25. 2. Smart Meters – Electric Power Company
25GridDB by Toshiba
• One of Japan’s top Electric power companies
switched from a Relational Database to
GridDB
• The company saw an increase in throughput
by 2,250 times the old system
• Overall processing time was went down
considerably
• Data center costs reduced significantly
GridDB was used for its high performance, large data handling and reduced cost
26. 2. Smart Meters – Electric Power Company
26GridDB by Toshiba
• Has been running as a real system since April, 2016
• 3 million smart meters` data is collected every 30 minutes and is stored for 3 months
• Data size is approximately 2.6 TB
• 13 billion records
• Record size of 200 bytes
MDMS
MapReduce
Charge Cal. Imbalance Cal.
30 Min. Balancing
MapReduce
Read Value App
AppServer
Data Input GridDB
GridDB
RDB
Preliminary
Results Usage
Power
Retailers
Usage
Power
Retailers3 million
smart meters
SM
SM
SM
3 node cluster 3 node cluster
5 node cluster
Active-Standby Cluster
3 node cluster
4 node cluster
SM – Smart Meter
MDMS – Meter Data Management System
RDB – Relational Database
27. 3. Smart City – Disaster-tolerant Ishinomaki City
27GridDB by Toshiba
GridDB was used for its high speed processing of large data, long-term data retention, maintain consistency
Post 2011 disaster recovery plan of Ishinomaki city
28. PoC of Consignment Charge Calculation System
28GridDB by Toshiba
• 30 million smart meters’ data is collected every 30 minutes
and is stored for 1 month
• Data size is approximately 8.6TB
• 43 billion records
• Record size of 200 bytes
• 1 month charge calculation for 30 million meter data was
executed in 96 minutes
MDMS
Imbalance
(43G records)
5 node cluster
MapReduce
Data Input
(30M data)
GridDB
6 node cluster
30 million
smart meters
SM
SM
SM 8.6TB
Charge
Calculation
(43G records)
Associating
Contract Info.
(30M data)
Execution Time
= 1 min 47 secs
Execution Time
= 9 mins
Execution Time
= 30 mins
Execution Time
= 55 mins
32. Languages and Connectors
• GridDB Community Edition is open sourced and is available on GitHub
• https://github.com/griddb
• Currently supports Java, C/C++, REST, Python & Ruby interfaces
• Go, PHP, Perl and JavaScript drivers will be added in the coming months
• MapReduce connector is available on GitHub
• https://github.com/griddb/griddb_hadoop_mapreduce
• KairosDB connector is available on GitHub
• https://github.com/griddb/griddb_kairosdb
• Spark connector is recently released on GitHub
• https://github.com/griddb/griddb_spark
• Kafka-GridDB integration blog post is up on www.griddb.net website
32GridDB by Toshiba
33. GridDB feature set
33GridDB by Toshiba
Horizontal scaling is near-linear and works great on commodity hardware
• Tested on 100 nodes per cluster, can scale up to 1000 nodes
GridDB's advanced master-slave model eliminates SPOF and split brain
Autonomous data distribution prevents data loss
ACID transactions are guaranteed at the container level
TQL, an SQL-like language for fast querying and analytics
GridDB’s hybrid composition of In-Memory and Disk architecture is optimized for maximum performance
GridDB is custom designed for IoT and other use cases that involve Time Series operations
• TS data types, temporal based querying, geometry type and BLOB types are supported
• Vector sets data type support is in development
38. YCSB
Yahoo Cloud Services Benchmark is an open source benchmarking suite designed by Yahoo
Labs for comparative performance evaluation of NoSQL Database Management Systems
• YCSB is used by DBMS vendors for ‘Benchmark Comparison’
• Traditional benchmarking tools such as TPC (Transaction Processing Performance Council) are used
to compare RDBMS
• YCSB measures/compares various attributes of the DBMS such as Latency, Throughput, Durability,
Scalability, Availability, Read/Write optimization, Sync/Async replication etc.
YCSB has 2 main parts
• YCSB Client – an extensible workload generator
• Client generated standard workloads can also be extended to generate user defined workloads that would be operated
on the system (on DBMS)
• YCSB Core Workloads – a set of scenarios generated by the client to run on the existing system
under test
• Core workloads give a well rounded picture of the system’s performance under test
GridDB by Toshiba 38
39. YCSB Workloads
YCSB has 6 core workloads
GridDB by Toshiba 39
Workload A-
Update heavy
Workload B -
Read mostly
Workload C -
Read only
Workload D -
Read latest
Workload E -
Short Ranges
Workload F -
Read-modify-
write
This workload has a mix of 50/50 reads and writes. An application example is a session store
recording recent actions
This workload has a 95/5 reads/write mix. Application example: photo tagging; add a tag is
an update, but most operations are to read tags
This workload is 100% read. Application example: user profile cache, where profiles are
constructed elsewhere (e.g., Hadoop)
In this workload, new records are inserted, and the most recently inserted records are the
most popular. Application example: user status updates; people want to read the latest
In this workload, short ranges of records are queried, instead of individual records.
Application example: threaded conversations, where each scan is for the posts in a given
thread (assumed to be clustered by thread id)
In this workload, the client will read a record, modify it, and write back the changes.
Application example: user database, where user records are read and modified by the user
or to record user activity