We will share Scylla adoption practices in equipment sensor data management of MES, Data Modeling Tips, Data Architecture using Scylla, configurations, and tunings.
CNIC Information System with Pakdata Cf In Pakistan
Scylla Summit 2017: Scylla for Mass Simultaneous Sensor Data Processing of MES (Manufacturing Execution System)
1. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
SCYLLA
in Manufacturing
Principal Engineers, Samsung SDS
Kuyul Noh & Junghyun Park
2. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Kuyul Noh
• 25-year experience in ICT industry
• Principal Data Architect at Samsung SDS
• Planning & Leading ScyllaDB projects for Samsung
Junghyun Park
• 10-year experience in ICT industry
• Senior Data Architect at Samsung SDS
• Leading ScyllaDB adoption projects for Samsung
- 2 / 30 -
3. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Agenda
Use Case in Manufacturing
Samsung SDS?
Lessons Learned
Scylla Managed Service
4. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Samsung SDS ?
5. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
SAMSUNG SDS (1/2)
IT Services Business Solutions Logistics
BPO
Logistics BPO2
Consulting / SI1
Infrastructure Outsourcing
Application Outsourcing
Supply Chain & Logistics
1SI : Systems Integration
2BPO : Business Process Outsourcing
Enterprise Applications
Enterprise Analytics
Enterprise Mobility
▪ As an “IT Solution & Service Provider”, Samsung SDS plays a pivotal role
in improving IT competitiveness across the Samsung Group to become a
top tier company in diverse industries
- 5 / 30 -
6. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
57 Global Offices in 31countries
Global Presence
SDS China
Beijing, China
Global HQ
Seoul, Korea
SDS Latin America
Sao Paulo, Brazil
SDS Asia Pacific
Singapore
SDS America
New Jersey, USA
SDS India
New Delhi, India
SDS Europe
Weybridge, UK
SDS Middle East
Dubai, UAE
Global Footprints
4 SW Centers
29 Logistics Offices
7 Overseas Subsidiaries
11 Data Centers
SAMSUNG SDS (2/2)
- 6 / 30 -
7. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Scylla in SAMSUNG SDS
▪ In-depth technical validation of Scylla solution
▪ Signed a Global Partnership Agreement
▪ Deploying Scylla in Samsung
(Manufacturing, IoT Platform, Communication, Healthcare, etc.)
▪ Preparing Scylla Managed Service in Cloud
- 7 / 30 -
8. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Use Case
in Manufacturing
9. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Gather sensor data from equipment in real time
e.g. Temperature, Pressure
Stop production lines if specification
data exceeds the pre-defined threshold
FDC
➔ Reference data setup for equipment & sensors
➔ Threshold setup for anomaly detection
➔ Dash Board
➔ Data / Trend Viewer
➔ Data Analysis
ScyllaDB #1
ScyllaDB #2
ScyllaDB #3
RDBMS
Meta Data
Sensor data
Use Case Overview
▪ FDC (Fault Detection & Classification) System
- 9 / 30 -
10. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
System Requirements
▪ High throughput (more than 200K Events per Second)
▪ Scalability for production facilities
▪ Lower cost than existing commercial RDBMS (e.g. Oracle Exadata)
▪ Easy deployment and maintenance (Auto Tuning, etc.)
▪ Easy to delete old data (Time To Live, etc.)
- 10 / 30 -
11. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Performance Test (Cassandra vs. ScyllaDB)
▪ Scylla has 2.3 times higher throughput
누적 시간(seconds)
X 100 batch
▪ HW : 16 Cores / 48GB (3 Nodes)
▪ SW : Scylla 1.5 / Cassandra 3.9
▪ Client : Java Program
110 Thread Max
Avg. 282,900
Avg. 159,400
Avg. 124,600
2.3x
Cumulative Time (Seconds)
- 11 / 30 -
12. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Legacy Data Schema (Oracle)
▪ Each sensor data is collected every second
▪ Sensor data occupies more than 80% of the disk
▪ About 19 additional columns (data types) are required
Column Data Type
SensorId (PK) NUMBER
Time (PK) TIMESTAMP
Value NUMBER
Col1 NUMBER
Col2 NUMBER
Step_cd VARCHAR2
… …
19 Columns
- 12 / 30 -
13. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
1st Design (Scylla)
▪ Added Partition Key (daily partitioning)
▪ Added 19 meta-data columns
▪ Default Configuration
Column Data type Key type
PartitionKey text PARTITION KEY
SensorId bigint PARTITION KEY
Time timestamp CLUSTERING KEY
Value Double
Col1 Text
Col2 Text
Step_cd Text
… …
19 Columns
- 13 / 30 -
14. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Challenge #1
▪ Adding additional 19 columns resulted in an enormous amount of data
→ Defined a UDT(User Defined Type) as a group of columns
which is looked up together
CREATE TYPE UDT1 (
step_cd text,
…
);
Column Data type
PartitionKey text
SensorId bigint
Time timestamp
Value Double
Col1 Text
Col2 Text
Detail1 UDT1 (12 column)
Detail2 UDT2 (5 column)
Data size reduced
by more than 50%
- 14 / 30 -
15. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Challenge #2
▪ Failures in deleting some expired data in “DateTieredCompaction” policy
& loop compaction
→ Scylla’s technical support and patch (#2260)
Expired data was deleted
Urgent
Patch
14:31:11 server02 scylla: [shard 0] compaction - Compacted 1
14:31:11 server02 scylla: [shard 0] compaction - Compacting
14:31:12 server02 scylla: [shard 0] compaction - Compacted 1
14:31:12 server02 scylla: [shard 0] compaction - Compacting
14:31:12 server02 scylla: [shard 0] compaction - Compacted 1
14:31:12 server02 scylla: [shard 0] compaction – Compacting
…
<< Loop Compaction >>
<< No Loop Compaction & Expired Data Deletion >>
- 15 / 30 -
16. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Challenge #3
▪ Large partition size caused slow response
→ Changed daily key partitioning to hourly (34MB ➔ 1.4 MB)
→ Used async queries to process multiple partitions simultaneously
2x
Faster
read latency
ScyllaDB
…
Asynchronous 24 Queries for one-day data
Sorted
partition
- 16 / 30 -
17. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Challenge #4
▪ Increased memory usage due to the large size of CompressionInfo File
→ Changed chunk_length_kb value from 4k to 64k
Total memory : 20G
Chunk
length
CompressionInfo
Size (GB)
4k 13
64k 0.8
Size of Data.db file: 1.8 TB Non-LSA memory usage
decreased
13GB ➔ 0.8GB
Use case Recommendation
small single key smaller chunks
large single key larger chunks
range scans larger chunks
mostly writes larger chunks
Size Test ScyllaTeam‘s Guide
- 17 / 30 -
18. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Final Design
▪ Hourly Partitioning + Async Query
▪ UDT (User Defined Type) Columns
▪ chunk_length_kb = 64
Column Data type Key type
PartitionKey text PARTITION KEY
SensorId bigint PARTITION KEY
Time timestamp CLUSTERING KEY
Value Double
Col1 Text
Col2 Text
Detail1 UDT (12 columns)
Detail2 UDT (5 columns)
- 18 / 30 -
19. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Production (1/2)
▪ Read Request
▪ Write Request
✓ As of Now : 3,000 TPS
✓ Near Future : 10,000 TPS
✓ As of Now : 300 TPS
- 19 / 30 -
20. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Production (2/2)
▪ Reactor
▪ Disk Usage
GB
✓ Data past the retention period (31 days) was confirmed to have physically been deleted
✓ As of Now : 550 GB
✓ Near Future : 3 TB
- 20 / 30 -
21. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Lessons Learned
22. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Lessons Learned
▪ “Datetiered” compaction policy for Time Series Data
▪ UDT is an alternative choice for many columns
▪ The Smaller partition size, the Better
▪ Consider Async API for faster range read latency
▪ Design a suitable chunk size for memory utilization
• Reference : http://www.scylladb.com/2017/08/01/compression-chunk-sizes-scylla/
- 22 / 30 -
23. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Voices of the Customer
▪ Very satisfied with the simplicity of the architecture & high performance
▪ Still, some enhancements are required
▪ New storage format (like Cassandra 3.0) ➔ 2.x
▪ Allow row cache to store incomplete partitions ➔ 2.0
▪ Hinted handoff ➔ 2.1
▪ Materialized View ➔ Experimental 2.0, Production 2.2
▪ Secondary Index ➔ 2.2
▪ Time Window Compaction Strategy ➔ 2.1
- 23 / 30 -
24. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Next Step
▪ Through a close collaboration with ScyllaDB team,
we plan to deploy Scylla as a sensor data processing DBMS
across customer’s overseas production plants in the near future
- 24 / 30 -
25. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Scylla
Managed Service
26. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Service Provider
Service User
Management Interface
ConsumerInterfaceClientInterface
Service Management
Scylla Service Management
• Provisioning
• DB Operation
• Monitoring
• Metering, etc.
Infrastructure
Controller
(IaaS)
Resource pool
Infrastructure
DB image,
configurations
DB Management Interface
Admin
Developer
Applications
DB instance
DB instance
DB instance
DB instance
DB instance
DB instance
BSS
OSS
Admin
Managed DB Service
Conceptual Architecture
▪ Preparing for Scylla Managed Service in Cloud
- 26 / 30 -
27. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Service Features
▪ Completed Managed Service features
Category Features
Managed Service
Enterprise
Functionality
Optimization
DevOps
Cluster Provisioning DB Operations DB Monitoring
Configuration Management Backup / Restore Scale In / Out
Data Migration Backup Scheduler
Threshold Management / Alarm Cluster Diagnosis
Schema Management Query Execution
- 27 / 30 -
28. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
28
29. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Kirke
29
▪ Joyent Site
✓ Trial service is now available at
https://www.joyent.com/
30. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
THANK YOU
hanbada@samsung.com
infordb.park@samsung.com
Contact
Any questions?