The talk will focus on explaining why operational databases do not scale due to limitations in legacy transactional management.
https://www.bigdataspain.org/2017/talk/end-of-the-myth-ultra-scalable-transactional-management
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-Peris at Big Data Spain 2017
1.
2. The End of a
Myth: Ultra-
Scalable
Transactional
Management
Presented by:
Ricardo Jimenez-Peris
CEO & Co-founder
@ LeanXcale
3. About the Speaker
Top researcher on scalable transactional management and
distributed data management with 100+ publications in top
conferences and journals
Co-author of a book on Database Replication
Professor on distributed systems and data management for over 25
years
Co-inventor of two granted patents and 8 new patent applications
Invited speaker to top-tech companies in Silicon Valley, such as
Facebook, Twitter, Salesforce, Heroku, EMC-Pivotal (when it was
EMC-Greenplum), HP, Microsoft
4. About LeanXcale
Vendor of a NewSQL
ultra-scalable database,
Full ACID, Full SQL
LeanXcale – HTAP
Database: blending
Operational and
Analytical capabilities
delivering real-time
data
LeanXcale leverages an
ultra-efficient storage
engine, which is a
relational key-value
data store
Product Team
45
%
30%
15
Awards
Total number
PhD Holders
10-25 years of
Industry expertise
Engineers from Top
technical universities
5. The Myth
”Operational databases can not scale”
WHY?
Nobody managed to scale them in
three decades.
Some say that is due to the CAP
Theorem.
- vendors that do not provide ACID properties
6. C - Consistency
A - Availability
P – Partitions
The CAP theorem states something very well
known in distributed systems, i.e. if you want to
tolerate partitions, choose:
Availability at all nodes and no consistency
OR
Consistency and no Availability at all nodes
The CAP Theorem
Q: Where is the S of
Scalability?
A: Nowhere
7. Solved how to scale
transactions to large
scale (i.e. 100 million
update transactions
per second) in a fully
seamless way
Breakthrough result of
15+ years of research
by a tenacious team
The End of the Myth: Ultra-Scalable
Transactions
8. Evaluation without data manager/logging to see how much
throughput can attain the transactional processing
2.35
Million
transactio
ns
per
second
Scalability
14. Separation of commit from the visibility of committed
data
Proactive pre-assignment of commit timestamps to
committing transactions
Transactions can commit in parallel due to:
• They do not conflict
• They have their commit timestamp already assigned that will
determine its serialization order
• Visibility is regulated separately to guarantee the reading of fully
consistent states
Detection and resolution of conflicts before commit
Main Principles
16. Local
Transaction
Manager
Get start TS
Run on start
TS snapshot
Conflict
Manag
er
The transaction will read
the state as of “start TS”.
Write-write conflicts are
detected by conflict
managers on the fly.
Transactional Life Cycle: Execution
17. Get start TS
Run on start
TS snapshot
Commit
The local transaction
manager orchestrates
the commit.
Local Txn
Manager
Transactional Life Cycle: Commit
19. TIMESTAMP 11
TIMESTAMP 15
TIMESTAMP 12
TIMESTAMP 14
TIMESTAMP 13
Time
Sequence of timestamps received by the Snapshot Server
Evolution of the current snapshot at the Snapshot Server
TIMESTAMP
11
TIMESTAMP
12 TIMESTAMP
12 TIMESTAMP
15TIMESTAMP
11
1
1
1
5
1
2
1
4
1
3
1
1
1
1
1
2
1
2
1
5
Transactional Life Cycle: Commit
20. The described approach so far is the original reactive
approach
It results in multiple messages per update transaction.
The adopted approach is proactive:
• The local transaction managers report periodically about
the number of committed update transactions per second
• The commit sequencer distributes batches of commit
timestamps to the local transaction managers
• The snapshot server gets periodically batches of
timestamps (both used and discarded) from local
transaction managers
• The snapshot server reports periodically to local transaction
managers the most current consistent snapshot
Increasing Efficiency
21. The transactional management provides ultra-scalability
Fully transparent:
• No sharding.
• No required a priori knowledge about rows to be
accessed.
• Syntactically: no changes required in the application.
• Semantically: equivalent behavior to a centralized
system.
Provides Snapshot Isolation
(the isolation level provided by Oracle when set to
“Serializable” isolation).
+
+
Transactional Processing
22. KiVi Key-Value
Data Store
OLTP & OLAP
Query Engine
Storage
Transaction Manager
SQL Engine
Ultra-Scalable
Transactions
Architecture
23. Cutting costs of business analytics by 80%
Real-time Analytical Queries
No more ETLs
Analytical Queries
on Operational Data
Operational Database
OLTP
Data Warehouse
OLAP
OLTP + OLAP
Blending OLTP & OLAP:
Making Decisions at the Right Time
25. LeanXcale is the first database technology that can substitute
the mainframe.
It can bear the operational workloads of a mainframe, but at
the same time provide real-time analytics over the
operational data.
It can be deployed by the mainframe to be loaded/updated in
real-time, and applications can be offloaded from the
mainframe one by one.
LeanXcale is partnering with Bull Atos to provide a database
appliance that will provide the substitute of the mainframe.
Offloading/Substituting Mainframe
26. Enabling to implement the Customer Experience Management
(CEM) halving the number of nodes.
Leveraging the computation of aggregates in real-time as raw
KPIs are inserted.
Analytical aggregation queries become simple single-row
queries.
Elasticity enables to substantially reduce the operation
personnel cost during the non-working hours with low loads.
Reducing Cost of Ownership at
Telcos
27. Using the key-value interface for large data ingestion of IoT
applications while still accessible through SQL and reducing by
several times the infrastructure needed.
Real-time analytics.
Computation of aggregates in real-time to reduce the cost of
aggregation analytical queries, e.g., for the smart grid.
Elasticity enable to adjust the consumption of resources to the
load received.
Large IoT Applications
28. Using the key-value interface to reduce the footprint needed
to get clicks
Real-time analytics for implementing availability checking
Elasticity enable to adjust the consumption of resources to
the load received
Full ACIDity to guarantee the consistency of the truth of
sales and actual availability
Disrupting Travel Tech