SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Omid: A Transactional Framework for HBase
Francisco Perez-Sorrosal
Ohad Shacham
Hadoop Summit SJ
June 29th, 2016
Outline
 Background
 Basic Concepts
 Use cases
 Architecture
 Transaction Management
 High Availability
 Performance
 Summary
Hadoop Summit SJ (June 29th 2016)2
 New Big data apps → new requirements:
● Low-latency
● Incremental data processing
● e.g. Percolator
 Multiple clients updating same data concurrently
● Problem: Conflicts/Inconsistencies may arise
● Solution: Transactional Access to Data
Background
Hadoop Summit SJ (June 29th 2016)3
 Transaction → Abstract UoW to manage data with certain
guarantees
● ACID
● Relational databases
 Big data → NoSQL datastores → Transactions in NoSQL
● Relaxed Guarantees:
○ e.g. Atomicity, Consistency
Background
● Hard to Scale
○ Data partition
○ Data replication
Hadoop Summit SJ (June 29th 2016)4
 Flexible
 Reliable
 High Performant
 Scalable
…OLTP framework that allows BigData apps to
execute ACID transactions on top of HBase
+ =
Consistency in
BigData Apps
Omid is a…
Hadoop Summit SJ (June 29th 2016)5
Why use Omid?
 Simplifies development of apps requiring consistency
● Multi-row/multi-table transactions on HBase
● Simple & well-known interface
 Good performance & reliability
 Lock-free
 Snapshot Isolation
 HBase is a blackbox
● No HBase code modification
● No changes on table schemas
 Used successfully at Yahoo
Hadoop Summit SJ (June 29th 2016)6
Snapshot Isolation
▪ Transaction T2 overlaps in time with T1 & T3, but spatially:
● T1 ∩ T2 = ∅
● T2 ∩ T3 = { R4 } Transactions T2 and T3 conflict
▪ Transaction T4 does not have conflicts
TxId
T1
T2
T3
T4
Time Overlap Spatial Overlap (WriteSet)
R1 R2 R3 R4
R3 R4
R2 R4
R1 R3
Hadoop Summit SJ (June 29th 2016)7
Sieve
Use Cases: Sieve @ Yahoo
HBase
Internet
Crawler Doc Proc Aggregation
Omid
Feeder
Real-Time
Index
Notifications
Transactional Data Flow
Hadoop Summit SJ (June 29th 2016)8
Hive Metastore Thrift Server
Use Cases:
HBase
HBaseStore
Omid
Hadoop Summit SJ (June 29th 2016)
ObjectStore
Relational
Database
9
Transactional App
Architectural Components
HBase
Omid Client
Transaction Status Oracle
(TSO)
Timestamp
Oracle
Get Start/Commit
Timestamps
Start/Commit TXs
Keep track &
Validate TXs
Commit Table
Compactor
Commit data
R/W data
Guarantee
SI
App Table
Shadow
CellsApp TableApp Table
Shadow
Cells
Hadoop Summit SJ (June 29th 2016)10
Client APIs
▪ Transaction Manager → Create Transactional contexts
Transaction begin();
void commit(Transaction tx);
void rollback(Transaction tx);
▪ Transactional Tables (TTable) → Data access
Result get(Transaction tx, Get g);
void put(Transaction tx, Put p);
ResultScanner getScanner(Transaction tx, Scan s);
Hadoop Summit SJ (June 29th 2016)11
TX Management (Begin TX phase)
Omid Client TSO TO Table/SC CommitTable
Begin TX Get ST
ST=1
TX(ST=1)
R/W Ops for TX (ST=1)
App
Begin TX
R/W Ops (within TX context)
TX Context
R/W Results for TX with ST=1
Read Ops:
Get right results
for TX’s SnapshotWrite Ops:
Build Writeset
for TX
Hadoop Summit SJ (June 29th 2016)12
TX Management (Commit TX Phase)
Omid Client TSO TO Table/SC CommitTable
Commit TX (Writeset)
Get CT
CT=2
TX(CT=2)
App
Commit TX
Check Conflicts
of TX Writeset
in Conflict Map
Persist commit details (ST/CT) for TX
Hadoop Summit SJ (June 29th 2016)13
TX Management (Complete TX Phase)
Omid Client TSO TO Table/SC CommitTable
Update SC for TX (ST=1/CT=2)
App
Complete commit (Cleanup entry for TX with ST=1)
Result
Hadoop Summit SJ (June 29th 2016)14
Transactional App
High Availability
HBase
Omid Client
Transaction Status Oracle
Timestamp
Oracle
Get Start/Commit
Timestamps
Start/Commit TXs
Commit Table
Compactor
Commit data
R/W data
Guarantee
SI
App Table
Shadow
CellsApp TableApp Table
Shadow
Cells
Single
point of
failure
Hadoop Summit SJ (June 29th 2016)15
Timestamp
Oracle
Transaction Status Oracle
Transactional App
High Availability
HBase
Omid Client
Transaction Status Oracle
Timestamp
Oracle
Get Start/Commit
Timestamps
Start/Commit TXs
Commit Table
Compactor
Commit data
R/W data
Guarantee
SI
App Table
Shadow
CellsApp TableApp Table
Shadow
CellsRecovery
State
Primary
/
Backup
Hadoop Summit SJ (June 29th 2016)16
High Availability – Failing Scenario
Omid Client TSO P TSO B Table/SC CommitTableApp
Begin TX
Begin TX Get ST
ST=1
TX(ST=1)
TX 1
TO
Data Store Commit Table
Write(k1, v1) (ST=1)
TX 1 Write(k1, v1)
(k1, v1, 1)
Hadoop Summit SJ (June 29th 2016)17
High Availability – Failing Scenario
Omid Client TSO P TSO B Table/SC CommitTableApp TO
Data Store Commit Table
Write(k2, v2) (ST=1)
Write(k2, v2)
(k1, v1, 1)
(k2, v2, 1)
Commit TX 1{k1, k2}
Commit TX 1
Get CT
CT=2
Persist commit details for TX 1
Hadoop Summit SJ (June 29th 2016)18
High Availability – Failing Scenario
Omid Client TSO B Table/SC CommitTableApp
Begin TX
Begin TX Get ST
ST=3
TX(ST=3)
TX 3
TO
Data Store Commit Table
Read(k1) (ST=3)
TX 3 Read(k1)
(k1, v1, 1)
(k1, v1, 1)
(k2, v2, 1)Hadoop Summit SJ (June 29th 2016)19
High Availability – Failing Scenario
Omid Client TSO B Table/SC CommitTableApp TO
Data Store Commit Table
Return TX 1 CT
(k1, v1, 1)
! exist
! exist
Read(k2) (ST=3)
(k2, v2, 1)
TX 3 Read(k2)
(k2, v2, 1)
CT = 2
Return TX 1 CT
v2
(1, 2)Hadoop Summit SJ (June 29th 2016)20
Timestamp
Oracle
Transaction Status Oracle
Transactional App
High Availability
HBase
Omid Client
Transaction Status Oracle
Timestamp
Oracle
Get Start/Commit
Timestamps
Start/Commit TXs
Commit Table
Compactor
R/W data
Guarantee
SI
App Table
Shadow
CellsApp TableApp Table
Shadow
CellsRecovery
State
Hadoop Summit SJ (June 29th 2016)21
High Availability – Solution
Omid Client TSO P TSO B Table/SC CommitTableApp
Begin TX
Begin TX Get ST
ST=1
TX(ST=1,E=1)
TX 1, 1
TO
Data Store Commit Table
Write(k1, v1) (ST=1)
TX 1 Write(k1, v1)
(k1, v1, 1)
Hadoop Summit SJ (June 29th 2016)22
High Availability – Solution
Omid Client TSO P TSO B Table/SC CommitTableApp TO
Data Store Commit Table
Write(k2, v2) (ST=1)
Write(k2, v2)
(k1, v1, 1)
(k2, v2, 1)
Commit TX 1{k1, k2}
Commit TX 1
Get CT
CT=2
Persist commit details for TX 1
Hadoop Summit SJ (June 29th 2016)23
High Availability – Solution
Omid Client TSO B Table/SC CommitTableApp
Begin TX
Begin TX Get ST
ST=3
TX(ST=3,E=3)
TX 3,3
TO
Data Store Commit Table
Read(k1) (ST=3)
TX 3 Read(k1)
(k1, v1, 1)
(k1, v1, 1)
(k2, v2, 1)Hadoop Summit SJ (June 29th 2016)24
High Availability – Solution
Omid Client TSO B Table/SC CommitTableApp TO
Data Store Commit Table
Return TX1 CT
(k1, v1, 1)
! exist
(k2, v2, 1)
Invalid
Try invalidate
(1, -, invalid)
! exist
Read(k2) (ST=3)
(k2, v2, 1)
TX 3 Read(k2)
Hadoop Summit SJ (June 29th 2016)25
High Availability – Solution
Omid Client TSO B Table/SC CommitTableApp TO
Data Store Commit Table
Return TX 1 CT
(k1, v1, 1)
! exist
! exist
(k2, v2, 1)
(1, 2, invalid)Hadoop Summit SJ (June 29th 2016)26
High Availability
 No runtime overhead in mainstream execution
• Minor overhead after failover
 TSO uses regular writes
 Leases for leader election
• Lease status check before/after writing to Commit Table
Hadoop Summit SJ (June 29th 2016)27
Perf. Improvements: Read-Only Txs
Omid Client TSO/TO Table/SC
Begin TX
TX(ST=1)
Read Ops for TX (ST=1)
App
Begin TX
Read Ops (in TX context)
TX Context
Read Results in Snapshot
Commit TX
Writeset is ∅, so no need to contact TSO!!!Success
Hadoop Summit SJ (June 29th 2016)28
TSO
HBase
Perf. Improvements: Commit Table Writes
Omid
Client
HBase
TSO
Commit
Table
Commit
Data
Omid
Client
Commit
Data
Hadoop Summit SJ (June 29th 2016)29
HBase
TSO
Perf. Improvements: Commit Table Writes
Omid
Client
HBase
TSO
Commit
Table
Commit
Data
Omid
Client
Commit
Data
Hadoop Summit SJ (June 29th 2016)30
0
50
100
150
200
250
300
350
400
1 2 4 6
Tps*103
Commit Table: # Region servers
Omid Throughput with Improvements
Hadoop Summit SJ (June 29th 2016)31
Summary
 Transactions in NoSQL
• Use cases in incremental big data processing
• Snapshot Isolation: Scalable consistency model
 Omid
• Web-scale TPS for HBase
• Reliable and performant
• Battle-tested
• http://omid.incubator.apache.org/
Hadoop Summit SJ (June 29th 2016)32
Questions?
Hadoop Summit SJ (June 29th 2016)33

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 
Data Pipeline at Tapad
Data Pipeline at TapadData Pipeline at Tapad
Data Pipeline at Tapad
 
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
 
Why is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier LeauteWhy is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier Leaute
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Bellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark StreamingBellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark Streaming
 
Realtime Reporting using Spark Streaming
Realtime Reporting using Spark StreamingRealtime Reporting using Spark Streaming
Realtime Reporting using Spark Streaming
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 

Andere mochten auch

Omid Efficient Transaction Mgmt and Processing for HBase
Omid Efficient Transaction Mgmt and Processing for HBaseOmid Efficient Transaction Mgmt and Processing for HBase
Omid Efficient Transaction Mgmt and Processing for HBase
DataWorks Summit
 
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJIntro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Daniel Madrigal
 

Andere mochten auch (20)

Omid Efficient Transaction Mgmt and Processing for HBase
Omid Efficient Transaction Mgmt and Processing for HBaseOmid Efficient Transaction Mgmt and Processing for HBase
Omid Efficient Transaction Mgmt and Processing for HBase
 
Real Time BI with Hadoop
Real Time BI with HadoopReal Time BI with Hadoop
Real Time BI with Hadoop
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
 
Using Hadoop for Cognitive Analytics
Using Hadoop for Cognitive AnalyticsUsing Hadoop for Cognitive Analytics
Using Hadoop for Cognitive Analytics
 
Making the leap to BI on Hadoop by Mariani, dave @ atscale
Making the leap to BI on Hadoop by Mariani, dave @ atscaleMaking the leap to BI on Hadoop by Mariani, dave @ atscale
Making the leap to BI on Hadoop by Mariani, dave @ atscale
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
The Path to Wellness through Big Data
The Path to Wellness through Big DataThe Path to Wellness through Big Data
The Path to Wellness through Big Data
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
HIPAA Compliance in the Cloud
HIPAA Compliance in the CloudHIPAA Compliance in the Cloud
HIPAA Compliance in the Cloud
 
Real Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with SparkReal Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with Spark
 
Open Source Ingredients for Interactive Data Analysis in Spark
Open Source Ingredients for Interactive Data Analysis in Spark Open Source Ingredients for Interactive Data Analysis in Spark
Open Source Ingredients for Interactive Data Analysis in Spark
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of Data
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Extreme Analytics @ eBay
Extreme Analytics @ eBayExtreme Analytics @ eBay
Extreme Analytics @ eBay
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJIntro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 

Ähnlich wie Omid: A Transactional Framework for HBase

Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
Gabriele Modena
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Xiao Qin
 

Ähnlich wie Omid: A Transactional Framework for HBase (20)

Omid: A transactional Framework for HBase
Omid: A transactional Framework for HBaseOmid: A transactional Framework for HBase
Omid: A transactional Framework for HBase
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL Server
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
 
Traitement temps réel chez Streamroot - Golang Paris Juin 2016
Traitement temps réel chez Streamroot - Golang Paris Juin 2016Traitement temps réel chez Streamroot - Golang Paris Juin 2016
Traitement temps réel chez Streamroot - Golang Paris Juin 2016
 
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' MeetupMongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+Tables
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 

Mehr von DataWorks Summit/Hadoop Summit

How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 

Mehr von DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

Omid: A Transactional Framework for HBase

  • 1. Omid: A Transactional Framework for HBase Francisco Perez-Sorrosal Ohad Shacham Hadoop Summit SJ June 29th, 2016
  • 2. Outline  Background  Basic Concepts  Use cases  Architecture  Transaction Management  High Availability  Performance  Summary Hadoop Summit SJ (June 29th 2016)2
  • 3.  New Big data apps → new requirements: ● Low-latency ● Incremental data processing ● e.g. Percolator  Multiple clients updating same data concurrently ● Problem: Conflicts/Inconsistencies may arise ● Solution: Transactional Access to Data Background Hadoop Summit SJ (June 29th 2016)3
  • 4.  Transaction → Abstract UoW to manage data with certain guarantees ● ACID ● Relational databases  Big data → NoSQL datastores → Transactions in NoSQL ● Relaxed Guarantees: ○ e.g. Atomicity, Consistency Background ● Hard to Scale ○ Data partition ○ Data replication Hadoop Summit SJ (June 29th 2016)4
  • 5.  Flexible  Reliable  High Performant  Scalable …OLTP framework that allows BigData apps to execute ACID transactions on top of HBase + = Consistency in BigData Apps Omid is a… Hadoop Summit SJ (June 29th 2016)5
  • 6. Why use Omid?  Simplifies development of apps requiring consistency ● Multi-row/multi-table transactions on HBase ● Simple & well-known interface  Good performance & reliability  Lock-free  Snapshot Isolation  HBase is a blackbox ● No HBase code modification ● No changes on table schemas  Used successfully at Yahoo Hadoop Summit SJ (June 29th 2016)6
  • 7. Snapshot Isolation ▪ Transaction T2 overlaps in time with T1 & T3, but spatially: ● T1 ∩ T2 = ∅ ● T2 ∩ T3 = { R4 } Transactions T2 and T3 conflict ▪ Transaction T4 does not have conflicts TxId T1 T2 T3 T4 Time Overlap Spatial Overlap (WriteSet) R1 R2 R3 R4 R3 R4 R2 R4 R1 R3 Hadoop Summit SJ (June 29th 2016)7
  • 8. Sieve Use Cases: Sieve @ Yahoo HBase Internet Crawler Doc Proc Aggregation Omid Feeder Real-Time Index Notifications Transactional Data Flow Hadoop Summit SJ (June 29th 2016)8
  • 9. Hive Metastore Thrift Server Use Cases: HBase HBaseStore Omid Hadoop Summit SJ (June 29th 2016) ObjectStore Relational Database 9
  • 10. Transactional App Architectural Components HBase Omid Client Transaction Status Oracle (TSO) Timestamp Oracle Get Start/Commit Timestamps Start/Commit TXs Keep track & Validate TXs Commit Table Compactor Commit data R/W data Guarantee SI App Table Shadow CellsApp TableApp Table Shadow Cells Hadoop Summit SJ (June 29th 2016)10
  • 11. Client APIs ▪ Transaction Manager → Create Transactional contexts Transaction begin(); void commit(Transaction tx); void rollback(Transaction tx); ▪ Transactional Tables (TTable) → Data access Result get(Transaction tx, Get g); void put(Transaction tx, Put p); ResultScanner getScanner(Transaction tx, Scan s); Hadoop Summit SJ (June 29th 2016)11
  • 12. TX Management (Begin TX phase) Omid Client TSO TO Table/SC CommitTable Begin TX Get ST ST=1 TX(ST=1) R/W Ops for TX (ST=1) App Begin TX R/W Ops (within TX context) TX Context R/W Results for TX with ST=1 Read Ops: Get right results for TX’s SnapshotWrite Ops: Build Writeset for TX Hadoop Summit SJ (June 29th 2016)12
  • 13. TX Management (Commit TX Phase) Omid Client TSO TO Table/SC CommitTable Commit TX (Writeset) Get CT CT=2 TX(CT=2) App Commit TX Check Conflicts of TX Writeset in Conflict Map Persist commit details (ST/CT) for TX Hadoop Summit SJ (June 29th 2016)13
  • 14. TX Management (Complete TX Phase) Omid Client TSO TO Table/SC CommitTable Update SC for TX (ST=1/CT=2) App Complete commit (Cleanup entry for TX with ST=1) Result Hadoop Summit SJ (June 29th 2016)14
  • 15. Transactional App High Availability HBase Omid Client Transaction Status Oracle Timestamp Oracle Get Start/Commit Timestamps Start/Commit TXs Commit Table Compactor Commit data R/W data Guarantee SI App Table Shadow CellsApp TableApp Table Shadow Cells Single point of failure Hadoop Summit SJ (June 29th 2016)15
  • 16. Timestamp Oracle Transaction Status Oracle Transactional App High Availability HBase Omid Client Transaction Status Oracle Timestamp Oracle Get Start/Commit Timestamps Start/Commit TXs Commit Table Compactor Commit data R/W data Guarantee SI App Table Shadow CellsApp TableApp Table Shadow CellsRecovery State Primary / Backup Hadoop Summit SJ (June 29th 2016)16
  • 17. High Availability – Failing Scenario Omid Client TSO P TSO B Table/SC CommitTableApp Begin TX Begin TX Get ST ST=1 TX(ST=1) TX 1 TO Data Store Commit Table Write(k1, v1) (ST=1) TX 1 Write(k1, v1) (k1, v1, 1) Hadoop Summit SJ (June 29th 2016)17
  • 18. High Availability – Failing Scenario Omid Client TSO P TSO B Table/SC CommitTableApp TO Data Store Commit Table Write(k2, v2) (ST=1) Write(k2, v2) (k1, v1, 1) (k2, v2, 1) Commit TX 1{k1, k2} Commit TX 1 Get CT CT=2 Persist commit details for TX 1 Hadoop Summit SJ (June 29th 2016)18
  • 19. High Availability – Failing Scenario Omid Client TSO B Table/SC CommitTableApp Begin TX Begin TX Get ST ST=3 TX(ST=3) TX 3 TO Data Store Commit Table Read(k1) (ST=3) TX 3 Read(k1) (k1, v1, 1) (k1, v1, 1) (k2, v2, 1)Hadoop Summit SJ (June 29th 2016)19
  • 20. High Availability – Failing Scenario Omid Client TSO B Table/SC CommitTableApp TO Data Store Commit Table Return TX 1 CT (k1, v1, 1) ! exist ! exist Read(k2) (ST=3) (k2, v2, 1) TX 3 Read(k2) (k2, v2, 1) CT = 2 Return TX 1 CT v2 (1, 2)Hadoop Summit SJ (June 29th 2016)20
  • 21. Timestamp Oracle Transaction Status Oracle Transactional App High Availability HBase Omid Client Transaction Status Oracle Timestamp Oracle Get Start/Commit Timestamps Start/Commit TXs Commit Table Compactor R/W data Guarantee SI App Table Shadow CellsApp TableApp Table Shadow CellsRecovery State Hadoop Summit SJ (June 29th 2016)21
  • 22. High Availability – Solution Omid Client TSO P TSO B Table/SC CommitTableApp Begin TX Begin TX Get ST ST=1 TX(ST=1,E=1) TX 1, 1 TO Data Store Commit Table Write(k1, v1) (ST=1) TX 1 Write(k1, v1) (k1, v1, 1) Hadoop Summit SJ (June 29th 2016)22
  • 23. High Availability – Solution Omid Client TSO P TSO B Table/SC CommitTableApp TO Data Store Commit Table Write(k2, v2) (ST=1) Write(k2, v2) (k1, v1, 1) (k2, v2, 1) Commit TX 1{k1, k2} Commit TX 1 Get CT CT=2 Persist commit details for TX 1 Hadoop Summit SJ (June 29th 2016)23
  • 24. High Availability – Solution Omid Client TSO B Table/SC CommitTableApp Begin TX Begin TX Get ST ST=3 TX(ST=3,E=3) TX 3,3 TO Data Store Commit Table Read(k1) (ST=3) TX 3 Read(k1) (k1, v1, 1) (k1, v1, 1) (k2, v2, 1)Hadoop Summit SJ (June 29th 2016)24
  • 25. High Availability – Solution Omid Client TSO B Table/SC CommitTableApp TO Data Store Commit Table Return TX1 CT (k1, v1, 1) ! exist (k2, v2, 1) Invalid Try invalidate (1, -, invalid) ! exist Read(k2) (ST=3) (k2, v2, 1) TX 3 Read(k2) Hadoop Summit SJ (June 29th 2016)25
  • 26. High Availability – Solution Omid Client TSO B Table/SC CommitTableApp TO Data Store Commit Table Return TX 1 CT (k1, v1, 1) ! exist ! exist (k2, v2, 1) (1, 2, invalid)Hadoop Summit SJ (June 29th 2016)26
  • 27. High Availability  No runtime overhead in mainstream execution • Minor overhead after failover  TSO uses regular writes  Leases for leader election • Lease status check before/after writing to Commit Table Hadoop Summit SJ (June 29th 2016)27
  • 28. Perf. Improvements: Read-Only Txs Omid Client TSO/TO Table/SC Begin TX TX(ST=1) Read Ops for TX (ST=1) App Begin TX Read Ops (in TX context) TX Context Read Results in Snapshot Commit TX Writeset is ∅, so no need to contact TSO!!!Success Hadoop Summit SJ (June 29th 2016)28
  • 29. TSO HBase Perf. Improvements: Commit Table Writes Omid Client HBase TSO Commit Table Commit Data Omid Client Commit Data Hadoop Summit SJ (June 29th 2016)29
  • 30. HBase TSO Perf. Improvements: Commit Table Writes Omid Client HBase TSO Commit Table Commit Data Omid Client Commit Data Hadoop Summit SJ (June 29th 2016)30
  • 31. 0 50 100 150 200 250 300 350 400 1 2 4 6 Tps*103 Commit Table: # Region servers Omid Throughput with Improvements Hadoop Summit SJ (June 29th 2016)31
  • 32. Summary  Transactions in NoSQL • Use cases in incremental big data processing • Snapshot Isolation: Scalable consistency model  Omid • Web-scale TPS for HBase • Reliable and performant • Battle-tested • http://omid.incubator.apache.org/ Hadoop Summit SJ (June 29th 2016)32
  • 33. Questions? Hadoop Summit SJ (June 29th 2016)33