SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Scaling systems using change
propagation across data stores
Jagadeesh Huliyar
I’ll talk about
➔ Need for Tiered Data Stores in scaling systems and
Role of real time data change propagation systems.
◆ Example : Payments system
● Issues in old data tier architecture & motivation for
new one.
◆ Design Choices
➔ Aesop - Real Time Data Change Propagation System
◆ Aesop Scaling and High Availability
Payments - What does it do?
➔ Managing Transactions - Takes the customer through the life cycle of a Payment
Transaction.
➔ Reconciliation and Settlement - Reconcile with Bank and Settle to Merchants.
➔ Fraud Detection - Detect Payment Fraud
➔ Monitoring and Routing - Monitor for success rate of Transactions on various
dimensions and modifying routing.
Data Needs
Use Case Operation Requirement Data Retention
Transaction Flow Write + Read ACID + Normalised Structure
+ Low Latency
Transactions during Life Cycle +
All Data related to a Transaction
(Data for a Month)
Console Read + Search Denormalized Some attributes of a Transaction
Fraud Detection,
Financial Reports,
Monitoring
Aggregation + Unique
Values for a Transaction
Dimension
Aggregation and Large Data
access
1 year
Archival (Regulation) Reads, Reports Horizontally scalable data
store to store large amounts of
data.
All the Data
Data Needs
Payments - Old Data Tier Architecture
➔ MySql Master + Hot Standby + Slave.
➔ Application writes to Master. Transactional
and Real Time reads from Master.
➔ Historical Reads from Slave.
➔ Analytical and Aggregation queries onto
slave.
Can one store fit all these use cases?
➔ From these signs it was apparent that changes
were required in the data tier design.
➔ The current approach of one data store fits all
required change.
➔ The data tier would have to scale horizontally
and we needed more than one data store.
Multiple Data Stores
Multiple Data Stores - Issues?
➔ Data Consistency
➔ Real Time Data Availability across
stores
ETL?
Classic ETL approach has been around for decades and has a well defined and known solution. However this
was not an option for us because
➔ Data from the secondary stores is used to feed more than just business decisions.
➔ At Payments this data is supposed to feed into REAL TIME use cases like Console, Fraud Detection and
Monitoring Systems.
Dual Writes?
Application writes to destination data stores, synchronously or asynchronously. Application can write to a
Publisher-Subscriber system in which the Subscribers are consumers that eventually write to Destination Data
stores
➔ Pros : Appears Easy : Application can publish the same event that is being inserted/updated in the
Primary Data Source.
➔ Cons : Difficult to maintain consistency
◆ Writes are not Atomic - Ordering Issues
◆ Updates with non-primary-key where clause.
◆ Application Failures and Crashes.
◆ Manual changes in Primary Data Store will be missed.
Log Mining?
Log Mining
Separate application/service can extract changes from Database commit logs and publish them. This would use
the same approach used by database for replication.
➔ Pros : Consistency can be guaranteed as changes are being read from commit logs (bin log in case of
MySql).
➔ Cons
◆ Appears tough - But definitely possible.
◆ Tied to mechanism used by database for replication. Tied to commit log format, etc … Tightly
coupled approach.
Since Consistency across Datastores is of paramount importance to a financial system like Payments we chose
the Log Mining approach.
Approaches to Log Mining
MySql Bin Log Parsing
➔ Pros : Familiar approach
◆ Open source softwares were available
that parsed MySql bin logs. Open
Replicator and Tungsten Replicator
➔ Cons
◆ If format of bin logs changes the parser
would have to change.
◆ Open Replicator was supporting MySql
version 5.5. We would have to modify
Open Replicator to support MySql v5.6
and checksum feature introduced in
v5.6.
Custom Storage Engine
➔ Pros : Independent of binlog format. Layers
above Storage Engine take care of parsing.
➔ Cons : Unfamiliar approach. Unknown pitfalls.
Decided to go with known pitfalls and picked Bin Log
Parsing approach.
Introducing Aesop - Putting it all together
Reliability and Data Consistency
Reliability and Data Consistency
High Availability, Load Balancing and Scaling - Client Cluster
High Availability, Load Balancing and Scaling - Relay HA
Multiple Relay Servers read from the Source Data Sources.
➔ The Clients connect to Relay Server via a LB.
➔ Since the requests from clients are over HTTP one of the
Relay Servers or both can be serving the request based on
the configuration in the LB.
➔ When one Relay goes down the other can still handle the
requests.
Event Transformation
➔ Transforms the event as per the
mapping of source and destination
schema. It maps the source entity to
destination entity. The source
attribute is mapped to destination
attribute within the entity.
➔ A source entity can be mapped to
more than one destination entity
types.
➔ Map-All - one to one
➔ Hierarchical mapping
Monitoring
➔ Dashboard
➔ JMX
Summary
➔ Performance
◆ Relay : 1 XL VM (8 core, 32GB)
◆ Consumers : 4XL VM, 200 partitions
◆ Throughput : 20K-30K Inserts per sec
(MySQL to HBase)
◆ Data size : 500 GB
➔ What it is?
◆ Supports multiple data stores
◆ Delivers updates reliably - at least once
◆ Maintains Ordering within every
Partition
◆ Supports varying consumer speeds
➔ What is it not?
◆ Not exactly-once delivery
◆ Not a storage system
◆ No global ordering
➔ Support For
◆ Source
● MySql
● HBase
◆ Destination
● MySql
● HBase
● Elasticsearch
● Kafka
● Mapped Event Stream
More Details
➔ Project
◆ Open Source : https://github.com/Flipkart/aesop
◆ Support : aesop-users@googlegroups.com
◆ Multiple production deployments at Flipkart
➔ Related Work
◆ LinkedIn Databus
◆ Facebook Wormhole
➔ References
◆ Architecture of a Database System : http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf
◆ Wormhole Paper: https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-sharma.pdf

Weitere ähnliche Inhalte

Andere mochten auch (8)

Mongo for aadhaar
Mongo for aadhaarMongo for aadhaar
Mongo for aadhaar
 
Building the Flipkart phantom
Building the Flipkart phantomBuilding the Flipkart phantom
Building the Flipkart phantom
 
E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaar
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
 
Cataloging of nonbook materials edited
Cataloging of nonbook materials editedCataloging of nonbook materials edited
Cataloging of nonbook materials edited
 
What Is Cataloging?
What Is Cataloging?What Is Cataloging?
What Is Cataloging?
 
How Flipkart scales PHP
How Flipkart scales PHPHow Flipkart scales PHP
How Flipkart scales PHP
 

Ähnlich wie Scaling systems using change propagation across data stores

Software architecture case study - why and why not sql server replication
Software architecture   case study - why and why not sql server replicationSoftware architecture   case study - why and why not sql server replication
Software architecture case study - why and why not sql server replication
Shahzad
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptx
Priyadarshini648418
 
Scalable and Available, Patterns for Success
Scalable and Available, Patterns for SuccessScalable and Available, Patterns for Success
Scalable and Available, Patterns for Success
Derek Collison
 

Ähnlich wie Scaling systems using change propagation across data stores (20)

Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application Design
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application Design
 
Data ware house architecture
Data ware house architectureData ware house architecture
Data ware house architecture
 
Polyglot Persistence
Polyglot Persistence Polyglot Persistence
Polyglot Persistence
 
Hbase hive pig
Hbase hive pigHbase hive pig
Hbase hive pig
 
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick ParkerDevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Understanding System Design and Architecture Blueprints of Efficiency
Understanding System Design and Architecture Blueprints of EfficiencyUnderstanding System Design and Architecture Blueprints of Efficiency
Understanding System Design and Architecture Blueprints of Efficiency
 
Ledingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkLedingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lk
 
Software architecture case study - why and why not sql server replication
Software architecture   case study - why and why not sql server replicationSoftware architecture   case study - why and why not sql server replication
Software architecture case study - why and why not sql server replication
 
System design for video streaming service
System design for video streaming serviceSystem design for video streaming service
System design for video streaming service
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Work with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMsWork with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMs
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptx
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
RESTful Approaches to Financial Systems Integration
RESTful Approaches to Financial Systems IntegrationRESTful Approaches to Financial Systems Integration
RESTful Approaches to Financial Systems Integration
 
Anatomy behind Fast Data Applications.pptx
Anatomy behind Fast Data Applications.pptxAnatomy behind Fast Data Applications.pptx
Anatomy behind Fast Data Applications.pptx
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
 
Scalable and Available, Patterns for Success
Scalable and Available, Patterns for SuccessScalable and Available, Patterns for Success
Scalable and Available, Patterns for Success
 

Scaling systems using change propagation across data stores

  • 1. Scaling systems using change propagation across data stores Jagadeesh Huliyar
  • 2. I’ll talk about ➔ Need for Tiered Data Stores in scaling systems and Role of real time data change propagation systems. ◆ Example : Payments system ● Issues in old data tier architecture & motivation for new one. ◆ Design Choices ➔ Aesop - Real Time Data Change Propagation System ◆ Aesop Scaling and High Availability
  • 3. Payments - What does it do? ➔ Managing Transactions - Takes the customer through the life cycle of a Payment Transaction. ➔ Reconciliation and Settlement - Reconcile with Bank and Settle to Merchants. ➔ Fraud Detection - Detect Payment Fraud ➔ Monitoring and Routing - Monitor for success rate of Transactions on various dimensions and modifying routing.
  • 4. Data Needs Use Case Operation Requirement Data Retention Transaction Flow Write + Read ACID + Normalised Structure + Low Latency Transactions during Life Cycle + All Data related to a Transaction (Data for a Month) Console Read + Search Denormalized Some attributes of a Transaction Fraud Detection, Financial Reports, Monitoring Aggregation + Unique Values for a Transaction Dimension Aggregation and Large Data access 1 year Archival (Regulation) Reads, Reports Horizontally scalable data store to store large amounts of data. All the Data
  • 6. Payments - Old Data Tier Architecture ➔ MySql Master + Hot Standby + Slave. ➔ Application writes to Master. Transactional and Real Time reads from Master. ➔ Historical Reads from Slave. ➔ Analytical and Aggregation queries onto slave.
  • 7. Can one store fit all these use cases?
  • 8. ➔ From these signs it was apparent that changes were required in the data tier design. ➔ The current approach of one data store fits all required change. ➔ The data tier would have to scale horizontally and we needed more than one data store. Multiple Data Stores
  • 9. Multiple Data Stores - Issues? ➔ Data Consistency ➔ Real Time Data Availability across stores
  • 10. ETL? Classic ETL approach has been around for decades and has a well defined and known solution. However this was not an option for us because ➔ Data from the secondary stores is used to feed more than just business decisions. ➔ At Payments this data is supposed to feed into REAL TIME use cases like Console, Fraud Detection and Monitoring Systems.
  • 11. Dual Writes? Application writes to destination data stores, synchronously or asynchronously. Application can write to a Publisher-Subscriber system in which the Subscribers are consumers that eventually write to Destination Data stores ➔ Pros : Appears Easy : Application can publish the same event that is being inserted/updated in the Primary Data Source. ➔ Cons : Difficult to maintain consistency ◆ Writes are not Atomic - Ordering Issues ◆ Updates with non-primary-key where clause. ◆ Application Failures and Crashes. ◆ Manual changes in Primary Data Store will be missed.
  • 12. Log Mining? Log Mining Separate application/service can extract changes from Database commit logs and publish them. This would use the same approach used by database for replication. ➔ Pros : Consistency can be guaranteed as changes are being read from commit logs (bin log in case of MySql). ➔ Cons ◆ Appears tough - But definitely possible. ◆ Tied to mechanism used by database for replication. Tied to commit log format, etc … Tightly coupled approach. Since Consistency across Datastores is of paramount importance to a financial system like Payments we chose the Log Mining approach.
  • 13. Approaches to Log Mining MySql Bin Log Parsing ➔ Pros : Familiar approach ◆ Open source softwares were available that parsed MySql bin logs. Open Replicator and Tungsten Replicator ➔ Cons ◆ If format of bin logs changes the parser would have to change. ◆ Open Replicator was supporting MySql version 5.5. We would have to modify Open Replicator to support MySql v5.6 and checksum feature introduced in v5.6. Custom Storage Engine ➔ Pros : Independent of binlog format. Layers above Storage Engine take care of parsing. ➔ Cons : Unfamiliar approach. Unknown pitfalls. Decided to go with known pitfalls and picked Bin Log Parsing approach.
  • 14. Introducing Aesop - Putting it all together
  • 15. Reliability and Data Consistency
  • 16. Reliability and Data Consistency
  • 17. High Availability, Load Balancing and Scaling - Client Cluster
  • 18. High Availability, Load Balancing and Scaling - Relay HA Multiple Relay Servers read from the Source Data Sources. ➔ The Clients connect to Relay Server via a LB. ➔ Since the requests from clients are over HTTP one of the Relay Servers or both can be serving the request based on the configuration in the LB. ➔ When one Relay goes down the other can still handle the requests.
  • 19. Event Transformation ➔ Transforms the event as per the mapping of source and destination schema. It maps the source entity to destination entity. The source attribute is mapped to destination attribute within the entity. ➔ A source entity can be mapped to more than one destination entity types. ➔ Map-All - one to one ➔ Hierarchical mapping
  • 21. Summary ➔ Performance ◆ Relay : 1 XL VM (8 core, 32GB) ◆ Consumers : 4XL VM, 200 partitions ◆ Throughput : 20K-30K Inserts per sec (MySQL to HBase) ◆ Data size : 500 GB ➔ What it is? ◆ Supports multiple data stores ◆ Delivers updates reliably - at least once ◆ Maintains Ordering within every Partition ◆ Supports varying consumer speeds ➔ What is it not? ◆ Not exactly-once delivery ◆ Not a storage system ◆ No global ordering ➔ Support For ◆ Source ● MySql ● HBase ◆ Destination ● MySql ● HBase ● Elasticsearch ● Kafka ● Mapped Event Stream
  • 22. More Details ➔ Project ◆ Open Source : https://github.com/Flipkart/aesop ◆ Support : aesop-users@googlegroups.com ◆ Multiple production deployments at Flipkart ➔ Related Work ◆ LinkedIn Databus ◆ Facebook Wormhole ➔ References ◆ Architecture of a Database System : http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf ◆ Wormhole Paper: https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-sharma.pdf