SlideShare a Scribd company logo
1 of 50
©2019, Intechsystems SIA
Preparing for Multi-Cloud Operation
1
20.06.2019
By Konstantin Tjuterev & Oleg Andreyev
©2019, Intechsystems SIA
Why 2 speakers?
• Less slides to prepare for each of us
• We’d like to show our case from 2 perspectives
• Business and high-level architecture
• Nitty-gritty details of implementation
2
©2019, Intechsystems SIA
About Oleg Andreyev
• Senior Software Architect @ Intexsys
• 7+ years in Software Development
3
©2019, Intechsystems SIA
About Konstantin Tjuterev
• Founder and Chief Architect @ Intexsys
• 20+ years in Software Development
4
©2019, Intechsystems SIA
Agenda
• Why do we need Multi-cloud?
• Challenges – why it’s complicated
• Addressing the challenges
5
©2019, Intechsystems SIA
Initial state
• Top 500 Online Retailer in USA
• Existing Proprietary E-commerce Platform
• Multi-component stack (PHP/Symfony, MySQL, Elasticsearch,
Cassandra, RabbitMQ, HAProxy, Varnish, Nodejs)
• 15 online stores
• 1 000 000+ items sold
• $300 million annual turnover
• Hosted on AWS since 2018
6
©2019, Intechsystems SIA
Goals
• Average of $820K daily sales
• Downtime cost is at least $500/minute (820K/24/60)
• In reality, it can go as high as $5000/minute during Black Friday
7
©2019, Intechsystems SIA
What IF?
• What if AWS goes down?
• Never happened?
• But it DID
• And multiple times
8
©2019, Intechsystems SIA
What AWS outage causes
The four-hour AWS outage caused S&P 500 companies to lose $150
million, Cyence, a startup that models the economic impact of cyber
risk, estimated, a Cyence spokeswoman said via email. US financial
services companies lost $160 million, the firm estimated.
That estimate doesn’t include countless other businesses that rely on
S3, on other AWS services that rely on S3, or on service providers that
built their services on Amazon’s cloud
https://www.datacenterknowledge.com/archives/2017/03/02/aws-outage-that-broke-the-internet-caused-by-mistyped-command
9
©2019, Intechsystems SIA
What happened?
10
©2019, Intechsystems SIA 11
©2019, Intechsystems SIA
What IF?
• What if we have a major problem in one of the (clustered) services?
• Elasticsearch cluster issue
• MySQL master issue
• What if we push a wrong button in some infrastructure/deployment
automation tool?
12
©2019, Intechsystems SIA
Disaster Recovery Options
• Restoring from back-ups
• Snapshots of virtual machines/Database dumps
• Will have to spin up the whole infrastructure
• Cold stand-by
• A set of prepared but stopped virtual machines
• Database can be started, but dump must be restored
• Hot stand-by
• A set of running virtual machines not serving the traffic
• Running database replicas
13
©2019, Intechsystems SIA
Criteria
• Single/Shared points of failure
• Time to recovery / potential losses from outage ($5K/minute)
• Time to switching back after restoring operation of the primary
infrastructure / potential losses
• Cost of implementation / Complexity
• Cost of maintenance
• Additional benefits
14
©2019, Intechsystems SIA
Comparison
Option Point of failure Time to
recovery/switching back
Complexity/Costs
Backup If backups are in the same
cloud (AWS) or potential
restoring is to the same
cloud - single
Very Long – 24h at best
(spinning up and
reconfiguring the whole
infrastructure)
Low / Very low (just
storage)
Cold stand-by Depends (can be put in a
different cloud/data-
center)
Medium – 12+h (database
restore) if the cold
infrastructure is up to
date
High / Medium
Hot stand-by Depends (can be put in a
different cloud/data-
center)
Low – less than 1h Very High / High
15
©2019, Intechsystems SIA
What is Multi-Cloud Operation?
• Not a Disaster Recovery – just always running production traffic from
multiple independent clouds
• No single point of failure
• Almost instant recovery in case of Cloud outage - just all traffic is
served by surviving Cloud
• No “failover/switching back” – when Cloud is restored after outage,
we’ll just start sending traffic there
• High complexity/cost, but much better reliability
• Continuously live-tested (monitoring, deployment, real customers)
16
©2019, Intechsystems SIA
Additional benefits
• Blue/green deployments on the whole infrastructure scale
• Running infrastructure related experiments in isolated, but
production environment
• Ability to benefit from cost differences between cloud providers
(given that we’re paying for disaster recovery anyway)
17
©2019, Intechsystems SIA
Why not just AWS Multi-AZ?
• Sometimes AWS fails in all Availability Zones
• Vendor lock
• Complexity of Multi-AZ setup is similar to Multi-Cloud, just shifted
• Single cloud setup becomes easier (just use 1 AZ)
• Cross-cloud setup becomes more complicated
• With the same overall complexity we can get better results
• Better protection – no single point failure
18
©2019, Intechsystems SIA
Challenges
• Pushing source data to Multiple Clouds
• Data Synchronization between Clouds
• Deployment
• Dependencies
• Scheduled jobs
• Traffic balancing
• Monitoring/Alerting
19
©2019, Intechsystems SIA
Pushing data
• RabbitMQ in the office
• Clouds pulling messages and updating data in real-time
• Incoming traffic in Clouds is free
• Read-only databases replication from the office
20
©2019, Intechsystems SIA
Data Synchronization between Clouds
• This is the most challenging part
• We need to replicate relational data (such as orders, users) between
multiple clouds
• We’re using MySQL and are not planning to change that
• So, how to replicate data between clouds?
21
©2019, Intechsystems SIA
MySQL Real Master-Master replication
• Master MySQL nodes running in different clouds
• Both writing Binary logs and executing from each other
• With Multi-Cloud we need to support writes from both clouds
• Initially we were using Auto-increment primary key (as everyone
does)
• It won’t work with Master-Master
22
©2019, Intechsystems SIA
What will happen if…
• John Doe and Peter Doe will both create an account/order
• Requests will be handled by different Cloud
©2019, Intechsystems SIA 24
©2019, Intechsystems SIA
Replication conflict
• Replication will stop
• Replication can be fixed manually by ignoring error
• Multi-Cloud is out of sync
©2019, Intechsystems SIA
How to avoid such situation?
• Setup MySQL Cluster
• or setup Percona XtraDB Cluster
• or setup MariaDB Galera Cluster
©2019, Intechsystems SIA
But...
• “all or nothing approach”
• Your application needs to handle COMMIT
• COMMIT slowness = slowest node in cluster
• Network round-trip time / Certification time / Local apply
• We are not building a cluster…
©2019, Intechsystems SIA 28
©2019, Intechsystems SIA
Other solutions
• Primary Key Auto Increment step for each server (even/odd)
• Primary Key that will not collide
©2019, Intechsystems SIA
Universally unique identifier - UUID
• It’s a 128-bit number
• It’s a 32 hexadecimal digits (128/4)
• Can be referred as GUID
©2019, Intechsystems SIA
Versions of UUID
• Nil UUID – special case of UUID which is equal to NULL and all zeros
• UUID v1 – generated from a time and a node id (MAC address)
• UUID v2 – generated from an identifier, time, and a node id
• UUID v3 – generated by hashing a namespace name-based (md5)
• UUID v4 – generated using a random or pseudo-random number.
• UUID v5 – same as v3 but using sha1
• UUID v6 – optimized version of UUID v1 (unofficial)
©2019, Intechsystems SIA
UUID v1
• It is time based (sorting will not suffer much)
• It can be stored optimized in 16-bytes
• Maximal Average Rate 163 billion per second per node
• Can be tracked back to the server that created it
• Optimized B-Tree
• Less storage required for 16-bytes then for 32 characters
• To UUID or not to UUID ?
• Storing UUID Values in MySQL
©2019, Intechsystems SIA
UUID v1 Structure
33
©2019, Intechsystems SIA
Optimized UUID v1
34
©2019, Intechsystems SIA
But conflicts are still possible…
• Conflicts are possible but not with PK
• Conflicts can be caused by other unique key
©2019, Intechsystems SIA 36
©2019, Intechsystems SIA
But it’s very unlikely to happen because
• Normal replication delay < 1s
• Customer cannot send requests that fast with same data to different
cloud
©2019, Intechsystems SIA
What data to replication between Clouds?
• Each information has it’s source – need to clearly understand that
• Data which is generated by end user (customer/or server)
• Data which is pushed into Cloud by us
©2019, Intechsystems SIA
How to do Database Migrations
• Follow “zero” downtime migrations practices
• Avoid table locking
• Use ALGORITHM=INPLACE, LOCK=NONE when possible
• Do not deploy code that writes into column first
• Always think about Backward compatibility usually without revert
• Run DROP and RENAME after you are fully satisfied
• It’s better to run ALTER manually - more predictable
• Always remember that you are running in Multi-Cloud/Hot-standby
©2019, Intechsystems SIA
Another safety-check for developers
• Create separate users for two types of tables with DDL
• Table that are populated by customer
• Table that are populated by us
• Remove DDL permissions from main user
• Group migrations by “category”
• Before deploying to another Cloud make sure it has SBM = 0
©2019, Intechsystems SIA
How to deploy to Multi-Cloud
• Make sure your application is Cloud agnostic
• Store config in the environment (The Twelve Factors)
• Do not deploy application to all Clouds simultaneously
• Backward compatibility
©2019, Intechsystems SIA
How to deploy assets (JS/CSS)
• Figure out assets lifetime
• Make sure you support few old versions of assets (cache)
• Make sure your assets are Backward Compatible
• If you have some persisted data in Customer space (cookies, local
storage) make sure it compatible between versions
• Monitor and logs your assets
• Make sure that assets hash is auto generated
©2019, Intechsystems SIA
Asset Lifetime
43
©2019, Intechsystems SIA
Distributed CRON
• Do not directly configure CRON on servers
• Scheduling MUST be delegated to independent system
• Determine your clients
• Handle VM “death” – you should be able to switch job fast
• https://mesos.github.io/chronos/
• https://dkron.io
©2019, Intechsystems SIA
Other facts
• We had to upgrade MySQL twice within 6 months, 5.5 -> 5.7 -> 8.0
• 5.7 – GTID, Replication channels
• 8.0 – Replication Filter per Channel
• Use GTID (Global Transaction ID) for consistency
• Use AUTO_POSITION for replication (only with GTID)
©2019, Intechsystems SIA
Brief Summary
• Use UUID to avoid conflicts with Primary Key
• Determine what data needs to be synced
• Monitor your replication with all possible tools
• Use distributes CRON
• Monitor and log your JS/CSS
• Remember about CAP theorem
46
©2019, Intechsystems SIA
Traffic balancing
• DNS – weight-based with health checks
• WAF/CDN + Rules Engine (on CDN Edges)
• Location stickiness
47
©2019, Intechsystems SIA
Routing with cloud stickiness
AWS AZURE
Weight-based DNS
WAF / СDN
Sticky Cookie
Present?
www.site.com
Alive
NO
Request Cookie = AWS?Yes
Yes
No
Set-Cookie: cloud=aws
Set-Cookie: cloud=azure
balanced.site.com
Health-
based
DNS
90%
aws.site.com
AWS Outage
Health-
based
DNS
10%
azure.site.com
Alive
AzureOutage
CDN/Edges
Response
48
©2019, Intechsystems SIA
Summary
• Not everyone needs Multi-cloud
• You need to have clear reasons to do go Multi-cloud
• Disaster recovery
• Speed (geo-based)
• It’s challenging and costly
• But doable even with basic tools/stack (PHP/MySQL)
49
©2019, Intechsystems SIA
Q&A
50

More Related Content

What's hot

Powering a Hybrid Cloud with CommVault and Amazon Web Services - Session Spon...
Powering a Hybrid Cloud with CommVault and Amazon Web Services - Session Spon...Powering a Hybrid Cloud with CommVault and Amazon Web Services - Session Spon...
Powering a Hybrid Cloud with CommVault and Amazon Web Services - Session Spon...Amazon Web Services
 
Cloud computing and migration strategies to cloud
Cloud computing and migration strategies to cloudCloud computing and migration strategies to cloud
Cloud computing and migration strategies to cloudSourabh Saxena
 
Cloud service models 101
Cloud service models 101Cloud service models 101
Cloud service models 101Nagaraj Shenoy
 
Multi-Cloud Microservices - DevOps Summit Silicon Valley 2015
Multi-Cloud Microservices - DevOps Summit Silicon Valley 2015Multi-Cloud Microservices - DevOps Summit Silicon Valley 2015
Multi-Cloud Microservices - DevOps Summit Silicon Valley 2015Jim Bugwadia
 
Migrating national services to the Cloud
Migrating national services to the CloudMigrating national services to the Cloud
Migrating national services to the CloudMike Jones
 
Cloud technologies
Cloud technologiesCloud technologies
Cloud technologiesUma Rangaraj
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud ComputingDavid Wallom
 
Multi Cloud Challanges Review
Multi Cloud Challanges ReviewMulti Cloud Challanges Review
Multi Cloud Challanges ReviewOmid Vahdaty
 
Hybrid Cloud: OpenStack and Other Approaches
  Hybrid Cloud:  OpenStack and Other Approaches  Hybrid Cloud:  OpenStack and Other Approaches
Hybrid Cloud: OpenStack and Other ApproachesMirantis
 
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYCAWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYCAmazon Web Services
 
IT Geek Week 2016 - Introduction To Cloud Computing
IT Geek Week 2016 - Introduction To Cloud ComputingIT Geek Week 2016 - Introduction To Cloud Computing
IT Geek Week 2016 - Introduction To Cloud ComputingHaim Ateya
 
The Best Approach For Multi-cloud Infrastructure Provisioning
The Best Approach For Multi-cloud Infrastructure ProvisioningThe Best Approach For Multi-cloud Infrastructure Provisioning
The Best Approach For Multi-cloud Infrastructure Provisioning Ashnikbiz
 
What is Cloud Hosting? Here is Everything You Must Know About It
What is Cloud Hosting? Here is Everything You Must Know About ItWhat is Cloud Hosting? Here is Everything You Must Know About It
What is Cloud Hosting? Here is Everything You Must Know About ItReal Estate
 
Cloud for the Hybrid Data Center
Cloud for the Hybrid Data CenterCloud for the Hybrid Data Center
Cloud for the Hybrid Data CenterNetAppUK
 
Predictable Business Continuity for Amazon Web Services
Predictable Business Continuity for Amazon Web ServicesPredictable Business Continuity for Amazon Web Services
Predictable Business Continuity for Amazon Web ServicesVeritas Technologies LLC
 
Migration into cloud
Migration into cloud Migration into cloud
Migration into cloud yashsingh205
 

What's hot (20)

Powering a Hybrid Cloud with CommVault and Amazon Web Services - Session Spon...
Powering a Hybrid Cloud with CommVault and Amazon Web Services - Session Spon...Powering a Hybrid Cloud with CommVault and Amazon Web Services - Session Spon...
Powering a Hybrid Cloud with CommVault and Amazon Web Services - Session Spon...
 
soCloud: a multi-cloud paas
soCloud: a multi-cloud paassoCloud: a multi-cloud paas
soCloud: a multi-cloud paas
 
Cloud computing and migration strategies to cloud
Cloud computing and migration strategies to cloudCloud computing and migration strategies to cloud
Cloud computing and migration strategies to cloud
 
Cloud service models 101
Cloud service models 101Cloud service models 101
Cloud service models 101
 
Multi-Cloud Microservices - DevOps Summit Silicon Valley 2015
Multi-Cloud Microservices - DevOps Summit Silicon Valley 2015Multi-Cloud Microservices - DevOps Summit Silicon Valley 2015
Multi-Cloud Microservices - DevOps Summit Silicon Valley 2015
 
Migrating national services to the Cloud
Migrating national services to the CloudMigrating national services to the Cloud
Migrating national services to the Cloud
 
Hybrid cloud and azure stack
Hybrid cloud and azure stackHybrid cloud and azure stack
Hybrid cloud and azure stack
 
Cloud technologies
Cloud technologiesCloud technologies
Cloud technologies
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud Computing
 
Multi Cloud Challanges Review
Multi Cloud Challanges ReviewMulti Cloud Challanges Review
Multi Cloud Challanges Review
 
Hybrid Cloud: OpenStack and Other Approaches
  Hybrid Cloud:  OpenStack and Other Approaches  Hybrid Cloud:  OpenStack and Other Approaches
Hybrid Cloud: OpenStack and Other Approaches
 
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYCAWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
 
IT Geek Week 2016 - Introduction To Cloud Computing
IT Geek Week 2016 - Introduction To Cloud ComputingIT Geek Week 2016 - Introduction To Cloud Computing
IT Geek Week 2016 - Introduction To Cloud Computing
 
The Best Approach For Multi-cloud Infrastructure Provisioning
The Best Approach For Multi-cloud Infrastructure ProvisioningThe Best Approach For Multi-cloud Infrastructure Provisioning
The Best Approach For Multi-cloud Infrastructure Provisioning
 
What is Cloud Hosting? Here is Everything You Must Know About It
What is Cloud Hosting? Here is Everything You Must Know About ItWhat is Cloud Hosting? Here is Everything You Must Know About It
What is Cloud Hosting? Here is Everything You Must Know About It
 
Multi Cloud Architecture Approach
Multi Cloud Architecture ApproachMulti Cloud Architecture Approach
Multi Cloud Architecture Approach
 
Cloud for the Hybrid Data Center
Cloud for the Hybrid Data CenterCloud for the Hybrid Data Center
Cloud for the Hybrid Data Center
 
Predictable Business Continuity for Amazon Web Services
Predictable Business Continuity for Amazon Web ServicesPredictable Business Continuity for Amazon Web Services
Predictable Business Continuity for Amazon Web Services
 
Migration into cloud
Migration into cloud Migration into cloud
Migration into cloud
 
Azure Hybid
Azure HybidAzure Hybid
Azure Hybid
 

Similar to Preparing for Multi-Cloud

20191201 kubernetes managed weblogic revival - part 1
20191201 kubernetes managed weblogic revival - part 120191201 kubernetes managed weblogic revival - part 1
20191201 kubernetes managed weblogic revival - part 1makker_nl
 
The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)Simon Haslam
 
Private cloud-webinar
Private cloud-webinarPrivate cloud-webinar
Private cloud-webinarWSO2
 
Cloudstack for beginners
Cloudstack for beginnersCloudstack for beginners
Cloudstack for beginnersJoseph Amirani
 
Security in the cloud Workshop HSTC 2014
Security in the cloud Workshop HSTC 2014Security in the cloud Workshop HSTC 2014
Security in the cloud Workshop HSTC 2014Akash Mahajan
 
Databricks secure deployments and security baselines, doug march 2022
Databricks secure deployments and security baselines, doug march 2022Databricks secure deployments and security baselines, doug march 2022
Databricks secure deployments and security baselines, doug march 2022Henrik Brattlie
 
Virtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowVirtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowAndrew Miller
 
How to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANsHow to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANsDataCore Software
 
Whats new in Microsoft Windows Server 2016 Clustering and Storage
Whats new in Microsoft Windows Server 2016 Clustering and StorageWhats new in Microsoft Windows Server 2016 Clustering and Storage
Whats new in Microsoft Windows Server 2016 Clustering and StorageJohn Moran
 
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...Andrew Miller
 
Latest trendsincloud computing
Latest trendsincloud computingLatest trendsincloud computing
Latest trendsincloud computingLiliana Ignat
 
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...Hendrik van Run
 
S100298 pendulum-swings-orlando-v1804a
S100298 pendulum-swings-orlando-v1804aS100298 pendulum-swings-orlando-v1804a
S100298 pendulum-swings-orlando-v1804aTony Pearson
 
MySQL Day Paris 2016 - Introducing Oracle MySQL Cloud Service
MySQL Day Paris 2016 - Introducing Oracle MySQL Cloud ServiceMySQL Day Paris 2016 - Introducing Oracle MySQL Cloud Service
MySQL Day Paris 2016 - Introducing Oracle MySQL Cloud ServiceOlivier DASINI
 
Cisco + AWS Stronger Security & Greater AWS Adoption
Cisco + AWS Stronger Security & Greater AWS Adoption Cisco + AWS Stronger Security & Greater AWS Adoption
Cisco + AWS Stronger Security & Greater AWS Adoption Amazon Web Services
 
More Datacenters, More Problems
More Datacenters, More ProblemsMore Datacenters, More Problems
More Datacenters, More ProblemsTodd Palino
 

Similar to Preparing for Multi-Cloud (20)

20191201 kubernetes managed weblogic revival - part 1
20191201 kubernetes managed weblogic revival - part 120191201 kubernetes managed weblogic revival - part 1
20191201 kubernetes managed weblogic revival - part 1
 
The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)
 
Private cloud-webinar
Private cloud-webinarPrivate cloud-webinar
Private cloud-webinar
 
Cloudstack for beginners
Cloudstack for beginnersCloudstack for beginners
Cloudstack for beginners
 
Security in the cloud Workshop HSTC 2014
Security in the cloud Workshop HSTC 2014Security in the cloud Workshop HSTC 2014
Security in the cloud Workshop HSTC 2014
 
25 snowflake
25 snowflake25 snowflake
25 snowflake
 
Databricks secure deployments and security baselines, doug march 2022
Databricks secure deployments and security baselines, doug march 2022Databricks secure deployments and security baselines, doug march 2022
Databricks secure deployments and security baselines, doug march 2022
 
Cloud stack for_beginners
Cloud stack for_beginnersCloud stack for_beginners
Cloud stack for_beginners
 
What is Cloud Computing?
What is Cloud Computing?What is Cloud Computing?
What is Cloud Computing?
 
Virtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowVirtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - Varrow
 
How to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANsHow to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANs
 
Whats new in Microsoft Windows Server 2016 Clustering and Storage
Whats new in Microsoft Windows Server 2016 Clustering and StorageWhats new in Microsoft Windows Server 2016 Clustering and Storage
Whats new in Microsoft Windows Server 2016 Clustering and Storage
 
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
 
AWS for VMware Admins
AWS for VMware AdminsAWS for VMware Admins
AWS for VMware Admins
 
Latest trendsincloud computing
Latest trendsincloud computingLatest trendsincloud computing
Latest trendsincloud computing
 
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
 
S100298 pendulum-swings-orlando-v1804a
S100298 pendulum-swings-orlando-v1804aS100298 pendulum-swings-orlando-v1804a
S100298 pendulum-swings-orlando-v1804a
 
MySQL Day Paris 2016 - Introducing Oracle MySQL Cloud Service
MySQL Day Paris 2016 - Introducing Oracle MySQL Cloud ServiceMySQL Day Paris 2016 - Introducing Oracle MySQL Cloud Service
MySQL Day Paris 2016 - Introducing Oracle MySQL Cloud Service
 
Cisco + AWS Stronger Security & Greater AWS Adoption
Cisco + AWS Stronger Security & Greater AWS Adoption Cisco + AWS Stronger Security & Greater AWS Adoption
Cisco + AWS Stronger Security & Greater AWS Adoption
 
More Datacenters, More Problems
More Datacenters, More ProblemsMore Datacenters, More Problems
More Datacenters, More Problems
 

Recently uploaded

UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 

Recently uploaded (20)

UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 

Preparing for Multi-Cloud

  • 1. ©2019, Intechsystems SIA Preparing for Multi-Cloud Operation 1 20.06.2019 By Konstantin Tjuterev & Oleg Andreyev
  • 2. ©2019, Intechsystems SIA Why 2 speakers? • Less slides to prepare for each of us • We’d like to show our case from 2 perspectives • Business and high-level architecture • Nitty-gritty details of implementation 2
  • 3. ©2019, Intechsystems SIA About Oleg Andreyev • Senior Software Architect @ Intexsys • 7+ years in Software Development 3
  • 4. ©2019, Intechsystems SIA About Konstantin Tjuterev • Founder and Chief Architect @ Intexsys • 20+ years in Software Development 4
  • 5. ©2019, Intechsystems SIA Agenda • Why do we need Multi-cloud? • Challenges – why it’s complicated • Addressing the challenges 5
  • 6. ©2019, Intechsystems SIA Initial state • Top 500 Online Retailer in USA • Existing Proprietary E-commerce Platform • Multi-component stack (PHP/Symfony, MySQL, Elasticsearch, Cassandra, RabbitMQ, HAProxy, Varnish, Nodejs) • 15 online stores • 1 000 000+ items sold • $300 million annual turnover • Hosted on AWS since 2018 6
  • 7. ©2019, Intechsystems SIA Goals • Average of $820K daily sales • Downtime cost is at least $500/minute (820K/24/60) • In reality, it can go as high as $5000/minute during Black Friday 7
  • 8. ©2019, Intechsystems SIA What IF? • What if AWS goes down? • Never happened? • But it DID • And multiple times 8
  • 9. ©2019, Intechsystems SIA What AWS outage causes The four-hour AWS outage caused S&P 500 companies to lose $150 million, Cyence, a startup that models the economic impact of cyber risk, estimated, a Cyence spokeswoman said via email. US financial services companies lost $160 million, the firm estimated. That estimate doesn’t include countless other businesses that rely on S3, on other AWS services that rely on S3, or on service providers that built their services on Amazon’s cloud https://www.datacenterknowledge.com/archives/2017/03/02/aws-outage-that-broke-the-internet-caused-by-mistyped-command 9
  • 12. ©2019, Intechsystems SIA What IF? • What if we have a major problem in one of the (clustered) services? • Elasticsearch cluster issue • MySQL master issue • What if we push a wrong button in some infrastructure/deployment automation tool? 12
  • 13. ©2019, Intechsystems SIA Disaster Recovery Options • Restoring from back-ups • Snapshots of virtual machines/Database dumps • Will have to spin up the whole infrastructure • Cold stand-by • A set of prepared but stopped virtual machines • Database can be started, but dump must be restored • Hot stand-by • A set of running virtual machines not serving the traffic • Running database replicas 13
  • 14. ©2019, Intechsystems SIA Criteria • Single/Shared points of failure • Time to recovery / potential losses from outage ($5K/minute) • Time to switching back after restoring operation of the primary infrastructure / potential losses • Cost of implementation / Complexity • Cost of maintenance • Additional benefits 14
  • 15. ©2019, Intechsystems SIA Comparison Option Point of failure Time to recovery/switching back Complexity/Costs Backup If backups are in the same cloud (AWS) or potential restoring is to the same cloud - single Very Long – 24h at best (spinning up and reconfiguring the whole infrastructure) Low / Very low (just storage) Cold stand-by Depends (can be put in a different cloud/data- center) Medium – 12+h (database restore) if the cold infrastructure is up to date High / Medium Hot stand-by Depends (can be put in a different cloud/data- center) Low – less than 1h Very High / High 15
  • 16. ©2019, Intechsystems SIA What is Multi-Cloud Operation? • Not a Disaster Recovery – just always running production traffic from multiple independent clouds • No single point of failure • Almost instant recovery in case of Cloud outage - just all traffic is served by surviving Cloud • No “failover/switching back” – when Cloud is restored after outage, we’ll just start sending traffic there • High complexity/cost, but much better reliability • Continuously live-tested (monitoring, deployment, real customers) 16
  • 17. ©2019, Intechsystems SIA Additional benefits • Blue/green deployments on the whole infrastructure scale • Running infrastructure related experiments in isolated, but production environment • Ability to benefit from cost differences between cloud providers (given that we’re paying for disaster recovery anyway) 17
  • 18. ©2019, Intechsystems SIA Why not just AWS Multi-AZ? • Sometimes AWS fails in all Availability Zones • Vendor lock • Complexity of Multi-AZ setup is similar to Multi-Cloud, just shifted • Single cloud setup becomes easier (just use 1 AZ) • Cross-cloud setup becomes more complicated • With the same overall complexity we can get better results • Better protection – no single point failure 18
  • 19. ©2019, Intechsystems SIA Challenges • Pushing source data to Multiple Clouds • Data Synchronization between Clouds • Deployment • Dependencies • Scheduled jobs • Traffic balancing • Monitoring/Alerting 19
  • 20. ©2019, Intechsystems SIA Pushing data • RabbitMQ in the office • Clouds pulling messages and updating data in real-time • Incoming traffic in Clouds is free • Read-only databases replication from the office 20
  • 21. ©2019, Intechsystems SIA Data Synchronization between Clouds • This is the most challenging part • We need to replicate relational data (such as orders, users) between multiple clouds • We’re using MySQL and are not planning to change that • So, how to replicate data between clouds? 21
  • 22. ©2019, Intechsystems SIA MySQL Real Master-Master replication • Master MySQL nodes running in different clouds • Both writing Binary logs and executing from each other • With Multi-Cloud we need to support writes from both clouds • Initially we were using Auto-increment primary key (as everyone does) • It won’t work with Master-Master 22
  • 23. ©2019, Intechsystems SIA What will happen if… • John Doe and Peter Doe will both create an account/order • Requests will be handled by different Cloud
  • 25. ©2019, Intechsystems SIA Replication conflict • Replication will stop • Replication can be fixed manually by ignoring error • Multi-Cloud is out of sync
  • 26. ©2019, Intechsystems SIA How to avoid such situation? • Setup MySQL Cluster • or setup Percona XtraDB Cluster • or setup MariaDB Galera Cluster
  • 27. ©2019, Intechsystems SIA But... • “all or nothing approach” • Your application needs to handle COMMIT • COMMIT slowness = slowest node in cluster • Network round-trip time / Certification time / Local apply • We are not building a cluster…
  • 29. ©2019, Intechsystems SIA Other solutions • Primary Key Auto Increment step for each server (even/odd) • Primary Key that will not collide
  • 30. ©2019, Intechsystems SIA Universally unique identifier - UUID • It’s a 128-bit number • It’s a 32 hexadecimal digits (128/4) • Can be referred as GUID
  • 31. ©2019, Intechsystems SIA Versions of UUID • Nil UUID – special case of UUID which is equal to NULL and all zeros • UUID v1 – generated from a time and a node id (MAC address) • UUID v2 – generated from an identifier, time, and a node id • UUID v3 – generated by hashing a namespace name-based (md5) • UUID v4 – generated using a random or pseudo-random number. • UUID v5 – same as v3 but using sha1 • UUID v6 – optimized version of UUID v1 (unofficial)
  • 32. ©2019, Intechsystems SIA UUID v1 • It is time based (sorting will not suffer much) • It can be stored optimized in 16-bytes • Maximal Average Rate 163 billion per second per node • Can be tracked back to the server that created it • Optimized B-Tree • Less storage required for 16-bytes then for 32 characters • To UUID or not to UUID ? • Storing UUID Values in MySQL
  • 35. ©2019, Intechsystems SIA But conflicts are still possible… • Conflicts are possible but not with PK • Conflicts can be caused by other unique key
  • 37. ©2019, Intechsystems SIA But it’s very unlikely to happen because • Normal replication delay < 1s • Customer cannot send requests that fast with same data to different cloud
  • 38. ©2019, Intechsystems SIA What data to replication between Clouds? • Each information has it’s source – need to clearly understand that • Data which is generated by end user (customer/or server) • Data which is pushed into Cloud by us
  • 39. ©2019, Intechsystems SIA How to do Database Migrations • Follow “zero” downtime migrations practices • Avoid table locking • Use ALGORITHM=INPLACE, LOCK=NONE when possible • Do not deploy code that writes into column first • Always think about Backward compatibility usually without revert • Run DROP and RENAME after you are fully satisfied • It’s better to run ALTER manually - more predictable • Always remember that you are running in Multi-Cloud/Hot-standby
  • 40. ©2019, Intechsystems SIA Another safety-check for developers • Create separate users for two types of tables with DDL • Table that are populated by customer • Table that are populated by us • Remove DDL permissions from main user • Group migrations by “category” • Before deploying to another Cloud make sure it has SBM = 0
  • 41. ©2019, Intechsystems SIA How to deploy to Multi-Cloud • Make sure your application is Cloud agnostic • Store config in the environment (The Twelve Factors) • Do not deploy application to all Clouds simultaneously • Backward compatibility
  • 42. ©2019, Intechsystems SIA How to deploy assets (JS/CSS) • Figure out assets lifetime • Make sure you support few old versions of assets (cache) • Make sure your assets are Backward Compatible • If you have some persisted data in Customer space (cookies, local storage) make sure it compatible between versions • Monitor and logs your assets • Make sure that assets hash is auto generated
  • 44. ©2019, Intechsystems SIA Distributed CRON • Do not directly configure CRON on servers • Scheduling MUST be delegated to independent system • Determine your clients • Handle VM “death” – you should be able to switch job fast • https://mesos.github.io/chronos/ • https://dkron.io
  • 45. ©2019, Intechsystems SIA Other facts • We had to upgrade MySQL twice within 6 months, 5.5 -> 5.7 -> 8.0 • 5.7 – GTID, Replication channels • 8.0 – Replication Filter per Channel • Use GTID (Global Transaction ID) for consistency • Use AUTO_POSITION for replication (only with GTID)
  • 46. ©2019, Intechsystems SIA Brief Summary • Use UUID to avoid conflicts with Primary Key • Determine what data needs to be synced • Monitor your replication with all possible tools • Use distributes CRON • Monitor and log your JS/CSS • Remember about CAP theorem 46
  • 47. ©2019, Intechsystems SIA Traffic balancing • DNS – weight-based with health checks • WAF/CDN + Rules Engine (on CDN Edges) • Location stickiness 47
  • 48. ©2019, Intechsystems SIA Routing with cloud stickiness AWS AZURE Weight-based DNS WAF / СDN Sticky Cookie Present? www.site.com Alive NO Request Cookie = AWS?Yes Yes No Set-Cookie: cloud=aws Set-Cookie: cloud=azure balanced.site.com Health- based DNS 90% aws.site.com AWS Outage Health- based DNS 10% azure.site.com Alive AzureOutage CDN/Edges Response 48
  • 49. ©2019, Intechsystems SIA Summary • Not everyone needs Multi-cloud • You need to have clear reasons to do go Multi-cloud • Disaster recovery • Speed (geo-based) • It’s challenging and costly • But doable even with basic tools/stack (PHP/MySQL) 49

Editor's Notes

  1. - Thus, the probability to find a duplicate within 103 trillion version-4 UUIDs is one in a billion. - In practice, collisions are reported,[17][18][19] such incidents are considered as software bugs.
  2. there are 6 versions of UUID Thus, the probability to find a duplicate within 103 trillion version-4 UUIDs is one in a billion. In practice, collisions are reported,such incidents are considered as software bugs.
  3. Great research from Peter Zaitsev (founder of Percona) on using UUID with MySQL Why 16-bytes are imported? Here we need to understand how indices in MySQL/InnoDb are working. In InnoDB your primary key (index) is copied into secondary index If you present UUID as string, it will take 36-bytes
  4. There could replication spikes that show more than 1s delay in replication, but generally such spikes go away within few seconds. Even if servers are located far away Light travels through empty space at ~300000 km per second. The speed of electricity in copper is 95.1% the speed of light. 
  5. There are plenty of articles about zero-downtime migrations, but briefly: When you are adding a column, you must add it with default/or NULL value Execute migrations first Deploy code to servers Be aware that if you are changing schema/table that is synced between Clouds, your ALTER will be replication to another Cloud When you are removing column First you need to make sure that column is not used by anyone And only then deploy migration that removes your code When you need to update a column If it has same datatype, no special actions are required Most complicated is situation when you need to change column type… if you are interested in how to do it, ask me after To avoid table locks, we’ve upgrade our system to MySQL from 5.5 -> 5.7 which support Online DDL Operations
  6. Make sure you do no hardcode any hostname into your application
  7. Google has a good book about Site Reliability , where they have a chapter about Distributed CRON
  8. Some times it’s required to enable multi threaded replication so that your slave would catch-up with MASTER and Seconds Behind Master would be 0 So when we execute ALTER on customer related table, it will be replicated from MASTER binlog into SLAVES relaylog and it’s okay when your running single thread for replication, but when you are running multiple threads (workers) how to user that work haven’t execute ROW/STATEMENT from relaylog, you need to enable GTID which stands for Global Transaction Identifier, it’s associated with each transaction you COMMIT, so when works will read from relaylog what to apply, they will verify that this GTID is not executed by another workers
  9. CAP theorem - Consistency / Availability / Partition tolerance (network partition) When all the above is done, we need to implement traffic balancing and Konstantin will talk about it