These are the slides of the second talk of the first Tech Talk@TransferWise Singapore, which happened on the 23rd of November 2017.
These slides share how TransferWise codebase is moving from a monolith architecture to a microservices architecture.
3. What this talk is about
● How TransferWise started
● Challenges we had when scaling both engineering
organisation and growing customer base
● Our transition strategy and progress to
microservices
● How microservices have helped and what are the
benefits that we have seen
● Where we are today and what new technologies we
adopted in order to make best use of μ-services
architecture
● What are the learnings
5. Organization scaled...
● Build pipeline
○ Build stability
○ Build speed
○ Flaky tests
○ Merge queue
● Deployment
bottleneck
● Code ownership
clarity
● Difficult to have
overview of all
application changes
6. …and more people started using TransferWise.
● Production stability and reliability
○ Batch jobs taking resources from consumer web
○ DDL changes and MySQL table metadata locking
○ Cascading failures
● Security
○ Isolating sensitive concepts: auth, PCI card data
● Performance & scalability
○ Database QPS and CPU utilization
○ RAM requirements for Grails and time it takes to
compile and for application startup
○ Reaching limits of commodity hardware
7. Moving to services.
Measuring progress:
● % of code in
services vs
monolith
● Database QPS
monolith
database vs
service
databases
● API calls to
services vs
monolith
● Amount of
data in service
DBs vs
monolith
database
8. μ-Services benefits.
● We have seen μ-Services to address many of our previous problems:
○ Cleaner architecture and interface definitions: RESTful API, Kafka
messaging leading to functional isolation
○ Engineers can iterate and deploy independently -> no deployment
bottleneck
○ Smaller codebases and easier to change -> tests run fast and simple
fixes can go live in minutes
● Using latest technology and choosing the best stack for the job
○ Apache Spark for fraud analysis, Zookeeper vs database locking
● Scaling of the business
○ Handling exponential growth in payment volumes and web traffic
○ Scaling database load and size, DDL locking problems are isolated
9. Individual service view
Average service size ~12,000 LOC
Config service
REST API
Eureka discovery
Logging and monitoring
● Log server (syslog)
● Rollbar
● New relic
● Zabbix + Victorops
● Grafana
Public gateway
10. μ-Services learnings
Microservices are not only technical evolution, it is also cultural and
organizational mindset shift. We found that there are several prerequisites:
● Infrastructure automation
○ Distributed logging and tracing: syslog and ELK
○ Rapid application deployment: Octopus
○ Rapid provisioning: Ansible
○ Database backups, replication, recovery: ServiceDB setup
● Testing and developing “locally” -> dev-cloud
● API standardisation (for public API product) -> public gateway
● Monitoring: need both business and technical. Many services -> more
things can break. Circuit breakers and dealing with downstream failure.
● Service discovery and client side load balancing with Eureka and Ribbon
Envoy service mesh
● Business analytics with distributed databases: pg_ninja, looker