How to define a well-organized Cloud Architecture is crucial ingredient for the success of any Start-up. This presentation will bring you lessons learnt at AnyMind Group on the Cloud and how to architect Microservices with Elastic Container Services and Docker using AWS.
3. What we want to do
Enable businesses and professionals across
industries to use AI to work smarter and more efficiently.
Who we are
Previously known as AdAsia Holdings, AnyMind Group was
formed in January 2018 and is the parent company of
AdAsia Holdings, TalentMind and CastingAsia. Along with a
proprietary AI-driven matching engine and end-to-end
SaaS solutions, AnyMind Group aims to provide industries,
businesses and professionals with a suite of solutions to
enable greater efficiency and scale.
Transform industries
through AI
4. Enable advertisers and publishers to
leverage on intelligent tools and end-to-end
solutions to drive business outcomes.
Founded
April 2016
11 Offices
10 Markets
1 Tech Hub
294
Employees
3
Industries
Transform industries through AI
Enable recruitment and human resources
professionals with the ability to make smart,
data-backed decisions and action, enabling
efficiency and scalability through artificial
intelligence.
Equip marketers with the power to drive
influencer marketing campaigns through
artificial intelligence, enabling smart, efficient
and scalable influencer marketing.
5. AnyMind Group Products Overview
AI-driven marketing platform for
marketers, advertisers, and
publishers
AI-powered influencer marketing
platform that connects advertisers
and influencers
Enabling enterprises to streamline
and enhance hiring process with
our AI solutions
6. AnyMind Group Product Challenges
Constantly fast delivery from
Development to Production
Continuously improve
productivity of Development and
Operation in Product Team
8. Starting point
We started our Cloud experience using Google Cloud. The pricing on
virtual machines on Google Cloud was one of the factors that make us
decide to use it.
9. Old infra stack
1. PostgREST
Providing an API
2. PostgreSQL
3. pg-amqp-bridge
Forward messages from db
to RabbitMQ
4. RabbitMQ
5. RabbitMQ consumer
Consume from RabbitMQ
and forward to OpenWhisk
6. Apache OpenWhisk
FAAS provided by IBM
10. Main challenges of the infrastructure stack
The multi-cloud approach with sparse resource among IBM and Google
make it very difficult to maintain. Most of the components of our system
require high maintenance and monitoring.
1. Hard to maintain.
2. Hard to replicate a new environment.
3. Logs management.
4. Costs/latencies.
5. Not highly available.
11. Hard to maintain - FAAS
We got attracted by the simplicity of OpenWhisk at first. FAAS with Docker
container is very attractive.
1. Big cold start
2. High error rate
3. Lack of log integration
4. Slow/Bad UI
5. Less mature
1. Faster cold start
2. Low error rate
3. CloudWatch integration
4. Good UI
5. Most mature FAAS
6. Online code editor
12. Hard to maintain - Pub/sub messaging
RabbitMQ as a default Pub/Sub mechanism between PostgreSQL and IBM
OpenWhisk.
1. Requires expertise
2. Custom consumers
3. Dashboard friction
4. Self-managed
1. Don’t require any expertise
2. Can forward a message to Lambda
3. Integrated in AWS Dashboard
4. Don’t need to manage any
infrastructure
13. Hard to maintain - Log system
13
Graylog works well at first but takes a lot of time for our engineers to manage.
1. Zero setup
2. SNS and autoscaling
integrations
3. Simple UI
4. Natively integrated in AWS
1. Self-hosted management
2. Need to integrate manually
3. Complex UI
4. More dashboards, users, billings to
control.
AWS CloudWatch
Cloudwatch can become expensive, $0.70 per GB. Setting alerts will reduce
The risk of surprises by the end of the month.
14. 14
Hard to maintain - Persistence
Graylog works well at first but takes a lot of time for our engineers to manage.
1. Easy setup
2. Backup, Point-in-time
recovery
3. Integrated monitoring with
CloudWatch
4. Multi-AZ deployment
5. One click read replicas
1. Self-hosted management
2. Need expertise to manage at scale
3. Cheaper than RDS
4. Manually manage upgrades, logs,
read replicas, etc..
15. 15
Hard to replicate a new environment
The process of replicating a full environment it takes nearly a week and
involves the expertise of different engineers.
From one week to 5 minutes using AWS CloudFormation . CloudFormation is
the infrastructure as code technology of AWS.
16. 16
Why we choose CloudFormation
16
We want to manage the state of our applications in a simple and declarative
way without the need of managing extra virtual machines or increasing
operational costs.
18. 1818
Benefits of using CloudFormation
1. Managing infrastructure by Git
Increase the easiness of controlling and
reviewing changes.
3. State management/rollback
When deploying a new version.
CloudFormation will rollback in case of
error.
4. Free
AWS does not charge for using
CloudFormation.
2. Improve the infra visibility
Developers can read the full
infrastructure in YAML files.
20. 2020
High Availability
Problem
Our infrastructure was not High Available.
Solution
Using Application Load balancer, Elastic Container Service, and EC2 Auto Scaling. Those
services together provide a holistic solution for deploying high available software
solutions. Also, RDS provide default options for high availability.
23. 2323
Architecting microservices in AWS
Microservices architecture is a service-oriented architecture that allows to
decouple monolith applications into smaller services.
1. Reduce deployment risk
By just deploying a small piece of the full
architecture.
3. Easier to understand
By reducing the complexity of the overall
project.
4. Easier to scale
Scale independently the mosts critical
microservices of the system.
2. Promote agility on deployment
Different teams using different microservices
can deploy separately.
24. 2424
Why do we need container orchestration on AWS?
● Horizontal scalability.
● Grouping related containers.
● Automatic failure detection and recovery.
● Seamless updates.
26. 2626
ECS - EC2 Container Service
● Docker container orchestration service by AWS.
● Operates on top of EC2.
● Built-in private docker registry.
27. 2727
Why use ECS?
● Security. Example: Assign IAM roles to Docker containers.
● Native integration with ELB and Auto Scaling.
● Kubernetes still not very mature on AWS.
● Service itself is free. Only pay for EC2 usage.
28. 2828
ECS Concepts
Cluster: A group of virtual machines optimized called container instances.
Task: A set of one or more related container. Containers within a task are
deployed on the same host.
Task definition: Template for a task defining all the docker features
accessible from docker run. Able to define CPU and memory limits. Able to
specify an IAM role.
29. 2929
ECS Concepts: Services
● An abstraction above tasks.
● Deploy multiple copies of the task definition.
● Maintain the desired number of running tasks.
● Integrated to the Application Load Balancer.
30. 3030
What is service discovery?
With service discovery. A mechanism which allows a client to find out the
location of the desired microservice.
Where is the microservice?
http://microservice.mysuperapp.com/
How does a client know where to send a request when a service run on
multiple nodes?.
31. 3131
Why do we need service discovery
● Cloud environment changes all the time.
● Autoscaling launch and terminate instances.
● Some instances could be under maintenance.
● Ip addresses and ports are assigned dynamically.
34. 3434
Application Load Balancer L7 in AWS
● Balance requests between different tasks of the same service.
● Perform health check to services.
● DNS and path-based rules for forwarding requests.
● Provide session stickiness
● Native integration to ECS
35. 3535
Autoscaling in AWS
How can we automatically scale an ECS Service based on load?
● Service Autoscaling scales the number of running ECS tasks for a
concrete service.
● Cluster Autoscaling scale the number of running container instances in
the cluster.
Both types rely on CloudWatch metrics as a trigger!
36. 3636
Autoscaling in AWS - Resource allocation
● Each container reserve a portion of CPU and memory on the container
instance that they run.
● The remaining capacity is shared among the other containers
● Resource allocation is defined in the task definition.
● Each container instances has 1024 CPU units per core. The units correlate
with the number of CPU cycles that our container will have access.
37. 3737
Autoscaling in AWS - Resource allocation
● Memory allocation
○ Soft limit. Amount is reserved for the container but may be exceeded if
capacity is available.
○ Hard limit. Container is killed when trying to exceed the reserved amount.
38. 3838
Autoscaling in AWS - Service autoscaling
.
● Adding more containers to handle an increasing load.
● Use CPU and memory usage for triggering scaling events.
● Do we have any computer power? Add more containers.
39. 3939
Autoscaling in AWS - Cluster autoscaling
.
● Adding more containers to accommodate an increasing number of
containers.
● Use CPU and memory reservation for triggering scaling events.
● Do we have room for more containers? Add more container instances.
40. 4040
Autoscaling in AWS - Summary
.
● Configure both Service and Cluster Auto Scaling.
● Scale services based on utilization.
● Scale clusters based on reservation.
● Leave some spare capacity on each host.
● Service Auto Scaling is much faster than Cluster Auto Scaling.
41. 4141
CI/CD with ECS
How to deploy applications to ECS and update them without service
disruption?
● ECS allow rolling updates to your running services.
● From CI/CD use AWS CLI or API to update a running service.
● We can specify which images to deploy in the task definition.
42. 4242
CI/CD workflow
● Checkout from version control to the CI/CD system.
● Build a new Docker image.
● Push the image to the ECR.
● Update the task definition & service.
● ECS updates the containers on the cluster.