Learn about Sony's efforts to build a cloud-native authentication and profile management platform on AWS. Sony engineers demonstrate how they used AWS Elastic Beanstalk (Elastic Beanstalk) to deploy, manage, and scale their applications. They also describe how they use AWS CloudFormation for resource provisioning, Amazon DynamoDB for the main database, and AWS Lambda and Amazon Redshift for log handling and analysis. This discussion focuses on best practices, security considerations, tradeoffs, and final architecture and implementation. By the end of the session, you will clearly understand how to use Elastic Beanstalk as a platform to quickly and easily build at-scale web application on AWS, and how to use Elastic Beanstalk with other AWS services to build cloud-native applications.
2. What to expect from the session
You will learn how to use AWS Elastic Beanstalk:
• As a platform to easily build customized web application at scale on
AWS.
• To seamlessly build cloud-native applications with other AWS
services.
10. Achievement - robustness
Before AfterItem
Access surges impact Unstable or down No impact
IaaS trouble impact Service damage No impact
Emergency operation Auto recover/healing
Related service down Affecting an entire system Minimum impact
11. Achievement - efficiency
Before AfterItem
Config management Manual Git (Infrastructure as Code)
7+ self-managed
services
0Infra for management
Scaling Not flexible Auto Scaling
15. System overview
Authentication and profile management system - 1
Public
PublicPrivatePublic
PrivatePublic
AZ-2
us-west2
AZ-1
NAT
NAT
HA
Service Providers
NATAPI
NATAPI
S3
Data Pipeline
Batch
EC2
Resource
Batch
Config
Log
Backup
Profile
DB
DynamoDB
API Call DynamoDB/S3
Route53
Third party
Authentication
Services
16. System overview
Authentication and profile management system - 2
Public
PublicPrivatePublic
PrivatePublic
AZ-2
us-west2
Route53
AZ-1
S3
Service Providers
API Call DynamoDB/S3
Data Pipeline
Batch
EC2
Resource
NAT
NATAPI
NATAPI
NAT
Batch
Config
Log
Backup
Profile
DB
DynamoDB
HA
Third party
Authentication
Services
17. us-west2
System overview – CloudFormation
Base layer
Public
PublicPrivatePublic
PrivatePublic
AZ-2
AZ-1
S3
NAT
NAT
Profile
DB
Dynamo DB
CloudFormation
HA
19. Continuous delivery system
Code Repository
Development
Push Code
3 Build
Kick off
4 Unit Test
5 Push Image
6 Provision & Deploy
7 Sanity Test
Result
Delivery system without self-managed infrastructure
1
2
3
4
6
7
8
Development
QA5 Integration Test5
Get Image
Production
20. Throttling and Circuit Breaker
Self-defense for robustness
Throttling Circuit Breaker
APIs
Throttling Circuit Breaker
Third party
Authentication
Services
28. Auto Scaling based on custom metric
• Custom Metric via Data Pipeline
AppApp
Alarms
ELB Metrics
ELB Metrics
CloudWatch Data Pipeline
Auto Scaling group
Custom Metric
(Successful Response Rate per Instance)
32. High availability for application
• Zero downtime deployment
• Auto healing based on deep health check
• Disk space shortage prevention
33. Zero downtime deployment
Auto Scaling group
• Rolling deployments
• Update application instances one by one
Batch
Batch
Batch
App
Working
App
Working
App
Working
34. Zero downtime deployment
Auto Scaling group
• Rolling deployments
• Update application instances one by one
Batch
Batch
Batch
App
Working
App
Working
App
Updating
35. Zero downtime deployment
• Rolling deployments via .ebextensions
option_settings:
"aws:elasticbeanstalk:command":
BatchSizeType: Fixed
BatchSize: 1
37. Zero downtime deployment
• Rolling updates
• Dynamic batch size
Auto Scaling group
MinSize 2
MaxSize 10
Batch
Batch
App
Working
App
Working
App
Working
App
Working
Increased by
scaling out
38. Zero downtime deployment
• Rolling updates
• Keep the number of in-service instances
Auto Scaling group
MinSize 2
MaxSize 10
Batch
Batch
App
Working
App
Working
App
Working
App
Working
New
Launching
New
Launching
39. Zero downtime deployment
• Rolling updates
• Keep the number of in-service instances
Auto Scaling group
MinSize 2
MaxSize 10
BatchApp
Working
App
Working
New
Launching
New
Launching
BatchNew
Working
New
Working
App
Terminating
App
Terminating
40. Zero downtime deployment
• Rolling updates via .ebextensions
option_settings:
"aws:autoscaling:updatepolicy:rollingupdate":
RollingUpdateEnabled: true
MaxBatchSize: <num of running instances> / 2 # eg.) 2
MinInstancesInService: <num of running instances> # eg.) 4
41. Zero downtime deployment
Tradeoff
• Rolling deployments/updates
Definite app version switching
Low tolerance to deployment failure (rolling deployments)
42. Zero downtime deployment
Tradeoff
• Rolling deployments/updates
Definite app version switching
Low tolerance to deployment failure (rolling deployments)
• CNAME swap
High tolerance to deployment failure
DNS propagation
43. Zero downtime deployment
Tradeoff
• Rolling deployments/updates
Definite app version switching
Low tolerance to deployment failure (rolling deployments)
• CNAME swap
High tolerance to deployment failure
DNS propagation
44. Auto healing based on deep health check
• Deep health check
• Accuracy of system time
• Accessibility to main database (DynamoDB)
45. Auto healing based on deep health check
• Deep health check configuration via .ebextensions
option_settings:
"aws:elasticbeanstalk:application":
"Application Healthcheck URL": /1/status
"aws:elb:healthcheck":
Interval: 15
Timeout: 10
HealthyThreshold: 3
UnhealthyThreshold: 3
46. Auto healing based on deep health check
• Auto healing configuration via .ebextensions
Resources:
AWSEBAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
HealthCheckType: ELB
47. Auto healing based on deep health check
Rolling deployments with auto healing configuration
Problem
• Unexpected instance termination caused by Elastic Beanstalk
48. Auto healing based on deep health check
Rolling deployments with auto healing configuration
Problem
• Unexpected instance termination caused by Elastic Beanstalk
Workaround
• Suspend HealthCheck process in AWSEBAutoScalingGroup
during rolling deployments
49. Disk space shortage prevention
• Docker image local cache size
0%
20%
40%
60%
80%
100%
1 2 … n
Free
Docker Image Local Cache
System
Rolling Deployments
DiskUsage
Pulling new layers
51. Disk space shortage prevention
• Docker container log size
• Container logs captured by Elastic Beanstalk
• /var/log/eb-docker/containers/eb-current-app/*-stdouterr.log
• Original container logs
• /var/lib/docker/containers/<cid>/<cid>-json.log
52. Disk space shortage prevention
• Docker container log size
• Container logs captured by Elastic Beanstalk
Rotated
• Original container logs
Keeps growing in size
53. Disk space shortage prevention
• Docker container logs truncation via .ebextensions
files:
"/etc/cron.hourly/cron.logtruncate.docker.json.log.conf":
mode: "000755"
owner: root
group: root
content: |
#!/bin/sh
# truncate docker container logs here.
# see appendix for the actual script implementation.
...
54. High availability for NAT
• NAT instance in AutoScalingGroup
• Periodic route table monitoring
55. NAT instance in AutoScalingGroup
• Static resources created via CloudFormation
Public Subnet
Public Subnet
Private Subnet for Apps
Private Subnet for Apps
AZ-2
AWS Region
AZ-1
tag:NetworkSegment NAT-A
tag:NetworkSegment NAT-B
Internet
MinSize 1
MaxSize 1
MinSize 1
MaxSize 1
56. NAT instance in AutoScalingGroup
• Dynamic NAT instances
Public Subnet
Public Subnet
Private Subnet for Apps
Private Subnet for Apps
AZ-2
AWS Region
AZ-1
NAT
Pending
NAT
Pending
tag:NetworkSegment NAT-A
Public IP
Internet
tag:NetworkSegment NAT-B
Public IP
tag:NetworkSegment NAT-A
tag:NetworkSegment NAT-B
AutoScalingGroup launches
new NAT instance.
57. NAT instance in AutoScalingGroup
• Dynamic NAT instance configuration via cloud-init
Public Subnet
Public Subnet
Private Subnet for Apps
Private Subnet for Apps
AZ-2
AWS Region
AZ-1
NAT
Running
NAT
Running
tag:NetworkSegment NAT-A
Elastic IP
Internet
tag:NetworkSegment NAT-B
Elastic IP
tag:NetworkSegment NAT-A
tag:NetworkSegment NAT-B
Disable SRC/DST check,
Assign Elastic IP, etc...
58. NAT instance in AutoScalingGroup
• Route table lookup
Public Subnet
Public Subnet
Private Subnet for Apps
Private Subnet for Apps
AZ-2
AWS Region
AZ-1
NAT
Running
NAT
Running
Internet
New NAT Instance looks up
route tables based on tag.
tag:NetworkSegment NAT-A
tag:NetworkSegment NAT-B
tag:NetworkSegment NAT-A
Elastic IP
tag:NetworkSegment NAT-B
Elastic IP
59. NAT Instance in AutoScalingGroup
• Dynamic route configuration
Public Subnet
Public Subnet
Private Subnet for Apps
Private Subnet for Apps
AZ-2
AWS Region
AZ-1
NAT
Running
NAT
Running
tag:NetworkSegment NAT-A
tag:RoutingStatus OK
tag:NetworkSegment NAT-B
tag:RoutingStatus OK
Internet
tag:NetworkSegment NAT-A
Elastic IP
tag:NetworkSegment NAT-B
Elastic IP
60. Periodic route table monitoring
• Running normally
Public Subnet
Public SubnetPrivate Subnet
Private Subnet
AZ-2
AWS Region
AZ-1
NAT
Running
NATApp
NATApp
NAT
Running
tag:NetworkSegment NAT-A
tag:RoutingStatus OK
tag:NetworkSegment NAT-B
tag:RoutingStatus OK
0.0.0.0/0 Active
tag:NetworkSegment NAT-A
Internet
0.0.0.0/0 Active
tag:NetworkSegment NAT-B
NAT Instances monitor route tables
located in different AZs periodically.
61. Periodic route table monitoring
• Black hole route detection
Public Subnet
Public SubnetPrivate Subnet
Private Subnet
AZ-2
AWS Region
AZ-1
NAT
Terminated
NATApp
NATApp
NAT
Running
tag:NetworkSegment NAT-A
tag:RoutingStatus OK
tag:NetworkSegment NAT-B
tag:RoutingStatus OK
0.0.0.0/0 Black Hole
tag:NetworkSegment NAT-A
Internet
0.0.0.0/0 Active
tag:NetworkSegment NAT-B
Healthy NAT Instance detects
blackhole internet route.
62. AWS Region
Periodic route table monitoring
• Outbound traffic takeover
Public Subnet
Public SubnetPrivate Subnet
Private Subnet
AZ-2
AZ-1
NAT
Terminated
NATApp
NATApp
NAT
Running
tag:NetworkSegment NAT-A
tag:RoutingStatus TakenOver
tag:NetworkSegment NAT-B
tag:RoutingStatus OK
Internet
0.0.0.0/0 Active
Healthy NAT Instance takes
over outboud traffic to internet.
tag:NetworkSegment NAT-A
tag:NetworkSegment NAT-B
63. AWS Region
Periodic route table monitoring
• Outbound traffic takeover
Public Subnet
Public SubnetPrivate Subnet
Private Subnet
AZ-2
AZ-1
NAT
Terminated
NATApp
NATApp
NAT
Running
tag:NetworkSegment NAT-A
tag:RoutingStatus TakenOver
tag:NetworkSegment NAT-B
tag:RoutingStatus OK
Internet
0.0.0.0/0 Active
NAT
Pending
tag:NetworkSegment NAT-A
AutoScalingGroup launches
new NAT instance.
tag:NetworkSegment NAT-B
64. AWS Region
Periodic route table monitoring
• Route table lookup
Public Subnet
Public SubnetPrivate Subnet
Private Subnet
AZ-2
AZ-1
NAT
Terminated
NATApp
NATApp
NAT
Running
tag:NetworkSegment NAT-A
tag:RoutingStatus TakenOver
tag:NetworkSegment NAT-B
tag:RoutingStatus OK
Internet
0.0.0.0/0 Active
NAT
Running
tag:NetworkSegment NAT-A
tag:NetworkSegment NAT-B
New NAT Instance looks up
route tables based on tag.
65. AWS Region
Periodic route table monitoring
• Outbound traffic recovery
Public Subnet
Public SubnetPrivate Subnet
Private Subnet
AZ-2
AZ-1
NAT
Terminated
NATApp
NATApp
NAT
Running
tag:NetworkSegment NAT-A
tag:RoutingStatus OK
tag:NetworkSegment NAT-B
tag:RoutingStatus OK
tag:NetworkSegment NAT-B
Internet
0.0.0.0/0 Active
NAT
Running
tag:NetworkSegment NAT-A
New NAT Instance recovers
internet route.
0.0.0.0/0 Active
66. Periodic route table monitoring
Network capacity planning for NAT instances
• Need to consider total amount of outbound traffic coming
from application instances across Availability Zones
71. Source IP address whitelisting
Fill required properties in security group for ELB
via .ebextensionsResources:
AWSEBLoadBalancer:
Type: AWS::ElasticLoadBalancing::LoadBalancer
Properties:
Tags:
- { Key: IPWhitelistGroup, Value: Group1 }
AWSEBLoadBalancerSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: "Load Balancer Security Group"
VpcId: { "Fn::GetOptionSetting" : { "OptionName" : "VPCId" } }
Tags:
- { Key: IPWhitelistGroup, Value: DefaultGroup }
Specifying GroupDescription and VpcId is also required
in order to modify AWSEBLoadBalancerSecurityGroup
resource via .ebextensions.
72. Connection/request throttling
• Throttling per client (source IP address)
Amazon Linux
Docker Container
App
APIs
Internal
Service
External
Services
Over Limit
Over Limit
Third party
Authentication
Services
91. Trouble-less eight months in production with
Elastic Beanstalk
• Flexibility
Satisfy customization needs
• Reliability
No major problems
• Simplicity
Simplified DevOps
96. Sony open source software
• gobreaker
• Go implementation of circuit breaker
• Available on GitHub
• https://github.com/sony/gobreaker
• Feel free to submit pull requests and raise issues on the
GitHub project
97. Sony open source software
• Sonyflake
• Go implementation of distributed unique ID generator
• Available on GitHub
• https://github.com/sony/sonyflake
• Small utility for AWS (VPC) included
• Example running on EB provided
• Feel free to submit pull requests and raise issues on the
GitHub project
100. Auto Scaling design
Scale out timing chart
Execute Policy
Running
In ServiceOut of Service
App Startup
ELB Determination
Health Check Grace Period
Deployment
In Service Dead Line Resume Auto Scaling
EC2 State
ELB Instance State
Cooldown Period (scale out policy)
Register Instance
Pending
Auto Scaling
Timers
* in the case of HealthCheckType: ELB
101. Auto Scaling design
Scale out timing parameters
Execute Policy
Running
In ServiceOut of Service
App Startup
45 ELB Determination
HealthCheck Interval x HealthyThreshold
Health Check Grace Period 600
Deployment
In Service Dead Line Resume Auto Scaling
Margin 300
Margin for
Balancing & Metric
EC2 State
ELB Instance State
Cooldown Period (scale out policy) 900
300 avg.
15 3
300
Register Instance
Pending
Auto Scaling
Timers
* in the case of HealthCheckType: ELB
102. Examples
• Elastic IP association via cloud-init
#!/bin/bash
REGION=$1
EIP_ALLOCATION_ID=$2
INSTANCE_ID=$(curl --silent http://169.254.169.254/latest/meta-data/instance-id)
while true; do
INSTANCE_STATUS=$(aws --region "${REGION}" --output text
ec2 describe-instance-status
--instance-ids "${INSTANCE_ID}"
--filters Name=instance-state-name,Values=running)
if [[ $? = 0 && "${INSTANCE_STATUS}" != "" ]]; then
aws --region "${REGION}" --output text
ec2 associate-address --instance-id "${INSTANCE_ID}"
--allocation-id "${EIP_ALLOCATION_ID}" && break
fi
sleep 5s
done
103. Examples
• Elastic IP association via cloud-init
• associate-address command fails if the instance is still in
pending state
• Need to wait for the instance to become running state before
executing associate-address command