5. High Availability Focus
Keep apps and services running in a performant,
reliable and recoverable manner with timely error
detection
1. Application Instances
2. Platform Processes
3. Platform VMs
4. Availability Zones
Keep Cloud Foundry running in a performant, reliable
and recoverable manner with timely error detection
6. HA Deployments
Data Center Data Center
vs
Single Foundation
Deployment
Dual Foundation
Deployment
Data Center
AZ AZ
RDS
7. WHAT IF I TOLD YOU
IT’S POSSIBLE TO SANELY
STREACH LAYER 2
8. User Targets
myapp.mycf.com
DNS
Resolution
NSX Boundary NSX Boundary
VIP VIP
SSL Termination
SSL Termination
DNS Global Traffic Management (GTM)
HA ProxyHA Proxy
LTM ApplianceLTM Appliance
HA ProxyHA Proxy
LTM Appliance LTM Appliance
14. HA Deployments
Data Center Data Center
vs
Single Foundation
Deployment
Dual Foundation
Deployment
Data Center
AZ AZ
RDS
15. Customer Requirements
• AWS with One VPC
• Specific IP Ranges
• Using their internal corporate DNS
• no ELBs or Route 53 due to security setup
• Multiple Deployments of Cloud Foundry
• Availability Requirements:
• App uptime
• Failure matrix for downtime situations 15
16. 16
HA Proxy HA Proxy
Bind DNS
CF Router CF Router
HA Proxy HA ProxySSL Termination
17. Who does the deployment need to
be highly available for?
• Users
17
• Developers
• Operations
18. Any non-critical jobs?
• clock_global
• used to clean up cc jobs.
• Rely on Resurrector?
• Redeploy to a different AZ by changing
the resource_pool
18
21. Caveats with this design
• Single points of failure?
• DNS
• Bosh
• Jumpbox
• Human interaction required in outage
• Bind DNS does not do health monitoring.
Monitoring scripts were outside the scope
of the engagement. 21
22. 22
AZ 2 Private Subnet
Customer
Managed
Interstate Data
Center
VPC
10.202.64.0/19
AZ 1 Private Subnet Bosh Subnet
jumpbox
CF SG
Direct
connect
Bosh SG
login
uaa
bosh
router
dea cc
natshealth etcd
doppler
cc
worker
loggregator
traffic
controller
clock
RDS Subnet
RDS SG
boshdb
uaadb
ccdb
apps
manager
router
bind dns
Customer Managed
NAT
bastion
ha
Proxy
ha
Proxy
ha
Proxy
ha
Proxy
router
router
login
uaadea cc
natshealth etcd
doppler
cc
worker
loggregator
traffic
controller
AZ 1
AZ 2
23. How We Deployed Services
• Proxy is a Single Point of
Failure
• No Load Balancer to use
• Acceptable by customer in
failure matrix 23
Proxy Server
Server
App
Proxy
Proxy
24. Best Practices for Services
24
• By Default the service
binding uses the first
proxy address only
Proxy
Proxy Server
Server
Server
App
Load
Balancer
25. Which Deployment
25
Data Center Data Center
Dual Foundation
Deployment
Single Foundation
Dual AZs
Data Center
Single Foundation
Single DC
Data Center
AZ AZ
RDS
30. Restrict Containers
• Cloud Foundry
• Application Security Groups
• dea network properties
• (allow_networks, deny_networks)
30
31. Pivotal Cloud Foundry for AWS 1.4
31
VPC
10.0.0.0/16
RDS Subnet
Private Subnet
Public
Subnet
Ops
Manager
Elastic Runtime SG
ELB
Internet
Gateway
NAT SG
Ops Manager SG
RDS SG
login
uaa micro
router
vpc
all
NAT
restricted ip
80, 443, 22*
dea
Common traffic flow
sg allow rules
cc
natshealth etcd
doppler
cc
worker
loggregator
traffic
controller
clock
boshdbuaadb ccdb
apps
manager
db
autoscaling
ELB SG
80?,443
vpc
all
vpc
all
was it just DEAs that used NAT?
32. Limit Scope if Compromised
• Different user/pass for each component
• Strong passwords (and usernames)
• 20 Characters Long
• RANDOM
• Both Cases
• best avoid special characters
• eg: YxLIodYrUBQJrvMRYSQL
• Avoid cloud cow 32
http://vanmethod.deviantart.com/art/Purple-‐Cow-‐on-‐a-‐Cloud-‐146265642
33. Limit Scope if Compromised
33
Runner
UAA
Login
uaadb
mySql App
Data
34. Post Breach Security Measures
• Roll
• AWS Credentials
• Username and password (Manifest)
• PEMs
• Investigate:
• Vm Logs (stored in Splunk / CloudWatch Logs)
• Bosh and Login Audit Trail
• Isolate the VM for investigation
• Resurrector will resurrect a non compromised VM
• Feedback:
• Incident Reports and Management Support 34
35. Paranoid Level Security for AWS
• Cloudtrail
• Alerts
• Audit Logs
• Rollback’
• Remove ability to delete
• s3 buckets
• subnets / vpc
• backups
• Everything else can be recovered from a backup… 35
43. Restoring Bosh With PCF
Export
Configuration Import
Configuration
:/var/tempest/workspaces/default/deployments/micro
BOSH
Director
+ bosh.yml
44. Restoring Bosh Manually
BOSH
BOSH DB
bosh.yml
pg_dump /var/vcap/store
/dev/xvda
/dev/sdb
/dev/sdf
Volume:
BOSH DB
External MySQL
Blobstore
45. Critical Databases
Backup Cloud Controller DB Encryption Credentials
Locate Databases Info From Deployment Manifest
bosh download manifest cf-c700aee17d9f801eb152 cfmanifest.yml