When you run a complex AWS environment with thousands of Amazon EC2 instances, more than half a petabyte of object storage, and support the largest daily newspapers in the UK, you need a world-class cloud management strategy. For companies like News Corp, implementing policies that automate infrastructure schedules, right-size workloads, and manage and modify reservations is critical. As you scale your cloud infrastructure, defining centralized governance rules while enabling decentralized management is key to running an optimized cloud.
This session is designed for advanced operations, infrastructure, and engineering teams to improve/deploy optimization strategies. It covers the five best cloud management practices, including automating Reserved Instance modifications, setting policies to ensure proper tagging, and scheduling lights-on/lights-off policies. Session sponsored by CloudHealth Technologies.
2. Presenters
Joe Kinsella, CTO & Founder
CloudHealth Technologies
@joekinsella
Iain Caldwell, Head of Infrastructure
News UK & News Corp EMEA
@caldi100
3. What to expect from this session
• Overview of News Corp’s use of AWS
• Why governance is critical to cloud success
• How to drive a governance strategy
• 5 best practices
4. News Corp strategy
• CTO set objective to reduce data centre footprint and associated costs
• Host 75% of estate in the public cloud within next three years.
• News UK currently running at 69% aiming to make 75% by July 2017
• Before we started in 2011 we built our AWS Cloud data centre
• Ran a global application assessment for cloud readiness across all BUs
• Digital estate was the main contender for cloud – web-based
applications, mobile applications, test, and dev
• Migrate our enterprise systems to the cloud over past 2 years
• Traditional newspaper, finance, and monitoring applications etc.
7. What is cloud governance?
• Process to ensure secure, effective,
& efficient use of IT resources
• Includes compliance to policies
& best practices
• Covers cost, security,
availability, performance, & usage
8. Governance needs…
• Brand protection
• Cost control
• Management of business risk
• Compliance to policies &
standards
Why governance matters: A balancing act
Agility drives…
• Quick time to market
• Innovation
• Flexibility
9. The challenge of cloud governance
• Rapid pace of change
• Powerful cloud services/features
• Consumption-based pricing
• IT often influencer/auditor, not owner
• Decentralized management
• Disparate management tools
• Requires integration of multiple products & sources of data
10. Common cloud governance issues – News Corp
• No tagging
• Reluctance to invest in Reserved Instances
• Reserved Instances underutilised
• No rightsizing
• ELB left unused
• EBS volumes left unattached
• RDS instances with no active connections
• S3 storage exponential growth
• PoC and dev environments created and left
• Not shutting dev environments down at night
11. The unique challenge to the enterprise
• Ownership increasingly distributed to lines of
business that increasingly:
• Control infrastructure supporting their
businesses
• Go “rogue” to get around IT and achieve
business agility
• Do not taking into account importance of
governance, compliance, risk management
• IT increasingly influencer/auditor instead of owner
12. Where to start
• Establish a strategy & obtain stakeholder buy-in
• Evaluate & implement tool strategy
• Identify deliverables by stakeholder
• Implement, rinse, & repeat
13. Establish strategy
• Implications of competing priorities
• Digital teams require agility – speed of
products to market, embrace innovation
• Enterprise teams need to control costs,
preserve security and adhere to
governance, attract and retain good people
• What’s needed from a people perspective
• Acquiring and maintaining talent
• A focus on cloud consumption & usage
• Develop best practices
• Cloud steward
Agility Governance
14. Team
lead
Operations
Finance Engineering
LOBs
• Business group definition & implementation
• Tagging, naming conventions, metadata, etc.
• Data integrations
• Cost, budget, assets, configuration,
performance, security
• Report definitions and delivery
• Policy definition and implementation
• Analysis, recommendations, & optimization
actions
• Capacity planning, modeling, & forecasting
• Service-level reporting
Cloud steward:
Responsible for ongoing cloud optimization & governance
OPERATIONS
16. Confidential
CEO
Global CIO Eng
Eng DevOps IT Ops
Cloud
Ops
CFO
FP&A
Fin
Analyst
LOB A
Eng DevOps IT Ops
Cloud
Ops
LOB B
Eng DevOps IT Ops
Cloud
Ops
Product & Function
Production Web
Development App
QA DB
Staging Storage
P&L & Department
OPEX/COGS
Product
Function
Customer
Business Unit
Product
Function
Customers
Business Unit
Product
Function
Customers
Perspectives
Cost Pulse
Health Check Pulse
RI Utilization Pulse
Cost by Group
Usage by Reservation Type
Reservation Modifications
Usage by Instance Type
Instance Rightsizing
Volume Rightsizing
Cost Pulse
Health Check Pulse
RI Utilization Pulse
Cost by Group
Usage by Reservation Type
Cost Pulse
Health Check Pulse
RI Utilization Pulse
Cost by Group
Usage by Reservation Type
Reservation Modifications
Usage by Instance Type
Instance Rightsizing
Volume Rightsizing
Cost Pulse
Health Check Pulse
RI Utilization Pulse
Cost by Group
Usage by Reservation Type
Reservation Modifications
Usage by Instance Type
Instance Rightsizing
Volume Rightsizing
Subscription
s
Over Budget
Purchase Reservations
Modify Reservations
Underutilized Instances
Unattached Volumes
Snapshot Aging
Untagged Assets
Start / Stop Instances
Over Budget
Modify Reservations
Purchase RI’s
Cost Per Group
Over Budget
Purchase Reservations
Modify Reservations
Underutilized Instances
Unattached Volumes
Snapshot Aging
Untagged Assets
Start / Stop Instances
Over Budget
Purchase Reservations
Modify Reservations
Underutilized Instances
Unattached Volumes
Snapshot Aging
Untagged Assets
Start / Stop Instances
PoliciesStakeholders Identify deliverables by stakeholder
BestPractices
17. Rinse & repeat: Continued improvements
• Enforced tagging – EC2, RDS, ELB,
EBS & Auto Scaling groups – delete
new instance if not tagged <15mins
• Daily cleanup:
• Delete EC2 instances shut down
for >=5 days
• Delete ELB no traffic >=5 days
• Delete EC2 no traffic >=5 days
18. Governing cost management: The total picture
• Right-size our current estate
• Invested in Reserved Instances
• Decommissioned what we didn’t need
• Implemented automation where possible
- CloudFormation & Chef/Puppet for us
• Implemented good governance – tagging
and service transition, including change
control – in progress
• Use the AWS Trusted Advisor service
19. Governing security management: Key requirements
• Security groups - NACLs reviewed and
updated to allow specific access.
• IAM roles - Groups created and applied to
instance. Functions and actions restricted.
• Networking - All ports closed. Open only what
is required.
• Users not active in News are removed.
• Antivirus set up on EC2 Windows instances
automatically.
• IAM users audited and user access modified.
20. Success criteria: The key metrics
• Architectural – adherence to standards/controls
• Cost – efficiency & lifecycle management, TCO, ROI
• Asset – adherence to configuration standard
• Security – compliance to best-practice configuration
• Adoption – rate of adoption
21. What’s next for governance
We need the equivalent of DevOps for cloud management
• Processes
• Set of roles
• Tooling
• Shared standards
22. 5 best practices
Empower a centralized owner that
delivers real value to stakeholders
Don’t give up on agility
Create partnerships with strategic
vendors
Establish high-value policies
Automate, automate, automate