Learn how to take your organization from manually tweaking and deploying servers and applications to automating the process, all the way from infrastructure to application code. In this session, we discuss how to structure teams to use DevOps, Service-Oriented Architecture, and Microservices. We evaluate the skill sets that are required for this and ways to attain or train employees to be sure that they have these skill sets. Customers who have gone through a transition to DevOps will discuss what the journey was like and lessons learned along the way. https://aws.amazon.com/government-education/
4. Background on SoundExchange
History
• Formed in 2000, a result of U.S. copyright legislation in the 90’s
• Became an independent organization in 2003
• Created by the industry for the industry; we are at the center of today’s digital music
industry
• 170 full-time employees headquartered in Washington, DC
Perform critical role in digital music world
• Sole U.S. entity to collect and distribute sound recording performance royalties for 3,000+
non-interactive internet radio, satellite radio, and cable television services
• In 2016, distributed approximately $884 million to recording artists and record labels
• To date, distributed more than $4.5 billion in royalties
At the forefront of music industry transformation to digital streaming
• We create and deploy innovative solutions to power the modern global music community
in order to pay creators transparently, accurately and efficiently
5. Our Technology Platform Transformation
• Monolithic core system
• Disjointed, siloed apps
• Traditional IT delivery
• Highly manual processes
• J2EE/Oracle DB stack
• On-premises infrastructure
Circa 2011
• Federated architecture and systems
• Service-oriented app integration
• Agile and DevOps-based delivery
• Highly automated processes
• Open source stack
• AWS cloud infrastructure
Circa 2016
6. SoundExchange Engineering in 2011
• Separate Tech Ops Team from Engineers
• Minimal build automation, manual deploy procedures
• Hand-rolled environments w/ environment “drift”
• Increasingly slow performance
• Difficult to triage stability issues
• Frequent fire-fighting
We were limiting business progress
7. In the past 6 years…
• Grown Technology group from 5 to 32 persons
• Adopted Agile, Open-Source, DevOps
• Adopted use of AWS Public Cloud
• Hired an incredible group of Engineers
• Rebuilt our Royalty Processing Platform
• Built several new systems on top of the Platform
We are a strong enabler of business progress
8. Principles and Practices
• Small Teams (“1-2 Pizzas”)
• Agile (Scrum & Kanban)
• Loose-coupling via APIs
• Lightweight architectures
• Continuously build and release
• Leverage existing services when possible
• Automation of tests (functional, performance, load)
• Resilient to outages with graceful degradation
9. How we define our “DevOps” Culture
• Engineers develop software and support it in Production
• DevOps Team develops capabilities to enable DevOps
• Results
• High system stability and quality
• Created culture that removes barriers and facilitates quality
• Enabled end-to-end problem thinking
• No opportunity to “throw it over the fence”
• Enabled experimentation, leading to better architectures
• Very efficient teams with low headcount needs
8 Dev Teams running 400+ servers with no O&M team
10. DevOps Team: Enabler of DevOps
DevOps
Team
Licensee
Team
Matching
Team
Repertoire
Team
Rights
Team
Distribution
Team
SXDirect
Team
11. DevOps Team: Creating DevOps Capabilities
Most capabilities should be extendable by Dev Teams
Capability Tools and Approaches
Change Management Git, Jenkins, Ansible, CloudFormation
Cloud Standards AWS Docs, AWS Training/Support, “Experience”
Platform Reliability Auto Scaling, Multi-AZ, SQS, Zone Evacuation
Monitoring of Components CloudWatch, New Relic, Pingdom
Security of the Platform Custom Scripts, CloudTrail, Trusted Advisor
Cloud Management CloudCheckr, Trusted Advisor, Custom Reports
12. DevOps Themes by Year
• Stabilized Legacy system
• Launched first system in AWS Public Cloud2012
• Developed initial Cloud Standards
• Selected Tools and created first Build Pipelines2013
• Adapting capabilities to fit with new dev projects
• Refactoring and paying down “tech debt”2014
• Training of Dev Teams on DevOps capabilities
• Enhanced Resiliency and Monitoring capabilities2015
• Most Dev Teams owning their “DevOps” capabilities
• Increased efficiencies (costs, environment build times)2016
• More granular security controls and protections
• Leveraging serverless and more AWS managed servicesPresent
13. Moving at the Speed of DevOps
In the last year, we’ve performed:
27,471 Continuous Integration Build/Deploys
5,495 Internal Testing Deployments
2,747 User Acceptance Test Deployments
686 Production Deployments
Compared to to just 50 builds/deploys in 2011
14. Before DevOps
• Uptime: varied, sub-90%
• Unplanned issues and outages
• Provisioning: Weeks/months
• Releases: Monthly
• Scalability: Low
• Focus: On “Tech Ops”
Moving at the Speed of DevOps
After DevOps
• Uptime (Avg): 99.97%
• Dependable deployments
• Provisioning: Hours
• Releases: Daily/Weekly
• Scalability: High
• Focus: On “The Business”
DevOps enables us to deliver what the business needs quickly,
efficiently, and with high quality and dependability
15. Decisions We Had To Make for DevOps
Decision Considerations
Fostering Collaboration What interest level and skills do we have on Dev teams to
enhance DevOps capabilities? How do we enable collaboration?
Effective Coordination How do we integrate our capabilities into Dev team roadmaps?
How do we roll out changes w/o breaking things?
Picking Tools Which tools for Version Control, CI/CD, Server CM, “Scripting”,
Monitoring, Security? How well do they fit with our tech stack?
Delivering Reliability What level of reliability does the platform need?
Do we / when do we need multi-region? How do we get there?
Service Monitoring What can we get “for free” with AWS? What else do we need?
How do we make it easy for Dev teams to add monitors?
Appropriate Security How much autonomy do we need right now vs controls?
How can we automate our security (pro-active vs reactive)?
18. DevOps - Scope and Goals
Repeatable, reliable, deployments
and testing
Lower labor costs by eliminating
manual touch points
Feedback from Operations
Increase Collaboration
Uptime > 99.9%
19. R&D Organizational & Culture Transformation
Strategy
Legacy Enterprise Software
Old technology skills
Agile’ish
Content driven releases
Geared to software delivery
Manual centric QA
Heavy manual deployments
Low feature velocity
Long Tenure
Cloud
Cloud and current web skills
DevOps function
Agile & metrics
Stand-alone sprint teams
Time boxed releases
Smaller accelerated feature
delivery
CI/automated testing
Load-performance testing
Automated
deployment/CD/DevOps
PresentQ1/15
Skills Review
Hire
Re-Train /
Study Groups
Consulting
Evangelize
Process/Org Change
Transform to skills and culture of a Cloud company in
2015
21. Building DevOps Culture
Hire
DevOps
Engineers
(Parallel)
Single DevOps team in R&D builds master templates & process
Breakout & embed DevOps with product groups to implement
Engage DevOps Consultants Utilize In-House Experience
Hand-off to in-house DevOps in each product group
Cross team DevOps SCRUM to keep standard/unblock issues
Automated deployment – Infrastructure-As-Code all products
Exit DevOps Consultants
22. DevOps On-boarding Process
Conduct high level overview of gap analysis for
products
Define the scope for the project
Conduct demo of the pipeline to the product teams
Develop a scope statement for each product.
Develop testing plan and acceptable testing
standards
Define criteria for “Done”
23. Full Pipeline Automation into production
Developers
Continuous
Build &
Integration
Automate
d Unit
Test
Fail Fast Pass
Auto
Deployme
nt
Continuous
Deploymen
t
Automated Unit &
Functional Test
QA
System
Applicatio
nCode
Infrastructure
asCode
FullStackundertest
Automated Delivery
Pass
Auto
Deployme
nt
Continuous
Deploymen
t
Automated
Security &
performanc
e Test
Staging
Pass
Auto
Deployme
nt
Continuous
Deploymen
t
Continuous
Monitoring
7x24 NOC
Productio
n
Fail Fast Fail Fast
Legacy Systems:
Infrastructure manually deployed and
maintained.
High labor open to human error
Security checks in production
Downtime during upgrades
Ellucian Systems:
Infrastructure code automatically
deployed and maintained.
Fully tested with App code
Repeatable and low labor
Security scans BEFORE production
Limit to no downtime during upgrades
(blue/green deployments) 98808 15
Phase 1:Build, Deploy, Test Phase 2:Operationalize
24. DevOps Maturity Assessment
less
than 20%
between
20% and 39%
between
40% and 59%
between
60% and 79%
80%
or greater
LEVEL LEVEL LEVEL LEVEL LEVEL
0 1 2 3 4
A N A G G R E G A T E A S S E S S M E N T
BASE BEGINNER INTERMEDIATE ADVANCED LEADER
Update
EvaluateMeasure
25. Source control used but
some items may not be
properly versioned. No
traceability from source to
binaries
Manual builds. Manual
dependency mgmt. Some
items not even fully
source controlled
Manual testing after
development
Manual processes for
deploying hardware and
software
Disparate logging and
reporting. Issues
discovered by customers.
LEVEL 1
All items, including
build/deploy scripts, in
source control ensuring
repeatable builds
Automated, repeatable,
builds. All items are
under source control
Able to support automated
testing during the
build/deploy
sequence. Clearly
tracked metrics showing
incremental improvements
in testing maturity
Some automation for
provisioning/deployment
but varies by
environment. Deployed
assets are tagged for
tracking.
Centralized logging
permitting operational
analytics.
LEVEL 2
Separate repositories in
use for infrastructure,
application, etc.
artifacts. Artifacts, in
binary repository, tagged
and fully traceable to
source
Automated builds include
integrated unit tests and
code coverage.
Clear acceptance criteria
for each story with
automated tests validating
acceptance. Increased
level of functional, non-
functional, and unit tests.
Deployment/provisioning
uses "Infrastructure as
code" and uses the
approved VPC
architecture
Adequate training,
feedback, monitoring and
preparation has been
completed to enable
Cloud Ops to
appropriately support the
application and meet
SLAs.
LEVEL 3
Formal branching
strategies, using best
practices, in use to
support release life-
cycles.
Continuous builds (CI)
with managed
dependencies. Metrics
tracked.
High-level of functional,
non-functional, and unit
test coverage including
integration testing for
related applications.
Consistent, automated
tools for
deploying/provisioning all
environments. Supports
smooth upgrades across
application versions.
Migration path to
production planned.
Reporting and billing
mgmt
centralized. Routine
activities by CloudOps
engineers are
automated. Disaster
Recovery plans in place.
LEVEL 4
Change management
procedures are actively
followed in production,
ensuring that DevOps
infrastructure definitions
are updated.
Automated fail-over,
disaster recovery for
production environments
in place.
Category CM Build Test Deploy Operations
LEVEL 0
LEVEL 1
LEVEL 2
LEVEL 3
LEVEL 4
DevOps Maturity Framework
26. Decentralized security with
weak security policies and
procedures in place.
No formal performance
monitoring
Manual database schema
and data
management. Manual
database server
deployment.
No deployment via
DevOps pipeline
LEVEL 1
Centralized security
monitoring and
escalations.
Performance monitoring
generates notification of
issues.
Automated db schema
management from source
control. Manual db server
deployments.
Pipeline deploys to staging
/ Testing
LEVEL 2
Data privacy issues are
tracked and mitigated
APM tools are used to
monitor and adjust
application performance.
Automated db server
deployments (e.g. AMI or
RDS instances)
Pipeline facilitates
automated creation of
development environments
LEVEL 3
Penetration tests are
utilized across production
environments.
Application scales across
multi-AZ/regions for
performance
characteristics. Performan
ce monitoring triggers
automatic scaling
and issue remediation.
Automated DB schema
and basic data updates
performed during
deployment using source
control artifacts
LEVEL 4
Production/preview
environments created from
assets that are promoted
from staging
Category Security Performance Database Environments
LEVEL 0
LEVEL 1
LEVEL 2
LEVEL 3
LEVEL 4
DevOps Maturity Framework
28. The Well-Architected Framework
Security:
The ability to protect information, systems, and assets while delivering business value
through risk assessments and mitigation strategies.
Reliability:
The ability of a system to recover from infrastructure or service failures, dynamically
acquire computing resources to meet demand, and mitigate disruptions such as
misconfigurations or transient network issues.
Performance Efficiency:
The ability to use computing resources efficiently to meet system requirements, and to
maintain that efficiency as demand changes and technologies evolve.
Cost Optimization:
The ability to avoid or eliminate unneeded cost or suboptimal resources.
Operational Excellence:
The ability to run and monitor systems to deliver business value and to continually
improve supporting processes and procedures.
29. Ellucian’s Culture Maturity
Mostly Lift-and-Shift into AWS
Very Little Test Coverage
Security Scans Ad-hoc
Sparse CI, No Real CD
Processes
New Node Deployments Man-
weeks: Manual
Refactoring Into Cloud-Native
Apps
Improved Automated Test
Coverage
Security Scans in DevOps
Pipeline
7000+ Jenkins Jobs Running
Daily
New Node Deployments ~4
Hours: Automated
Before After