Ellucian has been migrating its entire organization from a myriad of software delivery mechanisms, many of them manual, to a highly automated and advanced suite of DevOps tools. Using tools such as Jenkins, Terraform, and Ansible along with native AWS tooling, we have built a highly customized DevOps pipeline on top of the AWS platform. In this talk, we go over some of the challenges we have faced and also discuss our thoughts on the evolution of DevOps and the emerging patterns of managing AWS-based environments.
9. Prior to DevOps
Mostly lift-and-shift into AWS
Very little test coverage
Security scans ad-hoc
Sparse CI, no real CD processes
New node deployments took
man-weeks
Comparison
Current state
Refactoring into cloud-native apps
Improved test coverage
Security scans in DevOps pipeline
1500+ Jenkins jobs running daily
New node deployments took ~4
hours automated
12. Jenkins – orchestration layer
• Amazon EC2 Plugin allows Jenkins to spin up slaves
dynamically as needed.
• Folder per product team. Product teams restricted to their
folder based on AD Group.
• CloudBees Folder Plus Plugin allows us to constrain
projects/folders to specific slave pools. Separate slave pools
for different instance profiles for assume-role access.
13. Packer to create immutable AMIs
• Immutable AMIs ensure all tools and
components are included in the development
lifecycle.
• AMIs can be spun up in a different account
to audit software and licensing without direct
access to the product environment.
• New AMIs are rolled out by updating the
launch configuration in terraform.
14. Terraform for infrastructure as code
• Remote Amazon S3 state-file allows sharing of
resource values across modules and teams.
• Count, split, element design pattern to scale
resources.
• Jenkins serves as a middleware wrapper to
handle dynamic variables and configuration
across AWS accounts.
• Allows us to remain “cloud agnostic”.
15. Ansible for configuration management layer
• Ansible serves as standard format to write
and share server-level automation.
• Playbooks are pushed to S3 from Jenkins,
then downloaded from S3 and executed in
local mode in user_data to provision the
instance.
• PowerShell Desired State Configuration for
Windows platforms to adhere to the same
Ansible principles (push, pull, local).
16. Automated tests and scans
• Unit, smoke, and functional tests
ensure environment is operating as
expected.
• ServerSpec tests to validate
infrastructure is configured properly.
• Results proxied through bastion
server back to Jenkins for reporting
and tracking.
19. AWS resource tag management
• Lots and lots of teams.
• One central “billing” account.
• Defined “required” tags – but how do
you enforce this?
20. Some select child AWS resources do not support
a “tag flow down”.
Requirement: Automate a way to flow the
resource tags down from the parents to the
children.
Answer: Leverage AWS Lambda to regularly scan
the environment and copy tags from parent
resources to the appropriate child resources.
Resource tag “flow down”
21. AWS resource soft limits
Each resource class in each region of all of our accounts has a soft limit but:
• How can we monitor our soft limits?
• How can we automatically request an increase?
• How can we ensure production isn’t affected?
25. Self-healing CI/CD environment
• Deployment and configuration of
Jenkins pipeline is fully automated.
• Can seamlessly deploy to new regions
or recreate an existing environment.
EBS volume snapshot is taken and
reattached after recreating.
• Self-healing - Jenkins will recreate itself
based on certain Amazon CloudWatch
alarms.
Events
We are helping higher education by offering the software and services needed for a modern, connected campus.
Our cloud-enabled, integrated, mobile-first solutions help to enhance nearly every aspect of higher education.
1. We devote ourselves to 4 key customer value drivers that matter the most to customers:
Student success
Constituent experience
Operational efficiency
Institutional growth
2. We offer a portfolio with breadth and depth across the customer value drivers:
Sole focus is to help students achieve success [Recruiter, Degree Works, Banner Student]
Mobile-first intuitive user experience [Mobile, Faculty Advisor, Workflow, Portal]
Make your operations efficient and effective. [Banner HR, Finance, SIS, Document Mgmt., Colleague, Talent Mgmt.]
Reach the students best fit for your institution, including newer audiences. [Recruiter, CBE Brainstorm]
Ellucian Ethos platform supports the most complete portfolio of enterprise applications for higher education in the market today.
There are multiple ways to look at DevOps. We try to define it somewhat simply. We define it as a cultural aspiration where development, operations personnel, security architects, network engineers, etc. are working together with minimal barriers by advocating improved communication and collaboration to achieve increased agility, automation, security, and improved customer experience.
Or to put it more simply – people working together, with a common set of tools and goals to achieve the best possible customer experience.
DevOps is not a tool and while it has quickly become a marketing buzzword, in fact it is truly a cultural aspiration for organizations to undertake. In conjunction with setting up the tooling to provide a lot of the benefits of DevOps, organizations need to start embarking on the cultural journeys
In transforming your teams, many teams become cross-functional and span the traditional life cycles of software development. Often the proverbial ‘Development throws work over the wall to Operations’ is the starting point for refactoring the team makeup. But that is just the start, as the tooling and teams mature and shift more of the processed left in an automated fashion, the teams responsibilities become increasing blurred from well-defined silos of development, testing, operations, etc. into a mission statement that resonates with delighting customers with superior quality products.
To make DevOps work at scale in the enterprise, you cannot have a central team “doing DevOps” but rather have every team “doing DevOps”. It has to be a goal of the technology department that team members break out of their silos and become cross-functional gaining experience across the multi-faceted gamut of the technology stack. In fact much of the industry is also moving towards the notion of a ‘full stack developer’. In essence that is just a continuation of the DevOps culture branching out to redefine and disrupt the traditional models of application development. This same disruption is happening not only in the development arena but also in the operations and IT spaces.
At Ellucian, our mantra in regards to DevOps is to automate everything. Why? Well the cycles that we spend upfront automating our processes and functions will pay off hugely over time. Think of it this way, imagine if every week you spend ten mins doing a repeated task. That sounds like a low amount of a burden right? Compare that with an hour of time to automate the task. In six weeks you’ll have reclaimed that time but more importantly you won’t have to waste mental cycles doing a repeatable task manually. With that said, there needs to be a balance – while we want to strive to automate everything, we have to do it in practical means when scaling for the enterprise. In other words, to go back to our previous example if a team spends ten mins doing a repeated task but only on an irregular basis it perhaps makes less sense to expend the calories needed to get that particular task automated.
Could we get a picture of a toolchain in here??
Add amazon logos to the right hand side
Doing a folder per product team. Each product team is restricted to their folder based on AD Group. Utilizing CloudBees Folder Plus Plugin with controlled slave feature restrict where projects / folders can run. Different slave pools for dev / test and production allows us to restrict developers to dev and test environments only. Dynamic slave pools utilizing the Amazon EC2 plugin. Used to do CI and CD into our environments using cross-account role assumption into over 100 AWS accounts.
Talk about how we make this HA later in the talk
Add DSC – desired state configuration slide
Jesse to talk about benefits of agent-less
At Ellucian, we have dozens and dozens of accounts and are growing every day.
Requirement: Have a central cross-account management account and provide a CI/CD deployment pipeline that deploys to multiple accounts from a single account.
Answer: Leverage cross-account deployment via Jenkins with various management tasks in a single AWS account
Answer is janitor monkey
The global tag flow down script copies the CostCenter and Group tags from parent to child resources when the tags are empty.
EC2 instances -> EBS volumes
EBS volumes -> snapshots
AMIs -> Snapshots
We had tens of thousands of these untagged resources that were ‘fixed’ when we launched this script.
Answer is to do a page out and create a support ticket
We use a few different methods to achieve scheduling for our AWS resources. Focusing in on our EC2 instances, in our development and non-production environments we will define autoscale groups that will automatically scale up and down based on our team’s office hours. This created a huge amount of cost savings for our teams. We also leverage jenkins orchestrations to spin entire environments up and down on a scheduled basis depending on individual application needs. We have also begun exploring Lambda functions to accomplish this as well.
The inability to schedule other AWS resources specifically RDS is a current pain point that would help us realize additional cost optimizations for several of our environments.
Enables self-healing practices
Zero Downtime Deployments
We dogfood our own devops tools – meaning we are self-healing our devops toolchain and leading by example.
Deployment and configuration of Jenkins pipeline is fully automated.
Can seamlessly deploy to new regions or recreate an existing environment. EBS volume snapshot is taken and reattached after recreating
Self Healing; Jenkins will recreate itself based on certain cloudwatch alarms.
The Ellucian DevSecOps pipeline has been designed to verify that we are building secure applications. We use a layered approach of application security best practices to identify as many vulnerabilities as possible through automated mechanism. We have validated the design of our DevSecOps pipeline through an internal vetting process as well as a review by outside consultants from Cigital.
Dynamic application security testing, or DAST, performs automated black box penetration testing against running applications in a manner similar to the way a hacker might attempt to penetrate them. We are using two tools, open source Arachni and commercial HPE WebInspect to perform DAST within the DevSecOps pipeline. Using two tools give us better coverage by allowing us to verify the security of our applications through two different lenses. The Arachni tool is meant to be used on every build while the WebInspect tool should be used once prior to each release or the end of a sprint.
The OWASP Dependency-Check is used to examine open source libraries and dependencies and compare those against a database of known vulnerabilities. This helps us identify dependencies with known vulnerabilities at every build and take action to mitigate early in the development cycle.
We are currently evaluating static application security testing (SAST) tools and expect to complete procurement of a tools in Q2. The tool selected will complement our DAST tool by providing a white box approach to testing via direct examination of application source code. This provide yet another layer and ensure we are examining the entirety of our code based for vulnerabilities and identifying vulnerabilities even earlier in the development cycle.
Later this year, we will further enhance our capabilities through the implementation of tools such as an infrastructure vulnerability scanner. We are also investigating ways to further enhance our security posture through new agent based technologies such as Interactive Application Security Testing (IAST) and Runtime Application Self Protection (RASP) as well as AWS tools like Inspector.
The data from all of our tools in the pipeline will be fed into Threadfix vulnerability management system. Threadfix provides us with a single integrated view into the findings from all of our tools and allows for integration of the DevSecOps pipeline directly into the product backlogs for each team.