Amazon uses Java heavily and in 2016 we realized that we cannot rely on a binary distribution. We have thousands of Java applications running in production. Quarterly releases of the JDK often introduced subtle breaking changes with potential widespread impact. Critical bugfixes will take a few months from support. At the same time there is urgency to put these releases into production as soon as they come out to address security. But a binary distribution precluded making fixes in the JVM or the libraries. So, we decided to build Java from source to address constraints and requirements for our use.
Corretto team was charged with building from source. But one of our challenge is to have these thousands of teams to adopt it. These teams were using Oracle JDK and so our first tenet was to make Corretto compatible with Oracle JDK. We built compensating changes that made it drop-in compatible. That means that a team that had to adopt it had to do nothing.
Corretto was certified using the Java Technical Compatibility Kit (TCK) to ensure it meets the Java SE standard. This was a critical factor for making it drop-in compatible as well.
We have a wide variety of operating systems running at Amazon. There is Linux and Windows for deployment, macOS for development and Docker images for containerized development. All of these platforms had to be supported.
And lastly, these internal customers wanted long-term support so that they can continue running on JDK8 and other versions without worry. So we offered LTS to them.
Amazon has 14 leadership principles. The most important of all is Customer Obsession. 90-95% of our roadmap is driven by customer needs. And we treat internal and external customer about the same.
All of our external customers want mostly the same thing that we use internally. Our focus for engineering teams is to eliminate undifferentiated heavylifting for all of our customers, both inside and outside. So, when multiple customers started asking us about the options for their Java platform, it became quite clear and apparent to us that Corretto needs to be made external. And it turns out that everything we were already doing internally was fully applicable to our external customers as well.
Internally, we had customers using Oracle JDK and Red Hat JDK. And externally, JDK is used on a large variety of platforms. It was critical that Corretto would work as a drop-in replacement. So we went through the rigor of testing extensively using all the jtreg tests, ran all the TCK tests which again require setting up the infrastructure, excessive fine tuning, manual testing
Upstream compatibility is a core tenet for OpenJDK. From the get go, we decided to contribute everything upstream
In general, forking code is a bad idea.
Many open source contributions from Amazon bubble up from within our teams as it helps them reduce technical debt and reduces maintenance burden. And that’s the general open source philosophy at Amazon as well.
Forking also leads to compatibility changes over time. And this is important for customers as it makes them less agile. So the thought within the team is always to keep the trunk up to date.
Lets talk about how do we contribute.
Starting from January 2019, update project leadership has been handed off from Oracle to Red Hat. Amazon is collaborating with Red Hat by back porting, bug fixes and making security patches.
As you can see, we’ve been contributing patches to 8u212 and 11.0.3. The next quarterly release is coming out soon and you can find the issues that we fixed using JBS.
Lets talk about the process of contribution.
These are exclusive patches contributed by Corretto, some of them may have been up-streamed by now.
Some of these bugs have been fixed in tip but they’re not available in OpenJDK, need to be backported and tested. Similarly, we’ve security patches.
We have our own improvements, very collaborative process.
Now, lets talk about who can get the fixes in.
If you are doing modern open source development, then you’re familiar with the workflow that has been enabled by GitHub. Find an issue, send a PR, tests are run using a CI in the cloud, PR is reviewed, you get an LGTM and its merged. OpenJDK contribution process is different.
Contributor: Has contributed one or more OpenJDK patches, sponsored/pushed by a Committer/Reviewer. See http://openjdk.java.net/contribute/ for how to contribute if you’re not yet an Author, and http://openjdk.java.net/sponsor/ for how to sponsor a patch.
Author: Has an OpenJDK username, write access to JBS (Java Bug System) database (so can file bugs), and write access to the OpenJDK code review server cr.openjdk.java.net (so can post formatted patches and other materials there). Has made at least two contributions. Status granted by project leads, contributions must still be sponsored/pushed by a Committer/Reviewer.
Committer: has patch commit rights to the project repo(s), but can push only with Reviewer approval. Has made at least 8 “significant” (in opinion of existing Committers) contributions. Voted in by existing Committers, usually takes a year or two to achieve.
Reviewer: Has patch approval authority for the project repo(s). Every commit must be approved by at least one Reviewer. Has made at least 32 “significant” contributions. Voted in by existing Reviewers, usually takes 3 - 5 years to achieve.
There are several hundred Authors, Committers, and Reviewers.
We have Corretto team members
Luckily we hired Paul Hohensee who is a Reviewer for the jdk, jdk update, and hotspot jvm projects. We have in addition one jdk8u project Author, Xin Liu.
TODO: Include Project Lead
Customer reports a bug. Sometimes a fix is already done in later JDK, so backport to contribute to the older version
New bug - then we fix at the tip and backport
New bug - sometimes cannot wait, fix in internal code base, contribute to upstream later. OCA requirement: must publish patch to an OpenJDK forum (JBS issue, webrev, email to openjdk mailing list) before publishing source code or a binary derived from it anywhere else.
Significant improvements internally, useful in general products, we package them and make them available to community. For example ACCP.
OpenJDK convention
Testing is concentrated in tip, we want to get experience with the fix before we backport
Testing is more extensive in tip so we want to make sure all cases are covered
Amazon operates some of the world’s largest Java services. Over the years, we observed that cryptographic operations in Java caused significant CPU usage, throughput bottlenecks, and elevated operational cost. We developed ACCP, and then implemented, debugged, and have been tuning its performance in our own most critical production environments since 2017. You can now enjoy this performance optimization in your own products as well: with ACCP, we are introducing another supported distribution, ready for production use for JDK 8 and 11.
In our tests, ACCP improved the performance of AES-GCM encryption for JDK 8 by a factor of 28
AWS Snowball uses ACCP to run cryptographic functions 95% faster, doubling its data transfer speed.
After implementing ACCP, one service reported a drop in peak CPU usage from ~66% to ~55% (see graph below). Another reported a 40% reduction in fleet cost. Yet another increased its capacity by 32%.
Big OSS community at Amazon, customers pick them up for various reasons. We do internal code reviews, almost everybody can see. That was still a push on the team to move their proposals on the outside without worrying too much.
Shifting from our internal repo to GH, had to comfort the team, it’s ok, don’t be afraid. What would people think of Amazon if the quality is not too good.
If we stayed too long internally, then we
Now we go out early as soon as possible, to get early feedback. Be a lot more open to feedback and criticism.
Early days of Corretto. Customers are deploying Corretto in unexpected locations where the IT is not under control.