The document discusses Joyent's early adoption of OS-level virtualization through containers and SmartOS beginning in 2005. It explores the benefits of OS-level virtualization for performance, elasticity, and security compared to hardware virtualization. It also discusses Joyent's work developing platforms like no.de and Manta that combined OS containers with technologies like node.js and ZFS. A key challenge was gaining developer adoption due to SmartOS being different than mainstream Linux. Docker later helped popularize the container model. Joyent contributed to container technologies through projects like porting KVM to SmartOS and reviving Linux container support in illumos.
1. The Peril and Promise of
Early Adoption:
Arriving 10 Years Early to
Containers
CTO
bryan@joyent.com
Bryan Cantrill
@bcantrill
2. Who is Joyent?
• In an interview with ACM Queue in 2008, Joyent’s
mission was described concisely — if ambitiously:
3. Virtualization as cloud catalyst
• This vision — dating back to 2005 — was an example of
early cloud computing, but was itself not a new vision...
• In the 1960s — shortly after the dawn of computing! —
pundits foresaw a multi-tenant compute utility
• The vision was four decades too early: it took the
internet + commodity computing + virtualization to yield
cloud computing
• Virtualization is the essential ingredient for multi-tenant
operation — but where in the stack to virtualize?
• Choices around virtualization capture tensions between
elasticity, tenancy, and performance
• tl;dr: Virtualization choices drive economic tradeoffs
4. • The historical answer — since the 1960s — has been to
virtualize at the level of the hardware:
• A virtual machine is presented upon which each
tenant runs an operating system of their choosing
• There are as many operating systems as tenants
• The singular advantage of hardware virtualization: it can
run entire legacy stacks unmodified
• However, hardware virtualization exacts a heavy price:
operating systems are not designed to share resources
like DRAM, CPU, I/O devices or the network
• Hardware virtualization limits tenancy, elasticity and
performance
Hardware-level virtualization?
5. • Virtualizing at the application platform layer addresses
the tenancy challenges of hardware virtualization
• Added advantage of a much more nimble (& developer-
friendly!) abstraction…
• ...but at the cost of dictating abstraction to the developer
• This creates the “Google App Engine problem”:
developers are in a straightjacket where toy programs
are easy — but sophisticated apps are impossible
• Virtualizing at the application platform layer poses many
other challenges with respect to security, containment
and scalability
Platform-level virtualization?
6. • Virtualizing at the OS level hits the sweet spot:
• Single OS (i.e., single kernel) allows for efficient use of
hardware resources, maximizing tenancy and
performance
• Disjoint instances are securely compartmentalized by
the operating system
• Gives users what appears to be a virtual machine (albeit
a very fast one) on which to run higher-level software
• The ease of a PaaS with the generality of IaaS
• Model was pioneered by FreeBSD jails and taken to
their logical extreme by Solaris zones — and then aped
by Linux containers
OS-level virtualization!
7. OS-level virtualization in the cloud
• Joyent runs OS containers in the cloud via SmartOS
(our illumos derivative) — and we have run containers in
multi-tenant production since ~2005
• Core SmartOS facilities are container-aware and
optimized: Zones, ZFS, DTrace, Crossbow, SMF, etc.
• SmartOS also supports hardware-level virtualization —
but we have long advocated OS-level virtualization for
new build out
• We emphasized their operational characteristics
(performance, elasticity, tenancy)...
8. And it worked!
• Our vision captured developers seeking to scale apps —
and by 2007, a rapidly growing Twitter ran on Joyent
Accelerators:
9. But there were challenges...
• OS-based virtualization was a tremendous strength —
but SmartOS being (seemingly) spuriously different
made it difficult to capture developer mind-share
• Differences are more idiosyncratic than meaningful, but
they became an obstacle to adoption…
• Adopters had to be highly technical and really care
about performance/scale
• Differentiating on performance alone is challenging,
especially when the platform is different: too tempting to
blame the differences instead of using the differentiators
10. Could we go upstack?
• To recapture the developer, we needed to get upstack
• First attempt was SmartPlatform (ca. 2009?), a
JavaScript (SpiderMonkey!) + Perl frankensteinPaaS
• SmartPlatform had all of the problems of SpiderMonkey,
Perl and a PaaS — but showed the value of server-side
JavaScript
• When node.js first appeared in late 2009, we were
among the first to see its promise, and we lunged...
11. node.js + OS-based virtualization?
• In 2010, the challenge became to tie node.js to our most
fundamental differentiator, OS-based virtualization
• First experiments was a high-tenancy container-based
PaaS, no.de, launched for Node Knockout in Fall 2010
• We ran high tenancy (400+ machines in 48GB), high
performance — and developed DTrace-based graphical
observability
• Early results were promising...
13. no.de: Challenges of a PaaS
• We went on to develop full cloud analytics for no.de:
• But the PaaS business is more than performance
management — and it was clear that it was very early it
what was going to be a tough business...
14. node.js: Wins and frustrations
• The SmartOS + node.js efforts were successful in as
much as new developer converts to SmartOS were (and
are!) often coming from node.js
• The debugging we built into node.js on SmartOS is
(frankly) jawdropping — and essential for serious use...
• ...but our differentiators are production-oriented —
developers still have to be highly technical, and still
have to be willing to endure transitional pain
• Exacerbated by the fact that applications aren’t built in
node.js — they are connected with node.js
• We ended up back with familiar problems...
15. Hardware virtualization?
• In late 2010, it was clear that — despite the (obvious!)
technical superiority of OS-based virtualization — we
also needed hardware-based virtualization
• Could OS-based virtualization could help us differentiate
a hardware virtualization implementation?
• If we could port KVM to SmartOS, we could offer
advantages over other hypervisors: shared filesystem
cache, double-hulled security, global observability
• The problem is that KVM isn’t, in fact, portable — and
had never been ported to a different system
16. KVM + SmartOS: Supergroup or stopgap?
• In 2011, we managed to successfully port KVM to
SmartOS, becoming the first (and only) hypervisor to
offer HW virtualization within OS virtualization
• Over the course of 2011, we built SmartDataCenter, a
container-based orchestration and cloud-management
system around SmartOS
• Deployed SmartDataCenter into production in the
Joyent Public Cloud in late 2011
• Over the course of 2012, our entire cloud moved to SDC
• This was essential: most of our VMs today run inside
KVM, and many customers are hybrid
17. The limits of hardware virtualization
• Ironically, our time on KVM helped to reinforce our most
fundamental beliefs in OS-based virtualization...
• We spent significant time making KVM on SmartOS
perform — but there are physical limits
• There are certain performance and resource problems
around HW-based virtualization that are simple
intractable
• While it is indisputably the right abstraction for running
legacy software, it is the wrong abstraction for future
elastic infrastructure!
18. Aside: Cloud storage
• In 2011, the gaping hole in the Joyent Public Cloud was
storage — but we were reluctant to build an also-ran S3
• In thinking about this problem, it was tempting to fixate
on ZFS, one of our most fundamental differentiators
• ZFS rivals OS-based virtualization for our earliest
differentiator: we were the first large, public deployment
of ZFS (ca. 2006) — and a long-time proponent
• While ZFS was part of the answer, it should have been
no surprise that OS-based virtualization...
20. Manta: ZFS + OS-based virtualization!
• Over 2012 and early 2013, we built Manta, a ZFS- and
container-based internet-facing object storage system
offering in situ compute
• OS-based virtualization allows the description of
compute can be brought to where objects reside instead
of having to backhaul objects to transient compute
• The abstractions made available for computation are
anything that can run on the OS...
• ...and as a reminder, the OS — Unix — was built around
the notion of ad hoc unstructured data processing, and
allows for remarkably terse expressions of computation
21. Aside: Unix
• When Unix appeared in the early 1970s, it was not just a
new system, but a new way of thinking about systems
• Instead of a sealed monolith, the operating system was
a collection of small, easily understood programs
• First Edition Unix (1971) contained many programs that
we still use today (ls, rm, cat, mv)
• Its very name conveyed this minimalist aesthetic: Unix is
a homophone of “eunuchs” — a castrated Multics
We were a bit oppressed by the big system mentality. Ken
wanted to do something simple. — Dennis Ritchie
22. Unix: Let there be light
• In 1969, Doug McIlroy had the idea of connecting
different components:
At the same time that Thompson and Ritchie were sketching
out a file system, I was sketching out how to do data
processing on the blackboard by connecting together
cascades of processes
• This was the primordial pipe, but it took three years to
persuade Thompson to adopt it:
And one day I came up with a syntax for the shell that went
along with the piping, and Ken said, “I’m going to do it!”
23. Unix: ...and there was light
And the next morning we had this
orgy of one-liners. — Doug McIlroy
24. The Unix philosophy
• The pipe — coupled with the small-system aesthetic —
gave rise to the Unix philosophy, as articulated by Doug
McIlroy:
• Write programs that do one thing and do it well
• Write programs to work together
• Write programs that handle text streams, because
that is a universal interface
• Four decades later, this philosophy remains the single
most important revolution in software systems thinking!
25. • In 1986, Jon Bentley posed the challenge that became
the Epic Rap Battle of computer science history:
Read a file of text, determine the n most frequently used
words, and print out a sorted list of those words along with
their frequencies.
• Don Knuth’s solution: an elaborate program in WEB, a
Pascal-like literate programming system of his own
invention, using a purpose-built algorithm
• Doug McIlroy’s solution shows the power of the Unix
philosophy:
tr -cs A-Za-z 'n' | tr A-Z a-z |
sort | uniq -c | sort -rn | sed ${1}q
Doug McIlroy v. Don Knuth: FIGHT!
26. Big Data: History repeats itself?
• The original Google MapReduce paper (Dean et al.,
OSDI ’04) poses a problem disturbingly similar to
Bentley’s challenge nearly two decades prior:
Count of URL Access Frequency: The function processes
logs of web page requests and outputs ⟨URL, 1⟩. The
reduce function adds together all values for the same URL
and emits a ⟨URL, total count⟩ pair
• But the solutions do not adhere to the Unix philosophy...
• ...and nor do they make use of the substantial Unix
foundation for data processing
• e.g., Appendix A of the OSDI ’04 paper has a 71 line
word count in C++ — with nary a wc in sight
27. • Manta allows for an arbitrarily scalable variant of
McIlroy’s solution to Bentley’s challenge:
mfind -t o /bcantrill/public/v7/usr/man |
mjob create -o -m "tr -cs A-Za-z 'n' |
tr A-Z a-z | sort | uniq -c" -r
"awk '{ x[$2] += $1 }
END { for (w in x) { print x[w] " " w } }' |
sort -rn | sed ${1}q"
• This description not only terse, it is high performing: data
is left at rest — with the “map” phase doing heavy
reduction of the data stream
• As such, Manta — like Unix — is not merely syntactic
sugar; it converges compute and data in a new way
Manta: Unix for Big Data
28. Manta revolution
• Our experiences with Manta — like those with KVM —
have served to strengthen our core belief in OS-based
virtualization
• Compute/data convergence is clearly the future of big
data: stores of record must support computation as a
first-class, in situ operation
• Unix is a natural way of expressing this computation —
and the OS is clearly the right level at which to virtualize
to support this securely
• Manta will surely not be the only system to represent the
confluence of these; the rest of the world will (ultimately)
figure out the power of OS-based virtualization
29. Manta mental challenges
• Our biggest challenge with Manta has been that the key
underlying technology — OS-based virtualization — is
not well understood
• We underestimated the degree to which this would be
an impediment: Manta felt “easy” to us
• When technology requires a shift in mental model, its
transformative power must be that much greater to
compensate for its increased burden!
• Would the world ever really figure out containers?!
30. Containers as PaaS foundation?
• Some saw the power of OS containers to facilitate up-
stack platform-as-a-service abstractions
• For example, dotCloud — a platform-as-a-service
provider — build their PaaS on OS containers
• Hearing that many were interested in their container
orchestration layer (but not their PaaS), dotCloud open
sourced their container-based orchestration layer...
32. Docker revolution
• Docker has used the rapid provisioning + shared
underlying filesystem of containers to allow developers
to think operationally
• Developers can encode dependencies and deployment
practices into an image
• Images can be layered, allowing for swift development
• Images can be quickly deployed — and re-deployed
• As such, Docker is a perfect for for microservices
• Docker will do to apt what apt did to tar
33. Docker’s challenges
• The Docker model is the future of containers
• Docker’s challenges are largely around production
deployment: security, network virtualization, persistence
• Security concerns are real enough that for multi-tenancy,
OS containers are currently running in hardware VMs (!!)
• In SmartOS, we have spent a decade addressing these
concerns — and we have proven it in production…
• Could we combine the best of both worlds?
• Could we somehow deploy Docker containers as
SmartOS zones?
34. Docker + SmartOS: Linux binaries?
• First (obvious) problem: while it has been designed to
be cross-platform, Docker is Linux-centric
• While Docker could be ported, the encyclopedia of
Docker images will likely forever remain Linux binaries
• SmartOS is Unix — but it isn’t Linux…
• Could we somehow natively emulate Linux — and run
Linux binaries directly on the SmartOS kernel?
35. OS emulation: An old idea
• Operating systems have long employed system call
emulation to allow binaries from one operating system
run on another on the same instruction set architecture
• Combines the binary footprint of the emulated system
with the operational advantages of the emulating system
• Done as early as 1969 with DEC’s PA1050 (TOPS-10
on TOPS-20); Sun did this (for similar reasons) ca. 1993
with SunOS 4.x binaries running on Solaris 2.x
• In mid-2000s, Sun developed zone-based OS emulation
for Solaris: branded zones
• Several brands were developed — notably including an
LX brand that allowed for Linux emulation
36. LX-branded zones: Life and death
• The LX-branded zone worked for RHEL 3 (!): glibc 2.3.2
+ Linux 2.4
• Remarkable amount of work was done to handle device
pathing, signal handling, /proc — and arcana like TTY
ioctls, ptrace, etc.
• Worked for a surprising number of binaries!
• But support was only for 2.4 kernels and only for 32-bit;
2.6 + 64-bit appeared daunting…
• Support was ripped out of the system on June 11, 2010
• Fortunately, this was after the system was open sourced
in June 2005 — and the source was out there...
37. LX-branded zones: Resurrection!
• In January 2014, David Mackay, an illumos community
member, announced that he was able to resurrect the
LX brand —and that it appeared to work!
Linked below is a webrev which restores LX branded zones
support to Illumos:
http://cr.illumos.org/~webrev/DavidJX8P/lx-zones-restoration/
I have been running OpenIndiana, using it daily on my
workstation for over a month with the above webrev applied to
the illumos-gate and built by myself.
It would definitely raise interest in Illumos. Indeed, I have
seen many people who are extremely interested in LX zones.
The LX zones code is minimally invasive on Illumos itself, and
is mostly segregated out.
I hope you find this of interest.
38. LX-branded zones: Revival
• Encouraged that the LX-branded work was salvageable,
Joyent engineer Jerry Jelinek reintegrated the LX brand
into SmartOS on March 20, 2014...
• ...and started the (substantial) work to modernize it
• Guiding principles for LX-branded zone work:
• Do it all in the open
• Do it all on SmartOS master (illumos-joyent)
• Add base illumos facilities wherever possible
• Aim to upstream to illumos when we’re done
39. LX-branded zones: Progress
• Working assiduously over the course of 2014, progress
was difficult but steady:
• Ubuntu 10.04 booted in April
• Ubuntu 12.04 booted in May
• Ubuntu 14.04 booted in July
• 64-bit Ubuntu 14.04 booted in October (!)
• Going into 2015, it was becoming increasingly difficult to
find Linux software that didn’t work...
42. Docker + SmartOS: Provisioning?
• With the binary problem being tackled, focus turned to
the mechanics of integrating Docker with the SmartOS
facilities for provisioning
• Provisioning a SmartOS zone operates via the global
zone that represents the control plane of the machine
• docker is a single binary that functions as both client
and server — and with too much surface area to run in
the global zone, especially for a public cloud
• docker has also embedded Go- and Linux-isms that
we did not want in the global zone; we needed to find a
different approach...
43. Docker Remote API
• While docker is a single binary that can run on the
client or the server, it does not run in both at once…
• docker (the client) communicates with docker (the
server) via the Docker Remote API
• The Docker Remote API is expressive, modern and
robust (i.e. versioned), allowing for docker to
communicate with Docker backends that aren’t docker
• The clear approach was therefore to implement a
Docker Remote API endpoint for SmartDataCenter
44. Aside: SmartDataCenter
• Orchestration software for SmartOS-based clouds
• Unlike other cloud stacks, not designed to run arbitrary
hypervisors, sell legacy hardware or get 160 companies
to agree on something
• SmartDataCenter is designed to leverage the SmartOS
differentiators: ZFS, DTrace and (esp.) zones
• Runs both the Joyent Public Cloud and business-critical
on-premises clouds at well-known brands
• Born proprietary — but made entirely open source on
November 6, 2014: http://github.com/joyent/sdc
47. SmartDataCenter + Docker
• Implementing an SDC-wide endpoint for the Docker
remote API allows us to build in terms of our established
core services: UFDS, CNAPI, VMAPI, Image API, etc.
• Has the welcome side-effect of virtualizing the notion of
Docker host machine: Docker containers can be placed
anywhere within the data center
• From a developer perspective, one less thing to manage
• From an operations perspective, allows for a flexible
layer of management and control: Docker API endpoints
become a potential administrative nexus
• As such, virtualizing the Docker host is somewhat
analogous to the way ZFS virtualized the filesystem...
48. SmartDataCenter + Docker: Challenges
• Some Docker constructs have (implicitly) encoded co-
locality of Docker containers on a physical machine
• Some of these constructs (e.g., --volumes-from) we
will discourage but accommodate by co-scheduling
• Others (e.g., host directory-based volumes) we are
implementing via NFS backed by Manta, our (open
source!) distributed object storage service
• Moving forward, we are working with Docker to help
assure that the Docker Remote API doesn’t create new
implicit dependencies on physical locality
49. SmartDataCenter + Docker: Networking
• Parallel to our SmartOS and Docker work, we have
been working on next-generation software-defined
networking for SmartOS and SmartDataCenter
• Goal was to use standard encapsulation/decapsulation
protocols (i.e., VXLAN) for overlay networks
• We have taken a kernel-based (and ARP-inspired)
approach to assure scale
• Complements SDC’s existing in-kernel, API-managed
firewall facilities
• All done in the open: in SmartOS (illumos-joyent)
and as sdc-portolan
50. Putting it all together: sdc-docker
• Our Docker engine for SDC, sdc-docker, implements
the end points for the Docker Remote API
• Work is young (started in earnest in early fall 2014), but
because it takes advantage of a proven orchestration
substrate, progress has been very quick…
• We are deploying it into early access production in the
Joyent Public Cloud in Q1CY15 (yes: T-12 days!)
• It’s open source: http://github.com/joyent/sdc-docker;
you can install SDC (either on hardware or on VMware)
and check it out for yourself!
51. Containers: reflecting back
• For nearly a decade, we at Joyent have believed that
OS-virtualized containers are the future of computing
• While the efficiency gains are tremendous, they have
not alone been enough to propel containers into the
mainstream
• Containers are being propelled by Docker and its
embodiment of an entirely different advantage of OS
containers: developer agility
• With Docker, the moment for the technology seems to
have arrived: the technology seems to be in the right
place at the right time
• Reflecting back on our adventure as an early adopter...
52. Early adoption: The peril
• When working on a revolutionary technology, it’s easy to
dismiss the inconveniences as casualties of the future
• Some conveniences are actually constraints — but it
can be very difficult to discern which!
• When adopters must endure painful differences to enjoy
the differentiators, the economic advantages of a
technological revolution are undermined
• And even when the thinking does shift, it can take a long
time; as Keynes famously observed, “the market can
stay irrational longer than you can stay solvent”!
53. Early adoption: The promise
• When the payoffs do come, they can be tremendously
outsized with respect to the risk
• Placing gutsy technological bets attracts like-minded
technologists — which can create uniquely fertile
environments for innovation
• If and where early adoption is based on open source,
the community of like-minded technologists is not
confined to be within a company’s walls
• Open source innovation allows for new customers and/
or new employees: for early adopters, open source is
the farm system!
54. Early adoption: The peril and the promise
• While early adoption isn’t for everyone, every
organization should probably be doing some early
adoption somewhere — and probably in the open
• When an early adopter of a technology, don’t innovate in
too many directions at once: know the differentiators
and focus on ease of use/adoption for everything else
• Stay flexible and adaptable! You may very well be right
on trajectory, but wrong on specifics
• Don’t give up! Technological revolutions happen much
slower than you think they should — and then much
more quickly than anyone would think possible
• “God bless the early adopters!”
55. Thank you!
• Jerry Jelinek, @jmclulow, @pfmooney and @jperkin for
their work on LX branded zones
• @joshwilsdon, @trentmick, @cachafla and @orlandov
for their work on sdc-docker
• @rmustacc, @wayfaringrob, @fredfkuo and @notmatt
for their work on SDC overlay networking
• @dapsays for his work on Manta and node.js debugging
• @tjfontaine for his work on node.js
• The countless engineers who have worked on or with us
because they believed in OS-based virtualization!