SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Bringing the Unix
Philosophy to Big Data
Bryan Cantrill
SVP, Engineering
bryan@joyent.com
@bcantrill
Unix

•

When Unix appeared in the early 1970s, it was not just a
new system, but a new way of thinking about systems

•

Instead of a sealed monolith, the operating system was
a collection of small, easily understood programs

•

First Edition Unix (1971) contained many programs that
we still use today (ls, rm, cat, mv)

•

Its very name conveyed this minimalist aesthetic: Unix is
a homophone of “eunuchs” — a castrated Multics
We were a bit oppressed by the big system mentality. Ken
wanted to do something simple. — Dennis Ritchie
Unix: Let there be light

•

In 1969, Doug McIlroy had the idea of connecting
different components:
At the same time that Thompson and Ritchie were sketching
out a file system, I was sketching out how to do data
processing on the blackboard by connecting together
cascades of processes

•

This was the primordial pipe, but it took three years to
persuade Thompson to adopt it:
And one day I came up with a syntax for the shell that went
along with the piping, and Ken said, “I’m going to do it!”
Unix: ...and there was light

And the next morning we had this
orgy of one-liners. — Doug McIlroy
The Unix philosophy

•

The pipe — coupled with the small-system aesthetic —
gave rise to the Unix philosophy, as articulated by Doug
McIlroy:

•
•

Write programs to work together

•
•

Write programs that do one thing and do it well

Write programs that handle text streams, because
that is a universal interface

Four decades later, this philosophy remains the single
most important revolution in software systems thinking!
Doug McIlroy v. Don Knuth: FIGHT!

•

In 1986, Jon Bentley posed the challenge that became
the Epic Rap Battle of computer science history:
Read a file of text, determine the n most frequently used
words, and print out a sorted list of those words along with
their frequencies.

•

Don Knuth’s solution: an elaborate program in WEB, a
Pascal-like literate programming system of his own
invention, using a purpose-built algorithm

•

Doug McIlroy’s solution shows the power of the Unix
philosophy:
tr -cs A-Za-z 'n' | tr A-Z a-z | 
sort | uniq -c | sort -rn | sed ${1}q
Big Data: History repeats itself?

•

The original Google MapReduce paper (Dean et al.,
OSDI ’04) poses a problem disturbingly similar to
Bentley’s challenge nearly two decades prior:
Count of URL Access Frequency: The function processes
logs of web page requests and outputs ⟨URL, 1⟩. The
reduce function adds together all values for the same URL
and emits a ⟨URL, total count⟩ pair

•
•

But the solutions do not adhere to the Unix philosophy...

•

e.g., Appendix A of the OSDI ’04 paper has a 71 line
word count in C++ — with nary a wc in sight

...and nor do they make use of the substantial Unix
foundation for data processing
Big Data: Challenges

•

Must be able to scale storage to allow for “big data” —
quantities of data that dwarf a single machine

•
•
•

Must allow for massively parallel execution
Must allow for multi-tenancy
To make use of both the Unix philosophy and its toolset,
must be able to virtualize the operating system
Scaling storage

•

There are essentially three protocols for scalable
storage: block, file and object

•

Block (i.e., a SAN) is far too low an abstraction — and
notoriously expensive to scale

•

File (i.e., NAS) is too permissive an abstraction — it
implies a coherent store for arbitrary (partial) writes,
trying (and failing) to be both C and A in CAP

•

Object (e.g., S3) is similar “enough” to a file-based
abstraction, but by not allowing partial writes, allows for
proper CAP tradeoffs
Object storage

•
•

Object storage systems do not allow for partial updates

•

A different approach is to have a highly reliable local file
system that erasure encodes across local spindles —
with entire objects duplicated across nodes for
availability

•

ZFS pioneered both reliability and efficiency of this
model with RAID-Z — and has refined it over the past
decade of production use

•

ZFS is one of the four foundational technologies in
Joyent’s open source SmartOS

For both durability and availability, objects are generally
erasure encoded across spindles on different nodes
Virtualizing the operating system?

•

Historically — since the 1960s — systems have been
virtualized at the level of hardware

•

Hardware virtualization has its advantages, but it’s
heavyweight: operating systems are not designed to
share resources like DRAM, CPU, I/O devices, etc.

•

One can instead virtualize at the level of the operating
system: a single OS kernel that creates lightweight
containers — on the metal, but securely partitioned

•

Pioneered by BSD’s jails; taken to a logical extreme by
zones found in Joyent’s SmartOS
Idea: ZFS + Zones?

•

Can we combine the efficiency and reliability of ZFS
with the abstraction provided by zones to develop an
object store that has compute as a first-class citizen?

•

ZFS rollback allows for zones to be trashed — simply
rollback the zone after compute completes on an object

•

Add a job scheduling system that allows for both map
and reduce phases of distributed work

•

Would allow for the Unix toolset to be used on arbitrary
large amounts of data — unlocking big data one-liners

•

If it perhaps seems obvious now, it wasn’t at the time...
Idea: ZFS + Zones?
Manta: ZFS + Zones!

•

Building a sophisticated distributed system on top of
ZFS and zones, we have built Manta, an internet-facing
object storage system offering in situ compute

•

That is, the description of compute can be brought to
where objects reside instead of having to backhaul
objects to transient compute

•

The abstractions made available for computation are
anything that can run on the OS...

•

...and as a reminder, the OS — Unix — was built around
the notion of ad hoc unstructured data processing, and
allows for remarkably terse expressions of computation
Manta: Unix for Big Data

•

Manta allows for an arbitrarily scalable variant of
McIlroy’s solution to Bentley’s challenge:
mfind -t o /bcantrill/public/v7/usr/man | 
mjob create -o -m "tr -cs A-Za-z 'n' | 
tr A-Z a-z | sort | uniq -c" -r 
"awk '{ x[$2] += $1 }
END { for (w in x) { print x[w] " " w } }' | 
sort -rn | sed ${1}q"

•

This description not only terse, it is high performing: data
is left at rest — with the “map” phase doing heavy
reduction of the data stream

•

As such, Manta — like Unix — is not merely syntactic
sugar; it converges compute and data in a new way
Manta: CAP tradeoffs

•

Eventual consistency represents the wrong CAP
tradeoffs for most; we prefer consistency over
availability for writes (but still availability for reads)

•

Many more details:
http://dtrace.org/blogs/dap/2013/07/03/fault-tolerance-in-manta/

•

Celebrity endorsement:
Manta: Other design principles

•

Hierarchical storage is an excellent idea (ht: Multics);
Manta implements proper directories, delimited with a
forward slash

•

Manta implements a snapshot/link hybrid dubbed a
snaplink; can be used to effect versioning

•
•

Manta has full support for CORS headers

•
•

Manta SDKs exist for node.js, Java, Ruby, Python

Manta uses SSH-based HTTP auth for client-side
tooling (IETF draft-cavage-http-signatures-00)

“npm install manta” for command line interface
Manta and the future of big data

•

We believe compute/data convergence to be the future
of big data: stores of record must support computation
as a first-class, in situ operation

•

We believe that Unix is a natural way of expressing this
computation — and that the OS is the right level at
which to virtualize to support this securely

•

We believe that ZFS is the only sane storage substrate
underpinning for such a system

•

Manta will surely not be the only system to represent the
confluence of these — but it is the first

•

We are actively retooling our software stack in terms of
Manta — Manta is changing the way we develop
software!
Manta: More information

•

Product page:
http://joyent.com/products/manta

•

node.js module:
https://github.com/joyent/node-manta

•

Manta documentation:
http://apidocs.joyent.com/manta/

•

IRC, e-mail, Twitter, etc.:
#manta on freenode, manta@joyent.com, @mcavage,
@dapsays, @yunongx, @joyent

•

Here’s to the orgy of big data one-liners!

Weitere ähnliche Inhalte

Was ist angesagt?

Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunheut2008
 
An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014
An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014
An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014Davidlohr Bueso
 
Zebra SRv6 CLI on Linux Dataplane (ENOG#49)
Zebra SRv6 CLI on Linux Dataplane (ENOG#49)Zebra SRv6 CLI on Linux Dataplane (ENOG#49)
Zebra SRv6 CLI on Linux Dataplane (ENOG#49)Kentaro Ebisawa
 
Serf / Consul 入門 ~仕事を楽しくしよう~
Serf / Consul 入門 ~仕事を楽しくしよう~Serf / Consul 入門 ~仕事を楽しくしよう~
Serf / Consul 入門 ~仕事を楽しくしよう~Masahito Zembutsu
 
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInDatabricks
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsHisaki Ohara
 
TRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch HaimTRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch Haimharryvanhaaren
 
Ethernetの受信処理
Ethernetの受信処理Ethernetの受信処理
Ethernetの受信処理Takuya ASADA
 
[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOS[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOSAkihiro Suda
 
Open vSwitchソースコードの全体像
Open vSwitchソースコードの全体像 Open vSwitchソースコードの全体像
Open vSwitchソースコードの全体像 Sho Shimizu
 
冬のLock free祭り safe
冬のLock free祭り safe冬のLock free祭り safe
冬のLock free祭り safeKumazaki Hiroki
 
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15Takaya Saeki
 
TC Flower Offload
TC Flower OffloadTC Flower Offload
TC Flower OffloadNetronome
 
Unbound/NSD最新情報(OSC 2013 Tokyo/Spring)
Unbound/NSD最新情報(OSC 2013 Tokyo/Spring)Unbound/NSD最新情報(OSC 2013 Tokyo/Spring)
Unbound/NSD最新情報(OSC 2013 Tokyo/Spring)Takashi Takizawa
 
Xbyakの紹介とその周辺
Xbyakの紹介とその周辺Xbyakの紹介とその周辺
Xbyakの紹介とその周辺MITSUNARI Shigeo
 
ARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくいARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくいwata2ki
 
仮想化環境におけるパケットフォワーディング
仮想化環境におけるパケットフォワーディング仮想化環境におけるパケットフォワーディング
仮想化環境におけるパケットフォワーディングTakuya ASADA
 
TRex Realistic Traffic Generator - Stateless support
TRex  Realistic Traffic Generator  - Stateless support TRex  Realistic Traffic Generator  - Stateless support
TRex Realistic Traffic Generator - Stateless support Hanoch Haim
 

Was ist angesagt? (20)

Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zun
 
An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014
An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014
An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014
 
Zebra SRv6 CLI on Linux Dataplane (ENOG#49)
Zebra SRv6 CLI on Linux Dataplane (ENOG#49)Zebra SRv6 CLI on Linux Dataplane (ENOG#49)
Zebra SRv6 CLI on Linux Dataplane (ENOG#49)
 
Serf / Consul 入門 ~仕事を楽しくしよう~
Serf / Consul 入門 ~仕事を楽しくしよう~Serf / Consul 入門 ~仕事を楽しくしよう~
Serf / Consul 入門 ~仕事を楽しくしよう~
 
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
 
TRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch HaimTRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch Haim
 
Ethernetの受信処理
Ethernetの受信処理Ethernetの受信処理
Ethernetの受信処理
 
[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOS[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOS
 
TripleO Deep Dive
TripleO Deep DiveTripleO Deep Dive
TripleO Deep Dive
 
Linux Namespaces
Linux NamespacesLinux Namespaces
Linux Namespaces
 
Open vSwitchソースコードの全体像
Open vSwitchソースコードの全体像 Open vSwitchソースコードの全体像
Open vSwitchソースコードの全体像
 
冬のLock free祭り safe
冬のLock free祭り safe冬のLock free祭り safe
冬のLock free祭り safe
 
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15
 
TC Flower Offload
TC Flower OffloadTC Flower Offload
TC Flower Offload
 
Unbound/NSD最新情報(OSC 2013 Tokyo/Spring)
Unbound/NSD最新情報(OSC 2013 Tokyo/Spring)Unbound/NSD最新情報(OSC 2013 Tokyo/Spring)
Unbound/NSD最新情報(OSC 2013 Tokyo/Spring)
 
Xbyakの紹介とその周辺
Xbyakの紹介とその周辺Xbyakの紹介とその周辺
Xbyakの紹介とその周辺
 
ARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくいARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくい
 
仮想化環境におけるパケットフォワーディング
仮想化環境におけるパケットフォワーディング仮想化環境におけるパケットフォワーディング
仮想化環境におけるパケットフォワーディング
 
TRex Realistic Traffic Generator - Stateless support
TRex  Realistic Traffic Generator  - Stateless support TRex  Realistic Traffic Generator  - Stateless support
TRex Realistic Traffic Generator - Stateless support
 

Andere mochten auch

Corporate Open Source Anti-patterns
Corporate Open Source Anti-patternsCorporate Open Source Anti-patterns
Corporate Open Source Anti-patternsbcantrill
 
Linux principles and philosophy
Linux principles and philosophyLinux principles and philosophy
Linux principles and philosophyaliabintouq
 
Unix philosophy and principles
Unix philosophy and principlesUnix philosophy and principles
Unix philosophy and principlesmaryamalmarrii
 
Linux principles and philosophy
Linux principles and philosophyLinux principles and philosophy
Linux principles and philosophyFa6ma_
 
Mobile Knife Fighting at JSConf US
Mobile Knife Fighting at JSConf US Mobile Knife Fighting at JSConf US
Mobile Knife Fighting at JSConf US Brian LeRoux
 
Eduards Sizovs - Micro Service Architecture
Eduards Sizovs - Micro Service Architecture Eduards Sizovs - Micro Service Architecture
Eduards Sizovs - Micro Service Architecture DevConFu
 
Mocking Test - QA Ninja Conf 2016
Mocking Test - QA Ninja Conf 2016Mocking Test - QA Ninja Conf 2016
Mocking Test - QA Ninja Conf 2016Renato Groff
 
Writing Well-Behaved Unix Utilities
Writing Well-Behaved Unix UtilitiesWriting Well-Behaved Unix Utilities
Writing Well-Behaved Unix UtilitiesRob Miller
 
Circuit breakers for Java: Failsafe, Javaslang-Circuitbreaker, Hystrix and Ve...
Circuit breakers for Java: Failsafe, Javaslang-Circuitbreaker, Hystrix and Ve...Circuit breakers for Java: Failsafe, Javaslang-Circuitbreaker, Hystrix and Ve...
Circuit breakers for Java: Failsafe, Javaslang-Circuitbreaker, Hystrix and Ve...Micha Kops
 
To Microservices and Beyond
To Microservices and BeyondTo Microservices and Beyond
To Microservices and BeyondMatt Stine
 
Mockito vs JMockit, battle of the mocking frameworks
Mockito vs JMockit, battle of the mocking frameworksMockito vs JMockit, battle of the mocking frameworks
Mockito vs JMockit, battle of the mocking frameworksEndranNL
 
Using Hystrix to Build Resilient Distributed Systems
Using Hystrix to Build Resilient Distributed SystemsUsing Hystrix to Build Resilient Distributed Systems
Using Hystrix to Build Resilient Distributed SystemsMatt Jacobs
 
Down Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab AllocatorDown Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab Allocatorbcantrill
 
Oral tradition in software engineering: Passing the craft across generations
Oral tradition in software engineering: Passing the craft across generationsOral tradition in software engineering: Passing the craft across generations
Oral tradition in software engineering: Passing the craft across generationsbcantrill
 
The State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destructionThe State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destructionbcantrill
 
(ARC317) Maintaining a Resilient Front Door at Massive Scale | AWS re:Invent ...
(ARC317) Maintaining a Resilient Front Door at Massive Scale | AWS re:Invent ...(ARC317) Maintaining a Resilient Front Door at Massive Scale | AWS re:Invent ...
(ARC317) Maintaining a Resilient Front Door at Massive Scale | AWS re:Invent ...Amazon Web Services
 
How we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHow we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHenning Spjelkavik
 
Microservice Architecture
Microservice ArchitectureMicroservice Architecture
Microservice Architecturetyrantbrian
 

Andere mochten auch (20)

Corporate Open Source Anti-patterns
Corporate Open Source Anti-patternsCorporate Open Source Anti-patterns
Corporate Open Source Anti-patterns
 
Linux principles and philosophy
Linux principles and philosophyLinux principles and philosophy
Linux principles and philosophy
 
Unix philosophy and principles
Unix philosophy and principlesUnix philosophy and principles
Unix philosophy and principles
 
Linux principles and philosophy
Linux principles and philosophyLinux principles and philosophy
Linux principles and philosophy
 
Mobile Knife Fighting at JSConf US
Mobile Knife Fighting at JSConf US Mobile Knife Fighting at JSConf US
Mobile Knife Fighting at JSConf US
 
Eduards Sizovs - Micro Service Architecture
Eduards Sizovs - Micro Service Architecture Eduards Sizovs - Micro Service Architecture
Eduards Sizovs - Micro Service Architecture
 
Mocking Test - QA Ninja Conf 2016
Mocking Test - QA Ninja Conf 2016Mocking Test - QA Ninja Conf 2016
Mocking Test - QA Ninja Conf 2016
 
Unix Philosophy
Unix PhilosophyUnix Philosophy
Unix Philosophy
 
Writing Well-Behaved Unix Utilities
Writing Well-Behaved Unix UtilitiesWriting Well-Behaved Unix Utilities
Writing Well-Behaved Unix Utilities
 
Circuit breakers for Java: Failsafe, Javaslang-Circuitbreaker, Hystrix and Ve...
Circuit breakers for Java: Failsafe, Javaslang-Circuitbreaker, Hystrix and Ve...Circuit breakers for Java: Failsafe, Javaslang-Circuitbreaker, Hystrix and Ve...
Circuit breakers for Java: Failsafe, Javaslang-Circuitbreaker, Hystrix and Ve...
 
To Microservices and Beyond
To Microservices and BeyondTo Microservices and Beyond
To Microservices and Beyond
 
Mockito vs JMockit, battle of the mocking frameworks
Mockito vs JMockit, battle of the mocking frameworksMockito vs JMockit, battle of the mocking frameworks
Mockito vs JMockit, battle of the mocking frameworks
 
Using Hystrix to Build Resilient Distributed Systems
Using Hystrix to Build Resilient Distributed SystemsUsing Hystrix to Build Resilient Distributed Systems
Using Hystrix to Build Resilient Distributed Systems
 
Down Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab AllocatorDown Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab Allocator
 
Oral tradition in software engineering: Passing the craft across generations
Oral tradition in software engineering: Passing the craft across generationsOral tradition in software engineering: Passing the craft across generations
Oral tradition in software engineering: Passing the craft across generations
 
Mocking
MockingMocking
Mocking
 
The State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destructionThe State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destruction
 
(ARC317) Maintaining a Resilient Front Door at Massive Scale | AWS re:Invent ...
(ARC317) Maintaining a Resilient Front Door at Massive Scale | AWS re:Invent ...(ARC317) Maintaining a Resilient Front Door at Massive Scale | AWS re:Invent ...
(ARC317) Maintaining a Resilient Front Door at Massive Scale | AWS re:Invent ...
 
How we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHow we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.no
 
Microservice Architecture
Microservice ArchitectureMicroservice Architecture
Microservice Architecture
 

Ähnlich wie Bringing the Unix Philosophy to Big Data

Manta: a new internet-facing object storage facility that features compute by...
Manta: a new internet-facing object storage facility that features compute by...Manta: a new internet-facing object storage facility that features compute by...
Manta: a new internet-facing object storage facility that features compute by...Hakka Labs
 
The Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of dataThe Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of databcantrill
 
The Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decadeThe Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decadebcantrill
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...BigDataEverywhere
 
Another history of the Web from its architecture
Another history of the Web from its architectureAnother history of the Web from its architecture
Another history of the Web from its architectureAlexandre Monnin
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 
Storing and distributing data
Storing and distributing dataStoring and distributing data
Storing and distributing dataPhil Cryer
 
Web Information Systems Lecture 1: Introduction
Web Information Systems Lecture 1: IntroductionWeb Information Systems Lecture 1: Introduction
Web Information Systems Lecture 1: IntroductionKatrien Verbert
 
Introducing Plan9 from Bell Labs
Introducing Plan9 from Bell LabsIntroducing Plan9 from Bell Labs
Introducing Plan9 from Bell LabsAnant Narayanan
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systemsSri Prasanna
 
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardUsing Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardDocker, Inc.
 
The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
The Peril and Promise of Early Adoption: Arriving 10 Years Early to ContainersThe Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containersbcantrill
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 

Ähnlich wie Bringing the Unix Philosophy to Big Data (20)

Manta: a new internet-facing object storage facility that features compute by...
Manta: a new internet-facing object storage facility that features compute by...Manta: a new internet-facing object storage facility that features compute by...
Manta: a new internet-facing object storage facility that features compute by...
 
The Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of dataThe Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of data
 
The Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decadeThe Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decade
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
 
Plan 9: Not (Only) A Better UNIX
Plan 9: Not (Only) A Better UNIXPlan 9: Not (Only) A Better UNIX
Plan 9: Not (Only) A Better UNIX
 
E3 chap-04
E3 chap-04E3 chap-04
E3 chap-04
 
E3 chap-04
E3 chap-04E3 chap-04
E3 chap-04
 
Another history of the Web from its architecture
Another history of the Web from its architectureAnother history of the Web from its architecture
Another history of the Web from its architecture
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Storing and distributing data
Storing and distributing dataStoring and distributing data
Storing and distributing data
 
Web Information Systems Lecture 1: Introduction
Web Information Systems Lecture 1: IntroductionWeb Information Systems Lecture 1: Introduction
Web Information Systems Lecture 1: Introduction
 
Komputasi Awan
Komputasi AwanKomputasi Awan
Komputasi Awan
 
Introducing Plan9 from Bell Labs
Introducing Plan9 from Bell LabsIntroducing Plan9 from Bell Labs
Introducing Plan9 from Bell Labs
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systems
 
Microkernel Evolution
Microkernel EvolutionMicrokernel Evolution
Microkernel Evolution
 
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardUsing Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
 
PyData Paris 2015 - Closing keynote Francesc Alted
PyData Paris 2015 - Closing keynote Francesc AltedPyData Paris 2015 - Closing keynote Francesc Alted
PyData Paris 2015 - Closing keynote Francesc Alted
 
The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
The Peril and Promise of Early Adoption: Arriving 10 Years Early to ContainersThe Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Computer cluster
Computer clusterComputer cluster
Computer cluster
 

Mehr von bcantrill

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Presentbcantrill
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmakingbcantrill
 
Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...bcantrill
 
I have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsI have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsbcantrill
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systemsbcantrill
 
The Coming Firmware Revolution
The Coming Firmware RevolutionThe Coming Firmware Revolution
The Coming Firmware Revolutionbcantrill
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Agebcantrill
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesbcantrill
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Lawbcantrill
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineeringbcantrill
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarebcantrill
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?bcantrill
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the unionbcantrill
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsbcantrill
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after darkbcantrill
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadershipbcantrill
 
Zebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathZebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathbcantrill
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondbcantrill
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindbcantrill
 
Debugging (Docker) containers in production
Debugging (Docker) containers in productionDebugging (Docker) containers in production
Debugging (Docker) containers in productionbcantrill
 

Mehr von bcantrill (20)

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Present
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmaking
 
Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...
 
I have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsI have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systems
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systems
 
The Coming Firmware Revolution
The Coming Firmware RevolutionThe Coming Firmware Revolution
The Coming Firmware Revolution
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Age
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Law
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system software
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the union
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systems
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after dark
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadership
 
Zebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathZebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data path
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyond
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mind
 
Debugging (Docker) containers in production
Debugging (Docker) containers in productionDebugging (Docker) containers in production
Debugging (Docker) containers in production
 

Kürzlich hochgeladen

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Bringing the Unix Philosophy to Big Data

  • 1. Bringing the Unix Philosophy to Big Data Bryan Cantrill SVP, Engineering bryan@joyent.com @bcantrill
  • 2. Unix • When Unix appeared in the early 1970s, it was not just a new system, but a new way of thinking about systems • Instead of a sealed monolith, the operating system was a collection of small, easily understood programs • First Edition Unix (1971) contained many programs that we still use today (ls, rm, cat, mv) • Its very name conveyed this minimalist aesthetic: Unix is a homophone of “eunuchs” — a castrated Multics We were a bit oppressed by the big system mentality. Ken wanted to do something simple. — Dennis Ritchie
  • 3. Unix: Let there be light • In 1969, Doug McIlroy had the idea of connecting different components: At the same time that Thompson and Ritchie were sketching out a file system, I was sketching out how to do data processing on the blackboard by connecting together cascades of processes • This was the primordial pipe, but it took three years to persuade Thompson to adopt it: And one day I came up with a syntax for the shell that went along with the piping, and Ken said, “I’m going to do it!”
  • 4. Unix: ...and there was light And the next morning we had this orgy of one-liners. — Doug McIlroy
  • 5. The Unix philosophy • The pipe — coupled with the small-system aesthetic — gave rise to the Unix philosophy, as articulated by Doug McIlroy: • • Write programs to work together • • Write programs that do one thing and do it well Write programs that handle text streams, because that is a universal interface Four decades later, this philosophy remains the single most important revolution in software systems thinking!
  • 6. Doug McIlroy v. Don Knuth: FIGHT! • In 1986, Jon Bentley posed the challenge that became the Epic Rap Battle of computer science history: Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies. • Don Knuth’s solution: an elaborate program in WEB, a Pascal-like literate programming system of his own invention, using a purpose-built algorithm • Doug McIlroy’s solution shows the power of the Unix philosophy: tr -cs A-Za-z 'n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed ${1}q
  • 7. Big Data: History repeats itself? • The original Google MapReduce paper (Dean et al., OSDI ’04) poses a problem disturbingly similar to Bentley’s challenge nearly two decades prior: Count of URL Access Frequency: The function processes logs of web page requests and outputs ⟨URL, 1⟩. The reduce function adds together all values for the same URL and emits a ⟨URL, total count⟩ pair • • But the solutions do not adhere to the Unix philosophy... • e.g., Appendix A of the OSDI ’04 paper has a 71 line word count in C++ — with nary a wc in sight ...and nor do they make use of the substantial Unix foundation for data processing
  • 8. Big Data: Challenges • Must be able to scale storage to allow for “big data” — quantities of data that dwarf a single machine • • • Must allow for massively parallel execution Must allow for multi-tenancy To make use of both the Unix philosophy and its toolset, must be able to virtualize the operating system
  • 9. Scaling storage • There are essentially three protocols for scalable storage: block, file and object • Block (i.e., a SAN) is far too low an abstraction — and notoriously expensive to scale • File (i.e., NAS) is too permissive an abstraction — it implies a coherent store for arbitrary (partial) writes, trying (and failing) to be both C and A in CAP • Object (e.g., S3) is similar “enough” to a file-based abstraction, but by not allowing partial writes, allows for proper CAP tradeoffs
  • 10. Object storage • • Object storage systems do not allow for partial updates • A different approach is to have a highly reliable local file system that erasure encodes across local spindles — with entire objects duplicated across nodes for availability • ZFS pioneered both reliability and efficiency of this model with RAID-Z — and has refined it over the past decade of production use • ZFS is one of the four foundational technologies in Joyent’s open source SmartOS For both durability and availability, objects are generally erasure encoded across spindles on different nodes
  • 11. Virtualizing the operating system? • Historically — since the 1960s — systems have been virtualized at the level of hardware • Hardware virtualization has its advantages, but it’s heavyweight: operating systems are not designed to share resources like DRAM, CPU, I/O devices, etc. • One can instead virtualize at the level of the operating system: a single OS kernel that creates lightweight containers — on the metal, but securely partitioned • Pioneered by BSD’s jails; taken to a logical extreme by zones found in Joyent’s SmartOS
  • 12. Idea: ZFS + Zones? • Can we combine the efficiency and reliability of ZFS with the abstraction provided by zones to develop an object store that has compute as a first-class citizen? • ZFS rollback allows for zones to be trashed — simply rollback the zone after compute completes on an object • Add a job scheduling system that allows for both map and reduce phases of distributed work • Would allow for the Unix toolset to be used on arbitrary large amounts of data — unlocking big data one-liners • If it perhaps seems obvious now, it wasn’t at the time...
  • 13. Idea: ZFS + Zones?
  • 14. Manta: ZFS + Zones! • Building a sophisticated distributed system on top of ZFS and zones, we have built Manta, an internet-facing object storage system offering in situ compute • That is, the description of compute can be brought to where objects reside instead of having to backhaul objects to transient compute • The abstractions made available for computation are anything that can run on the OS... • ...and as a reminder, the OS — Unix — was built around the notion of ad hoc unstructured data processing, and allows for remarkably terse expressions of computation
  • 15. Manta: Unix for Big Data • Manta allows for an arbitrarily scalable variant of McIlroy’s solution to Bentley’s challenge: mfind -t o /bcantrill/public/v7/usr/man | mjob create -o -m "tr -cs A-Za-z 'n' | tr A-Z a-z | sort | uniq -c" -r "awk '{ x[$2] += $1 } END { for (w in x) { print x[w] " " w } }' | sort -rn | sed ${1}q" • This description not only terse, it is high performing: data is left at rest — with the “map” phase doing heavy reduction of the data stream • As such, Manta — like Unix — is not merely syntactic sugar; it converges compute and data in a new way
  • 16. Manta: CAP tradeoffs • Eventual consistency represents the wrong CAP tradeoffs for most; we prefer consistency over availability for writes (but still availability for reads) • Many more details: http://dtrace.org/blogs/dap/2013/07/03/fault-tolerance-in-manta/ • Celebrity endorsement:
  • 17. Manta: Other design principles • Hierarchical storage is an excellent idea (ht: Multics); Manta implements proper directories, delimited with a forward slash • Manta implements a snapshot/link hybrid dubbed a snaplink; can be used to effect versioning • • Manta has full support for CORS headers • • Manta SDKs exist for node.js, Java, Ruby, Python Manta uses SSH-based HTTP auth for client-side tooling (IETF draft-cavage-http-signatures-00) “npm install manta” for command line interface
  • 18. Manta and the future of big data • We believe compute/data convergence to be the future of big data: stores of record must support computation as a first-class, in situ operation • We believe that Unix is a natural way of expressing this computation — and that the OS is the right level at which to virtualize to support this securely • We believe that ZFS is the only sane storage substrate underpinning for such a system • Manta will surely not be the only system to represent the confluence of these — but it is the first • We are actively retooling our software stack in terms of Manta — Manta is changing the way we develop software!
  • 19. Manta: More information • Product page: http://joyent.com/products/manta • node.js module: https://github.com/joyent/node-manta • Manta documentation: http://apidocs.joyent.com/manta/ • IRC, e-mail, Twitter, etc.: #manta on freenode, manta@joyent.com, @mcavage, @dapsays, @yunongx, @joyent • Here’s to the orgy of big data one-liners!