SlideShare verwendet Cookies, um die Funktionalität und Leistungsfähigkeit der Webseite zu verbessern und Ihnen relevante Werbung bereitzustellen. Wenn Sie diese Webseite weiter besuchen, erklären Sie sich mit der Verwendung von Cookies auf dieser Seite einverstanden. Lesen Sie bitte unsere Nutzervereinbarung und die Datenschutzrichtlinie.
SlideShare verwendet Cookies, um die Funktionalität und Leistungsfähigkeit der Webseite zu verbessern und Ihnen relevante Werbung bereitzustellen. Wenn Sie diese Webseite weiter besuchen, erklären Sie sich mit der Verwendung von Cookies auf dieser Seite einverstanden. Lesen Sie bitte unsere unsere Datenschutzrichtlinie und die Nutzervereinbarung.
Kick off the presentation by playing the Doug Cutting “Hadoop 10” video. https://www.youtube.com/watch?v=XHz_R33QnsI
In the beginning, the word “Hadoop” referred to just two components. Fast forward a decade, and that word now refers to that “kernel” (aka Core Hadoop) as well as to a growing ecosystem of related projects. In that sense, Hadoop now has much in common with Linux, which is also both a kernel and an ecosystem.
These are the very high-level historical facts about Hadoop. The timeline to follow contains much more detail.
As a very basic explanation, Hadoop was originally an open source implementation of internal systems built by Google in the early ‘00s to deal with the extraordinarily resource-intensive problem of indexing the Internet every night. Those systems were first described in these papers, and Cutting and Cafarella, who faced similar problems with Nutch, took notice of them quickly. (Later, Google also published its “Bigtable” paper, which led other developers to create HBase.) As Cutting puts it, periodically, “Google sends us messages from the future.”
Cutting & Cafarella’s initial implementation of these systems consisted of just 2 components: MapReduce and HDFS.
This timeline is abridged for brevity, but it contains some major milestones.
The rapid expansion of the Hadoop ecosystem is further evidence of its meteoric adoption.
With the expansion of that ecosystem, “Hadoop” has grown much, much bigger than its original “core.”
Even with this history, one has to ask: Why did Hadoop succeed?
What does the future hold for Hadoop? There are many possible permutations, but these are just a couple of the obvious influences going forward.
Regardless of what Hadoop looks like in 10 or 20 years, it’s indisputable that the use cases for it will grow stronger as data volume, variety, and velocity expand. There will be no clearer driver of progress than the ability to translate raw data into actionable insight.
A Decade of Hadoop History on One Slide
Ten years ago, “Hadoop” referred to a scalable, fault-tolerant
filesystem (HDFS) and programming framework (MapReduce)
for distributed computing.
Today, it refers to both a kernel containing the aforementioned
pieces, as well as a constantly evolving ecosystem of 25+ data
stores, execution engines, programming and data access
frameworks, and other componentry.
Recognize this guy?
Fast Historical Facts
• The code that eventually became Hadoop was written by
Doug Cutting and Mike Cafarella, open source developers
working in the search tech community, as part of the
• The word “hadoop” originated with Cutting’s young son,
who owned a plush toy elephant he gave that name.
• Yahoo! was the first user of Hadoop in large-scale
production, and Cutting did early work on Hadoop there.
• Eventually, Cutting joined Cloudera as its chief architect
and remains there to this day.
The Original Inspirations for Hadoop
Hadoop’s Original Architecture
(Data Processing and Resource Management)
Doug Cutting and Mike
Cafarella create Nutch, an
open source web crawler
Google publishes its
“Google File System” paper
(October) Cutting & Cafarella
implement Nutch features
that will become HDFS
Google publishes its
Timeline (Abridged): The Invention Years
Cafarella spearheads an
MapReduce in Nutch
Cutting joins Yahoo!;
starts Hadoop subproject by
carving code from Nutch
Yahoo! creates its first Hadoop
cluster for R&D
Google publishes “Bigtable”
paper, which eventually will
inspire creation of HBase
First Hadoop User Group
meeting (in Palo Alto, CA)
Community contributions begin
to rise steeply
First Apache release of Hadoop
Timeline (Abridged): The Incubation Years
Hadoop becomes a Top Level
(January) Initial publication of Hadoop:
The Definitive Guide, by Tom
Cutting joins Cloudera as its
Inaugural Hadoop World
convenes in New York
Yahoo! launches world’s
largest Hadoop application
Hive, Hadoop’s first SQL
framework, becomes a
Cloudera, first company to
commercialize Hadoop, is
Initial Apache release of Pig
Timeline (Abridged): The Coming-Out Years
The extended Hadoop
community busily builds out
a plethora of new
components (Crunch, Sqoop,
Flume, Oozie, etc) that
extend Hadoop use cases and
HDFS NameNode HA, a
significant new feature for
enterprise adoption, merges
into Hadoop trunk
YARN, another important
advance for adoption,
becomes a Hadoop subproject
Impala, the first native MPP
query engine for Hadoop data,
joins the ecosystem
Spark, the emerging default
execution engine for Hadoop,
becomes a Top Level ASF
Kudu, the first native storage
option for Hadoop since
HBase, joins the ASF Incubator
(as does Impala)
Timeline (Abridged): The Rapid Adoption Years
Why Did Hadoop Succeed?
1. Open source community and license
A large and diverse community of developers has historically made, and continues to
make, the Hadoop ecosystem among the most active and engaged in history, while
the Apache License lowers the barrier to entry for users.
With the possible exception of Linux, no other complex platform has evolved on so
many levels, and so quickly, to meet user requirements over time.
3. A strong focus on systems
The roots of Hadoop are in making distributed computing infrastructure more
accessible by application developers. That continuing focus continues to bear fruit in
areas like resource management and security.
Hadoop’s Next 10 Years
Interest in public-cloud
deployments are driving
native support for them
into the platform.
advances are forcing the
community to re-think
Data sources are more
and diverse (IoT), and
Hadoop will adapt.
The Use Case Only Gets Stronger
Much of the progress we will make in this century
will come from increased understanding of the data
- Doug Cutting