Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Apache Hadoop at 10

2.555 Aufrufe

Veröffentlicht am

A look back at Apache Hadoop's first 10 years, and a peek at things to come #Hadoop10

Veröffentlicht in: Software
  • Loggen Sie sich ein, um Kommentare anzuzeigen.

Apache Hadoop at 10

  1. 1. 1 Apache Hadoop at 10 (+ The Next 10 Years)
  2. 2. 2
  3. 3. 3 A Decade of Hadoop History on One Slide Ten years ago, “Hadoop” referred to a scalable, fault-tolerant filesystem (HDFS) and programming framework (MapReduce) for distributed computing. Today, it refers to both a kernel containing the aforementioned pieces, as well as a constantly evolving ecosystem of 25+ data stores, execution engines, programming and data access frameworks, and other componentry. Recognize this guy?
  4. 4. 4 Fast Historical Facts • The code that eventually became Hadoop was written by Doug Cutting and Mike Cafarella, open source developers working in the search tech community, as part of the Nutch project. • The word “hadoop” originated with Cutting’s young son, who owned a plush toy elephant he gave that name. • Yahoo! was the first user of Hadoop in large-scale production, and Cutting did early work on Hadoop there. • Eventually, Cutting joined Cloudera as its chief architect and remains there to this day.
  5. 5. 5 The Original Inspirations for Hadoop 2003 2004
  6. 6. 6 Hadoop’s Original Architecture MapReduce (Data Processing and Resource Management) HDFS (Filesystem/Storage)
  7. 7. 7 2002 Doug Cutting and Mike Cafarella create Nutch, an open source web crawler (October) Google publishes its “Google File System” paper (October) Cutting & Cafarella implement Nutch features that will become HDFS (June) Google publishes its “MapReduce” paper (October) 2002 2003 2004 Timeline (Abridged): The Invention Years
  8. 8. 8 2002 Cafarella spearheads an implementation of MapReduce in Nutch (February) Cutting joins Yahoo!; starts Hadoop subproject by carving code from Nutch (January) 2005 2006 2007 Yahoo! creates its first Hadoop cluster for R&D (March) Google publishes “Bigtable” paper, which eventually will inspire creation of HBase (November) First Hadoop User Group meeting (in Palo Alto, CA) (October) Community contributions begin to rise steeply First Apache release of Hadoop (April) Timeline (Abridged): The Incubation Years
  9. 9. 9 2002 Hadoop becomes a Top Level ASF project (January) Initial publication of Hadoop: The Definitive Guide, by Tom White (June) 2008 2009 Cutting joins Cloudera as its chief architect (August) Inaugural Hadoop World conference convenes in New York (October) Yahoo! launches world’s largest Hadoop application (February) Hive, Hadoop’s first SQL framework, becomes a Hadoop sub-project (June) Cloudera, first company to commercialize Hadoop, is founded (August) Initial Apache release of Pig (November) Timeline (Abridged): The Coming-Out Years
  10. 10. 10 2002 The extended Hadoop community busily builds out a plethora of new components (Crunch, Sqoop, Flume, Oozie, etc) that extend Hadoop use cases and usability HDFS NameNode HA, a significant new feature for enterprise adoption, merges into Hadoop trunk (March) 2010-11 2012 YARN, another important advance for adoption, becomes a Hadoop subproject (August) Impala, the first native MPP query engine for Hadoop data, joins the ecosystem (October) Spark, the emerging default execution engine for Hadoop, becomes a Top Level ASF project (February) 2013-14 Kudu, the first native storage option for Hadoop since HBase, joins the ASF Incubator (as does Impala) (December) 2015 Timeline (Abridged): The Rapid Adoption Years
  11. 11. 11 2006 2008 2009 2010 2011 2012 2013 Core Hadoop (HDFS, MapReduce) HBase ZooKeeper Solr Pig Core Hadoop Hive Mahout HBase ZooKeeper Solr Pig Core Hadoop Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig Core Hadoop Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop Spark Tez Impala Kafka Drill Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop Parquet Sentry Spark Tez Impala Kafka Drill Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop The stack is continually evolving and growing! 2007 Solr Pig Core Hadoop Knox Flink Parquet Sentry Spark Tez Impala Kafka Drill Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop 2014 2015 Kudu RecordService Ibis Falcon Knox Flink Parquet Sentry Spark Tez Impala Kafka Drill Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop Evolution of the Hadoop Platform
  12. 12. 12 Hadoop Architecture Today
  13. 13. 13 Why Did Hadoop Succeed? 1. Open source community and license A large and diverse community of developers has historically made, and continues to make, the Hadoop ecosystem among the most active and engaged in history, while the Apache License lowers the barrier to entry for users. 2. Extensibility/adaptability With the possible exception of Linux, no other complex platform has evolved on so many levels, and so quickly, to meet user requirements over time. 3. A strong focus on systems The roots of Hadoop are in making distributed computing infrastructure more accessible by application developers. That continuing focus continues to bear fruit in areas like resource management and security.
  14. 14. 14 Hadoop’s Next 10 Years Interest in public-cloud deployments are driving native support for them into the platform. Rapid hardware advances are forcing the community to re-think Hadoop’s foundations. Data sources are more numerous, distributed, and diverse (IoT), and Hadoop will adapt.
  15. 15. 15 The Use Case Only Gets Stronger Much of the progress we will make in this century will come from increased understanding of the data we generate. - Doug Cutting “ ”
  16. 16. 16 Learn More cloudera.com/hadoop10

×