SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Hadoop Introduction
   Background && Installation && Hello world && related
Outline

•   Background
•   Hello world
•   Installation
•   Related




12/20/12           2
Background
• Why Hadoop?
   • Accessible: AWS
   • Robust : handle most such failures
   • Scalable: linearly
   • Simple: 1 == 1 w
• Key Points:
   • Scale-out
   • Moving code to data

12/20/12                                  3
Background: History
• Apache Top Project: Doug Cutting
• Lucence -> Nutch -> Hadoop(2004)
   • Yahoo (1w)
   • Facebook (Hive, Hbase,…)
   • HULU (Hbase)
   • Baidu (3000TB, one week)
   • Twitter (sweat data)


12/20/12                             4
Background
• Comparing SQL database and Hadoop
   • Structure:
      • SQL(structure data, Specific Pattern)
      • Hadoop(Key-value, like Text, Picture)
   • Scale-out <- scale-up
   • Key-Value <- Relation Tables
   • Functional Programming <- Declarative Queries
   • Offline batch processing <- Online (Once
     Write , Read many times)
12/20/12                                         5
Background – Understanding
• Word Count
     • File Size ++ , Memory Leak
     • Disk-Hash Table (More complex)
     • Distributed:
         • Phase 1: Part Processing
         • Phase 2: Merge Results
            • Shuffle the partitions the appropriate machines(AlphaBeta)

     • Now, We have already finish a minimal Hadoop.



12/20/12                                                                   6
Hello World: Word Count
• Two Phase:
     • Mapping: 获取输入数据,并将其装载到 mapper 中
     • Reducing: 处理来自 mapper 的所有输出,产生最终结果。

•   1.1    list(filename, file content)
•   1.2    list(word, 1)
•   2.1    list(word, list(word))
•   2.2    list(word, count)



12/20/12                                     7
Hello World
• mapper.py
• Reducer.py




12/20/12       8
Installation
• Mode:
   • 单机模式( default)
   • 伪分布模式 推荐开发和调试模式
   • 全分布模式
• Configuration:
   • 基本配置
   • Ssh 配置
   • Ubuntu 配置

12/20/12               9
Hadoop Framework
• HDFS:
   • NameNode : 跟踪,指导,记录
   • DataNode :底层 IO 操作
   • Secondary NameNode
• Map Reduce :
   • Job Tracker
   • Task Tracker


12/20/12                   10
Related
• Programming:
   • Java
   • Python
      • Jython ( Translate Python )
      • Hadoop Streaming ( stdin , stdout )
      • Dumbo
      • Happy


12/20/12                                      11
Related
•   Pig: 高级数据流语言
•   Hive: SQL 数据仓库
•   Hbase : Google BigTable , 面向列的数据库
•   ZookKeeper: 共享状态的协同系统
•   Chukwa : 数据收集系统
•   Mahout :数据挖掘与机器学习
•   Hama: 矩阵计算


12/20/12                                12
Resource
• Book:
   • Hadoop In action
   • Hadoop 实战 (第二版)
• Video && Google Course
• URL:
   • 资源收藏




12/20/12                   13
thanks




12/20/12            14

Weitere ähnliche Inhalte

Was ist angesagt?

Google LevelDB Study Discuss
Google LevelDB Study DiscussGoogle LevelDB Study Discuss
Google LevelDB Study Discusseverestsun
 
Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系Wei-Yu Chen
 
Leveldb background
Leveldb backgroundLeveldb background
Leveldb background宗志 陈
 
大資料趨勢介紹與相關使用技術
大資料趨勢介紹與相關使用技術大資料趨勢介紹與相關使用技術
大資料趨勢介紹與相關使用技術Wei-Yu Chen
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验Hanborq Inc.
 
Big Data, NoSQL, and MongoDB
Big Data, NoSQL, and MongoDBBig Data, NoSQL, and MongoDB
Big Data, NoSQL, and MongoDBMonster Supreme
 
Cassandra
CassandraCassandra
CassandraFEG
 
准实时海量数据分析系统架构探究
准实时海量数据分析系统架构探究准实时海量数据分析系统架构探究
准实时海量数据分析系统架构探究Min Zhou
 
開放原始碼 Ch2.4 app - oss - db (ver 1.0)
開放原始碼 Ch2.4   app - oss - db (ver 1.0)開放原始碼 Ch2.4   app - oss - db (ver 1.0)
開放原始碼 Ch2.4 app - oss - db (ver 1.0)My own sweet home!
 
How to plan a hadoop cluster for testing and production environment
How to plan a hadoop cluster for testing and production environmentHow to plan a hadoop cluster for testing and production environment
How to plan a hadoop cluster for testing and production environmentAnna Yen
 
Hbase架构简介、实践
Hbase架构简介、实践Hbase架构简介、实践
Hbase架构简介、实践Li Map
 
redis 适用场景与实现
redis 适用场景与实现redis 适用场景与实现
redis 适用场景与实现iammutex
 
Spark introduction - In Chinese
Spark introduction - In ChineseSpark introduction - In Chinese
Spark introduction - In Chinesecolorant
 
Hadoop-分布式数据平台
Hadoop-分布式数据平台Hadoop-分布式数据平台
Hadoop-分布式数据平台Jacky Chi
 
诗檀软件 Oracle开发优化基础
诗檀软件 Oracle开发优化基础 诗檀软件 Oracle开发优化基础
诗檀软件 Oracle开发优化基础 maclean liu
 
大型网站架构的发展
大型网站架构的发展大型网站架构的发展
大型网站架构的发展Hesey
 

Was ist angesagt? (20)

Google LevelDB Study Discuss
Google LevelDB Study DiscussGoogle LevelDB Study Discuss
Google LevelDB Study Discuss
 
Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系
 
Leveldb background
Leveldb backgroundLeveldb background
Leveldb background
 
大資料趨勢介紹與相關使用技術
大資料趨勢介紹與相關使用技術大資料趨勢介紹與相關使用技術
大資料趨勢介紹與相關使用技術
 
Level db
Level dbLevel db
Level db
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
 
Big Data, NoSQL, and MongoDB
Big Data, NoSQL, and MongoDBBig Data, NoSQL, and MongoDB
Big Data, NoSQL, and MongoDB
 
Cassandra
CassandraCassandra
Cassandra
 
准实时海量数据分析系统架构探究
准实时海量数据分析系统架构探究准实时海量数据分析系统架构探究
准实时海量数据分析系统架构探究
 
開放原始碼 Ch2.4 app - oss - db (ver 1.0)
開放原始碼 Ch2.4   app - oss - db (ver 1.0)開放原始碼 Ch2.4   app - oss - db (ver 1.0)
開放原始碼 Ch2.4 app - oss - db (ver 1.0)
 
How to plan a hadoop cluster for testing and production environment
How to plan a hadoop cluster for testing and production environmentHow to plan a hadoop cluster for testing and production environment
How to plan a hadoop cluster for testing and production environment
 
Hbase架构简介、实践
Hbase架构简介、实践Hbase架构简介、实践
Hbase架构简介、实践
 
redis 适用场景与实现
redis 适用场景与实现redis 适用场景与实现
redis 适用场景与实现
 
Hbase
HbaseHbase
Hbase
 
Spark introduction - In Chinese
Spark introduction - In ChineseSpark introduction - In Chinese
Spark introduction - In Chinese
 
Hadoop-分布式数据平台
Hadoop-分布式数据平台Hadoop-分布式数据平台
Hadoop-分布式数据平台
 
Why use MySQL
Why use MySQLWhy use MySQL
Why use MySQL
 
Zabbix in PPTV
Zabbix in PPTVZabbix in PPTV
Zabbix in PPTV
 
诗檀软件 Oracle开发优化基础
诗檀软件 Oracle开发优化基础 诗檀软件 Oracle开发优化基础
诗檀软件 Oracle开发优化基础
 
大型网站架构的发展
大型网站架构的发展大型网站架构的发展
大型网站架构的发展
 

Andere mochten auch

The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of HadoopNam Nham
 
Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Jean-Pierre König
 
Migration from FAST ESP to Solr
Migration from FAST ESP to SolrMigration from FAST ESP to Solr
Migration from FAST ESP to SolrTNR Global
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopGrant Ingersoll
 

Andere mochten auch (6)

Hadoop 2
Hadoop 2Hadoop 2
Hadoop 2
 
The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of Hadoop
 
Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013
 
Migration from FAST ESP to Solr
Migration from FAST ESP to SolrMigration from FAST ESP to Solr
Migration from FAST ESP to Solr
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr Hadoop
 

Ähnlich wie Hadoop introduction

What could hadoop do for us
What could hadoop do for us What could hadoop do for us
What could hadoop do for us Simon Hsu
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验Schubert Zhang
 
Hadoop 2.0 之古往今來
Hadoop 2.0 之古往今來Hadoop 2.0 之古往今來
Hadoop 2.0 之古往今來Wei-Yu Chen
 
Hadoop con 2015 hadoop enables enterprise data lake
Hadoop con 2015   hadoop enables enterprise data lakeHadoop con 2015   hadoop enables enterprise data lake
Hadoop con 2015 hadoop enables enterprise data lakeJames Chen
 
How We Prepared Etu Hadoop Competition 2014
How We Prepared Etu Hadoop Competition 2014How We Prepared Etu Hadoop Competition 2014
How We Prepared Etu Hadoop Competition 2014Yuen-Kuei Hsueh
 
Log collection
Log collectionLog collection
Log collectionFEG
 
大规模数据处理
大规模数据处理大规模数据处理
大规模数据处理Kay Yan
 
大规模数据处理
大规模数据处理大规模数据处理
大规模数据处理airsex
 
Qcon2013 罗李 - hadoop在阿里
Qcon2013 罗李 - hadoop在阿里Qcon2013 罗李 - hadoop在阿里
Qcon2013 罗李 - hadoop在阿里li luo
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataYuHsuan Chen
 
Hadoop Deployment Model @ OSDC.TW
Hadoop Deployment Model @ OSDC.TWHadoop Deployment Model @ OSDC.TW
Hadoop Deployment Model @ OSDC.TWJazz Yao-Tsung Wang
 
HDInsight for Microsoft Users
HDInsight for Microsoft UsersHDInsight for Microsoft Users
HDInsight for Microsoft UsersKuo-Chun Su
 
罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群hdhappy001
 
HDFS與MapReduce架構研討
HDFS與MapReduce架構研討HDFS與MapReduce架構研討
HDFS與MapReduce架構研討Billy Yang
 
Hadoop与数据分析
Hadoop与数据分析Hadoop与数据分析
Hadoop与数据分析George Ang
 

Ähnlich wie Hadoop introduction (20)

What could hadoop do for us
What could hadoop do for us What could hadoop do for us
What could hadoop do for us
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
 
Hadoop 2.0 之古往今來
Hadoop 2.0 之古往今來Hadoop 2.0 之古往今來
Hadoop 2.0 之古往今來
 
Hadoop con 2015 hadoop enables enterprise data lake
Hadoop con 2015   hadoop enables enterprise data lakeHadoop con 2015   hadoop enables enterprise data lake
Hadoop con 2015 hadoop enables enterprise data lake
 
How We Prepared Etu Hadoop Competition 2014
How We Prepared Etu Hadoop Competition 2014How We Prepared Etu Hadoop Competition 2014
How We Prepared Etu Hadoop Competition 2014
 
Log collection
Log collectionLog collection
Log collection
 
Hdfs
HdfsHdfs
Hdfs
 
Hdfs
HdfsHdfs
Hdfs
 
大规模数据处理
大规模数据处理大规模数据处理
大规模数据处理
 
大规模数据处理
大规模数据处理大规模数据处理
大规模数据处理
 
Qcon2013 罗李 - hadoop在阿里
Qcon2013 罗李 - hadoop在阿里Qcon2013 罗李 - hadoop在阿里
Qcon2013 罗李 - hadoop在阿里
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Hadoop 介紹 20141024
Hadoop 介紹 20141024Hadoop 介紹 20141024
Hadoop 介紹 20141024
 
大數據
大數據大數據
大數據
 
Hadoop Deployment Model @ OSDC.TW
Hadoop Deployment Model @ OSDC.TWHadoop Deployment Model @ OSDC.TW
Hadoop Deployment Model @ OSDC.TW
 
HDInsight for Microsoft Users
HDInsight for Microsoft UsersHDInsight for Microsoft Users
HDInsight for Microsoft Users
 
罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群
 
HDFS與MapReduce架構研討
HDFS與MapReduce架構研討HDFS與MapReduce架構研討
HDFS與MapReduce架構研討
 
Hadoop与数据分析
Hadoop与数据分析Hadoop与数据分析
Hadoop与数据分析
 
Mapreduce
MapreduceMapreduce
Mapreduce
 

Mehr von Tianwei Liu

2021 ee大会-旷视ai产品背后的研发效能工具建设
2021 ee大会-旷视ai产品背后的研发效能工具建设2021 ee大会-旷视ai产品背后的研发效能工具建设
2021 ee大会-旷视ai产品背后的研发效能工具建设Tianwei Liu
 
2020 gops-旷视城市大脑私有云平台实践-刘天伟
2020 gops-旷视城市大脑私有云平台实践-刘天伟2020 gops-旷视城市大脑私有云平台实践-刘天伟
2020 gops-旷视城市大脑私有云平台实践-刘天伟Tianwei Liu
 
豆瓣Paa s平台 dae - 2017
豆瓣Paa s平台 dae - 2017豆瓣Paa s平台 dae - 2017
豆瓣Paa s平台 dae - 2017Tianwei Liu
 
douban happyday docker for daeqaci
douban happyday docker for daeqacidouban happyday docker for daeqaci
douban happyday docker for daeqaciTianwei Liu
 
DAE 新变化介绍
DAE 新变化介绍DAE 新变化介绍
DAE 新变化介绍Tianwei Liu
 
Docker在豆瓣的实践 刘天伟-20160709
Docker在豆瓣的实践 刘天伟-20160709Docker在豆瓣的实践 刘天伟-20160709
Docker在豆瓣的实践 刘天伟-20160709Tianwei Liu
 
Mr&ueh数据库方面
Mr&ueh数据库方面Mr&ueh数据库方面
Mr&ueh数据库方面Tianwei Liu
 
Kmeans in-hadoop
Kmeans in-hadoopKmeans in-hadoop
Kmeans in-hadoopTianwei Liu
 
Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2Tianwei Liu
 

Mehr von Tianwei Liu (11)

2021 ee大会-旷视ai产品背后的研发效能工具建设
2021 ee大会-旷视ai产品背后的研发效能工具建设2021 ee大会-旷视ai产品背后的研发效能工具建设
2021 ee大会-旷视ai产品背后的研发效能工具建设
 
2020 gops-旷视城市大脑私有云平台实践-刘天伟
2020 gops-旷视城市大脑私有云平台实践-刘天伟2020 gops-旷视城市大脑私有云平台实践-刘天伟
2020 gops-旷视城市大脑私有云平台实践-刘天伟
 
豆瓣Paa s平台 dae - 2017
豆瓣Paa s平台 dae - 2017豆瓣Paa s平台 dae - 2017
豆瓣Paa s平台 dae - 2017
 
douban happyday docker for daeqaci
douban happyday docker for daeqacidouban happyday docker for daeqaci
douban happyday docker for daeqaci
 
DAE 新变化介绍
DAE 新变化介绍DAE 新变化介绍
DAE 新变化介绍
 
Docker在豆瓣的实践 刘天伟-20160709
Docker在豆瓣的实践 刘天伟-20160709Docker在豆瓣的实践 刘天伟-20160709
Docker在豆瓣的实践 刘天伟-20160709
 
Mr&ueh数据库方面
Mr&ueh数据库方面Mr&ueh数据库方面
Mr&ueh数据库方面
 
Mr
MrMr
Mr
 
Kmeans in-hadoop
Kmeans in-hadoopKmeans in-hadoop
Kmeans in-hadoop
 
Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2
 
Ueh
UehUeh
Ueh
 

Hadoop introduction

  • 1. Hadoop Introduction Background && Installation && Hello world && related
  • 2. Outline • Background • Hello world • Installation • Related 12/20/12 2
  • 3. Background • Why Hadoop? • Accessible: AWS • Robust : handle most such failures • Scalable: linearly • Simple: 1 == 1 w • Key Points: • Scale-out • Moving code to data 12/20/12 3
  • 4. Background: History • Apache Top Project: Doug Cutting • Lucence -> Nutch -> Hadoop(2004) • Yahoo (1w) • Facebook (Hive, Hbase,…) • HULU (Hbase) • Baidu (3000TB, one week) • Twitter (sweat data) 12/20/12 4
  • 5. Background • Comparing SQL database and Hadoop • Structure: • SQL(structure data, Specific Pattern) • Hadoop(Key-value, like Text, Picture) • Scale-out <- scale-up • Key-Value <- Relation Tables • Functional Programming <- Declarative Queries • Offline batch processing <- Online (Once Write , Read many times) 12/20/12 5
  • 6. Background – Understanding • Word Count • File Size ++ , Memory Leak • Disk-Hash Table (More complex) • Distributed: • Phase 1: Part Processing • Phase 2: Merge Results • Shuffle the partitions the appropriate machines(AlphaBeta) • Now, We have already finish a minimal Hadoop. 12/20/12 6
  • 7. Hello World: Word Count • Two Phase: • Mapping: 获取输入数据,并将其装载到 mapper 中 • Reducing: 处理来自 mapper 的所有输出,产生最终结果。 • 1.1 list(filename, file content) • 1.2 list(word, 1) • 2.1 list(word, list(word)) • 2.2 list(word, count) 12/20/12 7
  • 8. Hello World • mapper.py • Reducer.py 12/20/12 8
  • 9. Installation • Mode: • 单机模式( default) • 伪分布模式 推荐开发和调试模式 • 全分布模式 • Configuration: • 基本配置 • Ssh 配置 • Ubuntu 配置 12/20/12 9
  • 10. Hadoop Framework • HDFS: • NameNode : 跟踪,指导,记录 • DataNode :底层 IO 操作 • Secondary NameNode • Map Reduce : • Job Tracker • Task Tracker 12/20/12 10
  • 11. Related • Programming: • Java • Python • Jython ( Translate Python ) • Hadoop Streaming ( stdin , stdout ) • Dumbo • Happy 12/20/12 11
  • 12. Related • Pig: 高级数据流语言 • Hive: SQL 数据仓库 • Hbase : Google BigTable , 面向列的数据库 • ZookKeeper: 共享状态的协同系统 • Chukwa : 数据收集系统 • Mahout :数据挖掘与机器学习 • Hama: 矩阵计算 12/20/12 12
  • 13. Resource • Book: • Hadoop In action • Hadoop 实战 (第二版) • Video && Google Course • URL: • 资源收藏 12/20/12 13

Hinweis der Redaktion

  1. 素材天下 sucaitianxia.com