SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Presented By
 Framework for running applications on large clusters of commodity
hardware
 Scale: petabytes of data on thousands of nodes
 Include
 Storage: HDFS
 Processing: MapReduce
 Support the Map/Reduce programming model
 Requirements
 Economy: use cluster of comodity computers
 Easy to use
 Users: no need to deal with the complexity of distributed computing
 Reliable: can handle node failures automatically
http://www.kellytechno.com
 Implemented in Java
 Apache Top Level Project
 http://hadoop.apache.org/core/
 Core (15 Committers)
 HDFS
 MapReduce
 Community of contributors is growing
 Though mostly Yahoo for HDFS and MapReduce
 You can contribute too!
http://www.kellytechno.com
 Commodity HW
 Add inexpensive servers
 Storage servers and their disks are not assumed to be highly reliable and
available
 Use replication across servers to deal with unreliable storage/servers
 Metadata-data separation - simple design
 Namenode maintains metadata
 Datanodes manage storage
 Slightly Restricted file semantics
 Focus is mostly sequential access
 Single writers
 No file locking features
 Support for moving computation close to data
 Servers have 2 purposes: data storage and computation
 Single ‘storage + compute’ cluster vs. Separate clusters
http://www.kellytechno.com
Data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Results
Data data data data
Data data data data
Data data data data
Data data data data
Data data data data
Data data data data
Data data data data
Data data data data
Data data data data
Hadoop Cluster
DFS Block 1
DFS Block 1
DFS Block 2
DFS Block 2
DFS Block 2
DFS Block 1
DFS Block 3
DFS Block 3
DFS Block 3
MAP
MAP
MAP
Reduce
http://www.kellytechno.com
 Data is organized into files and directories
 Files are divided into uniform sized blocks and
distributed across cluster nodes
 Blocks are replicated to handle hardware
failure
 Filesystem keeps checksums of data for
corruption detection and recovery
 HDFS exposes block placement so that
computation can be migrated to data
http://www.kellytechno.com
NameNode(Filename, replicationFactor, block-ids, …)
/users/user1/data/part-0, r:2, {1,3}, …
/users/user1/data/part-1, r:3, {2,4,5}, …
Datanodes
1 1
3
3
2
2
2
4
4
4
55
5
http://www.kellytechno.com
 Master-Slave architecture
 DFS Master “Namenode”
 Manages the filesystem namespace
 Maintain file name to list blocks + location mapping
 Manages block allocation/replication
 Checkpoints namespace and journals namespace changes
for reliability
 Control access to namespace
 DFS Slaves “Datanodes” handle block storage
 Stores blocks using the underlying OS’s files
 Clients access the blocks directly from datanodes
 Periodically sends block reports to Namenode
 Periodically check block integrity
http://www.kellytechno.com
Read
Metadata ops
Client
Metadata (Name, #replicas, …):
/users/foo/data, 3, …
Namenode
Client
Datanodes
Rack 1 Rack 2
Replication
Block ops
Datanodes
Write
Blocks
http://www.kellytechno.com
 A file’s replication factor can be set per file (default 3)
 Block placement is rack aware
 Guarantee placement on two racks
 1st replica is on local node, 2rd/3rd replicas are on remote rack
 Avoid hot spots: balance I/O traffic
 Writes are pipelined to block replicas
 Minimize bandwidth usage
 Overlapping disk writes and network writes
 Reads are from the nearest replica
 Block under-replication & over-replication is detected by Namenode
 Balancer application rebalances blocks to balance DN utilization
http://www.kellytechno.com
 Scale cluster size
 Scale number of clients
 Scale namespace size (total number of files,
amount of data)
 Possible solutions
 Multiple namenodes
 Read-only secondary namenode
 Separate cluster management and namespace management
 Dynamic Partition namespace
 Mounting
http://www.kellytechno.com
 Map/Reduce is a programming model for efficient
distributed computing
 It works like a Unix pipeline:
 cat input | grep | sort
| uniq -c | cat > output
 Input | Map | Shuffle & Sort | Reduce |
Output
 A simple model but good for a lot of applications
 Log processing
 Web index building
http://www.kellytechno.com
http://www.kellytechno.com
 Mapper
 Input: value: lines of text of input
 Output: key: word, value: 1
 Reducer
 Input: key: word, value: set of counts
 Output: key: word, value: sum
 Launching program
 Defines the job
 Submits job to cluster
http://www.kellytechno.com
 Fine grained Map and Reduce tasks
 Improved load balancing
 Faster recovery from failed tasks
 Automatic re-execution on failure
 In a large cluster, some nodes are always slow or flaky
 Framework re-executes failed tasks
 Locality optimizations
 With large data, bandwidth to data is a problem
 Map-Reduce + HDFS is a very effective solution
 Map-Reduce queries HDFS for locations of input data
 Map tasks are scheduled close to the inputs when
possible
http://www.kellytechno.com
• Hadoop Wiki
– Introduction
• http://hadoop.apache.org/core/
– Getting Started
• http://wiki.apache.org/hadoop/GettingStartedWithHadoop
– Map/Reduce Overview
• http://wiki.apache.org/hadoop/HadoopMapReduce
– DFS
• http://hadoop.apache.org/core/docs/current/hdfs_design.html
• Javadoc
– http://hadoop.apache.org/core/docs/current/api/index.html
http://www.kellytechno.com
Thank you!
http://www.kellytechno.com
Presented
By

Weitere ähnliche Inhalte

Was ist angesagt? (18)

Unit 1
Unit 1Unit 1
Unit 1
 
2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfs
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Hadoop ppt2
Hadoop ppt2Hadoop ppt2
Hadoop ppt2
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
Map Reduce basics
Map Reduce basicsMap Reduce basics
Map Reduce basics
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Hadoop - Introduction to mapreduce
Hadoop -  Introduction to mapreduceHadoop -  Introduction to mapreduce
Hadoop - Introduction to mapreduce
 
Snapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File SystemSnapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File System
 
Cppt
CpptCppt
Cppt
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Sector Vs Hadoop
Sector Vs HadoopSector Vs Hadoop
Sector Vs Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 

Andere mochten auch

Guía ahorro energetico
Guía ahorro energeticoGuía ahorro energetico
Guía ahorro energeticoAMPAGoya
 
Importancia de la internet en la educacion
Importancia de la internet en la educacionImportancia de la internet en la educacion
Importancia de la internet en la educacionFernando Marmolejos Mesa
 
Thesis Presentation01
Thesis Presentation01Thesis Presentation01
Thesis Presentation01Oylum Boran
 
Trabajo grupal de fisica
Trabajo grupal de fisicaTrabajo grupal de fisica
Trabajo grupal de fisicamishel villegas
 
Bases + inscripciones concurso yo no lo compro ygualarte 2017
Bases + inscripciones concurso yo no lo compro ygualarte 2017Bases + inscripciones concurso yo no lo compro ygualarte 2017
Bases + inscripciones concurso yo no lo compro ygualarte 2017AMPAGoya
 
Generation of electricity_from_coal_&_dts
Generation of electricity_from_coal_&_dtsGeneration of electricity_from_coal_&_dts
Generation of electricity_from_coal_&_dtsKamal Kaushik
 
Contes bojos presentacio power
Contes bojos presentacio powerContes bojos presentacio power
Contes bojos presentacio powerorcatac
 
Research Method for Business chapter 6
Research Method for Business chapter  6Research Method for Business chapter  6
Research Method for Business chapter 6Mazhar Poohlah
 
Virtual Reality & Marketing Part 3 of 3
Virtual Reality &  Marketing Part 3 of 3Virtual Reality &  Marketing Part 3 of 3
Virtual Reality & Marketing Part 3 of 3Jennifer Irene Guevara
 
Research Method for Business chapter 3
Research Method for Business chapter 3Research Method for Business chapter 3
Research Method for Business chapter 3Mazhar Poohlah
 
Overview of-device-regulation-fda-san-diego-ca
Overview of-device-regulation-fda-san-diego-caOverview of-device-regulation-fda-san-diego-ca
Overview of-device-regulation-fda-san-diego-caGlobalCompliancePanel
 

Andere mochten auch (20)

RewardTrax Overview
RewardTrax OverviewRewardTrax Overview
RewardTrax Overview
 
Bab1
Bab1Bab1
Bab1
 
CartaBAT
CartaBATCartaBAT
CartaBAT
 
Guía ahorro energetico
Guía ahorro energeticoGuía ahorro energetico
Guía ahorro energetico
 
Debere prezi
Debere preziDebere prezi
Debere prezi
 
Spain E Book
Spain E BookSpain E Book
Spain E Book
 
Importancia de la internet en la educacion
Importancia de la internet en la educacionImportancia de la internet en la educacion
Importancia de la internet en la educacion
 
Furacões
FuracõesFuracões
Furacões
 
Thesis Presentation01
Thesis Presentation01Thesis Presentation01
Thesis Presentation01
 
Trabajo grupal de fisica
Trabajo grupal de fisicaTrabajo grupal de fisica
Trabajo grupal de fisica
 
Protocolos
ProtocolosProtocolos
Protocolos
 
Bases + inscripciones concurso yo no lo compro ygualarte 2017
Bases + inscripciones concurso yo no lo compro ygualarte 2017Bases + inscripciones concurso yo no lo compro ygualarte 2017
Bases + inscripciones concurso yo no lo compro ygualarte 2017
 
Yoga Meditation in 9 steps
Yoga Meditation in 9 stepsYoga Meditation in 9 steps
Yoga Meditation in 9 steps
 
Generation of electricity_from_coal_&_dts
Generation of electricity_from_coal_&_dtsGeneration of electricity_from_coal_&_dts
Generation of electricity_from_coal_&_dts
 
Contes bojos presentacio power
Contes bojos presentacio powerContes bojos presentacio power
Contes bojos presentacio power
 
Research Method for Business chapter 6
Research Method for Business chapter  6Research Method for Business chapter  6
Research Method for Business chapter 6
 
Lo scenario del loyalty marketing: retail e industria a confronto (C.Ziliani)
Lo scenario del loyalty marketing: retail e industria a confronto (C.Ziliani)Lo scenario del loyalty marketing: retail e industria a confronto (C.Ziliani)
Lo scenario del loyalty marketing: retail e industria a confronto (C.Ziliani)
 
Virtual Reality & Marketing Part 3 of 3
Virtual Reality &  Marketing Part 3 of 3Virtual Reality &  Marketing Part 3 of 3
Virtual Reality & Marketing Part 3 of 3
 
Research Method for Business chapter 3
Research Method for Business chapter 3Research Method for Business chapter 3
Research Method for Business chapter 3
 
Overview of-device-regulation-fda-san-diego-ca
Overview of-device-regulation-fda-san-diego-caOverview of-device-regulation-fda-san-diego-ca
Overview of-device-regulation-fda-san-diego-ca
 

Ähnlich wie Hadoop training in bangalore-kellytechnologies

Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hivesrikanthhadoop
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overviewharithakannan
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Hadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiHadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiUnmesh Baile
 
Hadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiHadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiUnmesh Baile
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
assignment3
assignment3assignment3
assignment3Kirti J
 

Ähnlich wie Hadoop training in bangalore-kellytechnologies (20)

Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hive
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
HADOOP
HADOOPHADOOP
HADOOP
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiHadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbai
 
Hadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiHadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbai
 
HadoopDB
HadoopDBHadoopDB
HadoopDB
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
HDFS.ppt
HDFS.pptHDFS.ppt
HDFS.ppt
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
assignment3
assignment3assignment3
assignment3
 

Kürzlich hochgeladen

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 

Kürzlich hochgeladen (20)

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 

Hadoop training in bangalore-kellytechnologies

  • 2.  Framework for running applications on large clusters of commodity hardware  Scale: petabytes of data on thousands of nodes  Include  Storage: HDFS  Processing: MapReduce  Support the Map/Reduce programming model  Requirements  Economy: use cluster of comodity computers  Easy to use  Users: no need to deal with the complexity of distributed computing  Reliable: can handle node failures automatically http://www.kellytechno.com
  • 3.  Implemented in Java  Apache Top Level Project  http://hadoop.apache.org/core/  Core (15 Committers)  HDFS  MapReduce  Community of contributors is growing  Though mostly Yahoo for HDFS and MapReduce  You can contribute too! http://www.kellytechno.com
  • 4.  Commodity HW  Add inexpensive servers  Storage servers and their disks are not assumed to be highly reliable and available  Use replication across servers to deal with unreliable storage/servers  Metadata-data separation - simple design  Namenode maintains metadata  Datanodes manage storage  Slightly Restricted file semantics  Focus is mostly sequential access  Single writers  No file locking features  Support for moving computation close to data  Servers have 2 purposes: data storage and computation  Single ‘storage + compute’ cluster vs. Separate clusters http://www.kellytechno.com
  • 5. Data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Results Data data data data Data data data data Data data data data Data data data data Data data data data Data data data data Data data data data Data data data data Data data data data Hadoop Cluster DFS Block 1 DFS Block 1 DFS Block 2 DFS Block 2 DFS Block 2 DFS Block 1 DFS Block 3 DFS Block 3 DFS Block 3 MAP MAP MAP Reduce http://www.kellytechno.com
  • 6.  Data is organized into files and directories  Files are divided into uniform sized blocks and distributed across cluster nodes  Blocks are replicated to handle hardware failure  Filesystem keeps checksums of data for corruption detection and recovery  HDFS exposes block placement so that computation can be migrated to data http://www.kellytechno.com
  • 7. NameNode(Filename, replicationFactor, block-ids, …) /users/user1/data/part-0, r:2, {1,3}, … /users/user1/data/part-1, r:3, {2,4,5}, … Datanodes 1 1 3 3 2 2 2 4 4 4 55 5 http://www.kellytechno.com
  • 8.  Master-Slave architecture  DFS Master “Namenode”  Manages the filesystem namespace  Maintain file name to list blocks + location mapping  Manages block allocation/replication  Checkpoints namespace and journals namespace changes for reliability  Control access to namespace  DFS Slaves “Datanodes” handle block storage  Stores blocks using the underlying OS’s files  Clients access the blocks directly from datanodes  Periodically sends block reports to Namenode  Periodically check block integrity http://www.kellytechno.com
  • 9. Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data, 3, … Namenode Client Datanodes Rack 1 Rack 2 Replication Block ops Datanodes Write Blocks http://www.kellytechno.com
  • 10.  A file’s replication factor can be set per file (default 3)  Block placement is rack aware  Guarantee placement on two racks  1st replica is on local node, 2rd/3rd replicas are on remote rack  Avoid hot spots: balance I/O traffic  Writes are pipelined to block replicas  Minimize bandwidth usage  Overlapping disk writes and network writes  Reads are from the nearest replica  Block under-replication & over-replication is detected by Namenode  Balancer application rebalances blocks to balance DN utilization http://www.kellytechno.com
  • 11.  Scale cluster size  Scale number of clients  Scale namespace size (total number of files, amount of data)  Possible solutions  Multiple namenodes  Read-only secondary namenode  Separate cluster management and namespace management  Dynamic Partition namespace  Mounting http://www.kellytechno.com
  • 12.  Map/Reduce is a programming model for efficient distributed computing  It works like a Unix pipeline:  cat input | grep | sort | uniq -c | cat > output  Input | Map | Shuffle & Sort | Reduce | Output  A simple model but good for a lot of applications  Log processing  Web index building http://www.kellytechno.com
  • 14.  Mapper  Input: value: lines of text of input  Output: key: word, value: 1  Reducer  Input: key: word, value: set of counts  Output: key: word, value: sum  Launching program  Defines the job  Submits job to cluster http://www.kellytechno.com
  • 15.  Fine grained Map and Reduce tasks  Improved load balancing  Faster recovery from failed tasks  Automatic re-execution on failure  In a large cluster, some nodes are always slow or flaky  Framework re-executes failed tasks  Locality optimizations  With large data, bandwidth to data is a problem  Map-Reduce + HDFS is a very effective solution  Map-Reduce queries HDFS for locations of input data  Map tasks are scheduled close to the inputs when possible http://www.kellytechno.com
  • 16. • Hadoop Wiki – Introduction • http://hadoop.apache.org/core/ – Getting Started • http://wiki.apache.org/hadoop/GettingStartedWithHadoop – Map/Reduce Overview • http://wiki.apache.org/hadoop/HadoopMapReduce – DFS • http://hadoop.apache.org/core/docs/current/hdfs_design.html • Javadoc – http://hadoop.apache.org/core/docs/current/api/index.html http://www.kellytechno.com

Hinweis der Redaktion

  1. 60M objects on 16G machine (e.g. 20M files with 2 blocks each)