SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Big Data and Hadoop

Presenter
Rajkumar Singh
http://rajkrrsingh.blogspot.com/
http://in.linkedin.com/in/rajkrrsingh
Big Data and Hadoop Introduction
Volume

Variety

Velocity

Facebook
Google Plus
Twitter
LinkedIn
Stock Exchange
Healthcare
Telecom

Structured,SemiStructured,unstructured

Facebook
Stock Exchange
Healthcare
Telecom
Mobile Devices
GPS
Security Infrastructure
The Problem

e.g. Stock Market
The Solution (Hadoop Evolution)
Traditional Approach
GB->TB->PB--ZB
so the processing with RDBMS is Impossible
Challenges In Big data
• Storage -- PB
• Processing – In a timely manner
• Variety of data -- S/SS/US
• Cost
To overcome Big Data Challenges
Hadoop evolves
• Cost Effective – Commodity HW
• Big Cluster – (1000 Nodes) --- Provides Storage n Processing
• Parallel Processing – Map reduce
• Big Storage – Memory per node * no of Nodes / RF
• Fail over mechanism – Automatic Failover
• Data Distribution
• Map Reduce Framework
• Moving Code to data
• Heterogeneous Hardware System (IBM,HP,AIX,Oracle Machine of
any memory and CPU configuration)
• Scalable
Typical Hadoop Infrastructure
What is Hadoop
•

Java Framework to Process erroneous amount of data

Hadoop Core
• HDFS
• Programming Construct (Map Reduce)
HDFS
Processing Framework (Mapreduce)
Hadoop Ecosystem
Hadoop Sub-Projects
• Hadoop Common: The common utilities that support the other Hadoop subprojects.
• Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to
application data.
• Hadoop MapReduce: A software framework for distributed processing of large data sets on compute
clusters.
Other Hadoop-related projects at Apache include:
• Avro™: A data serialization system.
• Cassandra™: A scalable multi-master database with no single points of failure.
• Chukwa™: A data collection system for managing large distributed systems.
• HBase™: A scalable, distributed database that supports structured data storage for large tables.
• Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.
• Mahout™: A Scalable machine learning and data mining library.
• Pig™: A high-level data-flow language and execution framework for parallel computation.
• ZooKeeper™: A high-performance coordination service for distributed applications. 
HDFS

250 GB

DFS

250 GB

1 TB File

250 GB

Based on GFS
250 GB
HDFS : Use Cases

• Very large file.
• Reading/Streaming Data Access.
Read data in large volume
Write once and Read frequent

• Expensive Hardware.
• Low latency Access.
• Lots of small files
• Parallel write/ Arbitrary Read
HDFS Building Blocks
Default Block Size
64MB
128MB

1GB file = 1024 MB/128 MB = 8 Blocks

For Small File Size
100 MB File < Block Size (128 MB) : Optimize for storage = 1 Block of
HDFS of size 100 MB
HDFS Daemon Services
• Name Node
• Secondary Name Node
• Data Node

GFS (Master/Slave Architecture)
HDFS Write
File 1: D1,D2,D4
File 2: D1,D2,D3

128 MB
RF = 3

D1

D1,D2,D4

D2

D3

D4
HDFS File System Commands
HDFS Federation
High Availability
Copying Data from one Cluster to another
Cluster

UAT Cluster

Prod Cluster

Parallel copying using distcp

hadoop distcp hdfs://uat:54311/user/rajkrrsingh/input hdfs://prod:54311/user/rajkrrsingh/input

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Uwe Printz
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemCloudera, Inc.
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentationArvind Kumar
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreModern Data Stack France
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce introGeoff Hendrey
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Column Stores and Google BigQuery
Column Stores and Google BigQueryColumn Stores and Google BigQuery
Column Stores and Google BigQueryCsaba Toth
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
 

Was ist angesagt? (20)

Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
Hadoop
Hadoop Hadoop
Hadoop
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
 
Apache Hadoop at 10
Apache Hadoop at 10Apache Hadoop at 10
Apache Hadoop at 10
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Column Stores and Google BigQuery
Column Stores and Google BigQueryColumn Stores and Google BigQuery
Column Stores and Google BigQuery
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
Nextag talk
Nextag talkNextag talk
Nextag talk
 
Hadoop Ecosystem Overview
Hadoop Ecosystem OverviewHadoop Ecosystem Overview
Hadoop Ecosystem Overview
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 

Andere mochten auch

Managing Big data using Hadoop Map Reduce in Telecom Domain
Managing Big data using Hadoop Map Reduce in Telecom DomainManaging Big data using Hadoop Map Reduce in Telecom Domain
Managing Big data using Hadoop Map Reduce in Telecom DomainAM Publications
 
Harnessing Big Data in Real-Time
Harnessing Big Data in Real-TimeHarnessing Big Data in Real-Time
Harnessing Big Data in Real-TimeDataWorks Summit
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom IndustryCloudera, Inc.
 
Hadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom IndustryHadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom IndustryDataWorks Summit
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystemDataiku
 
The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersZohar Elkayam
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceNeev Technologies
 
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010Cloudera, Inc.
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystemtfmailru
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersCleverence Kombe
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environmentDelhi/NCR HUG
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Uwe Printz
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Senthil Kumar
 
Hadoop Map Reduce 程式設計
Hadoop Map Reduce 程式設計Hadoop Map Reduce 程式設計
Hadoop Map Reduce 程式設計Wei-Yu Chen
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-ReduceBrendan Tierney
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexasArvind Prabhakar
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Edureka!
 

Andere mochten auch (20)

Managing Big data using Hadoop Map Reduce in Telecom Domain
Managing Big data using Hadoop Map Reduce in Telecom DomainManaging Big data using Hadoop Map Reduce in Telecom Domain
Managing Big data using Hadoop Map Reduce in Telecom Domain
 
Harnessing Big Data in Real-Time
Harnessing Big Data in Real-TimeHarnessing Big Data in Real-Time
Harnessing Big Data in Real-Time
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
 
Hadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom IndustryHadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom Industry
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystem
 
The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for Developers
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clusters
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
 
Hadoop Map Reduce 程式設計
Hadoop Map Reduce 程式設計Hadoop Map Reduce 程式設計
Hadoop Map Reduce 程式設計
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-Reduce
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexas
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
 

Ähnlich wie Big Data and Hadoop Introduction

Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברגTaldor Group
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoopyaevents
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemVaibhav Jain
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big DataJoe Alex
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015 clairvoyantllc
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 

Ähnlich wie Big Data and Hadoop Introduction (20)

Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoop
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Bigdata
BigdataBigdata
Bigdata
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Big Data and Hadoop Training in Chandigarh
Big Data and Hadoop Training in ChandigarhBig Data and Hadoop Training in Chandigarh
Big Data and Hadoop Training in Chandigarh
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 

Kürzlich hochgeladen

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 

Kürzlich hochgeladen (20)

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 

Big Data and Hadoop Introduction

  • 1. Big Data and Hadoop Presenter Rajkumar Singh http://rajkrrsingh.blogspot.com/ http://in.linkedin.com/in/rajkrrsingh
  • 2. Big Data and Hadoop Introduction Volume Variety Velocity Facebook Google Plus Twitter LinkedIn Stock Exchange Healthcare Telecom Structured,SemiStructured,unstructured Facebook Stock Exchange Healthcare Telecom Mobile Devices GPS Security Infrastructure
  • 4. The Solution (Hadoop Evolution) Traditional Approach
  • 5. GB->TB->PB--ZB so the processing with RDBMS is Impossible
  • 6. Challenges In Big data • Storage -- PB • Processing – In a timely manner • Variety of data -- S/SS/US • Cost
  • 7. To overcome Big Data Challenges Hadoop evolves • Cost Effective – Commodity HW • Big Cluster – (1000 Nodes) --- Provides Storage n Processing • Parallel Processing – Map reduce • Big Storage – Memory per node * no of Nodes / RF • Fail over mechanism – Automatic Failover • Data Distribution • Map Reduce Framework • Moving Code to data • Heterogeneous Hardware System (IBM,HP,AIX,Oracle Machine of any memory and CPU configuration) • Scalable
  • 9. What is Hadoop • Java Framework to Process erroneous amount of data Hadoop Core • HDFS • Programming Construct (Map Reduce)
  • 10. HDFS
  • 13. Hadoop Sub-Projects • Hadoop Common: The common utilities that support the other Hadoop subprojects. • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data. • Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters. Other Hadoop-related projects at Apache include: • Avro™: A data serialization system. • Cassandra™: A scalable multi-master database with no single points of failure. • Chukwa™: A data collection system for managing large distributed systems. • HBase™: A scalable, distributed database that supports structured data storage for large tables. • Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying. • Mahout™: A Scalable machine learning and data mining library. • Pig™: A high-level data-flow language and execution framework for parallel computation. • ZooKeeper™: A high-performance coordination service for distributed applications. 
  • 14. HDFS 250 GB DFS 250 GB 1 TB File 250 GB Based on GFS 250 GB
  • 15. HDFS : Use Cases • Very large file. • Reading/Streaming Data Access. Read data in large volume Write once and Read frequent • Expensive Hardware. • Low latency Access. • Lots of small files • Parallel write/ Arbitrary Read
  • 16. HDFS Building Blocks Default Block Size 64MB 128MB 1GB file = 1024 MB/128 MB = 8 Blocks For Small File Size 100 MB File < Block Size (128 MB) : Optimize for storage = 1 Block of HDFS of size 100 MB
  • 17. HDFS Daemon Services • Name Node • Secondary Name Node • Data Node GFS (Master/Slave Architecture)
  • 18. HDFS Write File 1: D1,D2,D4 File 2: D1,D2,D3 128 MB RF = 3 D1 D1,D2,D4 D2 D3 D4
  • 19.
  • 20.
  • 21.
  • 22. HDFS File System Commands
  • 23.
  • 24.
  • 27. Copying Data from one Cluster to another Cluster UAT Cluster Prod Cluster Parallel copying using distcp hadoop distcp hdfs://uat:54311/user/rajkrrsingh/input hdfs://prod:54311/user/rajkrrsingh/input