SlideShare ist ein Scribd-Unternehmen logo
1 von 19
I wonder what BigData, Hadoop and
MapReduce is all about 
How Big is BigBigBigData?
Facebook handles 180 PB of data in a year
Twitter handles 1.2 million tweets per second
2.3 Trillion GB everyday ; 40 Trillion GB by 2020
Volume} 
What’s wrong with data?
Log files  INFO: 192.12.135.34 /products/raspberry
JSON , XML , TXT , Videos , MP3s and your guess is as good
as mine.
The point is there is simply too much beauty in this world.
Or
Variety} 
in
Data
We will need Time machines!
 Data is getting generated at a faster pace than what we
can process.
 Think of a SQL query with complex join operations across
multiple tables each having millions of records.
 Elegant Solution : build a time machine ; Go back in
time and do data processing.
Velocity} 
Why can’t we just chuck this data away?
Francis
Bacon
said
something
wise.
DATA = ADVANTAGE
Think of Flipkart/Amazon/Snapdeal
 Competitors
 The company which understands
the need of user better will
eventually dominate.
 The more analysis they perform on
their data logs , better their
understanding on user.
 ML, IoT and all new technologies of
future would need BigData support.
Hadoop Hadoop Hadoop!
 Top level project under Apache Software Foundation (Open Source)
 Created based on Google research papers on how Google processed
their large volume of data.
Two Questions
Q1 ) Where does all our Big Big data goes and gets
stored ?
Ans . Hadoop Distributed File System (HDFS)
Q2) How do we process all this Big Big data ?
Ans. MapReduce
Q3) What is the key idea behind big data
processing?
Ans. Next Slide
Key Ideas
• Step 1: HDFS  Our large data gets stored as small chunks of data in many
machines (nodes) in a cluster.
• Step 2 : Data Local Computing  We process this small chunk of data
simultaneously at the nodes in which data is present.
• Step 3 : Aggregates the results from Step2 across various nodes and save
the money for building time machines.
HDFS
Replication Factor = 3
Cluster size = 5
Map Reduce
Is a programming paradigm (FP)
Mapper : The part of your program that
a) reads the file from HDFS ,
b) performs filtering , splitting and cleansing of data
c) Emits the data as key, value pair to reducer
Reducer : The part of your program that
a) performs aggregation or summary operation
b) the place where our program logic usually resides
c) Writes the output to HDFS
“Output from the Mapper is the input to Reducer”
A little more on MR
Mapping
• Mapper
• Each node applies the mapper function to the small chunk of
data that it holds simultaneously and produces key, value pairs.
Shuffle and
Sort
• Hadoop internal
• Nodes redistribute data based on the output from Mapper such
that all data belonging to one key is located on the same node.
Reducing
• Reducer
• Nodes now process each group of output data, per key, in
parallel
Word Count
Where does Python come in ?
Hadoop is written mainly in Java.
We can run mapreduce jobs via Hadoop Streaming (HS).
Using python for HS is quite easy.
We can do almost all the things with HS that normal Hadoop execution
provides.
There are going to be 2 parts to our mapreduce program.
Any guesses ?
Mapper.py
Reducer.py
How to get Started ?
Udacity : Intro to Hadoop and Map Reduce
MapR Free training lessons
Yahoo Hadoop Tutorials
BigData University
Hortonworks Webinars
Book : The Definitive Guide to Hadoop
Into to HDFS ; Youtube video
Tip : use the Cloudera VM provided by Udacity. It is light weight.
Hadoop EcoSystem
Thank You 

Weitere ähnliche Inhalte

Was ist angesagt?

Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampSpotle.ai
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detikk4ndar
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystemDataiku
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Bigdata & Hadoop
Bigdata & HadoopBigdata & Hadoop
Bigdata & HadoopPinto Das
 
Introduction to Big Data and hadoop
Introduction to Big Data and hadoopIntroduction to Big Data and hadoop
Introduction to Big Data and hadoopSandeep Patil
 
Introduction to Hadoop and Big-Data
Introduction to Hadoop and Big-DataIntroduction to Hadoop and Big-Data
Introduction to Hadoop and Big-DataRamsay Key
 
New Framework for Improving Bigdata Analaysis Using Mobile Agent
New Framework for Improving Bigdata Analaysis Using Mobile AgentNew Framework for Improving Bigdata Analaysis Using Mobile Agent
New Framework for Improving Bigdata Analaysis Using Mobile AgentMohammed Adam
 
ESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of ConvenienceESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of ConvenienceDan Pilone
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And HiveCloudera, Inc.
 
WisdomEye Technologies
WisdomEye TechnologiesWisdomEye Technologies
WisdomEye TechnologiesAshish Jha
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPTAnand Pandey
 

Was ist angesagt? (20)

Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystem
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Bigdata & Hadoop
Bigdata & HadoopBigdata & Hadoop
Bigdata & Hadoop
 
Introduction to Big Data and hadoop
Introduction to Big Data and hadoopIntroduction to Big Data and hadoop
Introduction to Big Data and hadoop
 
Introduction to Hadoop and Big-Data
Introduction to Hadoop and Big-DataIntroduction to Hadoop and Big-Data
Introduction to Hadoop and Big-Data
 
New Framework for Improving Bigdata Analaysis Using Mobile Agent
New Framework for Improving Bigdata Analaysis Using Mobile AgentNew Framework for Improving Bigdata Analaysis Using Mobile Agent
New Framework for Improving Bigdata Analaysis Using Mobile Agent
 
ESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of ConvenienceESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of Convenience
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Big data
Big dataBig data
Big data
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
WisdomEye Technologies
WisdomEye TechnologiesWisdomEye Technologies
WisdomEye Technologies
 
Hadoop_Presentation
Hadoop_PresentationHadoop_Presentation
Hadoop_Presentation
 
Big Data
Big DataBig Data
Big Data
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
 
Hadoop
HadoopHadoop
Hadoop
 

Andere mochten auch

Hadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MGHadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MGPradeep MG
 
Pyshark in Network Packet analysis
Pyshark in Network Packet analysisPyshark in Network Packet analysis
Pyshark in Network Packet analysisRengaraj D
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014soujavajug
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsLeila panahi
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduceFrane Bandov
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce AlgorithmsAmund Tveit
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsSkillspeed
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 

Andere mochten auch (14)

Hadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MGHadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MG
 
Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
 
Pyshark in Network Packet analysis
Pyshark in Network Packet analysisPyshark in Network Packet analysis
Pyshark in Network Packet analysis
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduce
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Ähnlich wie Intro to BigData , Hadoop and Mapreduce

Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopVictoria López
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopStefano Paluello
 
Python in big data world
Python in big data worldPython in big data world
Python in big data worldRohit
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniquesijsrd.com
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET Journal
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"Nicola Ferraro
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigDataThanusha154
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoopdbpublications
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceCsaba Toth
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopMr. Ankit
 
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop PrimerIntroduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop PrimerDenny Lee
 

Ähnlich wie Intro to BigData , Hadoop and Mapreduce (20)

Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and Hadoop
 
Python in big data world
Python in big data worldPython in big data world
Python in big data world
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
Big Data and Hadoop - An Introduction
Big Data and Hadoop - An IntroductionBig Data and Hadoop - An Introduction
Big Data and Hadoop - An Introduction
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop PrimerIntroduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop Primer
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 

Mehr von Krishna Sangeeth KS (17)

Bringing back Vangogh
Bringing back VangoghBringing back Vangogh
Bringing back Vangogh
 
All things py
All things pyAll things py
All things py
 
Up and running with pyspark
Up and running with pysparkUp and running with pyspark
Up and running with pyspark
 
Solving graph problems using networkX
Solving graph problems using networkXSolving graph problems using networkX
Solving graph problems using networkX
 
Lvc
LvcLvc
Lvc
 
Round 1 gnosis
Round 1 gnosisRound 1 gnosis
Round 1 gnosis
 
Lvc
LvcLvc
Lvc
 
Anti Counterfeit System using QR codes and Various other applications
Anti Counterfeit System using QR codes and Various other applicationsAnti Counterfeit System using QR codes and Various other applications
Anti Counterfeit System using QR codes and Various other applications
 
Automatic web monitoring & retrieval system
Automatic web monitoring & retrieval systemAutomatic web monitoring & retrieval system
Automatic web monitoring & retrieval system
 
Written round
Written roundWritten round
Written round
 
Visual connect
Visual connectVisual connect
Visual connect
 
Supertheme
SuperthemeSupertheme
Supertheme
 
Gnosis quiz 2k10 dry1
Gnosis quiz 2k10 dry1Gnosis quiz 2k10 dry1
Gnosis quiz 2k10 dry1
 
Dry 2
Dry 2Dry 2
Dry 2
 
Choice round
Choice roundChoice round
Choice round
 
Gnosis quiz 2k10 dry1
Gnosis quiz 2k10 dry1Gnosis quiz 2k10 dry1
Gnosis quiz 2k10 dry1
 
Gnosis Quiz 2k10 Prelims
Gnosis Quiz  2k10 PrelimsGnosis Quiz  2k10 Prelims
Gnosis Quiz 2k10 Prelims
 

Kürzlich hochgeladen

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Kürzlich hochgeladen (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Intro to BigData , Hadoop and Mapreduce

  • 1. I wonder what BigData, Hadoop and MapReduce is all about 
  • 2. How Big is BigBigBigData? Facebook handles 180 PB of data in a year Twitter handles 1.2 million tweets per second 2.3 Trillion GB everyday ; 40 Trillion GB by 2020 Volume} 
  • 3. What’s wrong with data? Log files  INFO: 192.12.135.34 /products/raspberry JSON , XML , TXT , Videos , MP3s and your guess is as good as mine. The point is there is simply too much beauty in this world. Or Variety}  in Data
  • 4. We will need Time machines!  Data is getting generated at a faster pace than what we can process.  Think of a SQL query with complex join operations across multiple tables each having millions of records.  Elegant Solution : build a time machine ; Go back in time and do data processing. Velocity} 
  • 5. Why can’t we just chuck this data away? Francis Bacon said something wise. DATA = ADVANTAGE
  • 6. Think of Flipkart/Amazon/Snapdeal  Competitors  The company which understands the need of user better will eventually dominate.  The more analysis they perform on their data logs , better their understanding on user.  ML, IoT and all new technologies of future would need BigData support.
  • 7. Hadoop Hadoop Hadoop!  Top level project under Apache Software Foundation (Open Source)  Created based on Google research papers on how Google processed their large volume of data.
  • 8. Two Questions Q1 ) Where does all our Big Big data goes and gets stored ? Ans . Hadoop Distributed File System (HDFS) Q2) How do we process all this Big Big data ? Ans. MapReduce Q3) What is the key idea behind big data processing? Ans. Next Slide
  • 9. Key Ideas • Step 1: HDFS  Our large data gets stored as small chunks of data in many machines (nodes) in a cluster. • Step 2 : Data Local Computing  We process this small chunk of data simultaneously at the nodes in which data is present. • Step 3 : Aggregates the results from Step2 across various nodes and save the money for building time machines.
  • 10. HDFS Replication Factor = 3 Cluster size = 5
  • 11. Map Reduce Is a programming paradigm (FP) Mapper : The part of your program that a) reads the file from HDFS , b) performs filtering , splitting and cleansing of data c) Emits the data as key, value pair to reducer Reducer : The part of your program that a) performs aggregation or summary operation b) the place where our program logic usually resides c) Writes the output to HDFS “Output from the Mapper is the input to Reducer”
  • 12. A little more on MR Mapping • Mapper • Each node applies the mapper function to the small chunk of data that it holds simultaneously and produces key, value pairs. Shuffle and Sort • Hadoop internal • Nodes redistribute data based on the output from Mapper such that all data belonging to one key is located on the same node. Reducing • Reducer • Nodes now process each group of output data, per key, in parallel
  • 14. Where does Python come in ? Hadoop is written mainly in Java. We can run mapreduce jobs via Hadoop Streaming (HS). Using python for HS is quite easy. We can do almost all the things with HS that normal Hadoop execution provides. There are going to be 2 parts to our mapreduce program. Any guesses ?
  • 17. How to get Started ? Udacity : Intro to Hadoop and Map Reduce MapR Free training lessons Yahoo Hadoop Tutorials BigData University Hortonworks Webinars Book : The Definitive Guide to Hadoop Into to HDFS ; Youtube video Tip : use the Cloudera VM provided by Udacity. It is light weight.