SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Why do I need Hadoop?
ď‚—

Business analytics focuses on developing
new insights and understanding of
business performance based on data and
statistical methods.

Business analytics
Problem : Too much data
Big Data!!
Velocity
ď‚— How fast data is being produced and how fast the data must be processed to meet
demand.
ď‚— Have a look through analytics lens!
Variability
ď‚— highly inconsistent with periodic peaks
ď‚— Is something big trending in the social media?
ď‚— Difference in Variety and Variability
Megabytes,Gigabytes…
ď‚—

Terabyte : To put it in some perspective, a
Terabyte could hold about 300 hours of good
quality video. A Terabyte could hold 1,000 copies
of the Encyclopedia Britannica.

ď‚—

Petabyte : It could hold 500 billion pages of
standard printed text.

ď‚—

Exabyte: It has been said that 5 Exabytes would
be equal to all of the words ever spoken by
mankind.
Human Generated Data and Machine
Generated
Data
Sheer size of Big Data
ď‚— Big Data is unstructured or semi
structured.
ď‚— No point in just storing big data, if we
can't process it.
ď‚—

Challenges of Big Data
Hadoop enables a computing
solution that is:

Scalable– New nodes can be added as needed, and added
without needing to change data formats, how data is loaded,
how jobs are written, or the applications on top.
 Cost effective– Hadoop brings massively parallel computing
to commodity servers.
 Flexible– Hadoop is schema-less, and can absorb any type of
data, structured or not, from any number of sources.
 Fault tolerant– When you lose a node, the system redirects
work to another location of the data and continues processing
without missing a beat.
ď‚—
Power of Map Reduce
ď‚—

Introduction
Hadoop: Basic Concepts
What is Hadoop?
The Hadoop Distributed File System
Hadoop Map Reduce Works
Anatomy of a Hadoop Cluster

ď‚—

Hadoop daemons
Master Daemons
Name node
Job Tracker
Secondary name node
Slave Daemons
Job tracker
Task tracker

Course Content
ď‚—

HDFS ( Hadoop Distributed File System )

ď‚—

Blocks and Splits
Input Splits
HDFS SplitsData Replication
Hadoop Rack Aware
Data high availability
Data Integrity
Cluster architecture and block placement
Accessing HDFS
JAVA Approach
CLI ApproachProgramming Practices
Developing MapReduce Programs in
Local Mode
Running without HDFS and Mapreduce
Pseudo-distributed Mode
Running all daemons in a single node
Fully distributed mode
Running daemons on dedicated nodes
ď‚—

Writing a MapReduce Program
Examining a Sample MapReduce Program
With several examples
Basic API Concepts
The Driver Code
The Mapper
The Reducer
Hadoop's Streaming API

ď‚—

Common MapReduce Algorithms
Sorting and Searching
Indexing
Classification/Machine Learning
Term Frequency - Inverse Document Frequency
Word Co-Occurrence
Hands-On Exercise: Creating an Inverted Index
Identity Mapper
Identity Reducer
Exploring well known problems using MapReduce applications
ď‚—

Debugging MapReduce Programs
Testing with MRUnit
Logging
Other Debugging Strategies.

ď‚—

Advanced MapReduce Programming
A Recap of the MapReduce Flow
The Secondary Sort
Customized Input Formats and Output Formats
ď‚—

HBase
HBase concepts
HBase architecture
Region server architecture
File storage architecture
HBase basics
Column access
Scans
HBase use cases
Install and configure HBase on a multi node cluster
Create database, Develop and run sample applications
Access data stored in HBase using clients like Java, Python and Pearl
HBase and Hive Integration
HBase admin tasks
Defining Schema and basic operation

Hadoop Ecosystem
ď‚—

Hive
Hive concepts
Hive architecture
Install and configure hive on cluster
Create database, access it from java client
Buckets
PartitionsJoins in hive
Inner joins
Outer Joins
Hive UDF
Hive UDAF
Hive UDTF
Develop and run sample applications in Java/Python to access hive
ď‚—

PIG
Pig basics
Install and configure PIG on a cluster
PIG Vs MapReduce and SQL
Pig Vs Hive
Write sample Pig Latin scripts
Modes of running PIG
Running in Grunt shell
Programming in Eclipse
Running as Java program
PIG UDFs
Pig Macros

ď‚—

Flume
Flume concepts
Install and configure flume on cluster
Create a sample application to capture logs from Apache using
flume
ď‚—

Sqoop
Getting Sqoop
A Sample Import
Database Imports
Controlling the import
Imports and consistency
Direct-mode imports
Performing an Export
Contact Us
Address
MindScripts Technologies,
2nd Floor, Siddharth Hall,
Near Ranka Jewellers,
Behind HP Petrol Pump,
Karve Rd,
Pune 411004
Call
9595957557
8805674210
9764560238
9767427924
9881371828

Address
MindScripts Technologies,
C8, 2nd Floor, Sant Tukaram Complex ,
Pradhikaran, Above Savali Hotel,
Opp Nigdi Bus Stand,
Nigdi,
Pune - 411044

www.mindscripts.com
info@mindscripts.com

Weitere ähnliche Inhalte

Was ist angesagt?

Big data course
Big data  courseBig data  course
Big data coursekiruthikab6
 
Hadoop online training course
Hadoop online  training courseHadoop online  training course
Hadoop online training courseKamal A
 
Big Data Technologies - Hadoop
Big Data Technologies - HadoopBig Data Technologies - Hadoop
Big Data Technologies - HadoopTalentica Software
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online trainingSmartittrainings
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionEdureka!
 
Introduction to Hive for Hadoop
Introduction to Hive for HadoopIntroduction to Hive for Hadoop
Introduction to Hive for Hadoopryanlecompte
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Beauty and Big Data
Beauty and Big DataBeauty and Big Data
Beauty and Big DataSri Ambati
 
1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21Hadoop User Group
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoopdhruv_gairola
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopSri Kanth
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystemJakub Stransky
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystemDataiku
 
Introduction to Big Data and hadoop
Introduction to Big Data and hadoopIntroduction to Big Data and hadoop
Introduction to Big Data and hadoopSandeep Patil
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopMark Ginnebaugh
 
Intro to big data and hadoop ubc cs lecture series - g fawkes
Intro to big data and hadoop   ubc cs lecture series - g fawkesIntro to big data and hadoop   ubc cs lecture series - g fawkes
Intro to big data and hadoop ubc cs lecture series - g fawkesgfawkesnew2
 

Was ist angesagt? (20)

Big data course
Big data  courseBig data  course
Big data course
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentation
 
Hadoop online training course
Hadoop online  training courseHadoop online  training course
Hadoop online training course
 
Big Data Technologies - Hadoop
Big Data Technologies - HadoopBig Data Technologies - Hadoop
Big Data Technologies - Hadoop
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
 
Introduction to Hive for Hadoop
Introduction to Hive for HadoopIntroduction to Hive for Hadoop
Introduction to Hive for Hadoop
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Beauty and Big Data
Beauty and Big DataBeauty and Big Data
Beauty and Big Data
 
1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystem
 
Introduction to Big Data and hadoop
Introduction to Big Data and hadoopIntroduction to Big Data and hadoop
Introduction to Big Data and hadoop
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & Hadoop
 
Intro to big data and hadoop ubc cs lecture series - g fawkes
Intro to big data and hadoop   ubc cs lecture series - g fawkesIntro to big data and hadoop   ubc cs lecture series - g fawkes
Intro to big data and hadoop ubc cs lecture series - g fawkes
 
Hadoop Summit 2010 Keynote
Hadoop Summit 2010 KeynoteHadoop Summit 2010 Keynote
Hadoop Summit 2010 Keynote
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 

Andere mochten auch

Hadoop
HadoopHadoop
Hadoop774474
 
Why do we need Hadoop?
Why do we need Hadoop?Why do we need Hadoop?
Why do we need Hadoop?Jatin Dabas
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file systemAnshul Bhatnagar
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduceNewvewm
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHanborq Inc.
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersRahul Jain
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBernard Marr
 

Andere mochten auch (11)

Big data
Big dataBig data
Big data
 
Hadoop
HadoopHadoop
Hadoop
 
Why do we need Hadoop?
Why do we need Hadoop?Why do we need Hadoop?
Why do we need Hadoop?
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 

Ă„hnlich wie Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Hadoop in action
Hadoop in actionHadoop in action
Hadoop in actionMahmoud Yassin
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Imam Raza
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: RevealedSachin Holla
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoopAsis Mohanty
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_pptjerrin joseph
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Yahoo Developer Network
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 
Lecture 2 Hadoop.pptx
Lecture 2 Hadoop.pptxLecture 2 Hadoop.pptx
Lecture 2 Hadoop.pptxAnonymous9etQKwW
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basicsLaxmi Rauth
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a NutshellAnthony Thomas
 

Ă„hnlich wie Big-Data Hadoop Tutorials - MindScripts Technologies, Pune (20)

Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Lecture 2 Hadoop.pptx
Lecture 2 Hadoop.pptxLecture 2 Hadoop.pptx
Lecture 2 Hadoop.pptx
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 

KĂĽrzlich hochgeladen

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 

KĂĽrzlich hochgeladen (20)

TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 

Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

  • 1. Why do I need Hadoop?
  • 2.
  • 3.
  • 4.
  • 5. ď‚— Business analytics focuses on developing new insights and understanding of business performance based on data and statistical methods. Business analytics
  • 6. Problem : Too much data
  • 8. Velocity ď‚— How fast data is being produced and how fast the data must be processed to meet demand. ď‚— Have a look through analytics lens!
  • 9. Variability ď‚— highly inconsistent with periodic peaks ď‚— Is something big trending in the social media? ď‚— Difference in Variety and Variability
  • 10. Megabytes,Gigabytes… ď‚— Terabyte : To put it in some perspective, a Terabyte could hold about 300 hours of good quality video. A Terabyte could hold 1,000 copies of the Encyclopedia Britannica. ď‚— Petabyte : It could hold 500 billion pages of standard printed text. ď‚— Exabyte: It has been said that 5 Exabytes would be equal to all of the words ever spoken by mankind.
  • 11. Human Generated Data and Machine Generated Data
  • 12. Sheer size of Big Data ď‚— Big Data is unstructured or semi structured. ď‚— No point in just storing big data, if we can't process it. ď‚— Challenges of Big Data
  • 13.
  • 14. Hadoop enables a computing solution that is: Scalable– New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top. ď‚— Cost effective– Hadoop brings massively parallel computing to commodity servers. ď‚— Flexible– Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. ď‚— Fault tolerant– When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat. ď‚—
  • 15. Power of Map Reduce
  • 16.
  • 17. ď‚— Introduction Hadoop: Basic Concepts What is Hadoop? The Hadoop Distributed File System Hadoop Map Reduce Works Anatomy of a Hadoop Cluster ď‚— Hadoop daemons Master Daemons Name node Job Tracker Secondary name node Slave Daemons Job tracker Task tracker Course Content
  • 18. ď‚— HDFS ( Hadoop Distributed File System ) ď‚— Blocks and Splits Input Splits HDFS SplitsData Replication Hadoop Rack Aware Data high availability Data Integrity Cluster architecture and block placement Accessing HDFS JAVA Approach CLI ApproachProgramming Practices Developing MapReduce Programs in Local Mode Running without HDFS and Mapreduce Pseudo-distributed Mode Running all daemons in a single node Fully distributed mode Running daemons on dedicated nodes
  • 19. ď‚— Writing a MapReduce Program Examining a Sample MapReduce Program With several examples Basic API Concepts The Driver Code The Mapper The Reducer Hadoop's Streaming API ď‚— Common MapReduce Algorithms Sorting and Searching Indexing Classification/Machine Learning Term Frequency - Inverse Document Frequency Word Co-Occurrence Hands-On Exercise: Creating an Inverted Index Identity Mapper Identity Reducer Exploring well known problems using MapReduce applications
  • 20. ď‚— Debugging MapReduce Programs Testing with MRUnit Logging Other Debugging Strategies. ď‚— Advanced MapReduce Programming A Recap of the MapReduce Flow The Secondary Sort Customized Input Formats and Output Formats
  • 21. ď‚— HBase HBase concepts HBase architecture Region server architecture File storage architecture HBase basics Column access Scans HBase use cases Install and configure HBase on a multi node cluster Create database, Develop and run sample applications Access data stored in HBase using clients like Java, Python and Pearl HBase and Hive Integration HBase admin tasks Defining Schema and basic operation Hadoop Ecosystem
  • 22. ď‚— Hive Hive concepts Hive architecture Install and configure hive on cluster Create database, access it from java client Buckets PartitionsJoins in hive Inner joins Outer Joins Hive UDF Hive UDAF Hive UDTF Develop and run sample applications in Java/Python to access hive
  • 23. ď‚— PIG Pig basics Install and configure PIG on a cluster PIG Vs MapReduce and SQL Pig Vs Hive Write sample Pig Latin scripts Modes of running PIG Running in Grunt shell Programming in Eclipse Running as Java program PIG UDFs Pig Macros ď‚— Flume Flume concepts Install and configure flume on cluster Create a sample application to capture logs from Apache using flume
  • 24. ď‚— Sqoop Getting Sqoop A Sample Import Database Imports Controlling the import Imports and consistency Direct-mode imports Performing an Export
  • 25. Contact Us Address MindScripts Technologies, 2nd Floor, Siddharth Hall, Near Ranka Jewellers, Behind HP Petrol Pump, Karve Rd, Pune 411004 Call 9595957557 8805674210 9764560238 9767427924 9881371828 Address MindScripts Technologies, C8, 2nd Floor, Sant Tukaram Complex , Pradhikaran, Above Savali Hotel, Opp Nigdi Bus Stand, Nigdi, Pune - 411044 www.mindscripts.com info@mindscripts.com