SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Msquare Systems Inc.,
What is Hadoop?

Apache Hadoop is an open source project governed by the Apache Software Foundation (ASF) that allows you to gain
insight from massive amounts of structured and unstructured data quickly and without significant investment.

Hadoop is designed to run on commodity hardware and can scale up or down without system interruption. It consists
of three main functions: storage, processing and resource management.
Core services on Hadoop

MapReduce: MapReduce is a framework for writing applications that process large amounts of structured and unstructured data in
parallel across a cluster of several machines in a reliable and fault-tolerant.

HDFS: Hadoop Distributed File System is a java-based file system that provides scalable and reliable data storage for large group of
clusters.

Hadoop Yarn: Yarn is a next generation framework for Hadoop Data processing extending MapReduce capabilities by supporting nonMapReduce workloads associated with other programming models.

Apache Tez: Tez generalizes the MapReduce paradigm to a more powerful framework for executing a complex DAG (directed acyclic
graph) of tasks for near real-time big data processing
Hadoop Data Services

Apache Pig: Its platform for processing and analyzing large data sets.

Apache Hbase: A column-oriented No SQL data storage system that provides random real-time read/write access to big data for user
applications.

Apache Hive: Built on the MapReduce framework, Hive is a data warehouse that enables easy data summarization and add-hoc queries via
SQL-like interface for large datasets stored in HDFS.

Apache Flume: Allows efficiently aggregating and moving large amounts of log data from many different sources to Hadoop.

Apache Mahout: Apache Mahout scalable machine learning algorithms for hadoop, which aids with data science for clustering, classification
and batch based collaborative filtering
Hadoop Data Services

Apache Accumulo : Accumulo is a high performance data storage and retrieval system with cell-level access control. It is a scalable
implementation of Google’s Big Table design that works on top of Apache Hadoop and Apache ZooKeeper.

Apache Storm : Storm is a distributed real-time computation system for processing fast, large streams of data adding reliable real-time data
processing capabilities to Apache Hadoop 2.x.
Apache Catalog : A table and metadata management service that provides a centralized way for data processing systems to understand the
structure and location of the data stored within Apache Hadoop

Apache Sqoop : Sqoop is a tool that speeds and eases movement of data in and out of Hadoop. It provides a reliable parallel load for
various, popular enterprise data sources.
Hadoop Operational Services

Apache Zookeeper: A highly available system for coordinating distributing processes.

Apache Falcon: Falcon is a data management framework for simplifying data lifecycle management and processing pipelines on Apache
hadoop.

Apache Ambari: Open source installation lifecycle management, administration, and monitoring system for Apache Hadoop Clusters.

Apache knox: “Knox” gateway is a system that provides a single point of authentication and access for Apache Hadoop services in a cluster.

Apache Oozie: Oozie Java web application used to schedule Apache Hadoop Jobs. Oozie combines multiple jobs sequentially into one logical
unit of work.
What Hadoop can, and can't do

What Hadoop can't do
You can't use Hadoop for
 Structured data
 Transactional data

What Hadoop can do
You can use Hadoop for
 Big Data
Support & Partner
Getting Started or Support –

Muthu Natarajan

muthu.n@msquaresystems.com

www.msquaresystems.com

Phone: 212-941-6000

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Big Data and Hadoop - An Introduction
Big Data and Hadoop - An IntroductionBig Data and Hadoop - An Introduction
Big Data and Hadoop - An Introduction
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
HDFS
HDFSHDFS
HDFS
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 

Andere mochten auch

Andere mochten auch (7)

Processing cassandra datasets with hadoop streaming based approaches
Processing cassandra datasets with hadoop streaming based approachesProcessing cassandra datasets with hadoop streaming based approaches
Processing cassandra datasets with hadoop streaming based approaches
 
An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)
 
Silicon Halton - Meetup 72 - Co-Accelerate Your Business
Silicon Halton - Meetup 72 - Co-Accelerate Your BusinessSilicon Halton - Meetup 72 - Co-Accelerate Your Business
Silicon Halton - Meetup 72 - Co-Accelerate Your Business
 
Silicon Halton Meetup 75: Augmented Reality, A Primer
Silicon Halton Meetup 75: Augmented Reality, A PrimerSilicon Halton Meetup 75: Augmented Reality, A Primer
Silicon Halton Meetup 75: Augmented Reality, A Primer
 
Silicon Halton Meetup 77 - Work Unscripted
Silicon Halton Meetup 77 - Work UnscriptedSilicon Halton Meetup 77 - Work Unscripted
Silicon Halton Meetup 77 - Work Unscripted
 
Silicon Halton Meetup 83 - Sr. HR Panel
Silicon Halton Meetup 83 - Sr. HR PanelSilicon Halton Meetup 83 - Sr. HR Panel
Silicon Halton Meetup 83 - Sr. HR Panel
 
Silicon Halton Meetup 79 - Chart of Accounts
Silicon Halton Meetup 79 - Chart of AccountsSilicon Halton Meetup 79 - Chart of Accounts
Silicon Halton Meetup 79 - Chart of Accounts
 

Ähnlich wie Hadoop white papers

Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
BOSC 2010
 

Ähnlich wie Hadoop white papers (20)

Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
In15orlesss hadoop
In15orlesss hadoopIn15orlesss hadoop
In15orlesss hadoop
 
Hadoop vs Apache Spark
Hadoop vs Apache SparkHadoop vs Apache Spark
Hadoop vs Apache Spark
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Case study on big data
Case study on big dataCase study on big data
Case study on big data
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
 

Mehr von Muthu Natarajan

Mehr von Muthu Natarajan (8)

Understanding about relational database m-square systems inc
Understanding about relational database m-square systems incUnderstanding about relational database m-square systems inc
Understanding about relational database m-square systems inc
 
Agile methodologiesvswaterfall
Agile methodologiesvswaterfallAgile methodologiesvswaterfall
Agile methodologiesvswaterfall
 
Business intelligence data analytics-visualization
Business intelligence data analytics-visualizationBusiness intelligence data analytics-visualization
Business intelligence data analytics-visualization
 
Business intelligence, Data Analytics & Data Visualization
Business intelligence, Data Analytics & Data VisualizationBusiness intelligence, Data Analytics & Data Visualization
Business intelligence, Data Analytics & Data Visualization
 
Social Media Strategies and Social Marketing
Social Media Strategies and Social MarketingSocial Media Strategies and Social Marketing
Social Media Strategies and Social Marketing
 
Protect your website
Protect your websiteProtect your website
Protect your website
 
Hr presentation
Hr presentationHr presentation
Hr presentation
 
Cloud Computing & Benefits
Cloud Computing & BenefitsCloud Computing & Benefits
Cloud Computing & Benefits
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Hadoop white papers

  • 2. What is Hadoop? Apache Hadoop is an open source project governed by the Apache Software Foundation (ASF) that allows you to gain insight from massive amounts of structured and unstructured data quickly and without significant investment. Hadoop is designed to run on commodity hardware and can scale up or down without system interruption. It consists of three main functions: storage, processing and resource management.
  • 3. Core services on Hadoop MapReduce: MapReduce is a framework for writing applications that process large amounts of structured and unstructured data in parallel across a cluster of several machines in a reliable and fault-tolerant. HDFS: Hadoop Distributed File System is a java-based file system that provides scalable and reliable data storage for large group of clusters. Hadoop Yarn: Yarn is a next generation framework for Hadoop Data processing extending MapReduce capabilities by supporting nonMapReduce workloads associated with other programming models. Apache Tez: Tez generalizes the MapReduce paradigm to a more powerful framework for executing a complex DAG (directed acyclic graph) of tasks for near real-time big data processing
  • 4. Hadoop Data Services Apache Pig: Its platform for processing and analyzing large data sets. Apache Hbase: A column-oriented No SQL data storage system that provides random real-time read/write access to big data for user applications. Apache Hive: Built on the MapReduce framework, Hive is a data warehouse that enables easy data summarization and add-hoc queries via SQL-like interface for large datasets stored in HDFS. Apache Flume: Allows efficiently aggregating and moving large amounts of log data from many different sources to Hadoop. Apache Mahout: Apache Mahout scalable machine learning algorithms for hadoop, which aids with data science for clustering, classification and batch based collaborative filtering
  • 5. Hadoop Data Services Apache Accumulo : Accumulo is a high performance data storage and retrieval system with cell-level access control. It is a scalable implementation of Google’s Big Table design that works on top of Apache Hadoop and Apache ZooKeeper. Apache Storm : Storm is a distributed real-time computation system for processing fast, large streams of data adding reliable real-time data processing capabilities to Apache Hadoop 2.x. Apache Catalog : A table and metadata management service that provides a centralized way for data processing systems to understand the structure and location of the data stored within Apache Hadoop Apache Sqoop : Sqoop is a tool that speeds and eases movement of data in and out of Hadoop. It provides a reliable parallel load for various, popular enterprise data sources.
  • 6. Hadoop Operational Services Apache Zookeeper: A highly available system for coordinating distributing processes. Apache Falcon: Falcon is a data management framework for simplifying data lifecycle management and processing pipelines on Apache hadoop. Apache Ambari: Open source installation lifecycle management, administration, and monitoring system for Apache Hadoop Clusters. Apache knox: “Knox” gateway is a system that provides a single point of authentication and access for Apache Hadoop services in a cluster. Apache Oozie: Oozie Java web application used to schedule Apache Hadoop Jobs. Oozie combines multiple jobs sequentially into one logical unit of work.
  • 7. What Hadoop can, and can't do What Hadoop can't do You can't use Hadoop for  Structured data  Transactional data What Hadoop can do You can use Hadoop for  Big Data
  • 8. Support & Partner Getting Started or Support – Muthu Natarajan muthu.n@msquaresystems.com www.msquaresystems.com Phone: 212-941-6000