SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Hadoop Ecosystem
Lior Sidi
Sep 2016
Hello!
I am Lior Sidi
Big data V’s
Volume
Velocity
Variety
What is Hadoop?
• Hadoop – Open source implementation of MapReduce (MR)
• Perform MR Jobs fast and efficient
Goal
generating Value from large datasets
That cannot be analyzed
using traditional technologies
Hadoop Concepts
Requirements
• Linear horizontal scalability
• Jobs run in isolation
• Simple programming model
Challenges and solution
• Ch1: Data access bottleneck
• Sol: Store and process data on same node
• Ch1: Distributed Programming is Difficult
• Sol: Use high level languages API
Hadoop Timeline
2003 Oct
Google File System
paper released
2004 Dec
MapReduce: Simplified Data
Processing on Large Clusters
2006 Oct
Hadoop 1.0 released
2007 Oct
Yahoo Labs creates Pig
2008 Oct
Cloudera, Hadoop
distributor is founded
2010 Sep
Hive and Pig Graduates
2011 Jan
Zookeeper Graduates
2013 Mar
Yarn deployed in Yahoo
2014 Feb
Apache Spark top
Level Apache Project
Data
Streaming
Analysis
Data Injection
Resource
Management
coordination
Processing
workflow
Visualization
Cluster
management
Storage
Search
Data
Formats
Hadoop Ecosystem
Storage
Hadoop Ecosystem
Storage / HDFS
• “Hadoop Distributed File System”
• Design:
• Write once – read many times pattern
• Cheap hardware
• Low latency data access
• Concepts:
• Block – File is split to Size 128 MB blocks, redundancy - 3
• NameNode (Master) – per cluster - file system namespace for blocks (single point of
failure)
• DataNode (Worker) – per Node - store and retrieve blocks
• Functions:
• High availability – run a second NameNode
• Block caching – block cached in only one DataNode
• Locality - Rack sensitive, network topology
• File permissions – like POSIX – r w x – owner/group/mode file/directory
• Interfaces – HTTP (proxy/direct), Java API
• Cluster balance – evenly spread the block on the cluster
2Rack
1Rack
Data
Block 1
Block 2
Block 3
DataNodeDataNodeDataNodeDataNode
Block 1
Block 1
Block 2
Block 2
Block 3
Block 3
Block 1
DataNode
Block 2
Block 3
NameNode
HDFS proxy Client
file is distribution and
accessed on Hadoop HDFS
Resource
Management
Storage
Hadoop Ecosystem
Resource Management / YARN
• “Yet Another Resource Negotiator”
• Manage and schedule the cluster resource
• Daemons:
• Resource Manager – Per Cluster – manage resource across the cluster
• Node Manager – Per Node – launch and monitor a Container
• Container – execute an app process
• Resource requests for containers:
• Amount of computers (CPU & Memory)
• Locality (node/rack)
• Lifespan: application per user job or long-running apps shared by users
• Scheduling:
• Allocate resource by policy (FIFO, capacity (ordanisation), Fair
Hadoop Cluster
Nodemanager
node
NodeManager
Container
Master
Client node
application
Resource manager node
ResourceManager
Client
Nodemanager
node
NodeManager
Container
Worker
Nodemanager
node
NodeManager
Container
Worker
launch
launch
launch
launch
Launch
YARN app
heartbeat Job scheduling on top
Hadoop Cluster
Resource
Management
Processing
Storage
Hadoop Ecosystem
Processing / MapReduce
• Simplify, large scale, automatic, Fault tolerant development data
processing
• origin - Google paper 2004
• Batch processing
• Hadoop MR:
• JobTracker – 1per cluster - master process, schedule tasks on workers,
monitor progress
• taskTracker – 1 per worker - execute map/reduce tasks locally and
report progress
Processing / MapReduce
LiorRonLior
RonRonAndrey
LiorAndreyLior
CountName
1Lior
1Ron
1Lior
CountName
1Lior
1Andrey
1Lior
CountName
1Andrey
1Ron
1Ron
CountName
4Lior
CountName
3Ron
CountName
2Andrey
Data
Map
ReduceShuffle
& Sort
Hadoop Cluster
Nodemanager
node
NodeManager
Container
JobTracker
Client node
MR program
Resource manager node
ResourceManager
Client
Nodemanager
node
NodeManager
Container
TaskTracker
Nodemanager
node
NodeManager
Container
TaskTracker
launch
launch
launch
launch
Launch
YARN app
heartbeat
MR Job scheduling on top
Hadoop Cluster
Resource
Management
Processing
Storage
Hadoop Ecosystem
Storage / HBase
• Distributes Column Base database on top HDFS
• Real time read/write random access for large data-sets
• Region – tables splitting by row
• Pheonex - SQL on HBase
RowKey Column Family 1 Column Family 2
Col 1.1
Version Data
Col 1.2 Col 1.3
Version Data
Version Data
Hbase Data Model
Resource
Management
coordination
Processing
Storage
Hadoop Ecosystem
Coordination / ZooKeeper
• Hadoop’s distributed coordination service
• Coordinate read/write action on data
• high availability filesystem
• Implementation:
• Data model:
• Tree build from Znodes (1MB data)
• Znode – data changes, ACL (access control list )
• Leader - perform write and broadcast an update
• Follower – pass atomic request to leader
• Lock service
• User groups
• Replicate mode
Coordination / ZooKeeper
Hadoop Cluster
ZooKeaper Service
Leader
HDFSHBase
DataNodeDataNodeDataNode
HMaster Other
Client
RegionRegionRegion
NameNode
/
/HBase HDFS/
Follower
/
/HBase HDFS/
Follower
/
/HBase HDFS/
LOCK LOCK
ZooKeeper
Coordination
example
Resource
Management
coordination
Processing
Storage
Data
Formats
Hadoop Ecosystem
Row Based  Avro
• Language natural data serialization system
• Share many data formats with many code language
• Split able and sortable - Allow easy map reduce
• Rich schema resolution – flexible scheme
• Other Row Based formats
• sequenceFile - Logfile format
• MapFile - Sorted sequenceFile
Row Based  Avro
Header Block 1 Block 2 Block N
Count objs Serialized objs SyncMarker
identifier Metadata: Schema & codec SyncMarker
Size objs
{
"Type":"record"
"Name":"Person"
"Fields":
[{
"name":"firstName",
"type":"string"
"order":"descending"
},{
"name":"age",
"type":"int"
},{...
]
}
Schema
File Structure
File Structure
Parquet
• Columnar storage format
• Skip unneeded columns
• Fast queries & small size
• Efficient nested data store Header Block 1 Block 2 Block N
Column chunk Column chunk Column chunk
Page Page Page Page
Magic Number File Metadata
Footer
Message Person {
Required binary name (UTF8);
Required int32 age (UTF8);
Required group hobbies (LIST) {
Required binary array (UTF8);
}
}
Schema
Data Injection
Resource
Management
coordination
Processing
Storage
Data
Formats
Hadoop Ecosystem
Data Integration / Sqoop
• Import/export structural data
• Sqoop connector:
• import/export from a database
• Sqoop1- command line
• Sqoop2 – service
• Connectors – connect RDBs
Hadoop Cluster
Export MapReduce Job
Database
Table
Sqoop client
Import MapReduce Job
Hdfs Hdfs
Map Map
Hdfs Hdfs
Map Map
metadata
launch launch
ExportImport
Data Integration / Flume
• Event base data injection into Hadoop
• Flume agent components:
• Sources – spoolingDir (create events), Avro(RPC), Http (requests)
• Channel
• Sink – Avro, HDFS, HBase, Solr(=near real time)
• Reliability - Use separate transaction
• Fan out – one source many sinks
• Scale - agent tiers for aggregation multiple sources
• Sink grouping- avoid failure and load balancing
Fan Out
Data Integration / Flume
Hadoop Data
File
system
Flume Agent
Source Channel Sink
Tier 1
Flume Agent
Tier 1
Flume Agent
Tier 1
Flume Agent
Tier 2
Flume Agent
Tier 2
Flume Agent
Tier 3
Flume Agent
Tier 3
Flume Agent
File
system Sink
GroupingScale
HDFS
HBase
Data
Data Integration / Kafka
• distributed publish-subscribe messaging system
• Fast, scalable, durable
• Components:
• Topics – categories of feeds messages
• Procedures – process that publish messages to topic
• Message consumer – processes that subscribe for topic
• Broker – kafka servers on cluster
• Distribution
• Leader – allow read/write
• Follower – replicate
Data
Streaming
Data Injection
Resource
Management
coordination
Processing
Storage
Data
Formats
Hadoop Ecosystem
Data Integration / Streaming
• Stream processing
• Kafka Stream - Process and analyze data in Kafka
• Storm – real-time computation
• Spark streaming – process live data and can apply Spark MLib and
graphX
Flume Agent 1
Data
Kafka
Spark Streaming
Flume Agent 2 Storm
Topic
A
Topic
B
HDFS
1
1
1
2
2
Data
Streaming
Analysis
Data Injection
Resource
Management
coordination
Processing
Storage
Data
Formats
Hadoop Ecosystem
• Cluster Computing Framework
• In Memory processing
• Language: Scala, Java and Python
• RDD – resilience Distributed dataset
• Read only collection spread in the cluster
• Computation of transformation happened when Action
• DAG engine – schedule many transformations to one optimal Job
• Spark context
• parallel jobs
• Caching
• Broadcast variables (Data/Functions)
• Cluster Manager of executors:
• Local, Standalone, Mesos , Yarn
Computation / Spark
Computation / Spark
Hadoop
Driver
SparkContext
Spark Program
DAG Scheduler
Task Scheduler
Scheduler backend
Executer Executer Executer
Job
Job
Stages
Tasks
Task Task Task
Scripting / Pig
• Data flow programming language - Map reduce abstraction
• support: User defined functions (UDF), Streaming, nested data
• Don’t support: random read/write
• Pig Latin - Scripting language
• Load, store, filtering, Group, Join, Sort, Union and Split, UDF, Co-group
• Modes
• Local – small datasets
• MR mode – run on cluster
• Execution - script, grunt (shell), embedded (java)
• Parameter substitution – run script with different parameters
• Similar
• Crunch – MR pipeline with Java (no UDF)
Query / Hive
• Components
• MetaStore – tables description
• HiveQL – SQL dialect (SQL: 2003)
• tables Management
• warehouse directory
• external tables
• functionality
• Bucketing and Partitions by column
• Support UDF and UDAF (aggregate)
• Insert Update Delete:
• Saved in delta files
• Background MR Jobs
• (Available Transaction context)
• Lock table (avoid drop)
Query / Comparison
SparkSql (shark)ImpalaHive
Procedural
development
BI & SQL analyticsBatchUsage
OKBestbadSpeed
MemoryDedicated Deamons on
DataNode
MapReduceimplementation
Persto ,
Drill (SQL: 2011)
Hive On sparkSimilar tools
Data
Streaming
Analysis
Data Injection
Resource
Management
coordination
Processing
workflow
Storage
Data
Formats
Hadoop Ecosystem
Workflow / Oozie
• Schedule Hadoop jobs
• Job types:
• Workflows – sequence of jobs via Directed Graphs (DAGs)
• Coordinator - trigger jobs by time or availability
start Sqoop Fork
Pig
PigMR
Sub
workflow
FS
(HDFS)
Join End
Control flow
Action
Email
Data
Streaming
Analysis
Data Injection
Resource
Management
coordination
Processing
workflow
Storage
Search
Data
Formats
Hadoop Ecosystem
Search / Solr
• Full- text search over Hadoop
• Near real time indexing
• REST API
• Based on Apache Lucene java search library
Data
Streaming
Analysis
Data Injection
Resource
Management
coordination
Processing
workflow
Visualization
Storage
Search
Data
Formats
Hadoop Ecosystem
Visualization / Hue
• Open source Web interface for analyzing data with any Hadoop.
• Application:
• File Browser: HDFS, Hbase
• Scheduling of jobs and workflows : Oozie
• Job Browser: YARN
• SQL : Hive, Impala
• Data analysis: Pig, UDF
• Dynamic Search: Solr
• Notebooks: Spark
• Data Transfer: Sqoop 2
Data
Streaming
Analysis
Data Injection
Resource
Management
coordination
Processing
workflow
Visualization
Cluster
management
Storage
Search
Data
Formats
Hadoop Ecosystem
Cluster Management / Cloudera
• 100% open source
• The most complete and tested distribution of Hadoop
• Integrate all Hadoop project
• Express – free, end to end administration
• Enterprise – Extra features and support
Cluster Management / Comparison
https://talendexpert.com/cloudera-vs-honworks-vs-mapr
MasterMasterMaster
Other Servers
Worker
Basic Cluster configuration
Resource manager
Standby
Resource Manager
NodeManager
DataNode
Cloudera Manager
Hive GW
ZooKeeper
Impala Daemon
Impala State
Sqoop GW
Spark GW
NameNode
Master
ZooKeeper
Secondary
NameNode
Worker
NodeManager
DataNode
Impala Daemon
Worker
NodeManager
DataNode
Impala Daemon
Worker
NodeManager
DataNode
Impala Daemon
Data
Streaming
Analysis
Data Injection
Resource
Management
coordination
Processing
workflow
Visualization
Cluster
management
Storage
Search
Data
Formats
Hadoop Ecosystem
Thanks!
Any questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Snapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File SystemSnapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File SystemBhavesh Padharia
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceDatabricks
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Simplilearn
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySparkRussell Jurney
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Se training storage grid webscale technical overview
Se training   storage grid webscale technical overviewSe training   storage grid webscale technical overview
Se training storage grid webscale technical overviewsolarisyougood
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engineWalter Liu
 
Dataguard presentation
Dataguard presentationDataguard presentation
Dataguard presentationVimlendu Kumar
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Cloudera, Inc.
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Hadoop Mapreduce Job Execution By Ravi Namboori Babson
Hadoop Mapreduce Job Execution By Ravi Namboori BabsonHadoop Mapreduce Job Execution By Ravi Namboori Babson
Hadoop Mapreduce Job Execution By Ravi Namboori BabsonRavi namboori
 

Was ist angesagt? (20)

Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Snapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File SystemSnapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File System
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop
HadoopHadoop
Hadoop
 
MapReduce
MapReduceMapReduce
MapReduce
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Se training storage grid webscale technical overview
Se training   storage grid webscale technical overviewSe training   storage grid webscale technical overview
Se training storage grid webscale technical overview
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
Dataguard presentation
Dataguard presentationDataguard presentation
Dataguard presentation
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Hadoop Mapreduce Job Execution By Ravi Namboori Babson
Hadoop Mapreduce Job Execution By Ravi Namboori BabsonHadoop Mapreduce Job Execution By Ravi Namboori Babson
Hadoop Mapreduce Job Execution By Ravi Namboori Babson
 
Pig latin
Pig latinPig latin
Pig latin
 

Ähnlich wie Hadoop Ecosystem

Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)VMware Tanzu
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecturesaipriyacoool
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big DataJoe Alex
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark FundamentalsZahra Eskandari
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
Introduction to hadoop V2
Introduction to hadoop V2Introduction to hadoop V2
Introduction to hadoop V2TarjeiRomtveit
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkJames Chen
 

Ähnlich wie Hadoop Ecosystem (20)

Hadoop
HadoopHadoop
Hadoop
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Hadoop
HadoopHadoop
Hadoop
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Introduction to hadoop V2
Introduction to hadoop V2Introduction to hadoop V2
Introduction to hadoop V2
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Apache drill
Apache drillApache drill
Apache drill
 

Kürzlich hochgeladen

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Hadoop Ecosystem

Hinweis der Redaktion

  1. HDFS – manage the file system across network of machines Design to store big files Master worker pattern Namenode maintain the directory tree –doesn’t maintain a perstistent location but reconstract when reboot Namenode is the most important component in the cluster when it lost the entire access to the cluster is lost therefore it possible to create high availabuility where we
  2. Design to support map reduce but is used for other operations