SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Shubhendu Tripathi
PSE – Red Hat
GlusterFS and Hadoop
06/22/15 2
Agenda
● What is BigData
● Hadoop and its Evolution
● Hadoop Acrchitecture and Components
● Hadoop and GlusterFS (glusterfs-hadoop plugin)
● Advantages of using GlusterFS with Hadoop
● References
06/22/15 3
What is BigData
● Software solutions mostly capture, maintain and manage data
● Storing data
● Processing data
● Growing data size in current world – big data generators
● Sensors
● CC Cam
● Social networks
● Online shopping portals
● Airlines
● Hospitality
06/22/15 4
Agenda
● What is BigData
● Hadoop and its Evolution
● Hadoop Acrchitecture and Components
● Hadoop and GlusterFS (glusterfs-hadoop plugin)
● Advantages of using GlusterFS with Hadoop
● References
06/22/15 5
What is BigData
● 90% of total data today we have, got generated in last 2 years
● 1990
● HDD: 1-20 GB, RAM: 14-128 MB, Speed: 10kbps
● 2014
● HDD: 0.5-1 TB, RAM: 1-16 GB, Speed: 100 mbps
●
● 3 Factors which define BigData
● Volume
● Velocity
● Variety (unstructured and semi structured data)
06/22/15 6
What is BigData
● SAN – Storage Area Network
● One option – Store the data on data centers and get them on need
basis and computation performed on them to process
● Computation is processor bound and a limit on the same
● As the size of the data increases we need more and more
computation as well and its not possible to perform the same on local
machine
● Solution - sending computation to the storage node and get the
processed data is better option (size of computation would be small)
06/22/15 7
Hadoop Evolution
● Started with Google – white papers
● GFS (Google File System) 2003 - Storage
● MapReduce 2004 – Computation
● Yahoo
● HDFS (Hadoop Distributed File System) - 2006,7
● MapReduce (Computation mechanism) – 2007,8
● Doug Cutting and Michael Cafarrela from Yahoo
● Logo Elephant
● Apache foundation (2005 Yahoo donated)
06/22/15 8
Hadoop Architecture /
Components
● Framework of tools – not an application in entirety
● Used for supporting running of applications on BigData
● Opensource'd set of tools distributed under Apache license
● Traditional Approach for handling huge data
● Powerful computer with big storage and computation capacity
● Limited by processing power of the computer with growing data
● Hadoop approach
● Break up data into smaller pieces and distribute to multiple
computers
● Breaks the computation as well into smaller pieces and distributes
them
● Combined results returned back
06/22/15 9
Hadoop Architecture /
Components
● Map Reduce
● Job Tracker
● Task Tracker
● HDFS
● Name Node
● Data Node
● Applications contact the master node, a task is formed and submitted
to the Task Tracker
● Task Tracker maintains a queue of the tasks and gets them
processed using the Task Tracker and Data Nodes
● Consolidates the result and sends back to the application
06/22/15 10
Hadoop Architecture /
Components
● Hadoop works on a distributed model
● Numerous low cost computers – commodity hardware
● Hadoop components
● Slaves
– Task Tracker – process smaller piece of task assigned
– Data Node – manage the piece of data distributed to this node
● Master
– Job Tracker – tracks the overall task
– Name Node – maintains the index of the data blocks stored on
different nodes
– Task Tracker
– Data Node
06/22/15 11
Hadoop Architecture /
Components
Task
Tracker
Data
Node
Job
Tracker
Name
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Applications
Master
Slaves
Queue
06/22/15 12
Hadoop Architecture /
Components
Task
Tracker
Data
Node
Job
Tracker
Name
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Applications
Master
Slaves
06/22/15 13
Hadoop Architecture /
Components
Task
Tracker
Data
Node
Job
Tracker
Name
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Applications
Master
Slaves
06/22/15 14
Hadoop and GlusterFS
● GlusterFS is a general purpose scale-out distributed file-
system supporting thousands of clients
● Aggregates storage exports over network interconnect to
provide a single unified namespace
● File-system
completely in
userspace, runs on
commodity
hardware
● Layered on disk file
systems that
support extended
attributes
06/22/15 15
● Hadoop contains set of daemons running in the system
● Name Node – centralized metadata node
● Job Tracker – overall task distribution across data nodes
● Task Tracker – on data nodes to maintain task
● Data Node – to store data
● Hadoop = Map Reduce framework + HDFS
● GlusterFS can be a replacement for HDFS
● glusterfs-hadoop-plugin
● Java module which implements Hadoop file system interface
● Simple a JAR file which could be kept in Hadoop libraries
● Replaces HDFS for glusterfs
Hadoop and GlusterFS
06/22/15 16
Hadoop and GlusterFS
● Data locality is ensured by Job Tracker
● Using glusterfs-hadoop-plugin ensures data locality by getting the gluster
volumes mounted as fuse mount
● Effectively no name node involved
● Only clients where map-reduce job runs
● And data nodes to store data
● Glusterfs-hadoop-plugin talks to glusterfs using fuse mounts
● In absence of name node, plugin uses xfattrs mechanism to get the details
from volume and consolidates the data using the same
● Reads the data directly from the bricks and bypasses the volume as such for
improved performance
06/22/15 17
Hadoop and GlusterFS
● As simple as to execute map reduce daemon and then submit the hadoop
task to use glusterfs as storage
● Analytics uses – using HDFS makes files moving around the nodes whereas
glusterfs just need to fuse mount the volume and no moving around the files
06/22/15 18
Advantages
● Elimination of centralized metadata server (name node)
● Compatibility with MapReduce and Hadoop based applications
● Elimination of code rewrites for Hadoop enablement of glusterfs
● Fault tolerant file system
● Allows co-location of compute and data nodes and ability to run Hadoop jobs
across multiple namespaces using multiple glusterfs volumes
● Data access through serveral different mechanisms / protocols (Fuse, NFS,
SMB and SWIFT …. and of course Hadoop)
06/22/15 19
References
● https://github.com/gluster/glusterfs-hadoop
● https://forge.gluster.org/hadoop/pages/Home
●
● shubhendu @ #gluster on freenode
06/22/15 20
Deployment Scenario
06/22/15 21
THANK YOU!

Weitere ähnliche Inhalte

Was ist angesagt?

Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015Vijay Bellur
 
Gluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvGluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvSahina Bose
 
Storage as a Service with Gluster
Storage as a Service with GlusterStorage as a Service with Gluster
Storage as a Service with GlusterVijay Bellur
 
Efficient data maintaince in GlusterFS using Databases
Efficient data maintaince in GlusterFS using DatabasesEfficient data maintaince in GlusterFS using Databases
Efficient data maintaince in GlusterFS using DatabasesJoseph Elwin Fernandes
 
Gluster Storage
Gluster StorageGluster Storage
Gluster StorageRaz Tamir
 
Red Hat Gluster Storage : GlusterFS
Red Hat Gluster Storage : GlusterFSRed Hat Gluster Storage : GlusterFS
Red Hat Gluster Storage : GlusterFSbipin kunal
 
Performance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fsPerformance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fsNeependra Khare
 
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS
 
Gluster.next feb-2016
Gluster.next feb-2016Gluster.next feb-2016
Gluster.next feb-2016Vijay Bellur
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterRed_Hat_Storage
 
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...Tommy Lee
 
Gluster Webinar: Introduction to GlusterFS v3.3
Gluster Webinar: Introduction to GlusterFS v3.3Gluster Webinar: Introduction to GlusterFS v3.3
Gluster Webinar: Introduction to GlusterFS v3.3GlusterFS
 
Gluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephantGluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephantGluster.org
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed_Hat_Storage
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Divejoshdurgin
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelonaGluster.org
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed_Hat_Storage
 
Red Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed_Hat_Storage
 

Was ist angesagt? (20)

Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015
 
Gluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvGluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlv
 
Storage as a Service with Gluster
Storage as a Service with GlusterStorage as a Service with Gluster
Storage as a Service with Gluster
 
Gluster Data Tiering
Gluster Data TieringGluster Data Tiering
Gluster Data Tiering
 
Efficient data maintaince in GlusterFS using Databases
Efficient data maintaince in GlusterFS using DatabasesEfficient data maintaince in GlusterFS using Databases
Efficient data maintaince in GlusterFS using Databases
 
Gluster Storage
Gluster StorageGluster Storage
Gluster Storage
 
Red Hat Gluster Storage : GlusterFS
Red Hat Gluster Storage : GlusterFSRed Hat Gluster Storage : GlusterFS
Red Hat Gluster Storage : GlusterFS
 
Performance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fsPerformance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fs
 
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 Meetup
 
Gluster.next feb-2016
Gluster.next feb-2016Gluster.next feb-2016
Gluster.next feb-2016
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on gluster
 
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
 
Gluster Webinar: Introduction to GlusterFS v3.3
Gluster Webinar: Introduction to GlusterFS v3.3Gluster Webinar: Introduction to GlusterFS v3.3
Gluster Webinar: Introduction to GlusterFS v3.3
 
Gluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephantGluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephant
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS Plans
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Dive
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelona
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
 
Gluster d2
Gluster d2Gluster d2
Gluster d2
 
Red Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep Dive
 

Andere mochten auch

Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksMarian Marinov
 
BMC BSM - Automate Service Management System
BMC BSM - Automate Service Management SystemBMC BSM - Automate Service Management System
BMC BSM - Automate Service Management SystemVyom Labs
 
Data Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerData Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerDataWorks Summit
 
Hack Into Drupal Sites (or, How to Secure Your Drupal Site)
Hack Into Drupal Sites (or, How to Secure Your Drupal Site)Hack Into Drupal Sites (or, How to Secure Your Drupal Site)
Hack Into Drupal Sites (or, How to Secure Your Drupal Site)nyccamp
 
Fibre Channel 基礎講座
Fibre Channel 基礎講座Fibre Channel 基礎講座
Fibre Channel 基礎講座Brocade
 
Software Quality Plan
Software Quality PlanSoftware Quality Plan
Software Quality Planguy_davis
 
AWS를 활용한 미디어 스트리밍 서비스
AWS를 활용한 미디어 스트리밍 서비스AWS를 활용한 미디어 스트리밍 서비스
AWS를 활용한 미디어 스트리밍 서비스Amazon Web Services Korea
 
Fast+plants+essay
Fast+plants+essayFast+plants+essay
Fast+plants+essayjespinal5
 
Hematology learning guide
Hematology learning guide Hematology learning guide
Hematology learning guide Fidaa Jaafrah
 
Furan Testing of Transformers Oil
Furan Testing of Transformers OilFuran Testing of Transformers Oil
Furan Testing of Transformers OilNitish Kumar
 
2015 Largest Healthcare Staffing Firms in the US
2015 Largest Healthcare Staffing Firms in the US2015 Largest Healthcare Staffing Firms in the US
2015 Largest Healthcare Staffing Firms in the USBrian Snyder
 
Cách làm Email marketing thành công!
Cách làm Email marketing thành công!Cách làm Email marketing thành công!
Cách làm Email marketing thành công!missbik
 
Cowboy tools and attire
Cowboy tools and attireCowboy tools and attire
Cowboy tools and attireChristianN2T
 

Andere mochten auch (18)

Gfs vs hdfs
Gfs vs hdfsGfs vs hdfs
Gfs vs hdfs
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
 
BMC BSM - Automate Service Management System
BMC BSM - Automate Service Management SystemBMC BSM - Automate Service Management System
BMC BSM - Automate Service Management System
 
Data Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerData Science with Hadoop: A Primer
Data Science with Hadoop: A Primer
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Hack Into Drupal Sites (or, How to Secure Your Drupal Site)
Hack Into Drupal Sites (or, How to Secure Your Drupal Site)Hack Into Drupal Sites (or, How to Secure Your Drupal Site)
Hack Into Drupal Sites (or, How to Secure Your Drupal Site)
 
Gourmet Company Presentation
Gourmet Company PresentationGourmet Company Presentation
Gourmet Company Presentation
 
Fibre Channel 基礎講座
Fibre Channel 基礎講座Fibre Channel 基礎講座
Fibre Channel 基礎講座
 
Medical Graphs
Medical GraphsMedical Graphs
Medical Graphs
 
Software Quality Plan
Software Quality PlanSoftware Quality Plan
Software Quality Plan
 
AWS를 활용한 미디어 스트리밍 서비스
AWS를 활용한 미디어 스트리밍 서비스AWS를 활용한 미디어 스트리밍 서비스
AWS를 활용한 미디어 스트리밍 서비스
 
Fast+plants+essay
Fast+plants+essayFast+plants+essay
Fast+plants+essay
 
Hematology learning guide
Hematology learning guide Hematology learning guide
Hematology learning guide
 
Furan Testing of Transformers Oil
Furan Testing of Transformers OilFuran Testing of Transformers Oil
Furan Testing of Transformers Oil
 
2015 Largest Healthcare Staffing Firms in the US
2015 Largest Healthcare Staffing Firms in the US2015 Largest Healthcare Staffing Firms in the US
2015 Largest Healthcare Staffing Firms in the US
 
Cách làm Email marketing thành công!
Cách làm Email marketing thành công!Cách làm Email marketing thành công!
Cách làm Email marketing thành công!
 
Cowboy tools and attire
Cowboy tools and attireCowboy tools and attire
Cowboy tools and attire
 
Selenium at Salesforce Scale
Selenium at Salesforce ScaleSelenium at Salesforce Scale
Selenium at Salesforce Scale
 

Ähnlich wie Glusterfs and Hadoop

Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache HadoopSufi Nawaz
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshersrajkamaltibacademy
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFSMySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFSMats Kindahl
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingSam Ng
 
Bigdata and Hadoop with Docker
Bigdata and Hadoop with DockerBigdata and Hadoop with Docker
Bigdata and Hadoop with Dockerharidasnss
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding HadoopAhmed Ossama
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to HadoopAnandMHadoop
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methodspaperpublications3
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introductionFrans van Noort
 
Finding URL pattern with MapReduce and Apache Hadoop
Finding URL pattern with MapReduce and Apache HadoopFinding URL pattern with MapReduce and Apache Hadoop
Finding URL pattern with MapReduce and Apache HadoopNushrat
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Sumeet Singh
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar ReportAtul Kushwaha
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415SANTOSH WAYAL
 

Ähnlich wie Glusterfs and Hadoop (20)

Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache Hadoop
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFSMySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data Processing
 
Bigdata and Hadoop with Docker
Bigdata and Hadoop with DockerBigdata and Hadoop with Docker
Bigdata and Hadoop with Docker
 
Training
TrainingTraining
Training
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Introduce to spark
Introduce to sparkIntroduce to spark
Introduce to spark
 
Finding URL pattern with MapReduce and Apache Hadoop
Finding URL pattern with MapReduce and Apache HadoopFinding URL pattern with MapReduce and Apache Hadoop
Finding URL pattern with MapReduce and Apache Hadoop
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
 

Kürzlich hochgeladen

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Kürzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Glusterfs and Hadoop

  • 1. Shubhendu Tripathi PSE – Red Hat GlusterFS and Hadoop
  • 2. 06/22/15 2 Agenda ● What is BigData ● Hadoop and its Evolution ● Hadoop Acrchitecture and Components ● Hadoop and GlusterFS (glusterfs-hadoop plugin) ● Advantages of using GlusterFS with Hadoop ● References
  • 3. 06/22/15 3 What is BigData ● Software solutions mostly capture, maintain and manage data ● Storing data ● Processing data ● Growing data size in current world – big data generators ● Sensors ● CC Cam ● Social networks ● Online shopping portals ● Airlines ● Hospitality
  • 4. 06/22/15 4 Agenda ● What is BigData ● Hadoop and its Evolution ● Hadoop Acrchitecture and Components ● Hadoop and GlusterFS (glusterfs-hadoop plugin) ● Advantages of using GlusterFS with Hadoop ● References
  • 5. 06/22/15 5 What is BigData ● 90% of total data today we have, got generated in last 2 years ● 1990 ● HDD: 1-20 GB, RAM: 14-128 MB, Speed: 10kbps ● 2014 ● HDD: 0.5-1 TB, RAM: 1-16 GB, Speed: 100 mbps ● ● 3 Factors which define BigData ● Volume ● Velocity ● Variety (unstructured and semi structured data)
  • 6. 06/22/15 6 What is BigData ● SAN – Storage Area Network ● One option – Store the data on data centers and get them on need basis and computation performed on them to process ● Computation is processor bound and a limit on the same ● As the size of the data increases we need more and more computation as well and its not possible to perform the same on local machine ● Solution - sending computation to the storage node and get the processed data is better option (size of computation would be small)
  • 7. 06/22/15 7 Hadoop Evolution ● Started with Google – white papers ● GFS (Google File System) 2003 - Storage ● MapReduce 2004 – Computation ● Yahoo ● HDFS (Hadoop Distributed File System) - 2006,7 ● MapReduce (Computation mechanism) – 2007,8 ● Doug Cutting and Michael Cafarrela from Yahoo ● Logo Elephant ● Apache foundation (2005 Yahoo donated)
  • 8. 06/22/15 8 Hadoop Architecture / Components ● Framework of tools – not an application in entirety ● Used for supporting running of applications on BigData ● Opensource'd set of tools distributed under Apache license ● Traditional Approach for handling huge data ● Powerful computer with big storage and computation capacity ● Limited by processing power of the computer with growing data ● Hadoop approach ● Break up data into smaller pieces and distribute to multiple computers ● Breaks the computation as well into smaller pieces and distributes them ● Combined results returned back
  • 9. 06/22/15 9 Hadoop Architecture / Components ● Map Reduce ● Job Tracker ● Task Tracker ● HDFS ● Name Node ● Data Node ● Applications contact the master node, a task is formed and submitted to the Task Tracker ● Task Tracker maintains a queue of the tasks and gets them processed using the Task Tracker and Data Nodes ● Consolidates the result and sends back to the application
  • 10. 06/22/15 10 Hadoop Architecture / Components ● Hadoop works on a distributed model ● Numerous low cost computers – commodity hardware ● Hadoop components ● Slaves – Task Tracker – process smaller piece of task assigned – Data Node – manage the piece of data distributed to this node ● Master – Job Tracker – tracks the overall task – Name Node – maintains the index of the data blocks stored on different nodes – Task Tracker – Data Node
  • 11. 06/22/15 11 Hadoop Architecture / Components Task Tracker Data Node Job Tracker Name Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Applications Master Slaves Queue
  • 12. 06/22/15 12 Hadoop Architecture / Components Task Tracker Data Node Job Tracker Name Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Applications Master Slaves
  • 13. 06/22/15 13 Hadoop Architecture / Components Task Tracker Data Node Job Tracker Name Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Applications Master Slaves
  • 14. 06/22/15 14 Hadoop and GlusterFS ● GlusterFS is a general purpose scale-out distributed file- system supporting thousands of clients ● Aggregates storage exports over network interconnect to provide a single unified namespace ● File-system completely in userspace, runs on commodity hardware ● Layered on disk file systems that support extended attributes
  • 15. 06/22/15 15 ● Hadoop contains set of daemons running in the system ● Name Node – centralized metadata node ● Job Tracker – overall task distribution across data nodes ● Task Tracker – on data nodes to maintain task ● Data Node – to store data ● Hadoop = Map Reduce framework + HDFS ● GlusterFS can be a replacement for HDFS ● glusterfs-hadoop-plugin ● Java module which implements Hadoop file system interface ● Simple a JAR file which could be kept in Hadoop libraries ● Replaces HDFS for glusterfs Hadoop and GlusterFS
  • 16. 06/22/15 16 Hadoop and GlusterFS ● Data locality is ensured by Job Tracker ● Using glusterfs-hadoop-plugin ensures data locality by getting the gluster volumes mounted as fuse mount ● Effectively no name node involved ● Only clients where map-reduce job runs ● And data nodes to store data ● Glusterfs-hadoop-plugin talks to glusterfs using fuse mounts ● In absence of name node, plugin uses xfattrs mechanism to get the details from volume and consolidates the data using the same ● Reads the data directly from the bricks and bypasses the volume as such for improved performance
  • 17. 06/22/15 17 Hadoop and GlusterFS ● As simple as to execute map reduce daemon and then submit the hadoop task to use glusterfs as storage ● Analytics uses – using HDFS makes files moving around the nodes whereas glusterfs just need to fuse mount the volume and no moving around the files
  • 18. 06/22/15 18 Advantages ● Elimination of centralized metadata server (name node) ● Compatibility with MapReduce and Hadoop based applications ● Elimination of code rewrites for Hadoop enablement of glusterfs ● Fault tolerant file system ● Allows co-location of compute and data nodes and ability to run Hadoop jobs across multiple namespaces using multiple glusterfs volumes ● Data access through serveral different mechanisms / protocols (Fuse, NFS, SMB and SWIFT …. and of course Hadoop)
  • 19. 06/22/15 19 References ● https://github.com/gluster/glusterfs-hadoop ● https://forge.gluster.org/hadoop/pages/Home ● ● shubhendu @ #gluster on freenode