SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
GlusterFS For Hadoop– Overview
Vijay Bellur
GlusterFS co-maintainer
Lalatendu Mohanty
GlusterFS Community
12/05/15
Agenda
● What is GlusterFS?
● Overview
● Use Cases
● Hadoop on GlusterFS
● Q&A
12/05/15
What is GlusterFS?
● A general purpose scale-out distributed file
system.
● Aggregates storage exports over network
interconnect to provide a single unified namespace.
● Filesystem is stackable and completely in
userspace.
● Layered on disk file systems that support extended
attributes.
12/05/15
Typical GlusterFS Deployment
Global namespace
Scale-out storage
building blocks
Supports
thousands of clients
Access using
GlusterFS native,
NFS, SMB and HTTP
protocols
Linear performance
scaling
12/05/15
GlusterFS Architecture – Foundations
● Software only, runs on commodity hardware
● No external metadata servers
● Scale-out with Elasticity
● Extensible and modular
● Deployment agnostic
● Unified access
● Largely POSIX compliant
12/05/15
Concepts & Algorithms
12/05/15
GlusterFS concepts – Trusted Storage Pool
● Trusted Storage Pool (cluster) is a collection of storage
servers.
● Trusted Storage Pool is formed by invitation – “probe” a
new member from the cluster and not vice versa.
● Membership information used for determining quorum.
● Members can be dynamically added and removed from
the pool.
12/05/15

A brick is the combination of a node and an export directory – for e.g. hostname:/dir

Each brick inherits limits of the underlying filesystem

No limit on the number bricks per node

Ideally, each brick in a cluster should be of the same size
/export3 /export3 /export3
Storage Node
/export1
Storage Node
/export2
/export1
/export2
/export4
/export5
Storage Node
/export1
/export2
3 bricks 5 bricks 3 bricks
GlusterFS concepts - Bricks
12/05/15
GlusterFS concepts - Volumes
● A volume is a logical collection of bricks.
● Volume is identified by an administrator provided name.
● Volume is a mountable entity and the volume name is
provided at the time of mounting.
– mount -t glusterfs server1:/<volname> /my/mnt/point
● Bricks from the same node can be part of different
volumes
12/05/15
GlusterFS concepts - Volumes
Node2Node1 Node3
/export/brick1
/export/brick2
/export/brick1
/export/brick2
/export/brick1
/export/brick2
music
Videos
12/05/15
Volume Types
➢Type of a volume is specified at the time of volume
creation
➢ Volume type determines how and where data is
placed
➢ Following volume types are supported in
glusterfs:
a) Distribute
b) Stripe
c) Replication
d) Distributed Replicate
e) Striped Replicate
f) Distributed Striped Replicate
12/05/15
Distributed Volume
➢Distributes files across various bricks of the volume.
➢Directories are present on all bricks of the volume.
➢Single brick failure will result in loss of data availability.
➢Removes the need for an external meta data server.
12/05/15
How does a distributed volume work?
➢ Uses Davies-Meyer hash algorithm.
➢ A 32-bit hash space is divided into N ranges for N bricks
➢ At the time of directory creation, a range is assigned to each
directory.
➢ During a file creation or retrieval, hash is computed on the file
name. This hash value is used to locate or place the file.
➢Different directories in the same brick end up with different
hash ranges.
12/05/15
Replicated Volume
● Synchronous replication of all directory and file
updates.
● Provides high availability of data when node
failures occur.
● Transaction driven for ensuring consistency.
● Changelogs maintained for re-conciliation.
● Any number of replicas can be configured.
12/05/15
How does a replicated volume work?
12/05/15
Distributed Replicated Volume
● Distribute files across replicated bricks
● Number of bricks must be a multiple of the replica count
● Ordering of bricks in volume definition matters
● Scaling and high availability
● Reads get load balanced.
● Most preferred model of deployment currently.
12/05/15
Distributed Replicated Volume
12/05/15
Striped Volume
● Files are striped into chunks and placed in various
bricks.
● Recommended only when very large files greater than
the size of the disks are present.
● A brick failure can result in data loss. Redundancy with
replication is highly recommended (striped replicated
volumes).
12/05/15
Elastic Volume Management
Application transparent operations that can be performed
in the storage layer.
● Addition of Bricks to a volume
● Remove brick from a volume
● Rebalance data spread within a volume
● Replace a brick in a volume
● Performance / Functionality tuning
12/05/15
Access Mechanisms
Gluster volumes can be accessed via the following
mechanisms:
– FUSE based Native protocol
– NFSv3 and v4
– SMB
– libgfapi
– ReST/HTTP
– HDFS
12/05/15
Implementation
12/05/15
Translators in GlusterFS
● Building blocks for a GlusterFS process.
● Based on Translators in GNU HURD.
● Each translator is a functional unit.
● Translators can be stacked together for
achieving desired functionality.
● Translators are deployment agnostic – can be
loaded in either the client or server stacks.
12/05/15
Customizable Translator Stack
12/05/15
Ecosystem Integration
● Currently integrated with various ecosystems:
● OpenStack
● Samba
● Ganesha
● oVirt
● qemu
● Hadoop
● pcp
● Proxmox
● uWSGI
12/05/15
12/05/15
Use Cases - current
● Unstructured data storage
● Archival
● Disaster Recovery
● Virtual Machine Image Store
● Cloud Storage for Service Providers
● Content Cloud
● Big Data
● Semi-structured & Structured data
12/05/15
Hadoop And GlusterFS
● GlusterFS can be used for Hadoop
● GlusterFS Hadoop plugin replaces HDFS with GlusterFS
● MapReduce jobs can be run on GlusterFS volumes.
● https://github.com/gluster/glusterfs-hadoop
12/05/15
Advantage Of Using GlusterFS
● Advantage of a POSIX compliant filesystem.
● Same volume/storage can be used for MapReduce and storing
application data.
● E.g. : log files, unstructured data.
● No need to copy data from storage to HDFS for running MapReduce.
● No need for “NameNode” i.e. metadata server.
● Advantage of GlusterFS features (e.g. Geo-replication, Erasure
Coding)
● Geo-replication is a distributed, continuous, asynchronous, and
incremental replication service for disastrous recovery
● It can replicate data from one site to another over Local Area
Networks (LANs), Wide Area Networks (WANs), and the
Internet.
12/05/15
Advantage Of Using GlusterFS
● Erasure Coding provides the fundamental technology for storage
systems to add redundancy and tolerate failures.
● On GlusterFS, MapReduce jobs use “data locality optimization”.
● That means Hadoop tries its best to run map tasks on nodes
where the data is present locally to optimize on the network and
inter-node communication latency.
● GlusterFS works with Apache Spark Project and The Apache Ambari
project.
12/05/15
Apache Spark Project
● Apache Spark is an open-source data analytics cluster
computing framework
● Spark fits into the Hadoop open-source community, building on
top of the Hadoop Distributed File System (HDFS)
● Spark is not tied to the two-stage MapReduce paradigm and
promises performance up to 100 times faster than Hadoop
MapReduce, for certain applications.
● Spark provides primitives for in-memory cluster computing.
● https://spark.apache.org/docs/0.8.1/cluster-overview.html
12/05/15
Apache Ambari Project
● The Apache Ambari project is for provisioning, managing, and
monitoring Apache Hadoop clusters.
● It provides an intuitive, easy-to-use Hadoop management web
UI backed by its RESTful APIs.
● Apache Ambari project supports the automated deployment and
configuration of Hadoop on top of GlusterFS.
● http://www.gluster.org/2013/10/automated-hadoop-deploymen
t-on-glusterfs-with-apache-ambari/
● http://ambari.apache.org/
12/05/15
Hadoop access
12/05/15
Resources
Mailing lists:
gluster-users@gluster.org
gluster-devel@nongnu.org
IRC:
#gluster and #gluster-dev on freenode
Links:
http://www.gluster.org
http://hekafs.org
http://forge.gluster.org
http://www.gluster.org/community/documentation/index.php/Arch
http://hadoopecosystemtable.github.io/
Thank you!
Lalatendu Mohanty
lmohanty@redhat.com
Twitter: @lalatenduM

Weitere ähnliche Inhalte

Was ist angesagt?

Red Hat Gluster Storage - Direction, Roadmap and Use-Cases
Red Hat Gluster Storage - Direction, Roadmap and Use-CasesRed Hat Gluster Storage - Direction, Roadmap and Use-Cases
Red Hat Gluster Storage - Direction, Roadmap and Use-CasesRed_Hat_Storage
 
Gluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & TricksGluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & TricksGlusterFS
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSGlusterFS
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelonaGluster.org
 
Sdc challenges-2012
Sdc challenges-2012Sdc challenges-2012
Sdc challenges-2012Gluster.org
 
Efficient data maintaince in GlusterFS using Databases
Efficient data maintaince in GlusterFS using DatabasesEfficient data maintaince in GlusterFS using Databases
Efficient data maintaince in GlusterFS using DatabasesJoseph Elwin Fernandes
 
Storage as a Service with Gluster
Storage as a Service with GlusterStorage as a Service with Gluster
Storage as a Service with GlusterVijay Bellur
 
Gluster Storage
Gluster StorageGluster Storage
Gluster StorageRaz Tamir
 
Gluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephantGluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephantGluster.org
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterRed_Hat_Storage
 
Lisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionLisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionGluster.org
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveGluster.org
 
Performance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fsPerformance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fsNeependra Khare
 
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...Tommy Lee
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster PerformanceGluster.org
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Divejoshdurgin
 

Was ist angesagt? (20)

Red Hat Gluster Storage - Direction, Roadmap and Use-Cases
Red Hat Gluster Storage - Direction, Roadmap and Use-CasesRed Hat Gluster Storage - Direction, Roadmap and Use-Cases
Red Hat Gluster Storage - Direction, Roadmap and Use-Cases
 
Gluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & TricksGluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & Tricks
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFS
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelona
 
Sdc challenges-2012
Sdc challenges-2012Sdc challenges-2012
Sdc challenges-2012
 
Efficient data maintaince in GlusterFS using Databases
Efficient data maintaince in GlusterFS using DatabasesEfficient data maintaince in GlusterFS using Databases
Efficient data maintaince in GlusterFS using Databases
 
Storage as a Service with Gluster
Storage as a Service with GlusterStorage as a Service with Gluster
Storage as a Service with Gluster
 
Gluster Storage
Gluster StorageGluster Storage
Gluster Storage
 
Gluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephantGluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephant
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on gluster
 
Gluster d2
Gluster d2Gluster d2
Gluster d2
 
Lisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionLisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introduction
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep Dive
 
Performance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fsPerformance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fs
 
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
 
Dedupe nmamit
Dedupe nmamitDedupe nmamit
Dedupe nmamit
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster Performance
 
YDAL Barcelona
YDAL BarcelonaYDAL Barcelona
YDAL Barcelona
 
Ceph Research at UCSC
Ceph Research at UCSCCeph Research at UCSC
Ceph Research at UCSC
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Dive
 

Ähnlich wie GlusterFS And Big Data

GlusterFS Talk for CentOS Dojo Bangalore
GlusterFS Talk for CentOS Dojo BangaloreGlusterFS Talk for CentOS Dojo Bangalore
GlusterFS Talk for CentOS Dojo BangaloreRaghavendra Talur
 
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013Gluster.org
 
Gluster fs architecture_&amp;_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster fs architecture_&amp;_roadmap-vijay_bellur-linuxcon_eu_2013Gluster fs architecture_&amp;_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster fs architecture_&amp;_roadmap-vijay_bellur-linuxcon_eu_2013Gluster.org
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdoseGluster.org
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdoseGluster.org
 
Gluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvGluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvSahina Bose
 
Introducing gluster filesystem by aditya
Introducing gluster filesystem by adityaIntroducing gluster filesystem by aditya
Introducing gluster filesystem by adityaAditya Chhikara
 
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Atin Mukherjee
 
Scale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_glusterScale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_glusterGluster.org
 
Scale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_glusterScale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_glusterGluster.org
 
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vosOSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vosNETWAYS
 
State of the_gluster_-_lceu
State of the_gluster_-_lceuState of the_gluster_-_lceu
State of the_gluster_-_lceuGluster.org
 
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
Celi @Codemotion 2014 - Roberto Franchini GlusterFSCeli @Codemotion 2014 - Roberto Franchini GlusterFS
Celi @Codemotion 2014 - Roberto Franchini GlusterFSCELI
 
GlusterFs: a scalable file system for today's and tomorrow's big data
GlusterFs: a scalable file system for today's and tomorrow's big dataGlusterFs: a scalable file system for today's and tomorrow's big data
GlusterFs: a scalable file system for today's and tomorrow's big dataRoberto Franchini
 
GlusterFS : un file system open source per i big data di oggi e domani - Robe...
GlusterFS : un file system open source per i big data di oggi e domani - Robe...GlusterFS : un file system open source per i big data di oggi e domani - Robe...
GlusterFS : un file system open source per i big data di oggi e domani - Robe...Codemotion
 
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS
 
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized StoreGlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized StoreAtin Mukherjee
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storageGluster.org
 

Ähnlich wie GlusterFS And Big Data (20)

GlusterFS Talk for CentOS Dojo Bangalore
GlusterFS Talk for CentOS Dojo BangaloreGlusterFS Talk for CentOS Dojo Bangalore
GlusterFS Talk for CentOS Dojo Bangalore
 
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
 
Gluster fs architecture_&amp;_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster fs architecture_&amp;_roadmap-vijay_bellur-linuxcon_eu_2013Gluster fs architecture_&amp;_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster fs architecture_&amp;_roadmap-vijay_bellur-linuxcon_eu_2013
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
Gluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvGluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlv
 
Introducing gluster filesystem by aditya
Introducing gluster filesystem by adityaIntroducing gluster filesystem by aditya
Introducing gluster filesystem by aditya
 
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
 
Scale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_glusterScale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_gluster
 
Scale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_glusterScale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_gluster
 
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vosOSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
 
State of the_gluster_-_lceu
State of the_gluster_-_lceuState of the_gluster_-_lceu
State of the_gluster_-_lceu
 
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
Celi @Codemotion 2014 - Roberto Franchini GlusterFSCeli @Codemotion 2014 - Roberto Franchini GlusterFS
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
 
GlusterFs: a scalable file system for today's and tomorrow's big data
GlusterFs: a scalable file system for today's and tomorrow's big dataGlusterFs: a scalable file system for today's and tomorrow's big data
GlusterFs: a scalable file system for today's and tomorrow's big data
 
GlusterFS : un file system open source per i big data di oggi e domani - Robe...
GlusterFS : un file system open source per i big data di oggi e domani - Robe...GlusterFS : un file system open source per i big data di oggi e domani - Robe...
GlusterFS : un file system open source per i big data di oggi e domani - Robe...
 
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 Meetup
 
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
 
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized StoreGlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
 
Gluster d2
Gluster d2Gluster d2
Gluster d2
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storage
 

Mehr von Lalatendu Mohanty

Confident OpenShift Upgrades with the Update Graph.pdf
Confident OpenShift Upgrades with the Update Graph.pdfConfident OpenShift Upgrades with the Update Graph.pdf
Confident OpenShift Upgrades with the Update Graph.pdfLalatendu Mohanty
 
Reproducible development to live applications with Red Hat CDK and Red Hat Op...
Reproducible development to live applications with Red Hat CDK and Red Hat Op...Reproducible development to live applications with Red Hat CDK and Red Hat Op...
Reproducible development to live applications with Red Hat CDK and Red Hat Op...Lalatendu Mohanty
 
OpenShift As A DevOps Platform
OpenShift As A DevOps PlatformOpenShift As A DevOps Platform
OpenShift As A DevOps PlatformLalatendu Mohanty
 
Red Hat Container Development Kit
Red Hat Container Development KitRed Hat Container Development Kit
Red Hat Container Development KitLalatendu Mohanty
 
Introduction to docker and docker compose
Introduction to docker and docker composeIntroduction to docker and docker compose
Introduction to docker and docker composeLalatendu Mohanty
 
Developer workflow with docker
Developer workflow with dockerDeveloper workflow with docker
Developer workflow with dockerLalatendu Mohanty
 
Bringing-it-all-together-overview-of-rpm-packaging-in-fedora
Bringing-it-all-together-overview-of-rpm-packaging-in-fedoraBringing-it-all-together-overview-of-rpm-packaging-in-fedora
Bringing-it-all-together-overview-of-rpm-packaging-in-fedoraLalatendu Mohanty
 
Running A SIG in CentOS @Devconf Brno 2014
Running A SIG in CentOS @Devconf Brno 2014Running A SIG in CentOS @Devconf Brno 2014
Running A SIG in CentOS @Devconf Brno 2014Lalatendu Mohanty
 
Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)Lalatendu Mohanty
 

Mehr von Lalatendu Mohanty (14)

Confident OpenShift Upgrades with the Update Graph.pdf
Confident OpenShift Upgrades with the Update Graph.pdfConfident OpenShift Upgrades with the Update Graph.pdf
Confident OpenShift Upgrades with the Update Graph.pdf
 
Reproducible development to live applications with Red Hat CDK and Red Hat Op...
Reproducible development to live applications with Red Hat CDK and Red Hat Op...Reproducible development to live applications with Red Hat CDK and Red Hat Op...
Reproducible development to live applications with Red Hat CDK and Red Hat Op...
 
OpenShift As A DevOps Platform
OpenShift As A DevOps PlatformOpenShift As A DevOps Platform
OpenShift As A DevOps Platform
 
Contributing To CentOS SIGs
Contributing To CentOS SIGsContributing To CentOS SIGs
Contributing To CentOS SIGs
 
Atomic CLI scan
Atomic CLI scanAtomic CLI scan
Atomic CLI scan
 
Red Hat Container Development Kit
Red Hat Container Development KitRed Hat Container Development Kit
Red Hat Container Development Kit
 
Introduction to docker and docker compose
Introduction to docker and docker composeIntroduction to docker and docker compose
Introduction to docker and docker compose
 
Developer workflow with docker
Developer workflow with dockerDeveloper workflow with docker
Developer workflow with docker
 
Vagrant For DevOps
Vagrant For DevOpsVagrant For DevOps
Vagrant For DevOps
 
Project Atomic-Nulecule
Project Atomic-NuleculeProject Atomic-Nulecule
Project Atomic-Nulecule
 
Bringing-it-all-together-overview-of-rpm-packaging-in-fedora
Bringing-it-all-together-overview-of-rpm-packaging-in-fedoraBringing-it-all-together-overview-of-rpm-packaging-in-fedora
Bringing-it-all-together-overview-of-rpm-packaging-in-fedora
 
Running A SIG in CentOS @Devconf Brno 2014
Running A SIG in CentOS @Devconf Brno 2014Running A SIG in CentOS @Devconf Brno 2014
Running A SIG in CentOS @Devconf Brno 2014
 
Docker quick start
Docker quick startDocker quick start
Docker quick start
 
Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)
 

Kürzlich hochgeladen

%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile EnvironmentVictorSzoltysek
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 

Kürzlich hochgeladen (20)

%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 

GlusterFS And Big Data

  • 1. GlusterFS For Hadoop– Overview Vijay Bellur GlusterFS co-maintainer Lalatendu Mohanty GlusterFS Community
  • 2. 12/05/15 Agenda ● What is GlusterFS? ● Overview ● Use Cases ● Hadoop on GlusterFS ● Q&A
  • 3. 12/05/15 What is GlusterFS? ● A general purpose scale-out distributed file system. ● Aggregates storage exports over network interconnect to provide a single unified namespace. ● Filesystem is stackable and completely in userspace. ● Layered on disk file systems that support extended attributes.
  • 4. 12/05/15 Typical GlusterFS Deployment Global namespace Scale-out storage building blocks Supports thousands of clients Access using GlusterFS native, NFS, SMB and HTTP protocols Linear performance scaling
  • 5. 12/05/15 GlusterFS Architecture – Foundations ● Software only, runs on commodity hardware ● No external metadata servers ● Scale-out with Elasticity ● Extensible and modular ● Deployment agnostic ● Unified access ● Largely POSIX compliant
  • 7. 12/05/15 GlusterFS concepts – Trusted Storage Pool ● Trusted Storage Pool (cluster) is a collection of storage servers. ● Trusted Storage Pool is formed by invitation – “probe” a new member from the cluster and not vice versa. ● Membership information used for determining quorum. ● Members can be dynamically added and removed from the pool.
  • 8. 12/05/15  A brick is the combination of a node and an export directory – for e.g. hostname:/dir  Each brick inherits limits of the underlying filesystem  No limit on the number bricks per node  Ideally, each brick in a cluster should be of the same size /export3 /export3 /export3 Storage Node /export1 Storage Node /export2 /export1 /export2 /export4 /export5 Storage Node /export1 /export2 3 bricks 5 bricks 3 bricks GlusterFS concepts - Bricks
  • 9. 12/05/15 GlusterFS concepts - Volumes ● A volume is a logical collection of bricks. ● Volume is identified by an administrator provided name. ● Volume is a mountable entity and the volume name is provided at the time of mounting. – mount -t glusterfs server1:/<volname> /my/mnt/point ● Bricks from the same node can be part of different volumes
  • 10. 12/05/15 GlusterFS concepts - Volumes Node2Node1 Node3 /export/brick1 /export/brick2 /export/brick1 /export/brick2 /export/brick1 /export/brick2 music Videos
  • 11. 12/05/15 Volume Types ➢Type of a volume is specified at the time of volume creation ➢ Volume type determines how and where data is placed ➢ Following volume types are supported in glusterfs: a) Distribute b) Stripe c) Replication d) Distributed Replicate e) Striped Replicate f) Distributed Striped Replicate
  • 12. 12/05/15 Distributed Volume ➢Distributes files across various bricks of the volume. ➢Directories are present on all bricks of the volume. ➢Single brick failure will result in loss of data availability. ➢Removes the need for an external meta data server.
  • 13. 12/05/15 How does a distributed volume work? ➢ Uses Davies-Meyer hash algorithm. ➢ A 32-bit hash space is divided into N ranges for N bricks ➢ At the time of directory creation, a range is assigned to each directory. ➢ During a file creation or retrieval, hash is computed on the file name. This hash value is used to locate or place the file. ➢Different directories in the same brick end up with different hash ranges.
  • 14. 12/05/15 Replicated Volume ● Synchronous replication of all directory and file updates. ● Provides high availability of data when node failures occur. ● Transaction driven for ensuring consistency. ● Changelogs maintained for re-conciliation. ● Any number of replicas can be configured.
  • 15. 12/05/15 How does a replicated volume work?
  • 16. 12/05/15 Distributed Replicated Volume ● Distribute files across replicated bricks ● Number of bricks must be a multiple of the replica count ● Ordering of bricks in volume definition matters ● Scaling and high availability ● Reads get load balanced. ● Most preferred model of deployment currently.
  • 18. 12/05/15 Striped Volume ● Files are striped into chunks and placed in various bricks. ● Recommended only when very large files greater than the size of the disks are present. ● A brick failure can result in data loss. Redundancy with replication is highly recommended (striped replicated volumes).
  • 19. 12/05/15 Elastic Volume Management Application transparent operations that can be performed in the storage layer. ● Addition of Bricks to a volume ● Remove brick from a volume ● Rebalance data spread within a volume ● Replace a brick in a volume ● Performance / Functionality tuning
  • 20. 12/05/15 Access Mechanisms Gluster volumes can be accessed via the following mechanisms: – FUSE based Native protocol – NFSv3 and v4 – SMB – libgfapi – ReST/HTTP – HDFS
  • 22. 12/05/15 Translators in GlusterFS ● Building blocks for a GlusterFS process. ● Based on Translators in GNU HURD. ● Each translator is a functional unit. ● Translators can be stacked together for achieving desired functionality. ● Translators are deployment agnostic – can be loaded in either the client or server stacks.
  • 24. 12/05/15 Ecosystem Integration ● Currently integrated with various ecosystems: ● OpenStack ● Samba ● Ganesha ● oVirt ● qemu ● Hadoop ● pcp ● Proxmox ● uWSGI
  • 26. 12/05/15 Use Cases - current ● Unstructured data storage ● Archival ● Disaster Recovery ● Virtual Machine Image Store ● Cloud Storage for Service Providers ● Content Cloud ● Big Data ● Semi-structured & Structured data
  • 27. 12/05/15 Hadoop And GlusterFS ● GlusterFS can be used for Hadoop ● GlusterFS Hadoop plugin replaces HDFS with GlusterFS ● MapReduce jobs can be run on GlusterFS volumes. ● https://github.com/gluster/glusterfs-hadoop
  • 28. 12/05/15 Advantage Of Using GlusterFS ● Advantage of a POSIX compliant filesystem. ● Same volume/storage can be used for MapReduce and storing application data. ● E.g. : log files, unstructured data. ● No need to copy data from storage to HDFS for running MapReduce. ● No need for “NameNode” i.e. metadata server. ● Advantage of GlusterFS features (e.g. Geo-replication, Erasure Coding) ● Geo-replication is a distributed, continuous, asynchronous, and incremental replication service for disastrous recovery ● It can replicate data from one site to another over Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
  • 29. 12/05/15 Advantage Of Using GlusterFS ● Erasure Coding provides the fundamental technology for storage systems to add redundancy and tolerate failures. ● On GlusterFS, MapReduce jobs use “data locality optimization”. ● That means Hadoop tries its best to run map tasks on nodes where the data is present locally to optimize on the network and inter-node communication latency. ● GlusterFS works with Apache Spark Project and The Apache Ambari project.
  • 30. 12/05/15 Apache Spark Project ● Apache Spark is an open-source data analytics cluster computing framework ● Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS) ● Spark is not tied to the two-stage MapReduce paradigm and promises performance up to 100 times faster than Hadoop MapReduce, for certain applications. ● Spark provides primitives for in-memory cluster computing. ● https://spark.apache.org/docs/0.8.1/cluster-overview.html
  • 31. 12/05/15 Apache Ambari Project ● The Apache Ambari project is for provisioning, managing, and monitoring Apache Hadoop clusters. ● It provides an intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs. ● Apache Ambari project supports the automated deployment and configuration of Hadoop on top of GlusterFS. ● http://www.gluster.org/2013/10/automated-hadoop-deploymen t-on-glusterfs-with-apache-ambari/ ● http://ambari.apache.org/
  • 33. 12/05/15 Resources Mailing lists: gluster-users@gluster.org gluster-devel@nongnu.org IRC: #gluster and #gluster-dev on freenode Links: http://www.gluster.org http://hekafs.org http://forge.gluster.org http://www.gluster.org/community/documentation/index.php/Arch http://hadoopecosystemtable.github.io/