SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Johannes Kirschnick, Steve Loughran June 2010 Making Hadoop highly available Using an alternative File system – HP IBRIX
Something about me I work at HP Labs, Bristol, UK Degree in computer science, TU Munich Automated Infrastructure Lab Automated, secure, dynamic instantiation and management of cloud computing infrastructure and services Personal interest Cloud Services Automated service deployment  Storage Service
What do I want to talk about Motivate High Availability, introduce the context Overview about Hadoop Highlight the Hadoop modes of failure operation Introduce HP IBRIX Performance Results Summary
Context of this talk High availability Continued availability in times of failures Hadoop Service Data operated on Fault tolerant operation What happens if a node dies Reduce time to restart
Hadoop in a nutshell Example: Wordcount across a number of documents Input Output Sort Job Reduce Map and,1 and,1 Copy sd Dearest creature in creation, Study English pronunciation. I will teach you in my verse Sounds like corpse, corps, horse, and worse. I will keep you, Suzy, busy, Make your head with heat grow dizzy. Tear in eye, your dress will tear. So shall I! Oh hear my prayer. and,1 I HAVE, alas! Philosophy, Medicine, Jurisprudence too, And to my cost Theology, With ardent labour, studied through. And here I stand, with all my lore, Poor fool, no wiser than before. Magister, doctor styled, indeed, Already these ten years I lead, Up, down, across, and to and fro, My pupils by the nose,--and learn, That we in truth can nothing know! That in my heart like fire doth burn. 'Tis true I've more cunning than all your dull tribe, Magister and doctor, priest, parson, and scribe; Scruple or doubt comes not to enthrall me, Neither can devil nor hell now appal me-- Hence also my heart must all pleasure forego! I may not pretend, aught rightly to know, I may not pretend, through teaching, to find A means to improve or convert mankind. Then I have neither goods nor treasure, No worldly honour, rank, or pleasure; No dog in such fashion would longer live! Therefore myself to magic I give, In hope, through spirit-voice and might, Secrets now veiled to bring to light, That I no more, with aching brow, Need speak of what I nothing know; That I the force may recognise That binds creation's inmost energies; Her vital powers, her embryo seeds survey, And fling the trade in empty words away. O full-orb'd moon, did but thy rays Their last upon mine anguish gaze! Beside this desk, at dead of night, Oft have I watched to hail thy light: Then, pensive friend! o'er book and scroll, With soothing power, thy radiance stole! In thy dear light, ah, might I climb, Freely, some mountain height sublime, Round mountain caves with spirits ride, In thy mild haze o'er meadows glide, And, purged from knowledge-fumes, renew My spirit, in thy healing dew! Woe's me! still prison'd in the gloom Of this abhorr'd and musty room! Where heaven's dear light itself doth pass, But dimly through the painted glass! Hemmed in by book-heaps, piled around, Worm-eaten, hid 'neath dust and mould, Which to the high vault's topmost bound, A smoke-stained paper doth enfold; With boxes round thee piled, and glass, And many a useless instrument, With old ancestral lumber blent-- This is thy world! a world! alas! And dost thou ask why heaves thy heart, With tighten'd pressure in thy breast? Why the dull ache will not depart, By which thy life-pulse is oppress'd? Instead of nature's living sphere, Created for mankind of old, Brute skeletons surround thee here, And dead men's bones in smoke and mould. Ham. To be, or not to be, that is the Question: Whether 'tis Nobler in the minde to suffer The Slings and Arrowes of outragious Fortune, Or to take Armes against a Sea of troubles, And by opposing end them: to dye, to sleepe No more; and by a sleepe, to say we end The Heart-ake, and the thousand Naturallshockes That Flesh is heyre too? 'Tis a consummation Deuoutly to be wish'd. To dye to sleepe, To sleepe, perchance to Dreame; I, there's the rub, For in that sleepe of death, what dreames may come, When we haueshuffel'd off this mortallcoile, Must giuevspawse. There's the respect That makes Calamity of so long life: For who would beare the Whips and Scornes of time, The Oppressors wrong, the poore mans Contumely, The pangs of dispriz'dLoue, the Lawes delay, The insolence of Office, and the Spurnes That patient merit of the vnworthy takes, When he himselfe might his Quietus make With a bare Bodkin? Who would these Fardlesbeare To grunt and sweat vnder a weary life, But that the dread of something after death, The vndiscoueredCountrey, from whose Borne No Trauellerreturnes, Puzels the will, And makes vs rather beare those illes we haue, Then flye to others that we know not of. Thus Conscience does make Cowards of vs all, And thus the Natiue hew of Resolution Is sickliedo're, with the pale cast of Thought, And enterprizes of great pith and moment, With this regard their Currants turne away, And loose the name of Action. Soft you now, The faire Ophelia? Nimph, in thy Orizons Be all my sinnesremembred <Word>,1 where1 what,1 "(Lo)cra" 1"1490 1 "1498," 1 "35" 1 "40," 1 "A 2 "AS-IS". 2 "A_ " 1 "Absoluti " 1 "Alack " 1 and,1 ReduceTask MapTask the,1 the,1 the,1 the,1 reduce(key, values ...) {  count = 0   for each value v in values: 	count+=v    emit(key,count) } map(name, document) {   for each word w in document: emitIntermediate(w,1) }
Hadoop components Map Reduce Layer Provides the map and reduce programming framework Can break up Jobs into tasks Keeps track of execution status File system Layer Pluggable file system Support for location aware file systems Access through an API Layer Default is HDFS (Hadoop Distributed File system) HDFS Provides fault high availability by replicating individual files Consists of a central metadata server – NameNode And a number of Data nodes, which store copies of files (or parts of them)
Hadoop operation (with HDFS) TaskTracker MapReduceLayer JobTracker TaskTracker Data Node Data Node NameNode File systemLayer ... Disk Disk Slave Node Master Slave Node
Hadoop operation (with HDFS) Scheduler Job TaskTracker MapReduceLayer JobTracker TaskTracker Data Node Data Node NameNode File systemLayer ... Disk Disk Slave Node Master Slave Node
Hadoop operation (with HDFS) Scheduler Job TaskTracker LocationInformation MapReduceLayer JobTracker TaskTracker Data Node Data Node NameNode File systemLayer ... Disk Disk Slave Node Master Slave Node
Scheduler Task Task Task Hadoop operation (with HDFS) Job TaskTracker LocationInformation MapReduceLayer JobTracker TaskTracker Data Node Data Node NameNode File systemLayer ... Disk Disk Slave Node Master Slave Node
Failure scenarios and responses Failure in Map Reduce components TaskTracker Sends heartbeat to JobTracker If unresponsive for x seconds, JobTracker marks TaskTracker as dead and stop assigning work to it Scheduler reschedules tasks running on that TaskTracker JobTracker No build in heartbeat mechanism Checkpoints to filesystem Can be restarted and resumes operation Individual Tasks TaskTracker monitors progress Can restart failed Tasks Complex failure handling E.g. skip parts of input data which produces failure
Failure scenarios and responses (2) Failure of Data storage Pluggable file system implementation needs to detect and remedy error scenarios HDFS Failure of Data Node Keeps track of replication count for files (parts of files) Can re-replicate missing pieces Tries to place copies of individual physically apart from each other Same rack vs. different racks Failure of NameNode Operations are written to logs, makes restart possible During restart the filesystem is in read only mode A secondary NameNode can periodically read these logs, to speed up time to become available BUTIf secondary namenode takes over, restart of the whole cluster is needed, since assigned hostnames have changed.
Availability takeaway Map reduce Layer Checkpoints to the persisting file system to resume work TaskTracker Can be restarted JobTracker Can be restarted HDFS Single point of failure is the NameNode Restarts can take a long time, depending on amount of data stored and number of operations in the log itself In the regions of hours
A different file system HP IBRIX Software solution which runs on top of storage configurations Fault tolerant, high availability file system Segmented File system Disks (Luns) are treated as Segments Segments are managed by Segment servers Aggregated into global file system(s) File systems provide single namespace Each file system supports up to 16 Petabyte
iBrix in a nutshell Client Client Client Client NFS, CIFS or native client Performance Fusion Manager increase ... Segment Server Segment Server Capacity Disk Disk Disk Disk ... ... No single metadata server / segmented file system
How Does it look like Fusion Manager Web Console Based on command line interface Global management view of the installation Here segments correspond to disks attached to servers
How does it look like (2) A client simply mounts the file system via: NFS CIFS / Samba Native Client Each segment server is automatically a client Mount points and exports need to be created firsts on the fusion manager Clients access file system via “normal” file system calls
Fault tolerant Supports failover Different hardware topologies configurations Couplet configuration Best suited for hybrid of performance and capacity ... server server server server server server RAID RAID RAID Single Namespace
Location aware Hadoop on IBRIX
Task Task Task Hadoop internals – with ibrix Scheduler Job TaskTracker LocationInformation MapReduceLayer JobTracker TaskTracker Segment Server Segment Server IBRIX Client File systemLayer ... Disk Disk Slave Node Master Slave Node
Performance test 1 GB of randomly generated data, spread across 10 input filesRandomWriter Use HadoopSort to sort the records, measure time spend sorting Includes mapping, sorting and reducing time Vary the number of slave nodes File access test Actual computation on each TaskTracker is low Governing factors for execution time are Time to read and write files Time to distribute data to the reducers
Performance Results execution Time (sec)
Performance results Comparable performance to native HDFS system For smaller workload even increased performance – due to no replication Can take advantage of location information Is dependent on distribution and type of input data Across the segment servers Prefers many smaller files, since they can be distributed better
feature hdfs ibrix Further Feature comparison Single Point of Failure Needs RAID Can expose location information Individual file replication Respond to node failure Homogenous file system Split files across nodes Yes, namenode No, replicates Yes Yes Re-Replicationmark as dead Yes Yes - files are split into chunks which are distributed individually No Yes Yes No, only complete filesystems Failovermark as dead, can fallback No, can define Tiers Only if a segment is full
Summary
Summary Hadoop provides a number of failure handling methods Dependent on persistent file system IBRIX as alternative file system Not specifically build for Hadoop Light weight file system plug-in for Hadoop Location aware design enables computation close to the data Comparable performance while gaining on fault tolerance Fault tolerance persistence – no single point of failure Reduced storage requirement Storage not exclusive  to Hadoop Future work Making the JobTracker failure independent Moving into a virtual environment Short lived Hadoop Cluster
Q&A
backup
Ibrix details IBRIX uses iNodes as backend store Extends them by a file-based globally unique identifier Each Segment server is responsible for a fixed number of iNodes) Determined by blocksize within that segment and overall size Example 4 GB segment size, 4kb block size  1,048,576 iNodes (1M) Simplified calculation example Where is iNode 1,800,000  divide by 1M ≈ 1.71  lives on segment server 1 iNodes do not store the data but have a reference to the actual data Backend storage for iBrix is ext3 filesystem
More details Based on distributed iNodes Segment Disk 1stiNode NthiNode local file system
security File system respects POSIX like interface Files belong to user/group and have read/write/execute flags Native Client Needs to be bound to a Fusion Manager Export control can be enforced  Mounting only possible from the Fusion manager console CIFS / Samba Requires Active Directory to translate windows ids to Linux id Export only sub path of the file system (e.g. /filesystem/sambadirectory) NFS Create exports on Segment server Limit clients by IP Mask Export only sub path of the file system (e.g. /filesystem/nfsdirectory) Normal NFS properties (read/write/root squash)
features Multiple logical file systems Select different segments as base for them Task Manager / Policy Rebalancing between different segment servers Tiering of data Some segments could be better/worse than others Move data to from them based on policy/rule Replicate complete logical file systems - Replicate to remote cluster Failover Buddy system of two (or more) segment servers (active/active standby) Native clients will automatically failover Growing Segment servers register with Fusion Manager New segments (discs) need to be programmatically discovered Can be added to logical file systems Is location aware By nature of design For each file, the segment server(s) where it is stored can be determined
Features (2) De-duplication Caching On segment server owning a particular file Distributed Metadata No single point of failure Supports snap shooting of whole file systems Creates a new virtual file system Policy for storing new files Distribute them randomly across segment servers  assign them to the “local” segment server Separate data network Allows to configure the network interface to use for storage communication

Weitere Àhnliche Inhalte

Ähnlich wie High Availability Hadoop

High availability hadoop november 2010
High availability hadoop   november 2010High availability hadoop   november 2010
High availability hadoop november 2010Steve Loughran
 
Freemasonry 247 the book of the words - albert pike
Freemasonry 247 the book of the words - albert pikeFreemasonry 247 the book of the words - albert pike
Freemasonry 247 the book of the words - albert pikeColinJxxx
 
3 speeches from coriolanus and how to analyse
3 speeches from coriolanus and how to analyse3 speeches from coriolanus and how to analyse
3 speeches from coriolanus and how to analyseJane Bathard-Smith
 
Albert Pike The Book of The Words - Free Book
Albert Pike The Book of The Words - Free BookAlbert Pike The Book of The Words - Free Book
Albert Pike The Book of The Words - Free BookChuck Thompson
 
The Assignment should be submitted on Turnitin. Your job in this a.docx
The Assignment should be submitted on Turnitin. Your job in this a.docxThe Assignment should be submitted on Turnitin. Your job in this a.docx
The Assignment should be submitted on Turnitin. Your job in this a.docxrtodd17
 
Albert Pike - The Book of the Words
Albert Pike - The Book of the WordsAlbert Pike - The Book of the Words
Albert Pike - The Book of the Wordsnastycivilian
 
Research Methods - fun slides
Research Methods - fun slidesResearch Methods - fun slides
Research Methods - fun slidesDamian T. Gordon
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystemsroyans
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystemsroyans
 
Beyond the File System: Designing Large-Scale File Storage and Serving
 	Beyond the File System: Designing Large-Scale File Storage and Serving 	Beyond the File System: Designing Large-Scale File Storage and Serving
Beyond the File System: Designing Large-Scale File Storage and Servingmclee
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystemsguest18a0f1
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystemsroyans
 
Descriptive Essay Describing A Neighborhood
Descriptive Essay Describing A NeighborhoodDescriptive Essay Describing A Neighborhood
Descriptive Essay Describing A NeighborhoodKatie Stewart
 
Beyond the File System - Designing Large Scale File Storage and Serving
Beyond the File System - Designing Large Scale File Storage and ServingBeyond the File System - Designing Large Scale File Storage and Serving
Beyond the File System - Designing Large Scale File Storage and Servingmclee
 
Filesystems
FilesystemsFilesystems
Filesystemsroyans
 

Ähnlich wie High Availability Hadoop (16)

High availability hadoop november 2010
High availability hadoop   november 2010High availability hadoop   november 2010
High availability hadoop november 2010
 
Freemasonry 247 the book of the words - albert pike
Freemasonry 247 the book of the words - albert pikeFreemasonry 247 the book of the words - albert pike
Freemasonry 247 the book of the words - albert pike
 
Posters.ppsx
Posters.ppsxPosters.ppsx
Posters.ppsx
 
3 speeches from coriolanus and how to analyse
3 speeches from coriolanus and how to analyse3 speeches from coriolanus and how to analyse
3 speeches from coriolanus and how to analyse
 
Albert Pike The Book of The Words - Free Book
Albert Pike The Book of The Words - Free BookAlbert Pike The Book of The Words - Free Book
Albert Pike The Book of The Words - Free Book
 
The Assignment should be submitted on Turnitin. Your job in this a.docx
The Assignment should be submitted on Turnitin. Your job in this a.docxThe Assignment should be submitted on Turnitin. Your job in this a.docx
The Assignment should be submitted on Turnitin. Your job in this a.docx
 
Albert Pike - The Book of the Words
Albert Pike - The Book of the WordsAlbert Pike - The Book of the Words
Albert Pike - The Book of the Words
 
Research Methods - fun slides
Research Methods - fun slidesResearch Methods - fun slides
Research Methods - fun slides
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystems
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystems
 
Beyond the File System: Designing Large-Scale File Storage and Serving
 	Beyond the File System: Designing Large-Scale File Storage and Serving 	Beyond the File System: Designing Large-Scale File Storage and Serving
Beyond the File System: Designing Large-Scale File Storage and Serving
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystems
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystems
 
Descriptive Essay Describing A Neighborhood
Descriptive Essay Describing A NeighborhoodDescriptive Essay Describing A Neighborhood
Descriptive Essay Describing A Neighborhood
 
Beyond the File System - Designing Large Scale File Storage and Serving
Beyond the File System - Designing Large Scale File Storage and ServingBeyond the File System - Designing Large Scale File Storage and Serving
Beyond the File System - Designing Large Scale File Storage and Serving
 
Filesystems
FilesystemsFilesystems
Filesystems
 

Mehr von Steve Loughran

Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IOSteve Loughran
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is overSteve Loughran
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)Steve Loughran
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionSteve Loughran
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!Steve Loughran
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()Steve Loughran
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming DeployedSteve Loughran
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?Steve Loughran
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveSteve Loughran
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupSteve Loughran
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSteve Loughran
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresSteve Loughran
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object StoresSteve Loughran
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraSteve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateSteve Loughran
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARNSteve Loughran
 

Mehr von Steve Loughran (20)

Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit Edition
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming Deployed
 
Testing
TestingTesting
Testing
 
I hate mocking
I hate mockingI hate mocking
I hate mocking
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object Stores
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony Era
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
 
YARN Services
YARN ServicesYARN Services
YARN Services
 

KĂŒrzlich hochgeladen

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Mcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vĂĄzquez
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

KĂŒrzlich hochgeladen (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Mcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

High Availability Hadoop

  • 1. Johannes Kirschnick, Steve Loughran June 2010 Making Hadoop highly available Using an alternative File system – HP IBRIX
  • 2. Something about me I work at HP Labs, Bristol, UK Degree in computer science, TU Munich Automated Infrastructure Lab Automated, secure, dynamic instantiation and management of cloud computing infrastructure and services Personal interest Cloud Services Automated service deployment Storage Service
  • 3. What do I want to talk about Motivate High Availability, introduce the context Overview about Hadoop Highlight the Hadoop modes of failure operation Introduce HP IBRIX Performance Results Summary
  • 4. Context of this talk High availability Continued availability in times of failures Hadoop Service Data operated on Fault tolerant operation What happens if a node dies Reduce time to restart
  • 5. Hadoop in a nutshell Example: Wordcount across a number of documents Input Output Sort Job Reduce Map and,1 and,1 Copy sd Dearest creature in creation, Study English pronunciation. I will teach you in my verse Sounds like corpse, corps, horse, and worse. I will keep you, Suzy, busy, Make your head with heat grow dizzy. Tear in eye, your dress will tear. So shall I! Oh hear my prayer. and,1 I HAVE, alas! Philosophy, Medicine, Jurisprudence too, And to my cost Theology, With ardent labour, studied through. And here I stand, with all my lore, Poor fool, no wiser than before. Magister, doctor styled, indeed, Already these ten years I lead, Up, down, across, and to and fro, My pupils by the nose,--and learn, That we in truth can nothing know! That in my heart like fire doth burn. 'Tis true I've more cunning than all your dull tribe, Magister and doctor, priest, parson, and scribe; Scruple or doubt comes not to enthrall me, Neither can devil nor hell now appal me-- Hence also my heart must all pleasure forego! I may not pretend, aught rightly to know, I may not pretend, through teaching, to find A means to improve or convert mankind. Then I have neither goods nor treasure, No worldly honour, rank, or pleasure; No dog in such fashion would longer live! Therefore myself to magic I give, In hope, through spirit-voice and might, Secrets now veiled to bring to light, That I no more, with aching brow, Need speak of what I nothing know; That I the force may recognise That binds creation's inmost energies; Her vital powers, her embryo seeds survey, And fling the trade in empty words away. O full-orb'd moon, did but thy rays Their last upon mine anguish gaze! Beside this desk, at dead of night, Oft have I watched to hail thy light: Then, pensive friend! o'er book and scroll, With soothing power, thy radiance stole! In thy dear light, ah, might I climb, Freely, some mountain height sublime, Round mountain caves with spirits ride, In thy mild haze o'er meadows glide, And, purged from knowledge-fumes, renew My spirit, in thy healing dew! Woe's me! still prison'd in the gloom Of this abhorr'd and musty room! Where heaven's dear light itself doth pass, But dimly through the painted glass! Hemmed in by book-heaps, piled around, Worm-eaten, hid 'neath dust and mould, Which to the high vault's topmost bound, A smoke-stained paper doth enfold; With boxes round thee piled, and glass, And many a useless instrument, With old ancestral lumber blent-- This is thy world! a world! alas! And dost thou ask why heaves thy heart, With tighten'd pressure in thy breast? Why the dull ache will not depart, By which thy life-pulse is oppress'd? Instead of nature's living sphere, Created for mankind of old, Brute skeletons surround thee here, And dead men's bones in smoke and mould. Ham. To be, or not to be, that is the Question: Whether 'tis Nobler in the minde to suffer The Slings and Arrowes of outragious Fortune, Or to take Armes against a Sea of troubles, And by opposing end them: to dye, to sleepe No more; and by a sleepe, to say we end The Heart-ake, and the thousand Naturallshockes That Flesh is heyre too? 'Tis a consummation Deuoutly to be wish'd. To dye to sleepe, To sleepe, perchance to Dreame; I, there's the rub, For in that sleepe of death, what dreames may come, When we haueshuffel'd off this mortallcoile, Must giuevspawse. There's the respect That makes Calamity of so long life: For who would beare the Whips and Scornes of time, The Oppressors wrong, the poore mans Contumely, The pangs of dispriz'dLoue, the Lawes delay, The insolence of Office, and the Spurnes That patient merit of the vnworthy takes, When he himselfe might his Quietus make With a bare Bodkin? Who would these Fardlesbeare To grunt and sweat vnder a weary life, But that the dread of something after death, The vndiscoueredCountrey, from whose Borne No Trauellerreturnes, Puzels the will, And makes vs rather beare those illes we haue, Then flye to others that we know not of. Thus Conscience does make Cowards of vs all, And thus the Natiue hew of Resolution Is sickliedo're, with the pale cast of Thought, And enterprizes of great pith and moment, With this regard their Currants turne away, And loose the name of Action. Soft you now, The faire Ophelia? Nimph, in thy Orizons Be all my sinnesremembred <Word>,1 where1 what,1 "(Lo)cra" 1"1490 1 "1498," 1 "35" 1 "40," 1 "A 2 "AS-IS". 2 "A_ " 1 "Absoluti " 1 "Alack " 1 and,1 ReduceTask MapTask the,1 the,1 the,1 the,1 reduce(key, values ...) { count = 0 for each value v in values: count+=v emit(key,count) } map(name, document) { for each word w in document: emitIntermediate(w,1) }
  • 6. Hadoop components Map Reduce Layer Provides the map and reduce programming framework Can break up Jobs into tasks Keeps track of execution status File system Layer Pluggable file system Support for location aware file systems Access through an API Layer Default is HDFS (Hadoop Distributed File system) HDFS Provides fault high availability by replicating individual files Consists of a central metadata server – NameNode And a number of Data nodes, which store copies of files (or parts of them)
  • 7. Hadoop operation (with HDFS) TaskTracker MapReduceLayer JobTracker TaskTracker Data Node Data Node NameNode File systemLayer ... Disk Disk Slave Node Master Slave Node
  • 8. Hadoop operation (with HDFS) Scheduler Job TaskTracker MapReduceLayer JobTracker TaskTracker Data Node Data Node NameNode File systemLayer ... Disk Disk Slave Node Master Slave Node
  • 9. Hadoop operation (with HDFS) Scheduler Job TaskTracker LocationInformation MapReduceLayer JobTracker TaskTracker Data Node Data Node NameNode File systemLayer ... Disk Disk Slave Node Master Slave Node
  • 10. Scheduler Task Task Task Hadoop operation (with HDFS) Job TaskTracker LocationInformation MapReduceLayer JobTracker TaskTracker Data Node Data Node NameNode File systemLayer ... Disk Disk Slave Node Master Slave Node
  • 11. Failure scenarios and responses Failure in Map Reduce components TaskTracker Sends heartbeat to JobTracker If unresponsive for x seconds, JobTracker marks TaskTracker as dead and stop assigning work to it Scheduler reschedules tasks running on that TaskTracker JobTracker No build in heartbeat mechanism Checkpoints to filesystem Can be restarted and resumes operation Individual Tasks TaskTracker monitors progress Can restart failed Tasks Complex failure handling E.g. skip parts of input data which produces failure
  • 12. Failure scenarios and responses (2) Failure of Data storage Pluggable file system implementation needs to detect and remedy error scenarios HDFS Failure of Data Node Keeps track of replication count for files (parts of files) Can re-replicate missing pieces Tries to place copies of individual physically apart from each other Same rack vs. different racks Failure of NameNode Operations are written to logs, makes restart possible During restart the filesystem is in read only mode A secondary NameNode can periodically read these logs, to speed up time to become available BUTIf secondary namenode takes over, restart of the whole cluster is needed, since assigned hostnames have changed.
  • 13. Availability takeaway Map reduce Layer Checkpoints to the persisting file system to resume work TaskTracker Can be restarted JobTracker Can be restarted HDFS Single point of failure is the NameNode Restarts can take a long time, depending on amount of data stored and number of operations in the log itself In the regions of hours
  • 14. A different file system HP IBRIX Software solution which runs on top of storage configurations Fault tolerant, high availability file system Segmented File system Disks (Luns) are treated as Segments Segments are managed by Segment servers Aggregated into global file system(s) File systems provide single namespace Each file system supports up to 16 Petabyte
  • 15. iBrix in a nutshell Client Client Client Client NFS, CIFS or native client Performance Fusion Manager increase ... Segment Server Segment Server Capacity Disk Disk Disk Disk ... ... No single metadata server / segmented file system
  • 16. How Does it look like Fusion Manager Web Console Based on command line interface Global management view of the installation Here segments correspond to disks attached to servers
  • 17. How does it look like (2) A client simply mounts the file system via: NFS CIFS / Samba Native Client Each segment server is automatically a client Mount points and exports need to be created firsts on the fusion manager Clients access file system via “normal” file system calls
  • 18. Fault tolerant Supports failover Different hardware topologies configurations Couplet configuration Best suited for hybrid of performance and capacity ... server server server server server server RAID RAID RAID Single Namespace
  • 20. Task Task Task Hadoop internals – with ibrix Scheduler Job TaskTracker LocationInformation MapReduceLayer JobTracker TaskTracker Segment Server Segment Server IBRIX Client File systemLayer ... Disk Disk Slave Node Master Slave Node
  • 21. Performance test 1 GB of randomly generated data, spread across 10 input filesRandomWriter Use HadoopSort to sort the records, measure time spend sorting Includes mapping, sorting and reducing time Vary the number of slave nodes File access test Actual computation on each TaskTracker is low Governing factors for execution time are Time to read and write files Time to distribute data to the reducers
  • 23. Performance results Comparable performance to native HDFS system For smaller workload even increased performance – due to no replication Can take advantage of location information Is dependent on distribution and type of input data Across the segment servers Prefers many smaller files, since they can be distributed better
  • 24. feature hdfs ibrix Further Feature comparison Single Point of Failure Needs RAID Can expose location information Individual file replication Respond to node failure Homogenous file system Split files across nodes Yes, namenode No, replicates Yes Yes Re-Replicationmark as dead Yes Yes - files are split into chunks which are distributed individually No Yes Yes No, only complete filesystems Failovermark as dead, can fallback No, can define Tiers Only if a segment is full
  • 26. Summary Hadoop provides a number of failure handling methods Dependent on persistent file system IBRIX as alternative file system Not specifically build for Hadoop Light weight file system plug-in for Hadoop Location aware design enables computation close to the data Comparable performance while gaining on fault tolerance Fault tolerance persistence – no single point of failure Reduced storage requirement Storage not exclusive to Hadoop Future work Making the JobTracker failure independent Moving into a virtual environment Short lived Hadoop Cluster
  • 27. Q&A
  • 29. Ibrix details IBRIX uses iNodes as backend store Extends them by a file-based globally unique identifier Each Segment server is responsible for a fixed number of iNodes) Determined by blocksize within that segment and overall size Example 4 GB segment size, 4kb block size  1,048,576 iNodes (1M) Simplified calculation example Where is iNode 1,800,000 divide by 1M ≈ 1.71  lives on segment server 1 iNodes do not store the data but have a reference to the actual data Backend storage for iBrix is ext3 filesystem
  • 30. More details Based on distributed iNodes Segment Disk 1stiNode NthiNode local file system
  • 31. security File system respects POSIX like interface Files belong to user/group and have read/write/execute flags Native Client Needs to be bound to a Fusion Manager Export control can be enforced Mounting only possible from the Fusion manager console CIFS / Samba Requires Active Directory to translate windows ids to Linux id Export only sub path of the file system (e.g. /filesystem/sambadirectory) NFS Create exports on Segment server Limit clients by IP Mask Export only sub path of the file system (e.g. /filesystem/nfsdirectory) Normal NFS properties (read/write/root squash)
  • 32. features Multiple logical file systems Select different segments as base for them Task Manager / Policy Rebalancing between different segment servers Tiering of data Some segments could be better/worse than others Move data to from them based on policy/rule Replicate complete logical file systems - Replicate to remote cluster Failover Buddy system of two (or more) segment servers (active/active standby) Native clients will automatically failover Growing Segment servers register with Fusion Manager New segments (discs) need to be programmatically discovered Can be added to logical file systems Is location aware By nature of design For each file, the segment server(s) where it is stored can be determined
  • 33. Features (2) De-duplication Caching On segment server owning a particular file Distributed Metadata No single point of failure Supports snap shooting of whole file systems Creates a new virtual file system Policy for storing new files Distribute them randomly across segment servers assign them to the “local” segment server Separate data network Allows to configure the network interface to use for storage communication