SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Hadoop at ContextWeb


                           Alex Dorman, VP Engineering
                      Paul George, Sr. Systems Architect




       October 2009
ContextWeb: Traffic

ADSDAQ – Online Advertisement Exchange
Traffic –
  up to 6,000 Ad requests per second.
  7bln Ad requests per month
  5,000+ Active Publisher and Advertiser accounts
Account reports are updated every 15 minutes
About 50 internal reports for business users updated nightly




                                                                2
ContextWeb Architecture highlights
Pre – Hadoop aggregation framework
   Logs are generated on each server and aggregated in memory to 15
    minute chunks
   Aggregation of logs from different servers into one log
   Load to DB
   Multi-stage aggregation in DB
   About 20 different jobs end-to-end
   Could take 2hr to process through all stages
   200mln records was the limit




                                                                       3
Hadoop Data Set

Up to 120GB of raw log files per day. 60GB compressed
60 different aggregated data sets 25TB total to cover 1 year
 (compressed)
50 different reports for Business and End Users
Major data sets are updated every 15 minutes
Hadoop Cluster

40 nodes/320 Cores (DELL 2950)
100TB total raw capacity
CentOS 5.3 x86_64
Hadoop 0.18.3-CH-0.2.1 (Cloudera), migrating to 0.20.x
NameNode high availability using DRBD Replication.
Log collection using custom scripts and Scribe
Hadoop Cluster

In-house developed Java framework on top of
 hadoop.mapred.*
PIG and Perl Streaming for ad-hoc reports
OpsWise scheduler
~2000 MapReduce job executions per day
Exposing data to Windows:
 WebDav Server with WebDrive clients
Reporting Application: Qlikview
Cloudera support for Hadoop
Architectural Challenges
How to organize data set to keep aggregated data sets
 fresh.
   Logs constantly appended to the main Data Set. Reports and
    aggregated datasets should be refreshed every 15 minutes
Mix of .NET and Java applications. (70%+ .Net, 30%- Java)
   How to make .Net application write logs to Hadoop?
Some 3rd party applications to consume results of
 MapReduce Jobs (e.g. reporting application)
   How make 3rd party or internal Legacy applications to read data from
    Hadoop ?
Backward and forward compatibility of our data sets
   every month we are adding 3-5 new data points to our logs

                                                                           7
The Data Flow
Partitioned Data Set: Date/Time
Date/Time as main dimension for Partitioning
Segregate results of MapReduce jobs into Monthly, Daily or
 Hourly Directories
Use MultipleOutputFormat to segregate output to
 different files
Reprocess only what has changed – check DateTime in
 filename to determine what is affected. Data Set is
 regenerated if input into MR job contains data for this Month/
 Day/Hour.
Use PathFilter to specify what files to process



                                                                  9
Partitioned Data Set: Revisions
Need overlapping jobs:
   12:00 -12:10 Job 1.1 A->B
   12:10-12:20 Job 1.2 B->C          12:15-12:25 Job 2.1 A->B   !!! Job 1.2 is still reading set B !!!
   12:20-12:30 Job 1.3C->D           12:25-12:35 Job 2.2 B->C
                                     12:35-12:45 Job 2.3 C->D
Use revisions:
   12:00 -12:10 Job 1.1 A.r1->B.r1
   12:10-12:20 Job 1.2 B.r1->C.r1    12:15-12:25 Job 2.1 A.r2->B.r2
   12:20-12:30 Job 1.3 C.r1->D.r1    12:25-12:35 Job 2.2 B.r2->C.r2
                                     12:35-12:45 Job 2.3 C.r2->D.r2
Assign revision (timestamp) when generate output
    Use MultipleOutputFormat to segregate output to different files
Use highest available revision number when selecting input
   Use PathFilter to specify revisions to process
Clean up “old” revisions after some grace period

                                                                                                    10
Partitioned Data Set: processing flow
                                          From Ad Serving Platform



      HADOOP
                          15 minute log                              HDFS
                            LogRpt15
                            yyyy0215_
                              hhmm

                                                           Historic Data (By Day)
                                                             RawLogD       RawLogD       RawLogD
                                                              0214_r4       0215_r4       0216_r4


                                                             RawLogD
                                                              0214_r5
        Map     Reduce



          IncomingMR
                                                                        Aggregated data for Advertisers (By Day)
                                                                           AdvD          AdvD         AdvD
                                                                          0214_r3       0215_r4      0216_r4


        Map      Reduce                                                    AdvD
                                                                          0214_r4

              AdvMR


                                                                                           To Reporting and Predictions


                                                                                                                          11
Workflow

Opswise scheduler
Logical Schemas and Headers
Meta data repository to define list of columns in all data sets
Each file has headers as the first line
Job configuration files that define source and target
Columns are mapped dynamically based on the schema file
 and header information
Each data set can have individual files of different format
No need to modify source code if a new column is added or
 if order of columns has changed
Support for default values if a column is missing in older file
Easy to export to external applications (DB, reporting apps)


                                                                   13
Getting Data in and out
Mix of .NET and Java applications. (70%+ .Net, 30%- Java)
   How to make .Net application write logs to Hadoop?
Some 3rd party applications to consume results of
 MapReduce Jobs (e.g. reporting application)
   How make 3rd party or internal Legacy applications to read data from
    Hadoop ?




                                                                           14
Getting Data in and out: WebDAV driver
WebDAV server is part of Hadoop source code tree
   Needed some minor clean up. Was co-developed with IponWeb.
    Available http://www.hadoop.iponweb.net/Home/hdfs-over-webdav
There are multiple commercial Windows WebDav clients you
 can use (we use WebDrive) http://www.webdrive.com/
Linux
   Mount Modules available from http://dav.sourceforge.net/




                                                                    15
Getting Data in and out: WebDav

                                                                                                 HADOOP/HDFS



                                                                                                   Data Node
                                                                                      Master
    Client (Windows/Linux)
                         Webdav client



           Data                                                           List
                                                                                                   Data Node
          consu                                                       getProperties
           mers                          (Windows/Linux)




                                                           HDFS api
                                                                                         Data
                                                WebDav
                                                                                                   Data Node
    Client (Windows/Linux)                      Server
                         Webdav client




           Data                                                                           Data
          consu
           mers                                                                                    Data Node

                                                                                       Data


                                                                                                   Data Node




                                                                                                               16
QlikView Reporting Application

In-memory DB
AJAX support for integration into WEB portals
TXT files are supported
Understands headers
WebDav allows to load data directly from Hadoop
Coming soon: generation of Qlikview files as output of
 Hadoop MR jobs
High Availability for NameNode/JobTracker
Goals
Availability! (But not stateful)
   Failed jobs resubmitted by workflow scheduler
   Target < 5 minutes of downtime per incident
Automatic fail over with no human action required.
   No phone calls, no experts required
   Alert that it happened, not that it needs to be fixed
Allow for maintenance windows
Avoid at all cost
   Whenever possible, use redundancy inside of the box
   Disks (RAID 1), network bonding, dual power supplies


                                                            18
Redundant Network Architecture
 • Use Linux bonding
    • See bonding.txt from Linux kernel docs.
    • Throughput advantage
       – Observed at 1.76Gb/s
    • We use LACP, aka 802.3ad, aka mode=4.
       – http://en.wikipedia.org/wiki/Link_Aggregation_Control_Protocol
       – Must be supported by your switches.
    • On the data nodes, too. Great for rebalancing.

 • Keep nodes on different switches
    • Use a dedicated cross over connection, too




                                                                          19
Software Packages We Use for HA

inux-HA Project’s Heartbeat
    (http://www.linux-ha.org)
    Default resource manager, haresources
    Manages multiple resources:
        Virtual IP address
        DRBD Disk and file system
        Hadoop init scripts (from Cloudera’s distribution)


RBD by LINBIT
   (http://www.drbd.org)
   “DRBD can be understood as network based raid-1.”



                                                              20
Replication of NameNode Metadata
DRBD Replication.
   Block level replication, file system agnostic
   File system is active on only one node at a time
   We use synchronous replication
   Move only the data that you need! (metadata, not the whole system)
   2.6mm Files, 33k dirs, 60TB = 1.3GB meta data (not a lot to move)
   Still consider running your secondary namenode on another machine
    and/or NFS dir!
   LVM snapshots
   /getimage?getimage=1
   /getimage?getedit=1




                                                                     21
In the Unlikely Event of a Water Landing
  Order of Events, the magic of Heartbeat
  •   Detect the failure (“deadtime” from ha.cf)
  •   Virtual IP fails over.
  •   DRBD system switches primary node. (/proc/drbd status)
  •   File system fsck and mount at /hadoop.
  •   Hadoop processes started via Cloudera init scripts.
  •   Optionally, original master is rebooted (if it’s still alive)
  •   End to end fail over time approximately 15 seconds.
In the Unlikely Event of a Water Landing
  Order of Events, the magic of Heartbeat
  •   Detect the failure (“deadtime” from ha.cf)
  •   Virtual IP fails over.
  •   DRBD system switches primary node. (/proc/drbd status)
  •   File system fsck and mount at /hadoop.
  •   Hadoop processes started via Cloudera init scripts.
  •   Optionally, original master is rebooted (if it’s still alive)
  •   End to end fail over time approximately 15 seconds.

  Does it work?
In the Unlikely Event of a Water Landing
  Order of Events, the magic of Heartbeat
  •   Detect the failure (“deadtime” from ha.cf)
  •   Virtual IP fails over.
  •   DRBD system switches primary node. (/proc/drbd status)
  •   File system fsck and mount at /hadoop.
  •   Hadoop processes started via Cloudera init scripts.
  •   Optionally, original master is rebooted (if it’s still alive)
  •   End to end fail over time approximately 15 seconds.

  Does it work?
  • Yes!! 6 failovers in the past 18 months
In the Unlikely Event of a Water Landing
  Order of Events, the magic of Heartbeat
  •   Detect the failure (“deadtime” from ha.cf)
  •   Virtual IP fails over.
  •   DRBD system switches primary node. (/proc/drbd status)
  •   File system fsck and mount at /hadoop.
  •   Hadoop processes started via Cloudera init scripts.
  •   Optionally, original master is rebooted (if it’s still alive)
  •   End to end fail over time approximately 15 seconds.

  Does it work?
  • Yes!! 6 failovers in the past 18 months
  • (only 3 were planned)
Other Options to Consider

 (or: How I Learned to Stop Worrying and Start Over From the
    Beginning)
  Explore additional resource management systems
      • ie., OpenAIS + Pacemaker: N+1, N-to-N
      • Be resource aware, not just machine aware
  Consider additional file system replication methods
      • ie., GlusterFS, Red Hat GFS
      • SAN/iSCSI backed
  Virtualized solutions?
  Other things I don’t know about yet
      • Solutions to the problem exist
      • Work with something you’re comfortable with
  http://www.cloudera.com/blog/2009/07/22/hadoop-ha-configuration/
                                                                      26

Weitere ähnliche Inhalte

Was ist angesagt?

Creating Web Applications with IDMS, COBOL and ADSO
Creating Web Applications with IDMS, COBOL and ADSOCreating Web Applications with IDMS, COBOL and ADSO
Creating Web Applications with IDMS, COBOL and ADSOMargaret Sliming
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebookparallellabs
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with HadoopOReillyStrata
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hortonworks
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Jeremy Walsh
 
Managing growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsManaging growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsDataWorks Summit
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesDavid Tjahjono,MD,MBA(UK)
 
Cluster Computing with Dryad
Cluster Computing with DryadCluster Computing with Dryad
Cluster Computing with Dryadbutest
 
Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Hadoop Summit San Jose 2014: Data Discovery on Hadoop Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Hadoop Summit San Jose 2014: Data Discovery on Hadoop Sumeet Singh
 
Data discoveryonhadoop@yahoo! hadoopsummit2014
Data discoveryonhadoop@yahoo! hadoopsummit2014Data discoveryonhadoop@yahoo! hadoopsummit2014
Data discoveryonhadoop@yahoo! hadoopsummit2014thiruvel
 
Pattern: an open source project for migrating predictive models onto Apache H...
Pattern: an open source project for migrating predictive models onto Apache H...Pattern: an open source project for migrating predictive models onto Apache H...
Pattern: an open source project for migrating predictive models onto Apache H...Paco Nathan
 
DB2 V 10 HADR Multiple Standby
DB2 V 10 HADR Multiple StandbyDB2 V 10 HADR Multiple Standby
DB2 V 10 HADR Multiple StandbyDale McInnis
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsCloudera, Inc.
 

Was ist angesagt? (20)

Creating Web Applications with IDMS, COBOL and ADSO
Creating Web Applications with IDMS, COBOL and ADSOCreating Web Applications with IDMS, COBOL and ADSO
Creating Web Applications with IDMS, COBOL and ADSO
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14
 
Managing growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsManaging growth in Production Hadoop Deployments
Managing growth in Production Hadoop Deployments
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Cluster Computing with Dryad
Cluster Computing with DryadCluster Computing with Dryad
Cluster Computing with Dryad
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Hadoop Summit San Jose 2014: Data Discovery on Hadoop Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Hadoop Summit San Jose 2014: Data Discovery on Hadoop
 
Data discoveryonhadoop@yahoo! hadoopsummit2014
Data discoveryonhadoop@yahoo! hadoopsummit2014Data discoveryonhadoop@yahoo! hadoopsummit2014
Data discoveryonhadoop@yahoo! hadoopsummit2014
 
Pattern: an open source project for migrating predictive models onto Apache H...
Pattern: an open source project for migrating predictive models onto Apache H...Pattern: an open source project for migrating predictive models onto Apache H...
Pattern: an open source project for migrating predictive models onto Apache H...
 
D02 Evolution of the HADR tool
D02 Evolution of the HADR toolD02 Evolution of the HADR tool
D02 Evolution of the HADR tool
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ TwitterCross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
 
DB2 V 10 HADR Multiple Standby
DB2 V 10 HADR Multiple StandbyDB2 V 10 HADR Multiple Standby
DB2 V 10 HADR Multiple Standby
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
 

Andere mochten auch

Ryan Cooley: The Spanish Empire
Ryan Cooley: The Spanish EmpireRyan Cooley: The Spanish Empire
Ryan Cooley: The Spanish EmpireTrinity748
 
Mike Hoogenakker 50th Birthday
Mike Hoogenakker 50th BirthdayMike Hoogenakker 50th Birthday
Mike Hoogenakker 50th Birthdaydutchhook
 
CleverAdvice Company Profile it
CleverAdvice Company Profile it CleverAdvice Company Profile it
CleverAdvice Company Profile it Silvia Aronadio
 
SID LEE | Conversational capital
SID LEE | Conversational capitalSID LEE | Conversational capital
SID LEE | Conversational capitalSid Lee
 
Sentence project jensen_and_ellie[1]
Sentence project jensen_and_ellie[1]Sentence project jensen_and_ellie[1]
Sentence project jensen_and_ellie[1]msgilmore
 
Зимове оздоровлення для членів Профспілки студентів, аспірантів та докторанті...
Зимове оздоровлення для членів Профспілки студентів, аспірантів та докторанті...Зимове оздоровлення для членів Профспілки студентів, аспірантів та докторанті...
Зимове оздоровлення для членів Профспілки студентів, аспірантів та докторанті...injenerzntu
 
Fikirborsasi öğrencileri çağırıyor
Fikirborsasi öğrencileri çağırıyorFikirborsasi öğrencileri çağırıyor
Fikirborsasi öğrencileri çağırıyorKutlu Kazanci
 
บทที่ 1 5อ.ภูเบศ
บทที่ 1 5อ.ภูเบศ บทที่ 1 5อ.ภูเบศ
บทที่ 1 5อ.ภูเบศ Nayy Praditsilp
 
Socks collection no branded
Socks collection   no brandedSocks collection   no branded
Socks collection no brandeddimitrypusters
 
A Hierarchical Graph for Nucleotide Binding Domain 2
A Hierarchical Graph for Nucleotide Binding Domain 2A Hierarchical Graph for Nucleotide Binding Domain 2
A Hierarchical Graph for Nucleotide Binding Domain 2Samuel Kakraba
 
Tic´s en pedagogia
Tic´s en pedagogiaTic´s en pedagogia
Tic´s en pedagogiaJorge Aconda
 
צביעה של גרפים מקריים
צביעה של גרפים מקרייםצביעה של גרפים מקריים
צביעה של גרפים מקרייםDalya Gartzman
 
Dugway Dispatch - December 2011 Edition
Dugway Dispatch - December 2011 EditionDugway Dispatch - December 2011 Edition
Dugway Dispatch - December 2011 EditionDugway_Proving_Ground
 

Andere mochten auch (19)

Ryan Cooley: The Spanish Empire
Ryan Cooley: The Spanish EmpireRyan Cooley: The Spanish Empire
Ryan Cooley: The Spanish Empire
 
Pedestriantv media-kit-2013
Pedestriantv media-kit-2013Pedestriantv media-kit-2013
Pedestriantv media-kit-2013
 
Mike Hoogenakker 50th Birthday
Mike Hoogenakker 50th BirthdayMike Hoogenakker 50th Birthday
Mike Hoogenakker 50th Birthday
 
Pb
PbPb
Pb
 
MAT-
MAT-MAT-
MAT-
 
CleverAdvice Company Profile it
CleverAdvice Company Profile it CleverAdvice Company Profile it
CleverAdvice Company Profile it
 
SID LEE | Conversational capital
SID LEE | Conversational capitalSID LEE | Conversational capital
SID LEE | Conversational capital
 
Sentence project jensen_and_ellie[1]
Sentence project jensen_and_ellie[1]Sentence project jensen_and_ellie[1]
Sentence project jensen_and_ellie[1]
 
Garment
GarmentGarment
Garment
 
Villaseca
VillasecaVillaseca
Villaseca
 
Зимове оздоровлення для членів Профспілки студентів, аспірантів та докторанті...
Зимове оздоровлення для членів Профспілки студентів, аспірантів та докторанті...Зимове оздоровлення для членів Профспілки студентів, аспірантів та докторанті...
Зимове оздоровлення для членів Профспілки студентів, аспірантів та докторанті...
 
Fikirborsasi öğrencileri çağırıyor
Fikirborsasi öğrencileri çağırıyorFikirborsasi öğrencileri çağırıyor
Fikirborsasi öğrencileri çağırıyor
 
บทที่ 1 5อ.ภูเบศ
บทที่ 1 5อ.ภูเบศ บทที่ 1 5อ.ภูเบศ
บทที่ 1 5อ.ภูเบศ
 
Socks collection no branded
Socks collection   no brandedSocks collection   no branded
Socks collection no branded
 
A Hierarchical Graph for Nucleotide Binding Domain 2
A Hierarchical Graph for Nucleotide Binding Domain 2A Hierarchical Graph for Nucleotide Binding Domain 2
A Hierarchical Graph for Nucleotide Binding Domain 2
 
For Norway ( 22.12.2010 )
For Norway ( 22.12.2010 )For Norway ( 22.12.2010 )
For Norway ( 22.12.2010 )
 
Tic´s en pedagogia
Tic´s en pedagogiaTic´s en pedagogia
Tic´s en pedagogia
 
צביעה של גרפים מקריים
צביעה של גרפים מקרייםצביעה של גרפים מקריים
צביעה של גרפים מקריים
 
Dugway Dispatch - December 2011 Edition
Dugway Dispatch - December 2011 EditionDugway Dispatch - December 2011 Edition
Dugway Dispatch - December 2011 Edition
 

Ähnlich wie Hadoop World Oct 2009 Production Deep Dive With High Availability

Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High AvailabilityCloudera, Inc.
 
Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopReal-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopContinuent
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBaseCarol McDonald
 
Hadoop and Big Data Overview
Hadoop and Big Data OverviewHadoop and Big Data Overview
Hadoop and Big Data OverviewPrabhu Thukkaram
 
PlatformDay2013 발표자료
PlatformDay2013 발표자료PlatformDay2013 발표자료
PlatformDay2013 발표자료Tae Young Lee
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprisesnvvrajesh
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Updatevithakur
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramSkillspeed
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET Journal
 
Free Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseFree Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseMapR Technologies
 
Application trends db2 day 2015 jorn
Application trends   db2 day 2015 jornApplication trends   db2 day 2015 jorn
Application trends db2 day 2015 jornPeter Schouboe
 
Trends and directions for application developers
Trends and directions for application developersTrends and directions for application developers
Trends and directions for application developersJørn Thyssen
 

Ähnlich wie Hadoop World Oct 2009 Production Deep Dive With High Availability (20)

Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
 
Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopReal-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to Hadoop
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
 
MATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and CapabilitiesMATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and Capabilities
 
Autodesk Technical Webinar: SAP NetWeaver Gateway Part 1
Autodesk Technical Webinar: SAP NetWeaver Gateway Part 1Autodesk Technical Webinar: SAP NetWeaver Gateway Part 1
Autodesk Technical Webinar: SAP NetWeaver Gateway Part 1
 
Hadoop and Big Data Overview
Hadoop and Big Data OverviewHadoop and Big Data Overview
Hadoop and Big Data Overview
 
PlatformDay2013 발표자료
PlatformDay2013 발표자료PlatformDay2013 발표자료
PlatformDay2013 발표자료
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprises
 
Apache drill
Apache drillApache drill
Apache drill
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Update
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
 
Free Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseFree Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBase
 
Application trends db2 day 2015 jorn
Application trends   db2 day 2015 jornApplication trends   db2 day 2015 jorn
Application trends db2 day 2015 jorn
 
Trends and directions for application developers
Trends and directions for application developersTrends and directions for application developers
Trends and directions for application developers
 

Hadoop World Oct 2009 Production Deep Dive With High Availability

  • 1. Hadoop at ContextWeb Alex Dorman, VP Engineering Paul George, Sr. Systems Architect October 2009
  • 2. ContextWeb: Traffic ADSDAQ – Online Advertisement Exchange Traffic –  up to 6,000 Ad requests per second.  7bln Ad requests per month  5,000+ Active Publisher and Advertiser accounts Account reports are updated every 15 minutes About 50 internal reports for business users updated nightly 2
  • 3. ContextWeb Architecture highlights Pre – Hadoop aggregation framework  Logs are generated on each server and aggregated in memory to 15 minute chunks  Aggregation of logs from different servers into one log  Load to DB  Multi-stage aggregation in DB  About 20 different jobs end-to-end  Could take 2hr to process through all stages  200mln records was the limit 3
  • 4. Hadoop Data Set Up to 120GB of raw log files per day. 60GB compressed 60 different aggregated data sets 25TB total to cover 1 year (compressed) 50 different reports for Business and End Users Major data sets are updated every 15 minutes
  • 5. Hadoop Cluster 40 nodes/320 Cores (DELL 2950) 100TB total raw capacity CentOS 5.3 x86_64 Hadoop 0.18.3-CH-0.2.1 (Cloudera), migrating to 0.20.x NameNode high availability using DRBD Replication. Log collection using custom scripts and Scribe
  • 6. Hadoop Cluster In-house developed Java framework on top of hadoop.mapred.* PIG and Perl Streaming for ad-hoc reports OpsWise scheduler ~2000 MapReduce job executions per day Exposing data to Windows: WebDav Server with WebDrive clients Reporting Application: Qlikview Cloudera support for Hadoop
  • 7. Architectural Challenges How to organize data set to keep aggregated data sets fresh.  Logs constantly appended to the main Data Set. Reports and aggregated datasets should be refreshed every 15 minutes Mix of .NET and Java applications. (70%+ .Net, 30%- Java)  How to make .Net application write logs to Hadoop? Some 3rd party applications to consume results of MapReduce Jobs (e.g. reporting application)  How make 3rd party or internal Legacy applications to read data from Hadoop ? Backward and forward compatibility of our data sets  every month we are adding 3-5 new data points to our logs 7
  • 9. Partitioned Data Set: Date/Time Date/Time as main dimension for Partitioning Segregate results of MapReduce jobs into Monthly, Daily or Hourly Directories Use MultipleOutputFormat to segregate output to different files Reprocess only what has changed – check DateTime in filename to determine what is affected. Data Set is regenerated if input into MR job contains data for this Month/ Day/Hour. Use PathFilter to specify what files to process 9
  • 10. Partitioned Data Set: Revisions Need overlapping jobs: 12:00 -12:10 Job 1.1 A->B 12:10-12:20 Job 1.2 B->C 12:15-12:25 Job 2.1 A->B !!! Job 1.2 is still reading set B !!! 12:20-12:30 Job 1.3C->D 12:25-12:35 Job 2.2 B->C 12:35-12:45 Job 2.3 C->D Use revisions: 12:00 -12:10 Job 1.1 A.r1->B.r1 12:10-12:20 Job 1.2 B.r1->C.r1 12:15-12:25 Job 2.1 A.r2->B.r2 12:20-12:30 Job 1.3 C.r1->D.r1 12:25-12:35 Job 2.2 B.r2->C.r2 12:35-12:45 Job 2.3 C.r2->D.r2 Assign revision (timestamp) when generate output  Use MultipleOutputFormat to segregate output to different files Use highest available revision number when selecting input Use PathFilter to specify revisions to process Clean up “old” revisions after some grace period 10
  • 11. Partitioned Data Set: processing flow From Ad Serving Platform HADOOP 15 minute log HDFS LogRpt15 yyyy0215_ hhmm Historic Data (By Day) RawLogD RawLogD RawLogD 0214_r4 0215_r4 0216_r4 RawLogD 0214_r5 Map Reduce IncomingMR Aggregated data for Advertisers (By Day) AdvD AdvD AdvD 0214_r3 0215_r4 0216_r4 Map Reduce AdvD 0214_r4 AdvMR To Reporting and Predictions 11
  • 13. Logical Schemas and Headers Meta data repository to define list of columns in all data sets Each file has headers as the first line Job configuration files that define source and target Columns are mapped dynamically based on the schema file and header information Each data set can have individual files of different format No need to modify source code if a new column is added or if order of columns has changed Support for default values if a column is missing in older file Easy to export to external applications (DB, reporting apps) 13
  • 14. Getting Data in and out Mix of .NET and Java applications. (70%+ .Net, 30%- Java)  How to make .Net application write logs to Hadoop? Some 3rd party applications to consume results of MapReduce Jobs (e.g. reporting application)  How make 3rd party or internal Legacy applications to read data from Hadoop ? 14
  • 15. Getting Data in and out: WebDAV driver WebDAV server is part of Hadoop source code tree  Needed some minor clean up. Was co-developed with IponWeb. Available http://www.hadoop.iponweb.net/Home/hdfs-over-webdav There are multiple commercial Windows WebDav clients you can use (we use WebDrive) http://www.webdrive.com/ Linux  Mount Modules available from http://dav.sourceforge.net/ 15
  • 16. Getting Data in and out: WebDav HADOOP/HDFS Data Node Master Client (Windows/Linux) Webdav client Data List Data Node consu getProperties mers (Windows/Linux) HDFS api Data WebDav Data Node Client (Windows/Linux) Server Webdav client Data Data consu mers Data Node Data Data Node 16
  • 17. QlikView Reporting Application In-memory DB AJAX support for integration into WEB portals TXT files are supported Understands headers WebDav allows to load data directly from Hadoop Coming soon: generation of Qlikview files as output of Hadoop MR jobs
  • 18. High Availability for NameNode/JobTracker Goals Availability! (But not stateful)  Failed jobs resubmitted by workflow scheduler  Target < 5 minutes of downtime per incident Automatic fail over with no human action required.  No phone calls, no experts required  Alert that it happened, not that it needs to be fixed Allow for maintenance windows Avoid at all cost  Whenever possible, use redundancy inside of the box  Disks (RAID 1), network bonding, dual power supplies 18
  • 19. Redundant Network Architecture • Use Linux bonding • See bonding.txt from Linux kernel docs. • Throughput advantage – Observed at 1.76Gb/s • We use LACP, aka 802.3ad, aka mode=4. – http://en.wikipedia.org/wiki/Link_Aggregation_Control_Protocol – Must be supported by your switches. • On the data nodes, too. Great for rebalancing. • Keep nodes on different switches • Use a dedicated cross over connection, too 19
  • 20. Software Packages We Use for HA inux-HA Project’s Heartbeat  (http://www.linux-ha.org)  Default resource manager, haresources  Manages multiple resources:  Virtual IP address  DRBD Disk and file system  Hadoop init scripts (from Cloudera’s distribution) RBD by LINBIT  (http://www.drbd.org)  “DRBD can be understood as network based raid-1.” 20
  • 21. Replication of NameNode Metadata DRBD Replication.  Block level replication, file system agnostic  File system is active on only one node at a time  We use synchronous replication  Move only the data that you need! (metadata, not the whole system)  2.6mm Files, 33k dirs, 60TB = 1.3GB meta data (not a lot to move)  Still consider running your secondary namenode on another machine and/or NFS dir!  LVM snapshots  /getimage?getimage=1  /getimage?getedit=1 21
  • 22. In the Unlikely Event of a Water Landing Order of Events, the magic of Heartbeat • Detect the failure (“deadtime” from ha.cf) • Virtual IP fails over. • DRBD system switches primary node. (/proc/drbd status) • File system fsck and mount at /hadoop. • Hadoop processes started via Cloudera init scripts. • Optionally, original master is rebooted (if it’s still alive) • End to end fail over time approximately 15 seconds.
  • 23. In the Unlikely Event of a Water Landing Order of Events, the magic of Heartbeat • Detect the failure (“deadtime” from ha.cf) • Virtual IP fails over. • DRBD system switches primary node. (/proc/drbd status) • File system fsck and mount at /hadoop. • Hadoop processes started via Cloudera init scripts. • Optionally, original master is rebooted (if it’s still alive) • End to end fail over time approximately 15 seconds. Does it work?
  • 24. In the Unlikely Event of a Water Landing Order of Events, the magic of Heartbeat • Detect the failure (“deadtime” from ha.cf) • Virtual IP fails over. • DRBD system switches primary node. (/proc/drbd status) • File system fsck and mount at /hadoop. • Hadoop processes started via Cloudera init scripts. • Optionally, original master is rebooted (if it’s still alive) • End to end fail over time approximately 15 seconds. Does it work? • Yes!! 6 failovers in the past 18 months
  • 25. In the Unlikely Event of a Water Landing Order of Events, the magic of Heartbeat • Detect the failure (“deadtime” from ha.cf) • Virtual IP fails over. • DRBD system switches primary node. (/proc/drbd status) • File system fsck and mount at /hadoop. • Hadoop processes started via Cloudera init scripts. • Optionally, original master is rebooted (if it’s still alive) • End to end fail over time approximately 15 seconds. Does it work? • Yes!! 6 failovers in the past 18 months • (only 3 were planned)
  • 26. Other Options to Consider (or: How I Learned to Stop Worrying and Start Over From the Beginning)  Explore additional resource management systems • ie., OpenAIS + Pacemaker: N+1, N-to-N • Be resource aware, not just machine aware  Consider additional file system replication methods • ie., GlusterFS, Red Hat GFS • SAN/iSCSI backed  Virtualized solutions?  Other things I don’t know about yet • Solutions to the problem exist • Work with something you’re comfortable with  http://www.cloudera.com/blog/2009/07/22/hadoop-ha-configuration/ 26