SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Grid Operations



          Hadoop Operations at LinkedIn
          Allen Wittenauer
          Grid Computing Architect


          ©2013 LinkedIn Corporation. All Rights Reserved.


Thursday, March 28, 2013
“Hadoop is not a developer problem;
                                   it’s an operations problem.”
                                -- Hadoop vendor ex-employee




          ©2013 LinkedIn Corporation. All Rights Reserved.


Thursday, March 28, 2013
©2013 LinkedIn Corporation. All Rights Reserved.


Thursday, March 28, 2013
§ August 2009
               – 20 Nodes in 1 grid
               – Apache Hadoop 0.20.0
               – No configuration management
               – No monitoring
               – No security
               – Free for all, including random mafia hits on running jobs
               – FIFO Scheduling
               – ~20 users
               – 20 tasks per node
               – Solaris

               – No operational support




          ©2013 LinkedIn Corporation. All Rights Reserved.                   GRID OPERATIONS

Thursday, March 28, 2013
©2013 LinkedIn Corporation. All Rights Reserved.   GRID OPERATIONS

Thursday, March 28, 2013
How We Fixed This
                                                    (In Chronological Order)




          ©2013 LinkedIn Corporation. All Rights Reserved.


Thursday, March 28, 2013
Year One




          ©2013 LinkedIn Corporation. All Rights Reserved.


Thursday, March 28, 2013
§ Dropped task count
               – 10 mappers => 7 mappers
               – 10 reducers => 5 reducers


            § Reworked ETL
               – hourlies => dailies
               – Re-ordered to take advantage of compression
                  § 10x storage improvement
               – Sample impact on one job (not workflow!):
                  § 80,000 map tasks => 2,000 map tasks
                  § Run time cut in half


            § Optimize work flows/culture shift
                  § More task time, less tasks
                  § Production review to reinforce good behavio(u)r



          ©2013 LinkedIn Corporation. All Rights Reserved.             GRID OPERATIONS

Thursday, March 28, 2013
§ Switched to Capacity Scheduler                5% ETL Tasks
               – FIFO is terrible                       15% Fast Queue:
               – Fair Share only viable for small tasks - Task Time < 15 Minutes
                                                        - Job Time < 1 Hour
               – Enforced SLAs via custom patch
                                                        - Slot stealing from "Slow" Queue

            § Submitted Jar Size Limit
                                                             80% Slow Queue:
               – Encourage distributed cache usage           - Job Time < 24 Hours
               – Enforced limit via custom patch             - Up to 80% of slots




          ©2013 LinkedIn Corporation. All Rights Reserved.                              GRID OPERATIONS

Thursday, March 28, 2013
§ Benchmarking
              – Use production code not TeraSort!

                             Old Node:                       New Node:
                             - 2 Rack Units                  - 1 Rack Unit
                             - 2 CPUs                        - 2 CPUs
                             - 16 GB                         - 24 or 32 GB
                             - 8 x 1 TB SATA                 - 6 x 2 TB SATA
                             - 1 x 2 gb NIC                  - 1 x 1 gb NIC



           § Cut cost per unit in half
           § 2x nodes per rack
           § Extra RAM
              – buffering
              – bus speed


          ©2013 LinkedIn Corporation. All Rights Reserved.                     GRID OPERATIONS

Thursday, March 28, 2013
©2013 LinkedIn Corporation. All Rights Reserved.   GRID OPERATIONS

Thursday, March 28, 2013
Year Two




          ©2013 LinkedIn Corporation. All Rights Reserved.


Thursday, March 28, 2013
©2013 LinkedIn Corporation. All Rights Reserved.   GRID OPERATIONS

Thursday, March 28, 2013
§ DataNode disk partitioning
               – Separate file systems for different purposes

                                                   20 GB        200 GB
                                                                                HDFS
                                                    /, ...        MR

                                                                         ...

                                              5GB            200 GB
                                                                               HDFS
                                              Swap             MR


               – Mount options: noatime, commit=30, data=writeback


            § NN, JT, etc
               – No “special hardware” == use SW RAID




          ©2013 LinkedIn Corporation. All Rights Reserved.                             GRID OPERATIONS

Thursday, March 28, 2013
LDAP Master              Multi
                                                                                   LDAP Master
                                                  +                   Master           +
                                                                     Replication
                                              KDC Master                              KDC



                                              LDAP/KDC                             LDAP/KDC
                                                Slaves                               Slaves


                                                   username, uid                      username, uid
                                                  group name, gid                    group name, gid
                                                 netgroup, sudoers                  netgroup, sudoers



                                                         nscd                             nscd

                                                 Client Node                        Client Node



          ©2013 LinkedIn Corporation. All Rights Reserved.                                              GRID OPERATIONS

Thursday, March 28, 2013
©2013 LinkedIn Corporation. All Rights Reserved.   GRID OPERATIONS

Thursday, March 28, 2013
Host                                      bcfg2 Server
                                                             Group1,
                                                             Group2,
                                                                ...              Group1 -> Svc1, Svc2, ...
                                            bcfg2
                                                                                 Group2 -> Svc1, Svc3, ...
                                            client                     Svc1+
                                                                                 Group3 -> Svc4, Svc5, ...
                                                                       Svc2+
                                                                        Svc3
                                                                       Content




            § Service Bundle
               – RPMs, config files, etc
               – Conflict resolution




          ©2013 LinkedIn Corporation. All Rights Reserved.                                                   GRID OPERATIONS

Thursday, March 28, 2013
§ Different RPM names + different install locations = pre-deploy-ability:



                   Object                                    RPM Name                    File Path

                   Hadoop 1.0.4-p3 Binaries                  hadoop-1043-bin-1.0.4-3     /dir/hadoop-1.0.4-p3

                   Grid Config for 1.0.4-p3                  gridname-1043-              /dir/grid-conf-1.0.4-p3
                                                             hadoopconf-1.0.4.3-1
                   Hadoop 1.1.2-p1 Binaries                  hadoop-1121-bin-1.1.2.1-1   /dir/hadoop-1.1.2-p1

                   Grid Config for 1.1.2-p1                  gridname-1043-              /dir/grid-conf-1.1.2-p1
                                                             hadoopconf-1.0.4.3-1




          ©2013 LinkedIn Corporation. All Rights Reserved.                                                         GRID OPERATIONS

Thursday, March 28, 2013
Year Three+




          ©2013 LinkedIn Corporation. All Rights Reserved.


Thursday, March 28, 2013
Corp IT
                                                                                       Grid Realm
                               Active Directory                   krbtgt/GRID@CORP
                                                                                        @GRID
                                  @CORP



                                        Password
                                                                                      krbtgt/host@GRID
                                                                                     krbtgt/service@GRID




                                                              krbtgt/user@CORP           Hadoop
                                                             krbtgt/GRID@CORP
                                                                                         Services




          ©2013 LinkedIn Corporation. All Rights Reserved.                                                 GRID OPERATIONS

Thursday, March 28, 2013
Many months moving to secure Apache Hadoop...




          ©2013 LinkedIn Corporation. All Rights Reserved.


Thursday, March 28, 2013
©2013 LinkedIn Corporation. All Rights Reserved.   GRID OPERATIONS

Thursday, March 28, 2013
©2013 LinkedIn Corporation. All Rights Reserved.   GRID OPERATIONS

Thursday, March 28, 2013
§ March 2013
               – 5000 Nodes in ~10 grids
               – Apache Hadoop 1.0.4 + custom patches
               – Full configuration management
               – Full monitoring
               – Security
               – Capacity scheduler with SLA
               – ~700 users
               – 12 tasks per node
               – Linux

               – Five dedicated operations staff members




          ©2013 LinkedIn Corporation. All Rights Reserved.   GRID OPERATIONS

Thursday, March 28, 2013
©2013 LinkedIn Corporation. All Rights Reserved.   GRID OPERATIONS

Thursday, March 28, 2013
Future Work




          ©2013 LinkedIn Corporation. All Rights Reserved.


Thursday, March 28, 2013
Is ‘pure Hadoop’ the right
                                             tool for all of our workloads?




          ©2013 LinkedIn Corporation. All Rights Reserved.


Thursday, March 28, 2013
YARN   PBS


                                                       H
                                                       D
                                                       F
                                                       S

                                                       C
                                                       E
                                                       P
                                                       H




          ©2013 LinkedIn Corporation. All Rights Reserved.                GRID OPERATIONS

Thursday, March 28, 2013
©2013 LinkedIn Corporation. All Rights Reserved.   BUSINESS OPERATIONS

Thursday, March 28, 2013
§ More on LinkedIn Hadoop Performance:
               – http://www.slideshare.net/allenwittenauer/2012-lihadoopperf


            § LinkedIn Data Analytics:
               – http://data.linkedin.com/




          ©2013 LinkedIn Corporation. All Rights Reserved.                     GRID OPERATIONS

Thursday, March 28, 2013

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaCloudera, Inc.
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
Nn ha hadoop world.final
Nn ha hadoop world.finalNn ha hadoop world.final
Nn ha hadoop world.finalHortonworks
 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureCloudera, Inc.
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersAmal G Jose
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoopmarkgrover
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit
 
Operate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineOperate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineDataWorks Summit
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best PracticesCloudera, Inc.
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopDataWorks Summit
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloudSteve Loughran
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Stefan Lipp
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastuctureDataWorks Summit
 
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsJignesh Shah
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera, Inc.
 

Was ist angesagt? (20)

Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
Nn ha hadoop world.final
Nn ha hadoop world.finalNn ha hadoop world.final
Nn ha hadoop world.final
 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and Future
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
Operate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineOperate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmine
 
YARN
YARNYARN
YARN
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best Practices
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing Hadoop
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastucture
 
Hadoop on VMware
Hadoop on VMwareHadoop on VMware
Hadoop on VMware
 
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 

Andere mochten auch

Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hortonworks
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkDatabricks
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark InternalsPietro Michiardi
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingCloudera, Inc.
 
Apache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerApache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerIMC Institute
 

Andere mochten auch (8)

Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer Training
 
Apache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerApache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainer
 

Ähnlich wie Hadoop Operations at LinkedIn

Hadoop Operations at LinkedIn
Hadoop Operations at LinkedInHadoop Operations at LinkedIn
Hadoop Operations at LinkedInAllen Wittenauer
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedInAllen Wittenauer
 
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)DataWorks Summit
 
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014Dave Stokes
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 
Get to know the browser better and write faster web apps
Get to know the browser better   and write faster web appsGet to know the browser better   and write faster web apps
Get to know the browser better and write faster web appsLior Bar-On
 
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Clusterpercona2013
 
SaltConf 2014: Safety with powertools
SaltConf 2014: Safety with powertoolsSaltConf 2014: Safety with powertools
SaltConf 2014: Safety with powertoolsThomas Jackson
 
SaltConf14 - Thomas Jackson, LinkedIn - Safety with Power Tools
SaltConf14 - Thomas Jackson, LinkedIn - Safety with Power ToolsSaltConf14 - Thomas Jackson, LinkedIn - Safety with Power Tools
SaltConf14 - Thomas Jackson, LinkedIn - Safety with Power ToolsSaltStack
 
Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data DeduplicationRedWireServices
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...Big Data Montreal
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
 
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRBig Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRVijay Rayapati
 
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison Ceph Community
 
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison Ceph Community
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccsrisatish ambati
 
Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)Valerii Kravchuk
 
Java Day Minsk 2016 Keynote about Microservices in real world
Java Day Minsk 2016 Keynote about Microservices in real worldJava Day Minsk 2016 Keynote about Microservices in real world
Java Day Minsk 2016 Keynote about Microservices in real worldКирилл Толкачёв
 

Ähnlich wie Hadoop Operations at LinkedIn (20)

Hadoop Operations at LinkedIn
Hadoop Operations at LinkedInHadoop Operations at LinkedIn
Hadoop Operations at LinkedIn
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
 
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
 
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Get to know the browser better and write faster web apps
Get to know the browser better   and write faster web appsGet to know the browser better   and write faster web apps
Get to know the browser better and write faster web apps
 
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Cluster
 
Spark 101
Spark 101Spark 101
Spark 101
 
SaltConf 2014: Safety with powertools
SaltConf 2014: Safety with powertoolsSaltConf 2014: Safety with powertools
SaltConf 2014: Safety with powertools
 
SaltConf14 - Thomas Jackson, LinkedIn - Safety with Power Tools
SaltConf14 - Thomas Jackson, LinkedIn - Safety with Power ToolsSaltConf14 - Thomas Jackson, LinkedIn - Safety with Power Tools
SaltConf14 - Thomas Jackson, LinkedIn - Safety with Power Tools
 
Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data Deduplication
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRBig Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
 
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
 
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)
 
Java Day Minsk 2016 Keynote about Microservices in real world
Java Day Minsk 2016 Keynote about Microservices in real worldJava Day Minsk 2016 Keynote about Microservices in real world
Java Day Minsk 2016 Keynote about Microservices in real world
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Kürzlich hochgeladen (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Hadoop Operations at LinkedIn

  • 1. Grid Operations Hadoop Operations at LinkedIn Allen Wittenauer Grid Computing Architect ©2013 LinkedIn Corporation. All Rights Reserved. Thursday, March 28, 2013
  • 2. “Hadoop is not a developer problem; it’s an operations problem.” -- Hadoop vendor ex-employee ©2013 LinkedIn Corporation. All Rights Reserved. Thursday, March 28, 2013
  • 3. ©2013 LinkedIn Corporation. All Rights Reserved. Thursday, March 28, 2013
  • 4. § August 2009 – 20 Nodes in 1 grid – Apache Hadoop 0.20.0 – No configuration management – No monitoring – No security – Free for all, including random mafia hits on running jobs – FIFO Scheduling – ~20 users – 20 tasks per node – Solaris – No operational support ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 5. ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 6. How We Fixed This (In Chronological Order) ©2013 LinkedIn Corporation. All Rights Reserved. Thursday, March 28, 2013
  • 7. Year One ©2013 LinkedIn Corporation. All Rights Reserved. Thursday, March 28, 2013
  • 8. § Dropped task count – 10 mappers => 7 mappers – 10 reducers => 5 reducers § Reworked ETL – hourlies => dailies – Re-ordered to take advantage of compression § 10x storage improvement – Sample impact on one job (not workflow!): § 80,000 map tasks => 2,000 map tasks § Run time cut in half § Optimize work flows/culture shift § More task time, less tasks § Production review to reinforce good behavio(u)r ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 9. § Switched to Capacity Scheduler 5% ETL Tasks – FIFO is terrible 15% Fast Queue: – Fair Share only viable for small tasks - Task Time < 15 Minutes - Job Time < 1 Hour – Enforced SLAs via custom patch - Slot stealing from "Slow" Queue § Submitted Jar Size Limit 80% Slow Queue: – Encourage distributed cache usage - Job Time < 24 Hours – Enforced limit via custom patch - Up to 80% of slots ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 10. § Benchmarking – Use production code not TeraSort! Old Node: New Node: - 2 Rack Units - 1 Rack Unit - 2 CPUs - 2 CPUs - 16 GB - 24 or 32 GB - 8 x 1 TB SATA - 6 x 2 TB SATA - 1 x 2 gb NIC - 1 x 1 gb NIC § Cut cost per unit in half § 2x nodes per rack § Extra RAM – buffering – bus speed ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 11. ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 12. Year Two ©2013 LinkedIn Corporation. All Rights Reserved. Thursday, March 28, 2013
  • 13. ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 14. § DataNode disk partitioning – Separate file systems for different purposes 20 GB 200 GB HDFS /, ... MR ... 5GB 200 GB HDFS Swap MR – Mount options: noatime, commit=30, data=writeback § NN, JT, etc – No “special hardware” == use SW RAID ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 15. LDAP Master Multi LDAP Master + Master + Replication KDC Master KDC LDAP/KDC LDAP/KDC Slaves Slaves username, uid username, uid group name, gid group name, gid netgroup, sudoers netgroup, sudoers nscd nscd Client Node Client Node ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 16. ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 17. Host bcfg2 Server Group1, Group2, ... Group1 -> Svc1, Svc2, ... bcfg2 Group2 -> Svc1, Svc3, ... client Svc1+ Group3 -> Svc4, Svc5, ... Svc2+ Svc3 Content § Service Bundle – RPMs, config files, etc – Conflict resolution ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 18. § Different RPM names + different install locations = pre-deploy-ability: Object RPM Name File Path Hadoop 1.0.4-p3 Binaries hadoop-1043-bin-1.0.4-3 /dir/hadoop-1.0.4-p3 Grid Config for 1.0.4-p3 gridname-1043- /dir/grid-conf-1.0.4-p3 hadoopconf-1.0.4.3-1 Hadoop 1.1.2-p1 Binaries hadoop-1121-bin-1.1.2.1-1 /dir/hadoop-1.1.2-p1 Grid Config for 1.1.2-p1 gridname-1043- /dir/grid-conf-1.1.2-p1 hadoopconf-1.0.4.3-1 ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 19. Year Three+ ©2013 LinkedIn Corporation. All Rights Reserved. Thursday, March 28, 2013
  • 20. Corp IT Grid Realm Active Directory krbtgt/GRID@CORP @GRID @CORP Password krbtgt/host@GRID krbtgt/service@GRID krbtgt/user@CORP Hadoop krbtgt/GRID@CORP Services ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 21. Many months moving to secure Apache Hadoop... ©2013 LinkedIn Corporation. All Rights Reserved. Thursday, March 28, 2013
  • 22. ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 23. ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 24. § March 2013 – 5000 Nodes in ~10 grids – Apache Hadoop 1.0.4 + custom patches – Full configuration management – Full monitoring – Security – Capacity scheduler with SLA – ~700 users – 12 tasks per node – Linux – Five dedicated operations staff members ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 25. ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 26. Future Work ©2013 LinkedIn Corporation. All Rights Reserved. Thursday, March 28, 2013
  • 27. Is ‘pure Hadoop’ the right tool for all of our workloads? ©2013 LinkedIn Corporation. All Rights Reserved. Thursday, March 28, 2013
  • 28. YARN PBS H D F S C E P H ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013
  • 29. ©2013 LinkedIn Corporation. All Rights Reserved. BUSINESS OPERATIONS Thursday, March 28, 2013
  • 30. § More on LinkedIn Hadoop Performance: – http://www.slideshare.net/allenwittenauer/2012-lihadoopperf § LinkedIn Data Analytics: – http://data.linkedin.com/ ©2013 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS Thursday, March 28, 2013