SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Status of Hadoop 0.23
Operations at Yahoo!
From Inception to Customer Validation

Charles Wimmer, Staff Site Reliability
        Engineer at LinkedIn
Summary of This Talk
● Includes
  ○ Operational changes required to support 0.23
● Does not include
  ○ Specifics about customer testing
  ○ Deployment into Research or Production clusters
Scope of This Change at Yahoo!
●   42,000+ Hadoop servers
●   20+ clusters
●   Three tiers: Sandbox, Research, Production
●   0.20.205.x
Overview of the Process
● Provide customers a 0.23 Sandbox cluster
● Provide customers enough data to test their
  applications
● Provide developer support to address
  application issues quickly
● Upgrade Research and Production clusters
  as applications are certified to work with 0.23
Test Cluster
●   420 Nodes
●   2 x Westmere 4 core processors
●   24G RAM
●   12 x 2T Disks
●   No Federation
Configuration
● Hierarchical Queues
● Memory Configuration
● Kerberos
Hierarchical Queues
<property>
  <name>yarn.scheduler.capacity.root.queues</name>
  <value>BIZUNIT-A,BIZUNIT-U,BIZUNIT-C,unfunded</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.capacity</name>
  <value>100</value>
</property>
Hierarchical Queues
<property>
 <name>yarn.scheduler.capacity.root.BIZUNIT-A.capacity</name>
  <value>50</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.BIZUNIT-U.capacity</name>
  <value>30</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.BIZUNIT-C.capacity</name>
  <value>15</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.unfunded.capacity</name>
  <value>5</value>
</property>
Hierarchical Queues
<property>
  <name>yarn.scheduler.capacity.root.BIZUNIT-U.proj-a.
capacity</name>
  <value>50</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.BIZUNIT-U.proj-b.
capacity</name>
  <value>50</value>
</property>
Hierarchical Queues
Memory Configuration
  <property> <name>yarn.
nodemanager.resource.memory-
mb</name>
  <value>21504</value>
  </property>
Kerberos Configuration
  <property>
   <name>yarn.resourcemanager.principal</name>
    <value>mapred/clustername-jt1.domain.name.com@REALM.
NAME.COM</value>
  </property>



  <property>
   <name>yarn.nodemanager.principal</name>
   <value>tt/_HOST@REALM.NAME.COM</value>
  </property>
Init Scripts
●   DataNode/NameNode
●   SecondaryNameNode
●   HistoryServer
●   NodeManager
●   ResourceManager
DataNode/NameNode
start_20(){
 . . .
}
start_next(){
 . . .
}
if [ -x /home/gs/hadoop/current/bin/hdfs ] ; then
   start_next $@
else
   start_20 $@
fi
SecondaryNameNode
function clean_checkpoint_dir {
 CHECKPOINT_DIR=/grid/0/tmp/hadoop-
hdfs/dfs/namesecondary/current
 if [ -d "$CHECKPOINT_DIR" ] ; then
   DELETE_DIR=`mktemp -p /grid/0/tmp -d delete-XXXXXX`
    if [ $? -eq 0 ] ; then
    echo "moving $CHECKPOINT_DIR to ${DELETE_DIR}/ "
    mv $CHECKPOINT_DIR ${DELETE_DIR}/
    cat<<EOF | at now+1min 2>/dev/null
if [ -d $DELETE_DIR ] ; then
    rm -rf --preserve-root $DELETE_DIR
fi
EOF
    fi
 fi
}
HistoryServer
case "$1" in
 start)
   su $HADOOP_USER -s /bin/sh -c 
"$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh
--config $HADOOP_CONF_DIR start
historyserver"
   RET=$?
   ;;
ResourceManager/NodeManager
case "$1" in
 start)
  su $HADOOP_USER -s /bin/sh -c 
"$HADOOP_PREFIX/sbin/yarn-daemon.sh --config
$HADOOP_CONF_DIR start $PROC"
  RET=$?
  ;;
Questions?
Charles Wimmer

@cwimmer

charles@wimmer.net

cwimmer@linkedin.com

Weitere ähnliche Inhalte

Was ist angesagt?

My two cents about Mysql backup
My two cents about Mysql backupMy two cents about Mysql backup
My two cents about Mysql backupAndrejs Vorobjovs
 
MySQL Backup and Security Best Practices
MySQL Backup and Security Best PracticesMySQL Backup and Security Best Practices
MySQL Backup and Security Best PracticesLenz Grimmer
 
patroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymentpatroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymenthyeongchae lee
 
MySQL Server Backup, Restoration, and Disaster Recovery Planning
MySQL Server Backup, Restoration, and Disaster Recovery PlanningMySQL Server Backup, Restoration, and Disaster Recovery Planning
MySQL Server Backup, Restoration, and Disaster Recovery PlanningLenz Grimmer
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
MySQL Performance Tuning Variables
MySQL Performance Tuning VariablesMySQL Performance Tuning Variables
MySQL Performance Tuning VariablesFromDual GmbH
 
Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016PostgreSQL-Consulting
 
[2019] 200만 동접 게임을 위한 MySQL 샤딩
[2019] 200만 동접 게임을 위한 MySQL 샤딩[2019] 200만 동접 게임을 위한 MySQL 샤딩
[2019] 200만 동접 게임을 위한 MySQL 샤딩NHN FORWARD
 
Instroduce Hazelcast
Instroduce HazelcastInstroduce Hazelcast
Instroduce Hazelcastlunfu zhong
 
Apache ignite - a do-it-all key-value db?
Apache ignite - a do-it-all key-value db?Apache ignite - a do-it-all key-value db?
Apache ignite - a do-it-all key-value db?Zaar Hai
 
Backup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryBackup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryMongoDB
 
MySQL Timeout Variables Explained
MySQL Timeout Variables Explained MySQL Timeout Variables Explained
MySQL Timeout Variables Explained Mydbops
 
Hosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCRHosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCRWoonsan Ko
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBMydbops
 
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesPostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesSperasoft
 
Postgres Plus Cloud Database on OpenStack
Postgres Plus Cloud Database on OpenStackPostgres Plus Cloud Database on OpenStack
Postgres Plus Cloud Database on OpenStackKamesh Pemmaraju
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Denish Patel
 

Was ist angesagt? (20)

My two cents about Mysql backup
My two cents about Mysql backupMy two cents about Mysql backup
My two cents about Mysql backup
 
MySQL Backup and Security Best Practices
MySQL Backup and Security Best PracticesMySQL Backup and Security Best Practices
MySQL Backup and Security Best Practices
 
patroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymentpatroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deployment
 
Broker otw.pptx
Broker otw.pptxBroker otw.pptx
Broker otw.pptx
 
MySQL Server Backup, Restoration, and Disaster Recovery Planning
MySQL Server Backup, Restoration, and Disaster Recovery PlanningMySQL Server Backup, Restoration, and Disaster Recovery Planning
MySQL Server Backup, Restoration, and Disaster Recovery Planning
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
MySQL Performance Tuning Variables
MySQL Performance Tuning VariablesMySQL Performance Tuning Variables
MySQL Performance Tuning Variables
 
Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016
 
SQL Server vs Postgres
SQL Server vs PostgresSQL Server vs Postgres
SQL Server vs Postgres
 
[2019] 200만 동접 게임을 위한 MySQL 샤딩
[2019] 200만 동접 게임을 위한 MySQL 샤딩[2019] 200만 동접 게임을 위한 MySQL 샤딩
[2019] 200만 동접 게임을 위한 MySQL 샤딩
 
Instroduce Hazelcast
Instroduce HazelcastInstroduce Hazelcast
Instroduce Hazelcast
 
Apache ignite - a do-it-all key-value db?
Apache ignite - a do-it-all key-value db?Apache ignite - a do-it-all key-value db?
Apache ignite - a do-it-all key-value db?
 
Backup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryBackup, Restore, and Disaster Recovery
Backup, Restore, and Disaster Recovery
 
MySQL Timeout Variables Explained
MySQL Timeout Variables Explained MySQL Timeout Variables Explained
MySQL Timeout Variables Explained
 
Get to know PostgreSQL!
Get to know PostgreSQL!Get to know PostgreSQL!
Get to know PostgreSQL!
 
Hosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCRHosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCR
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
 
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesPostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
 
Postgres Plus Cloud Database on OpenStack
Postgres Plus Cloud Database on OpenStackPostgres Plus Cloud Database on OpenStack
Postgres Plus Cloud Database on OpenStack
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
 

Ähnlich wie Status of Hadoop 0.23 Operations at Yahoo

Kerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastKerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastJorge Lopez-Malla
 
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Matt Fuller
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQLMydbops
 
Trouble shooting apachecloudstack
Trouble shooting apachecloudstackTrouble shooting apachecloudstack
Trouble shooting apachecloudstackSailaja Sunil
 
Oracle database 12.2 new features
Oracle database 12.2 new featuresOracle database 12.2 new features
Oracle database 12.2 new featuresAlfredo Krieg
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
3. v sphere big data extensions
3. v sphere big data extensions3. v sphere big data extensions
3. v sphere big data extensionsChiou-Nan Chen
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDoKC
 
REST in Piece - Administration of an Oracle Cluster/Database using REST
REST in Piece - Administration of an Oracle Cluster/Database using RESTREST in Piece - Administration of an Oracle Cluster/Database using REST
REST in Piece - Administration of an Oracle Cluster/Database using RESTChristian Gohmann
 
SQL Server 2014 Hybrid Cloud Features
SQL Server 2014 Hybrid Cloud FeaturesSQL Server 2014 Hybrid Cloud Features
SQL Server 2014 Hybrid Cloud FeaturesGuillermo Caicedo
 
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...Timofey Turenko
 
Craft CMS: Beyond the Small Business; Advanced tools and configurations
Craft CMS: Beyond the Small Business; Advanced tools and configurationsCraft CMS: Beyond the Small Business; Advanced tools and configurations
Craft CMS: Beyond the Small Business; Advanced tools and configurationsNate Iler
 
Oracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesOracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesBobby Curtis
 
Moving a Windows environment to the cloud - DevOps Galway Meetup
Moving a Windows environment to the cloud - DevOps Galway MeetupMoving a Windows environment to the cloud - DevOps Galway Meetup
Moving a Windows environment to the cloud - DevOps Galway MeetupGiulio Vian
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheusBob Cotton
 
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppet
 
Troubleshooting Apache Cloudstack
Troubleshooting Apache CloudstackTroubleshooting Apache Cloudstack
Troubleshooting Apache CloudstackRadhika Puthiyetath
 
Monitoring your technology stack with New Relic
Monitoring your technology stack with New RelicMonitoring your technology stack with New Relic
Monitoring your technology stack with New RelicRonald Bradford
 
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...European Collaboration Summit
 

Ähnlich wie Status of Hadoop 0.23 Operations at Yahoo (20)

Kerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastKerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit east
 
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQL
 
Trouble shooting apachecloudstack
Trouble shooting apachecloudstackTrouble shooting apachecloudstack
Trouble shooting apachecloudstack
 
Oracle database 12.2 new features
Oracle database 12.2 new featuresOracle database 12.2 new features
Oracle database 12.2 new features
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
3. v sphere big data extensions
3. v sphere big data extensions3. v sphere big data extensions
3. v sphere big data extensions
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on Kubernetes
 
REST in Piece - Administration of an Oracle Cluster/Database using REST
REST in Piece - Administration of an Oracle Cluster/Database using RESTREST in Piece - Administration of an Oracle Cluster/Database using REST
REST in Piece - Administration of an Oracle Cluster/Database using REST
 
Ci for all
Ci for allCi for all
Ci for all
 
SQL Server 2014 Hybrid Cloud Features
SQL Server 2014 Hybrid Cloud FeaturesSQL Server 2014 Hybrid Cloud Features
SQL Server 2014 Hybrid Cloud Features
 
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
 
Craft CMS: Beyond the Small Business; Advanced tools and configurations
Craft CMS: Beyond the Small Business; Advanced tools and configurationsCraft CMS: Beyond the Small Business; Advanced tools and configurations
Craft CMS: Beyond the Small Business; Advanced tools and configurations
 
Oracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesOracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best Practices
 
Moving a Windows environment to the cloud - DevOps Galway Meetup
Moving a Windows environment to the cloud - DevOps Galway MeetupMoving a Windows environment to the cloud - DevOps Galway Meetup
Moving a Windows environment to the cloud - DevOps Galway Meetup
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheus
 
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
 
Troubleshooting Apache Cloudstack
Troubleshooting Apache CloudstackTroubleshooting Apache Cloudstack
Troubleshooting Apache Cloudstack
 
Monitoring your technology stack with New Relic
Monitoring your technology stack with New RelicMonitoring your technology stack with New Relic
Monitoring your technology stack with New Relic
 
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Kürzlich hochgeladen (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Status of Hadoop 0.23 Operations at Yahoo

  • 1. Status of Hadoop 0.23 Operations at Yahoo! From Inception to Customer Validation Charles Wimmer, Staff Site Reliability Engineer at LinkedIn
  • 2. Summary of This Talk ● Includes ○ Operational changes required to support 0.23 ● Does not include ○ Specifics about customer testing ○ Deployment into Research or Production clusters
  • 3. Scope of This Change at Yahoo! ● 42,000+ Hadoop servers ● 20+ clusters ● Three tiers: Sandbox, Research, Production ● 0.20.205.x
  • 4. Overview of the Process ● Provide customers a 0.23 Sandbox cluster ● Provide customers enough data to test their applications ● Provide developer support to address application issues quickly ● Upgrade Research and Production clusters as applications are certified to work with 0.23
  • 5. Test Cluster ● 420 Nodes ● 2 x Westmere 4 core processors ● 24G RAM ● 12 x 2T Disks ● No Federation
  • 6. Configuration ● Hierarchical Queues ● Memory Configuration ● Kerberos
  • 7. Hierarchical Queues <property> <name>yarn.scheduler.capacity.root.queues</name> <value>BIZUNIT-A,BIZUNIT-U,BIZUNIT-C,unfunded</value> </property> <property> <name>yarn.scheduler.capacity.root.capacity</name> <value>100</value> </property>
  • 8. Hierarchical Queues <property> <name>yarn.scheduler.capacity.root.BIZUNIT-A.capacity</name> <value>50</value> </property> <property> <name>yarn.scheduler.capacity.root.BIZUNIT-U.capacity</name> <value>30</value> </property> <property> <name>yarn.scheduler.capacity.root.BIZUNIT-C.capacity</name> <value>15</value> </property> <property> <name>yarn.scheduler.capacity.root.unfunded.capacity</name> <value>5</value> </property>
  • 9. Hierarchical Queues <property> <name>yarn.scheduler.capacity.root.BIZUNIT-U.proj-a. capacity</name> <value>50</value> </property> <property> <name>yarn.scheduler.capacity.root.BIZUNIT-U.proj-b. capacity</name> <value>50</value> </property>
  • 11. Memory Configuration <property> <name>yarn. nodemanager.resource.memory- mb</name> <value>21504</value> </property>
  • 12. Kerberos Configuration <property> <name>yarn.resourcemanager.principal</name> <value>mapred/clustername-jt1.domain.name.com@REALM. NAME.COM</value> </property> <property> <name>yarn.nodemanager.principal</name> <value>tt/_HOST@REALM.NAME.COM</value> </property>
  • 13. Init Scripts ● DataNode/NameNode ● SecondaryNameNode ● HistoryServer ● NodeManager ● ResourceManager
  • 14. DataNode/NameNode start_20(){ . . . } start_next(){ . . . } if [ -x /home/gs/hadoop/current/bin/hdfs ] ; then start_next $@ else start_20 $@ fi
  • 15. SecondaryNameNode function clean_checkpoint_dir { CHECKPOINT_DIR=/grid/0/tmp/hadoop- hdfs/dfs/namesecondary/current if [ -d "$CHECKPOINT_DIR" ] ; then DELETE_DIR=`mktemp -p /grid/0/tmp -d delete-XXXXXX` if [ $? -eq 0 ] ; then echo "moving $CHECKPOINT_DIR to ${DELETE_DIR}/ " mv $CHECKPOINT_DIR ${DELETE_DIR}/ cat<<EOF | at now+1min 2>/dev/null if [ -d $DELETE_DIR ] ; then rm -rf --preserve-root $DELETE_DIR fi EOF fi fi }
  • 16. HistoryServer case "$1" in start) su $HADOOP_USER -s /bin/sh -c "$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver" RET=$? ;;
  • 17. ResourceManager/NodeManager case "$1" in start) su $HADOOP_USER -s /bin/sh -c "$HADOOP_PREFIX/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start $PROC" RET=$? ;;