SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
Hadoop at Rakuten.

          Rakuten Inc. Architect Group
Hamba Mitsuharu & Nakagawa Gen 2011/07/06(Wed)   1
Hadoop at Rakuten.



Today’s Agenda.
1. Our Profie.
2. What is Hadoop?
3. Our Current Hadoop System Overview.
4. Our Hadoop Usage.
5. Our Challenge.
6. Our Future Plan.
                                         2
Hadoop at Rakuten.




Our Profile.


                      3
Our Profile.



From ACT Group
Nakagawa Gen
Hamba Mitsuharu


                       4
Our Profile.



Our Mission
Enhancing Hadoop at Rakuten.




                               5
Our Profile.



Latest Our Tasks.
Done.
1.Implementing Ganglia.
2.Implementing HA.
                          6
Our Profile.



Latest Our Tasks.
Now Handing Over.
1. Keeping Up Our Hadoop Cluster.
2. Modifying Our Hadoop Configurations.
3. Implementing Scripts for Daily Chores.

                                            7
Our Profile.



Latest Our Tasks.
Concentrate It!
1. Evaluating The Related Products.



                                      8
Hadoop at Rakuten.




What is Hadoop?


                         9
What is Hadoop?



One of The Most Powerful
Distributed Processing
for Large Data Sets.


                          10
What is Hadoop?



Distributions.




                           11
What is Hadoop?



Ecosystem.




                ETC...
                          12
What is Hadoop?



HDFS & MapReduce
Constitute Hadoop.
HDFS :
Hadoop Distributed File System.
MapReduce :
Map & Reduce (Includes Shuffle & Sort) .
                                           13
What is Hadoop?


Input from HDFS.
         Process by MapReduce.
                       Output to HDFS.




                     Source : http://horicky.blogspot.com/2008_11_01_archive.html
                                                                               14
What is Hadoop?


Simple Example.




                     Source : http://techblog.yahoo.co.jp/cat207/cat209/hadoop/
                                                                            15
What is Hadoop?


In Common Case,
Combine Several Simple Jobs.




                   Source : http://horicky.blogspot.com/2008_11_01_archive.html
                                                                             16
What is Hadoop?


NameNode & DataNode
Constitute
HDFS.



                 Source : http://horicky.blogspot.com/2008_11_01_archive.html
                                                                           17
What is Hadoop?


Read & Write on HDFS.




     Source : http://hadoop.apache.org/common/docs/current/hdfs_design.html#NameNode+and+DataNodes
                                                                                                18
What is Hadoop?


JobTracker & TaskTracker
Constitute
MapReduce.




Source : http://horicky.blogspot.com/2008_11_01_archive.html
                                                               19
What is Hadoop?


Good & Bad Points of Hadoop.
 Good!
Easy to Scale Out System.
Easy to Implement Distributed Processing.
 Bad…
There is SPoF at NameNode.

                                            20
Hadoop at Rakuten.




Our Current Hadoop
 System Overview.

                          21
Our Current Hadoop System Overview.


The Cluster Infrastructure. #1
For Instance.




                       Source : http://www.ibm.com/developerworks/linux/library/l-hadoop/
                                                                                       22
Our Current Hadoop System Overview.


  The Cluster Infrastructure. #2
                                                                                      Client
   In Our Case.                                            Switch
                                                                          1Gbps


                         Switch                            Switch                              Switch
          1Gbps                                                           1Gbps                                 1Gbps




NN&JT                               x3    SNN                               x3    NN&JT                                 x3
 Active                                                                           Standby
                                  DN&TT                                   DN&TT                                    DN&TT

                  x10     x10                         x10           x10                          x10      x10

   x18

            3 Masters & 69 Slaves.          x18                                      x18




Others     DN&TT        DN&TT             Others   DN&TT      DN&TT               Others       DN&TT    DN&TT

 Rack       Rack        Rack      Rack    Rack      Rack        Rack      Rack     Rack        Rack      Rack       Rack
                                                                                                                        23
Our Current Hadoop System Overview.


The Monitoring System.
Using Ganglia (& MRTG).
Every Time We Easily Can Check
The Resource Usage,
Not Only Each Machine
But As Cluster.


                                             24
Our Current Hadoop System Overview.


High Availability.                                                           Client

Using DRBD & HeartBeat.                                                                       NN : NameNode
                                                                                              JT : JobTracker
                                       v-host.rakuten.co.jp
                  Active                                            Standby
                     eth1                                             eth1




                NN          JT                                  NN              JT




   /foo/drbd0                    /foo/drbd1            /foo/drbd0                     /foo/drbd1
                     eth0                                             eth0


                                                                                                   Source : Gen
                                     DRBD Sync The Change.
                                                                                                             25
Hadoop at Rakuten.




Our Hadoop Usage.


                          26
Our Hadoop Usage.


Who Is Using Our Hadoop.
1.   Generating Recommend Engine Index.
2.   Analyzing Redirect Log.
3.   Calculating AD Targeting Index.
4.   Measuring AD Effects.
5.   Analyzing Ichiba Merchandise & Order Info.
6.   Calculating Ichiba Product Ranking.
7.   Analyzing Search Log.

8. Analyzing Rakuten Travel’s Access Log. (Coming Soon...)
9. Analyzing Search Word N-gram.            (Coming Soon...)
                                                               27
Our Hadoop Usage.


The Issues of The Previous System.
1. Need High Cost to Keep Up The RDBMS.
2. Need Quite a Lot of Storage Space More & More.
3. System Cannot Handle So Many Job Request
   Due to Low Performance.
                                         Batch Server
     Purchase                                                                 Marketing




                                                        Manipulate
      Shop                               Intermediate                          Utility
                Unload




                                  Load



                         File
                          File                                       File
                                                                      File
                           File                                        File


     Category                            Intermediate                           NFS



      ITEM                  Previous System                                     Mail
                                                                                          28
Our Hadoop Usage.


The Effect of The New System.
1. Get Scalable System at Very Low Cost. (80% OFF as Storage.)
2. Transaction Time is Dramatically Improved. (50-75% OFF.)

                                        Batch Server
                                            with
    Purchase                                                                       Marketing




                                                             Manipulate
     Shop                                                                           Utility
               Unload




                                 Load




                        File
                         File                                             File
                                                                           File
                          File                                              File


   Category                                                                          NFS

                                        Intermediate

     ITEM                                                                            Mail
                        New System!                    1st   Step.                             29
Our Hadoop Usage.


The Remaining Subject of
The New System.
1. Still Halfway to Aiming DWH.
2. The Negative Influence Due to The Migration
   from Occupied Environment to Shared
   Environment.
   1. Security.
   2. Sharing Cluster Resource.
                                                 30
Hadoop at Rakuten.




Our Challenge.


                        31
Our Challenge.


The Issues with Our Hadoop.
1. Likely to Use Up The HDFS Space.
2. Need Much Electlicity Power.
3. Share The Cluster Resource Efficiently.
4. Need More Network Bandwidth.



                                             32
Hadoop at Rakuten.




Our Future Plan.


                         33
Our Future Plan.


Considering New Slave Machine.
Now Looking for a Machine Which has…
Low Electric Power Consumption,
About 6 Cores CPU x2,
About 10TB HDD,
About 96GB Memory,
& Naturally Compatible With Our Data Center.

                                   ?           34
Our Future Plan.


Upgrade from Apache to CDH3.
      Mr.Eric Sammer (Solution Architect at Cloudera)
      Described the Advantage of Hadoop from Cloudera on Quora.
1.    A version of Hadoop that has frequent releases (quarterly) that include bug fixes
      and back ported features (append for HBase, Kerberos security from Y!, etc.).
2.    Related projects (Hive, Pig, Oozie, HBase, Flume, Sqoop, etc.) tested together and
      work as a cohesive system.
3.    Simplified installation via Yum / Apt repositories.
4.    Tighter integration with the OS (init scripts for daemons, installation of things in
      common paths, logs in their proper location.).
5.    A fixed release schedule.
6.    Support available from Cloudera with SLAs.
     Source : http://www.quora.com/What-are-the-advantages-of-getting-Apache-Hadoop-from-Cloudera-rather-than-the-Apache-Software-Foundation]
                                                                                                                                           35
Our Future Plan.


Evaluating HBase Using AWS.
Constructing HBase Cluster on Amazon EC2.
Doing Evaluation & Verification This Summer!




                                               36
Hadoop at Rakuten.



Need Your Help!
We Need Hadooper Much More!
Come With Us!



                              37
Hadoop at Rakuten.




Thank You.



                      38

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationHadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationCloudera, Inc.
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesyarapavan
 
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentOct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentYahoo Developer Network
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Cloudera, Inc.
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionBenoit Perroud
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsTrendProgContest13
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebookparallellabs
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Steve Min
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsCloudera, Inc.
 
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Cloudera, Inc.
 
Hadoop Distributed File System Reliability and Durability at Facebook
Hadoop Distributed File System Reliability and Durability at FacebookHadoop Distributed File System Reliability and Durability at Facebook
Hadoop Distributed File System Reliability and Durability at FacebookDataWorks Summit
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandRichard McDougall
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 

Was ist angesagt? (20)

Hadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationHadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote Presentation
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentOct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment Evolution
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Hadoop
HadoopHadoop
Hadoop
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
Hadoop Tutorial for Beginners
Hadoop Tutorial for BeginnersHadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
 
Hadoop Ecosystem Overview
Hadoop Ecosystem OverviewHadoop Ecosystem Overview
Hadoop Ecosystem Overview
 
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
 
Hadoop Distributed File System Reliability and Durability at Facebook
Hadoop Distributed File System Reliability and Durability at FacebookHadoop Distributed File System Reliability and Durability at Facebook
Hadoop Distributed File System Reliability and Durability at Facebook
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 

Ähnlich wie Hadoop at Rakuten, 2011/07/06

Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confHadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confSujee Maniyam
 
High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)Mário Almeida
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...Big Data Montreal
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimePlamen Jeliazkov
 
Log everything!
Log everything!Log everything!
Log everything!ICANS GmbH
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopJoey Jablonski
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrowSteve Loughran
 
Feb 2013 HUG: HIT (Hadoop Integration Testing) for Automated Certification an...
Feb 2013 HUG: HIT (Hadoop Integration Testing) for Automated Certification an...Feb 2013 HUG: HIT (Hadoop Integration Testing) for Automated Certification an...
Feb 2013 HUG: HIT (Hadoop Integration Testing) for Automated Certification an...Yahoo Developer Network
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Jubatus Invited Talk at XLDB Asia
Jubatus Invited Talk at XLDB AsiaJubatus Invited Talk at XLDB Asia
Jubatus Invited Talk at XLDB AsiaPreferred Networks
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 
Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Big Data Joe™ Rossi
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsDataWorks Summit
 

Ähnlich wie Hadoop at Rakuten, 2011/07/06 (20)

Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confHadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA conf
 
High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)
 
Hadoop description
Hadoop descriptionHadoop description
Hadoop description
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
20140708hcj
20140708hcj20140708hcj
20140708hcj
 
NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real Time
 
Log everything!
Log everything!Log everything!
Log everything!
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
 
Feb 2013 HUG: HIT (Hadoop Integration Testing) for Automated Certification an...
Feb 2013 HUG: HIT (Hadoop Integration Testing) for Automated Certification an...Feb 2013 HUG: HIT (Hadoop Integration Testing) for Automated Certification an...
Feb 2013 HUG: HIT (Hadoop Integration Testing) for Automated Certification an...
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Jubatus Invited Talk at XLDB Asia
Jubatus Invited Talk at XLDB AsiaJubatus Invited Talk at XLDB Asia
Jubatus Invited Talk at XLDB Asia
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 

Mehr von Rakuten Group, Inc.

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話Rakuten Group, Inc.
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のりRakuten Group, Inc.
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Rakuten Group, Inc.
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みRakuten Group, Inc.
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開Rakuten Group, Inc.
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用Rakuten Group, Inc.
 
楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャーRakuten Group, Inc.
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割Rakuten Group, Inc.
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Group, Inc.
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfRakuten Group, Inc.
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfRakuten Group, Inc.
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfRakuten Group, Inc.
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfRakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technologyRakuten Group, Inc.
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情Rakuten Group, Inc.
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャーRakuten Group, Inc.
 

Mehr von Rakuten Group, Inc. (20)

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり
 
What Makes Software Green?
What Makes Software Green?What Makes Software Green?
What Makes Software Green?
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組み
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用
 
楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdf
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdf
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdf
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdf
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdf
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
OWASPTop10_Introduction
OWASPTop10_IntroductionOWASPTop10_Introduction
OWASPTop10_Introduction
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technology
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー
 

Kürzlich hochgeladen

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 

Kürzlich hochgeladen (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Hadoop at Rakuten, 2011/07/06

  • 1. Hadoop at Rakuten. Rakuten Inc. Architect Group Hamba Mitsuharu & Nakagawa Gen 2011/07/06(Wed) 1
  • 2. Hadoop at Rakuten. Today’s Agenda. 1. Our Profie. 2. What is Hadoop? 3. Our Current Hadoop System Overview. 4. Our Hadoop Usage. 5. Our Challenge. 6. Our Future Plan. 2
  • 4. Our Profile. From ACT Group Nakagawa Gen Hamba Mitsuharu 4
  • 5. Our Profile. Our Mission Enhancing Hadoop at Rakuten. 5
  • 6. Our Profile. Latest Our Tasks. Done. 1.Implementing Ganglia. 2.Implementing HA. 6
  • 7. Our Profile. Latest Our Tasks. Now Handing Over. 1. Keeping Up Our Hadoop Cluster. 2. Modifying Our Hadoop Configurations. 3. Implementing Scripts for Daily Chores. 7
  • 8. Our Profile. Latest Our Tasks. Concentrate It! 1. Evaluating The Related Products. 8
  • 9. Hadoop at Rakuten. What is Hadoop? 9
  • 10. What is Hadoop? One of The Most Powerful Distributed Processing for Large Data Sets. 10
  • 13. What is Hadoop? HDFS & MapReduce Constitute Hadoop. HDFS : Hadoop Distributed File System. MapReduce : Map & Reduce (Includes Shuffle & Sort) . 13
  • 14. What is Hadoop? Input from HDFS. Process by MapReduce. Output to HDFS. Source : http://horicky.blogspot.com/2008_11_01_archive.html 14
  • 15. What is Hadoop? Simple Example. Source : http://techblog.yahoo.co.jp/cat207/cat209/hadoop/ 15
  • 16. What is Hadoop? In Common Case, Combine Several Simple Jobs. Source : http://horicky.blogspot.com/2008_11_01_archive.html 16
  • 17. What is Hadoop? NameNode & DataNode Constitute HDFS. Source : http://horicky.blogspot.com/2008_11_01_archive.html 17
  • 18. What is Hadoop? Read & Write on HDFS. Source : http://hadoop.apache.org/common/docs/current/hdfs_design.html#NameNode+and+DataNodes 18
  • 19. What is Hadoop? JobTracker & TaskTracker Constitute MapReduce. Source : http://horicky.blogspot.com/2008_11_01_archive.html 19
  • 20. What is Hadoop? Good & Bad Points of Hadoop. Good! Easy to Scale Out System. Easy to Implement Distributed Processing. Bad… There is SPoF at NameNode. 20
  • 21. Hadoop at Rakuten. Our Current Hadoop System Overview. 21
  • 22. Our Current Hadoop System Overview. The Cluster Infrastructure. #1 For Instance. Source : http://www.ibm.com/developerworks/linux/library/l-hadoop/ 22
  • 23. Our Current Hadoop System Overview. The Cluster Infrastructure. #2 Client In Our Case. Switch 1Gbps Switch Switch Switch 1Gbps 1Gbps 1Gbps NN&JT x3 SNN x3 NN&JT x3 Active Standby DN&TT DN&TT DN&TT x10 x10 x10 x10 x10 x10 x18 3 Masters & 69 Slaves. x18 x18 Others DN&TT DN&TT Others DN&TT DN&TT Others DN&TT DN&TT Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack 23
  • 24. Our Current Hadoop System Overview. The Monitoring System. Using Ganglia (& MRTG). Every Time We Easily Can Check The Resource Usage, Not Only Each Machine But As Cluster. 24
  • 25. Our Current Hadoop System Overview. High Availability. Client Using DRBD & HeartBeat. NN : NameNode JT : JobTracker v-host.rakuten.co.jp Active Standby eth1 eth1 NN JT NN JT /foo/drbd0 /foo/drbd1 /foo/drbd0 /foo/drbd1 eth0 eth0 Source : Gen DRBD Sync The Change. 25
  • 26. Hadoop at Rakuten. Our Hadoop Usage. 26
  • 27. Our Hadoop Usage. Who Is Using Our Hadoop. 1. Generating Recommend Engine Index. 2. Analyzing Redirect Log. 3. Calculating AD Targeting Index. 4. Measuring AD Effects. 5. Analyzing Ichiba Merchandise & Order Info. 6. Calculating Ichiba Product Ranking. 7. Analyzing Search Log. 8. Analyzing Rakuten Travel’s Access Log. (Coming Soon...) 9. Analyzing Search Word N-gram. (Coming Soon...) 27
  • 28. Our Hadoop Usage. The Issues of The Previous System. 1. Need High Cost to Keep Up The RDBMS. 2. Need Quite a Lot of Storage Space More & More. 3. System Cannot Handle So Many Job Request Due to Low Performance. Batch Server Purchase Marketing Manipulate Shop Intermediate Utility Unload Load File File File File File File Category Intermediate NFS ITEM Previous System Mail 28
  • 29. Our Hadoop Usage. The Effect of The New System. 1. Get Scalable System at Very Low Cost. (80% OFF as Storage.) 2. Transaction Time is Dramatically Improved. (50-75% OFF.) Batch Server with Purchase Marketing Manipulate Shop Utility Unload Load File File File File File File Category NFS Intermediate ITEM Mail New System! 1st Step. 29
  • 30. Our Hadoop Usage. The Remaining Subject of The New System. 1. Still Halfway to Aiming DWH. 2. The Negative Influence Due to The Migration from Occupied Environment to Shared Environment. 1. Security. 2. Sharing Cluster Resource. 30
  • 31. Hadoop at Rakuten. Our Challenge. 31
  • 32. Our Challenge. The Issues with Our Hadoop. 1. Likely to Use Up The HDFS Space. 2. Need Much Electlicity Power. 3. Share The Cluster Resource Efficiently. 4. Need More Network Bandwidth. 32
  • 33. Hadoop at Rakuten. Our Future Plan. 33
  • 34. Our Future Plan. Considering New Slave Machine. Now Looking for a Machine Which has… Low Electric Power Consumption, About 6 Cores CPU x2, About 10TB HDD, About 96GB Memory, & Naturally Compatible With Our Data Center. ? 34
  • 35. Our Future Plan. Upgrade from Apache to CDH3. Mr.Eric Sammer (Solution Architect at Cloudera) Described the Advantage of Hadoop from Cloudera on Quora. 1. A version of Hadoop that has frequent releases (quarterly) that include bug fixes and back ported features (append for HBase, Kerberos security from Y!, etc.). 2. Related projects (Hive, Pig, Oozie, HBase, Flume, Sqoop, etc.) tested together and work as a cohesive system. 3. Simplified installation via Yum / Apt repositories. 4. Tighter integration with the OS (init scripts for daemons, installation of things in common paths, logs in their proper location.). 5. A fixed release schedule. 6. Support available from Cloudera with SLAs. Source : http://www.quora.com/What-are-the-advantages-of-getting-Apache-Hadoop-from-Cloudera-rather-than-the-Apache-Software-Foundation] 35
  • 36. Our Future Plan. Evaluating HBase Using AWS. Constructing HBase Cluster on Amazon EC2. Doing Evaluation & Verification This Summer! 36
  • 37. Hadoop at Rakuten. Need Your Help! We Need Hadooper Much More! Come With Us! 37