SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Data Analytics with
Hadoop/Hive on
Multiple Data Centers.

               Hirotaka Niisato
               GMO Internet, Inc.
about myself
●
    Hirotaka Niisato(@hirotakaster)
●
    Programmer
●
    GMO Internet, SIProp Project
●
    Work
    Robotics Kinect Android Networking MAKE: Solr Volunteer ...
Data Analytics System
●
    KPI reporting system for Cloud System
●
    GMO Apps Cloud
●
    Over 500 Titles
    mobage, gree, mixi, Hangame, facebook, nikoniko … etc
●
    Data Center
    Japan, US(west coast)
Analytics Specification
●
    Social Game Data KPI
    DAU/PV, Play Time, Sales
    A/B Testing, Conversion … etc


●
    Hourly, Daily, Weekly, Monthly


●
    Since 2010/06 ~
System Architecture
  SNS                                                         Game
  User                            SNS Platform                Master




Cloud System                                     Management   Monitoring
                                                   System      System


            Cloud Server
           (Game Server)



      Logging
                    Scheduler          ・・・・・・・・
       Server


                      MySQL
    Hadoop/Hive
                     (for Hive)

         Data Center A                                   Data Center N
Specification, Statistics
●
    Multiple NameNode per Data Center
●
    Hardware Spacification
    CPU : 8~16CPU(HT)
    MEM: 12~64Gbyte
    HD : RAID 1, 5, 1+0
●
    Statistics
    6,000,000 blocks/44,000 jobs/day
    1,000 over AP servers logging
Data Flow
load data local inpath 'hogehoge-access_log.*.log.gz'
overwrite into table original_logs
partition (log_date='2012-07-26', log_number=13);

host      string from deserializer
identity   string from deserializer
user       string from deserializer               Cloud Server
time      string from deserializer               (Game Server)
method     string from deserializer
request    string from deserializer
status    string from deserializer                  Logging
size      string from deserializer                                            Management
                                                     Server                     System
referer   string from deserializer
agent      string from deserializer
log_date        string
log_number      tinyint
                                          Hadoop/Hive         Scheduler
host     string
time     string
method   string                                                  HiveDriver
request  string
userid   string
log_date     string                          Filter → Hourly, Daily, Weekly, Monthly Report
log_number tinyint                           (AB Testing, Conversion, DAU..etc)
Conversion Count HQL
INSERT OVERWRITE TABLE conversion_click
 PARTITION (log_date= :logDate, log_number=:logNumber)
   SELECT regexp_extract(request, 'convid=([a-zA-Z0-9%])', 1),
             regexp_extract(request, 'convflg=(A|B){1}', 1),
             count(1),
             :logMonth,
             :logWeek
     FROM parsed_log
   WHERE request RLIKE 'convid=[a-zA-Z0-9%]'
      AND request RLIKE 'convflg=(A|B){1}'
      AND log_date = :logDate
      AND log_number = :logNumber
 GROUP BY regexp_extract(request, 'convid=([a-zA-Z0-9%])', 1),
           regexp_extract(request, 'convflg=(A|B){1}', 1)
Monitoring/Management(Zabbix)
Memory Management
●
    Namenode Memory
    File, Block, Directory



●
    Hadoop Archive


●
    Server Memory
Trouble
●
    Re-Analytics
●
    Backup and Recovery
●
    NameNode HA
●
    Hive vs MapReduce
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 

Was ist angesagt? (19)

Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysConexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
 
Daniel Sikar: Hadoop MapReduce - 06/09/2010
Daniel Sikar: Hadoop MapReduce - 06/09/2010 Daniel Sikar: Hadoop MapReduce - 06/09/2010
Daniel Sikar: Hadoop MapReduce - 06/09/2010
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 
RethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime webRethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime web
 
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In Ruby
 
Storing metrics at scale with Gnocchi
Storing metrics at scale with GnocchiStoring metrics at scale with Gnocchi
Storing metrics at scale with Gnocchi
 
Paul Dix (Founder InfluxDB) - Organising Metrics at #DOXLON
Paul Dix (Founder InfluxDB) - Organising Metrics at #DOXLONPaul Dix (Founder InfluxDB) - Organising Metrics at #DOXLON
Paul Dix (Founder InfluxDB) - Organising Metrics at #DOXLON
 
InfluxDB & Grafana
InfluxDB & GrafanaInfluxDB & Grafana
InfluxDB & Grafana
 
Time series database, InfluxDB & PHP
Time series database, InfluxDB & PHPTime series database, InfluxDB & PHP
Time series database, InfluxDB & PHP
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20
 
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
 
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
 
Supercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
Supercharge your Analytics with ClickHouse, v.2. By Vadim TkachenkoSupercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
Supercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
 
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyBuilding real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case study
 
Scalable real-time processing techniques
Scalable real-time processing techniquesScalable real-time processing techniques
Scalable real-time processing techniques
 
InfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
InfluxDB 1.0 - Optimizing InfluxDB by Sam DillardInfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
InfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
 
Server side geo_tools_in_drupal_pnw_2012
Server side geo_tools_in_drupal_pnw_2012Server side geo_tools_in_drupal_pnw_2012
Server side geo_tools_in_drupal_pnw_2012
 

Andere mochten auch

20120830 DBリファクタリング読書会第三回
20120830 DBリファクタリング読書会第三回20120830 DBリファクタリング読書会第三回
20120830 DBリファクタリング読書会第三回
都元ダイスケ Miyamoto
 
Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012
Hortonworks
 
【17-E-3】 オンライン機械学習で実現する大規模データ処理
【17-E-3】 オンライン機械学習で実現する大規模データ処理【17-E-3】 オンライン機械学習で実現する大規模データ処理
【17-E-3】 オンライン機械学習で実現する大規模データ処理
Developers Summit
 
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterHadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Bill Graham
 
SQLチューニング入門 入門編
SQLチューニング入門 入門編SQLチューニング入門 入門編
SQLチューニング入門 入門編
Miki Shimogai
 

Andere mochten auch (16)

20120830 DBリファクタリング読書会第三回
20120830 DBリファクタリング読書会第三回20120830 DBリファクタリング読書会第三回
20120830 DBリファクタリング読書会第三回
 
PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)
PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)
PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)
 
Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012
 
Cloudera Manager4.0とNameNode-HAセミナー資料
Cloudera Manager4.0とNameNode-HAセミナー資料Cloudera Manager4.0とNameNode-HAセミナー資料
Cloudera Manager4.0とNameNode-HAセミナー資料
 
Database smells
Database smellsDatabase smells
Database smells
 
【17-E-3】 オンライン機械学習で実現する大規模データ処理
【17-E-3】 オンライン機械学習で実現する大規模データ処理【17-E-3】 オンライン機械学習で実現する大規模データ処理
【17-E-3】 オンライン機械学習で実現する大規模データ処理
 
Lars George HBase Seminar with O'REILLY Oct.12 2012
Lars George HBase Seminar with O'REILLY Oct.12 2012Lars George HBase Seminar with O'REILLY Oct.12 2012
Lars George HBase Seminar with O'REILLY Oct.12 2012
 
Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012
 
並列データベースシステムの概念と原理
並列データベースシステムの概念と原理並列データベースシステムの概念と原理
並列データベースシステムの概念と原理
 
あなたの知らないPostgreSQL監視の世界
あなたの知らないPostgreSQL監視の世界あなたの知らないPostgreSQL監視の世界
あなたの知らないPostgreSQL監視の世界
 
【SQLインジェクション対策】徳丸先生に怒られない、動的SQLの安全な組み立て方
【SQLインジェクション対策】徳丸先生に怒られない、動的SQLの安全な組み立て方【SQLインジェクション対策】徳丸先生に怒られない、動的SQLの安全な組み立て方
【SQLインジェクション対策】徳丸先生に怒られない、動的SQLの安全な組み立て方
 
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterHadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
 
SQLチューニング入門 入門編
SQLチューニング入門 入門編SQLチューニング入門 入門編
SQLチューニング入門 入門編
 
Datalogからsqlへの トランスレータを書いた話
Datalogからsqlへの トランスレータを書いた話Datalogからsqlへの トランスレータを書いた話
Datalogからsqlへの トランスレータを書いた話
 
ならば(その弐)
ならば(その弐)ならば(その弐)
ならば(その弐)
 
PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~
PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~
PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~
 

Ähnlich wie Data analytics with hadoop hive on multiple data centers

Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & Zing
Long Dao
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zing
zingopen
 
Hadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game ForeverHadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game Forever
DataWorks Summit
 

Ähnlich wie Data analytics with hadoop hive on multiple data centers (20)

Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & Zing
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...
Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...
Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...
 
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
 
Building a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache DruidBuilding a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache Druid
 
Transforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big DataTransforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big Data
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zing
 
Siddhi - cloud-native stream processor
Siddhi - cloud-native stream processorSiddhi - cloud-native stream processor
Siddhi - cloud-native stream processor
 
Implementing Real-Time IoT Stream Processing in Azure
Implementing Real-Time IoT Stream Processing in Azure Implementing Real-Time IoT Stream Processing in Azure
Implementing Real-Time IoT Stream Processing in Azure
 
Hadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game ForeverHadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game Forever
 
WSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsWSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needs
 
CloudWatch hidden features for debugging serverless application
CloudWatch hidden features for debugging serverless applicationCloudWatch hidden features for debugging serverless application
CloudWatch hidden features for debugging serverless application
 
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
 
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]
 
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and MoreWSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
 
Game Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupGame Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid Meetup
 
Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 

Mehr von Hirotaka Niisato

How to MAKE HVC-C Protyping Application
How to MAKE HVC-C Protyping ApplicationHow to MAKE HVC-C Protyping Application
How to MAKE HVC-C Protyping Application
Hirotaka Niisato
 
Maker Faire Taipei 2014 workshop
Maker Faire Taipei 2014 workshopMaker Faire Taipei 2014 workshop
Maker Faire Taipei 2014 workshop
Hirotaka Niisato
 
android bazaar and conference 2014 spring
android bazaar and conference 2014 springandroid bazaar and conference 2014 spring
android bazaar and conference 2014 spring
Hirotaka Niisato
 
国内外のMaker faireに参加してみて
国内外のMaker faireに参加してみて国内外のMaker faireに参加してみて
国内外のMaker faireに参加してみて
Hirotaka Niisato
 
Interactive Application using Kinect and Android
Interactive Application using Kinect and AndroidInteractive Application using Kinect and Android
Interactive Application using Kinect and Android
Hirotaka Niisato
 
Android and OpenNI - NUI Application Treasure Hunter Robot
Android and OpenNI - NUI Application   Treasure Hunter RobotAndroid and OpenNI - NUI Application   Treasure Hunter Robot
Android and OpenNI - NUI Application Treasure Hunter Robot
Hirotaka Niisato
 
RandomSortFieldとMahoutのCtr比較について
RandomSortFieldとMahoutのCtr比較についてRandomSortFieldとMahoutのCtr比較について
RandomSortFieldとMahoutのCtr比較について
Hirotaka Niisato
 

Mehr von Hirotaka Niisato (20)

ジャンクスピーカーの再利用〜量子へと Maker Faire Tokyo 2021
ジャンクスピーカーの再利用〜量子へと Maker Faire Tokyo 2021 ジャンクスピーカーの再利用〜量子へと Maker Faire Tokyo 2021
ジャンクスピーカーの再利用〜量子へと Maker Faire Tokyo 2021
 
Manabiya session
Manabiya sessionManabiya session
Manabiya session
 
品テク meetup-vol.10
品テク meetup-vol.10品テク meetup-vol.10
品テク meetup-vol.10
 
LINE dev meetup
LINE dev meetupLINE dev meetup
LINE dev meetup
 
Developer Summit 2017
Developer Summit 2017Developer Summit 2017
Developer Summit 2017
 
ポスト君とIoTとline bot
ポスト君とIoTとline botポスト君とIoTとline bot
ポスト君とIoTとline bot
 
WebとIoTとMake
WebとIoTとMakeWebとIoTとMake
WebとIoTとMake
 
おうちハックナイト
おうちハックナイトおうちハックナイト
おうちハックナイト
 
QS Tools for Emotions and Communication
QS Tools for Emotions and CommunicationQS Tools for Emotions and Communication
QS Tools for Emotions and Communication
 
Makeでも使われる色んなセンサー
Makeでも使われる色んなセンサーMakeでも使われる色んなセンサー
Makeでも使われる色んなセンサー
 
How to MAKE HVC-C Protyping Application
How to MAKE HVC-C Protyping ApplicationHow to MAKE HVC-C Protyping Application
How to MAKE HVC-C Protyping Application
 
ネット側からの物作り
ネット側からの物作りネット側からの物作り
ネット側からの物作り
 
Maker Faire Taipei 2014 workshop
Maker Faire Taipei 2014 workshopMaker Faire Taipei 2014 workshop
Maker Faire Taipei 2014 workshop
 
android bazaar and conference 2014 spring
android bazaar and conference 2014 springandroid bazaar and conference 2014 spring
android bazaar and conference 2014 spring
 
国内外のMaker faireに参加してみて
国内外のMaker faireに参加してみて国内外のMaker faireに参加してみて
国内外のMaker faireに参加してみて
 
3 Dセンサーの活用
3 Dセンサーの活用3 Dセンサーの活用
3 Dセンサーの活用
 
Interactive Application using Kinect and Android
Interactive Application using Kinect and AndroidInteractive Application using Kinect and Android
Interactive Application using Kinect and Android
 
Android and OpenNI - NUI Application Treasure Hunter Robot
Android and OpenNI - NUI Application   Treasure Hunter RobotAndroid and OpenNI - NUI Application   Treasure Hunter Robot
Android and OpenNI - NUI Application Treasure Hunter Robot
 
Androidで出来る!! KinectとiPadを使った亀ロボ
Androidで出来る!! KinectとiPadを使った亀ロボAndroidで出来る!! KinectとiPadを使った亀ロボ
Androidで出来る!! KinectとiPadを使った亀ロボ
 
RandomSortFieldとMahoutのCtr比較について
RandomSortFieldとMahoutのCtr比較についてRandomSortFieldとMahoutのCtr比較について
RandomSortFieldとMahoutのCtr比較について
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Data analytics with hadoop hive on multiple data centers

  • 1. Data Analytics with Hadoop/Hive on Multiple Data Centers. Hirotaka Niisato GMO Internet, Inc.
  • 2. about myself ● Hirotaka Niisato(@hirotakaster) ● Programmer ● GMO Internet, SIProp Project ● Work Robotics Kinect Android Networking MAKE: Solr Volunteer ...
  • 3. Data Analytics System ● KPI reporting system for Cloud System ● GMO Apps Cloud ● Over 500 Titles mobage, gree, mixi, Hangame, facebook, nikoniko … etc ● Data Center Japan, US(west coast)
  • 4. Analytics Specification ● Social Game Data KPI DAU/PV, Play Time, Sales A/B Testing, Conversion … etc ● Hourly, Daily, Weekly, Monthly ● Since 2010/06 ~
  • 5. System Architecture SNS Game User SNS Platform Master Cloud System Management Monitoring System System Cloud Server (Game Server) Logging Scheduler ・・・・・・・・ Server MySQL Hadoop/Hive (for Hive) Data Center A Data Center N
  • 6. Specification, Statistics ● Multiple NameNode per Data Center ● Hardware Spacification CPU : 8~16CPU(HT) MEM: 12~64Gbyte HD : RAID 1, 5, 1+0 ● Statistics 6,000,000 blocks/44,000 jobs/day 1,000 over AP servers logging
  • 7. Data Flow load data local inpath 'hogehoge-access_log.*.log.gz' overwrite into table original_logs partition (log_date='2012-07-26', log_number=13); host string from deserializer identity string from deserializer user string from deserializer Cloud Server time string from deserializer (Game Server) method string from deserializer request string from deserializer status string from deserializer Logging size string from deserializer Management Server System referer string from deserializer agent string from deserializer log_date string log_number tinyint Hadoop/Hive Scheduler host string time string method string HiveDriver request string userid string log_date string Filter → Hourly, Daily, Weekly, Monthly Report log_number tinyint (AB Testing, Conversion, DAU..etc)
  • 8. Conversion Count HQL INSERT OVERWRITE TABLE conversion_click PARTITION (log_date= :logDate, log_number=:logNumber) SELECT regexp_extract(request, 'convid=([a-zA-Z0-9%])', 1), regexp_extract(request, 'convflg=(A|B){1}', 1), count(1), :logMonth, :logWeek FROM parsed_log WHERE request RLIKE 'convid=[a-zA-Z0-9%]' AND request RLIKE 'convflg=(A|B){1}' AND log_date = :logDate AND log_number = :logNumber GROUP BY regexp_extract(request, 'convid=([a-zA-Z0-9%])', 1), regexp_extract(request, 'convflg=(A|B){1}', 1)
  • 10. Memory Management ● Namenode Memory File, Block, Directory ● Hadoop Archive ● Server Memory
  • 11. Trouble ● Re-Analytics ● Backup and Recovery ● NameNode HA ● Hive vs MapReduce