Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Data analytics with hadoop hive on multiple data centers

5.096 Aufrufe

Veröffentlicht am

Veröffentlicht in: Technologie, Unterhaltung & Humor
  • Als Erste(r) kommentieren

Data analytics with hadoop hive on multiple data centers

  1. 1. Data Analytics withHadoop/Hive onMultiple Data Centers. Hirotaka Niisato GMO Internet, Inc.
  2. 2. about myself● Hirotaka Niisato(@hirotakaster)● Programmer● GMO Internet, SIProp Project● Work Robotics Kinect Android Networking MAKE: Solr Volunteer ...
  3. 3. Data Analytics System● KPI reporting system for Cloud System● GMO Apps Cloud● Over 500 Titles mobage, gree, mixi, Hangame, facebook, nikoniko … etc● Data Center Japan, US(west coast)
  4. 4. Analytics Specification● Social Game Data KPI DAU/PV, Play Time, Sales A/B Testing, Conversion … etc● Hourly, Daily, Weekly, Monthly● Since 2010/06 ~
  5. 5. System Architecture SNS Game User SNS Platform MasterCloud System Management Monitoring System System Cloud Server (Game Server) Logging Scheduler ・・・・・・・・ Server MySQL Hadoop/Hive (for Hive) Data Center A Data Center N
  6. 6. Specification, Statistics● Multiple NameNode per Data Center● Hardware Spacification CPU : 8~16CPU(HT) MEM: 12~64Gbyte HD : RAID 1, 5, 1+0● Statistics 6,000,000 blocks/44,000 jobs/day 1,000 over AP servers logging
  7. 7. Data Flowload data local inpath hogehoge-access_log.*.log.gzoverwrite into table original_logspartition (log_date=2012-07-26, log_number=13);host string from deserializeridentity string from deserializeruser string from deserializer Cloud Servertime string from deserializer (Game Server)method string from deserializerrequest string from deserializerstatus string from deserializer Loggingsize string from deserializer Management Server Systemreferer string from deserializeragent string from deserializerlog_date stringlog_number tinyint Hadoop/Hive Schedulerhost stringtime stringmethod string HiveDriverrequest stringuserid stringlog_date string Filter → Hourly, Daily, Weekly, Monthly Reportlog_number tinyint (AB Testing, Conversion, DAU..etc)
  8. 8. Conversion Count HQLINSERT OVERWRITE TABLE conversion_click PARTITION (log_date= :logDate, log_number=:logNumber) SELECT regexp_extract(request, convid=([a-zA-Z0-9%]), 1), regexp_extract(request, convflg=(A|B){1}, 1), count(1), :logMonth, :logWeek FROM parsed_log WHERE request RLIKE convid=[a-zA-Z0-9%] AND request RLIKE convflg=(A|B){1} AND log_date = :logDate AND log_number = :logNumber GROUP BY regexp_extract(request, convid=([a-zA-Z0-9%]), 1), regexp_extract(request, convflg=(A|B){1}, 1)
  9. 9. Monitoring/Management(Zabbix)
  10. 10. Memory Management● Namenode Memory File, Block, Directory● Hadoop Archive● Server Memory
  11. 11. Trouble● Re-Analytics● Backup and Recovery● NameNode HA● Hive vs MapReduce
  12. 12. Thank you

×