Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

The Data Platform Administration Handling the 100 PB.pdf

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
What Makes Software Green?
What Makes Software Green?
Wird geladen in …3
×

Hier ansehen

1 von 16 Anzeige
Anzeige

Weitere Verwandte Inhalte

Weitere von Rakuten Group, Inc. (20)

Anzeige

The Data Platform Administration Handling the 100 PB.pdf

  1. 1. The Data Platform Administration Handling the 100 PB May 19th, 2022 Yongduck Lee Cloud Platform Department Rakuten Group, Inc.
  2. 2. 2 About me Lecture History - Colloquium Lecturer at KAIST Program Committee - BigComp2017/2019 - EDB 2016 Certification - Certified Scrum Master (CSM) - Certified Project Management Professional (PMP #1255421) … ETC Lee Yongduck Daniel A Vice Section Manager and Senior Architect at Data Storage and Processing Section in Rakuten Group, Inc. Started as Recommendation Engine Developer and now is focusing on researching and verifying new Big Data Technology and how to support users who want to use Big Data System. B.Sc in Korea University in 2001. 21 years in Japan and have been worked for many organization and company such as NHK, NTTD and Rakuten Group, Inc.
  3. 3. 3 CONTENTS 1. Global Internet & Data Explosion 2. Data in Rakuten 3. Data platform & Big Data Administrator in Rakuten 4. What Advantages as Engineer in Rakuten
  4. 4. 4 Internet & Globalization The Internet is the global system of interconnected computer networks that use the Internet protocol suite (TCP/IP) to link devices worldwide. It is a network of networks that consists of private, public, academic, business, and government networks of local to global scope, linked by a broad array of electronic, wireless, and optical networking technologies G C Vast Unstructured 80% Structured 20% 35.2 ZB in 2020 The origins of the Internet date back to research commissioned by the federal government of the United States in the 1960s to build robust, fault- tolerant communication with computer networks. https://en.wikipedia.org/wiki/Internet#World_Wide_Web * From IDC white paper & EMC hances Lobalization Information Structure Volume
  5. 5. 5 Internet Users Internet users are defined as persons who accessed the Internet in the last 12 months from any device, including mobile phones. https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users#cite_note-UN_WPP-14
  6. 6. 6 Internet Users https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users#cite_note-UN_WPP-14 In Japan 92.3% are using Internet ( Population 127,202,192 / Internet Users 117,400,000 ) At 2018
  7. 7. 7
  8. 8. 8
  9. 9. 9 The Big Data in Rakuten There are huge potential value and possibilities due to Diversity of Service and Users not only from Japan but also Global. It is very interesting and ideal environment for Data Scientiest and Data Analyst. Increase synergy effect on personalization, clustering, segmentation, etc. by combining data from various services. The large volume of data every day, every month, and every year from services and users. It is a big challenge to store data and make it easy to utilize for data users as System Infrastructure Engineer and Data Engineer. Diversity and Synergy Scale
  10. 10. 10 Rakuten Hadoop and Kafka Supporting near-realtime & streaming processing in each region. Handling data totally around 1.3 Million Message/sec ( 10 GB/sec IN/OUT) around peak time at normal date. At 2021 Super Sale, we handled more than 2.5 times messages and traffics. Supporting Data Lake, Data Mart, and Data Analysis for Rakuten Service in each region. Lots of value mining from big data are being done by data scientist and contributing on Rakuten Service. Kafka: 800 Core, 20TB Mem, 4728 Topics Hadoop : 80K Core, 600 TB Mem, 160K TB Disk
  11. 11. 11 The Challenge on Administration
  12. 12. 12 The Big Data in Rakuten Platform/Middleware Administrator Users Project/Product Manager Big Data Platform Administrator Infra/Server Administrator Network Administrator Software/System Architect Software Developer
  13. 13. 13 Administration Use CASE (HBase) User reported performance issues on HBase but there were no issues or report from other users who are using other component on Hadoop. Confirm Way to get/put data on HBase • HBase Configuration Architecture, Work/Dataflow. Application/GC Logs • Dependency Component (*HDFS) READ/Write Performance Logs Application/GC Logs • DISK/Mem/CPU Load • Kernel Log • Network Connection Date & Time Matching Data Hot Spotting. Data or Configuration Caching HDFS JVM Config change Increasing Handler Increasing Scanner Interval HW Improvement Master Node Replacement Reduced RegionServers Move HDD to NVMe Dedicated RegionServers OS Configuration Root noprocs, nofiles increasing on Dedicated RS HBASE TCPNoDelay, Parallel Seeking , Master Table Locality WRITE/Short-READ/Long-READ Queue DEADLINE Scheduler, Hedged Reads, Short Circuit READ
  14. 14. 14 What Advantages in Rakuten as Data Engineer You can go through all necessary domains of Big Data Platform to get rich experience for Big Data Platform Administrators. Rakuten has experts who have rich knowledges and experiences on each technical and management domain.
  15. 15. 15 What Advantages in Rakuten as Data Engineer You can also work with various stakeholders from various service domain, from the point of data utilization. DB Services Event INFRA …

×