Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

IEEE International Conference on Data Engineering 2015

193 Aufrufe

Veröffentlicht am

Hadoop DW SK Telecom Usecase

Veröffentlicht in: Ingenieurwesen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

IEEE International Conference on Data Engineering 2015

  1. 1. SKT Hadoop DW SK telecom! Corporate R&D Center
 Yousun Jeong
  2. 2. Copyright@ 2015 by SK Telecom All rights reserved. 1. Big Data in SKT 2. What is Hadoop DW ? 3. SQL on Hadoop TAJO 4. Hadoop DW Commercialization Cases Table of Contents 2
  3. 3. Copyright@ 2015 by SK Telecom All rights reserved. High TCO for Data Management 250TB/day (91.25PB/year) 4 Hadoop clusters with various 
 commercial MPP databases for analytics Operational
 Systems Integration 
 Layer Data Warehouse Marts Marketing Sales ERP SCM ODS Staging
 Area Staging
 Area Mart A Mart B Mart C Mart D Hadoop+Hive MPP DBMS High TCO for Data Management
 (Too much data is loaded into MPP DBMS) One Unified Solution 30PB+ (compressed) on 1000+ nodes 10+ Hadoop clusters with Tajo & Spark 
 for all purposes Operational
 Systems Integration 
 Layer Data Warehouse Marts Marketing Sales ERP SCM ODS Staging
 Area Staging
 Area Mart A Mart B Mart C Mart D Hadoop+Tajo+Spark Affordable & Faster
 (Unified framework for Big Data) 1. Big Data in SKT 3
  4. 4. Copyright@ 2015 by SK Telecom All rights reserved. ✓ Optimized configuration of a large-scale cluster ✓ Operation know-how of managing 1000+ nodes ✓ Fault tolerant and effective resource management system Data Collector Data Collect & pre-processing Main Cluster Analysis R&D Cluster ~250 TB/day (700+ node) Service Logic Repository (200+ Node) (100+ node) Service Cluster (150+ node) App. 1 … App. N T-Hadoop Data Feeding Data Feeding Commercialize Develop. 1. Big Data in SKT SKT Hadoop Clusters 4
  5. 5. Copyright@ 2015 by SK Telecom All rights reserved. “Hadoop S/W and Commodity H/W! Based Cost-effective IT Infrastructure System” 【 Hadoop DW Infrastructure】 “High-price, High-performance! Proprietary IT Infrastructure System” 【 Legacy IT Infrastructure 】 ※ MPP Massively Parallel Processing, SAN Storage Area Network, NAS Network Attached Storage, RDBMS Relational DB Management System, ! SQL Structured Query Language 2. What is Hadoop DW ? Structured/Un-structured Data! Scale-out Structure (Petabyte, Exabyte) Low price
 ($200 ~ $1,000 / TB) Data Cost Structured Data! Scale-up Structure (Terabyte) High price! ($5,000~$50,000 / TB) Commodity H/W (x86 Server)H/W High Performance H/W! (MPP, Fabric Switch, etc.) Hadoop Architecture SQL on Hadoop S/W Proprietary S/W
 (RDBMS, etc.) Transaction/Batch Processing! (SQL) Hadoop File System The Hadoop DW provides a Hadoop Architecture based Data Warehouse from an Enterprise environment so the user can accommodate the massive amount of increasing data at a low cost. Solution SKT Hadoop DW 5
  6. 6. Copyright@ 2015 by SK Telecom All rights reserved. Tajo - Fully Distributed - Vector process HDFS Hadoop Cluster + Tajo [ Legacy Approach (MR) ] [Tajo Approach ] Process more data
 on same clusters
 with improved
 processing speed Response
 Speed Hadoop Cluster Query Hadoop Cluster Query Up to 
 10x min few 
 sec~min + Tajo Try more queries
 for analysis 
 with improved! response speed Hive MapReduce - Partially Distributed - Sequential process HDFS Hadoop Cluster Processing
 Speed High-speed SQL-on-Hadoop processing engine • 3~5x improvement in processing speed to Hive under TPC-H procedure • 80~100% response speed to Impala without data size limit • Full ANSI-SQL support for easy RDBMS migration 3. SQL on Hadoop - TAJO 6
  7. 7. Copyright@ 2015 by SK Telecom All rights reserved. 7 3. SQL on Hadoop - TAJO SQL Support ▪ ANSI SQL support ▪ Partition Type ▪ Meta Store Service Stability ▪ High Availability ▪ Resource Manager ▪ Fair Scheduler Performance ▪ High-speed processing ▪ Shuffling ▪ Dynamic Query Optimizer ▪ Query Rewriting System Integration ▪ BI Connector ▪ Proxy Support ▪ Tajo-R Function Support ▪ Analytic Function ▪ Hive Function [ Tajo Features ] [ Performance Comparison ] [ Apache Top-Level Project ]
  8. 8. Copyright@ 2015 by SK Telecom All rights reserved. Worker! 8 3.1 Tajo Architecture 1. Query Master! 2. TaskRunner Tajo Master! Persistent Storage! !!! Derby Store! MySQL Store! Postgre SQL Store! Logical Planner! Logical Optimizer! Resource Manager! SQL Parser! ! Query Rewriter! Query Manager! Tajo CatalogHCatalog Client Service Handler! JDBC ! Driver Tajo! CLI! Tajo! CLI! Worker! Query Master! !!!!!!!! Global 
 Planner! Client Service Handler! !!!!!!! Local Query Engine! Storage Manager! Local HDFS/Hbase S3 / swift ODBC ! Driver
  9. 9. Copyright@ 2015 by SK Telecom All rights reserved. 9 3.1 Technical Characteristic - Logical Flow Data Processing Tajo Master! ! ! ! ! ! ! ! ! SQL Parser Logical/Global Planner Resource Manager Query Parsing Decomposition of a work unit Work units delivered to the server Tajo Worker! Tajo Worker! Tajo Worker! Tajo Worker! Tajo Worker! ! ! ! ! ! ! ! Physical Planner Query Engine Storage Manager Decomposing the! task operation unit Unit operation Disk data I/O control
  10. 10. Copyright@ 2015 by SK Telecom All rights reserved. 10 3.1 Technical Characteristic - JIT Query Engine Implemented as a binary to 
 consider the number of all cases
 -> performance degradation
 (call, if, switch below 50%) switch(operand)! Case numeric : add numeric! Case string : add string! real-time code generation 
 based on operand type
 combined operation can be 
 processed by the compiler optimization Four functions in a 
 single operation(+2,-1,*1) <Existing methods> <JIT methods> Behavior depends on the operand characteristic! ! - 1 + 2 = 3! - “a” + “b” = “ab”! - {1,2} + {3,4} = {4,6}! - 1 + {1,2} = {2,3} Result = A x (1-B) + (1+C) + x - + A A A A A +
  11. 11. Copyright@ 2015 by SK Telecom All rights reserved. 11 3.1 Technical Characteristic -Vectorized Query Engine <Tuple at a time> <Vectorized engine> - DB! - 1 operation/record - Vectorized data! - 1 operation/vector A[] = {a1, a2, a3, a4, a5, a6}! B[] = {b1, b2, b3, b4, b5, b6}! ! C[] = A[] + B[] a1 a2 a3 a5 a4 a6 b1 b2 b3 b5 b4 b6 + + + + + + a1 a2 a3 a5 a4 a6 + b1 b2 b3 b5 b4 b6
  12. 12. Copyright@ 2015 by SK Telecom All rights reserved. 12 3.1 Technical Characteristic -Storage Manager Tajo Worker! Tajo Worker! Tajo Worker(scan)! Storage Manager! ! ! ! ! ! ! ! ! ! Disk Scanner! ! Pre-fetching Buffer! Disk Scanner! Disk Scanner! Request queue! ! ! ! ! Request queue! Request queue! Scan ! Scheduler Bulk Read Fine granularity File
 request
  13. 13. Copyright@ 2015 by SK Telecom All rights reserved. 13 Business Challenge How SKT Hadoop DW Helped [ SK Telecom ] • Explosion of log data with LTE service • Increase in types of data to be analyzed • Insufficient DW capacity due to high cost ✓ 3x storage expansion under same price, 
 or 80% reduction in unit price ✓ Enabled Ad-hoc analysis of unstructured text data sets for daily ✓ Hadoop DW could decrease contents-based analysis process time from few hours to 20 minutes max. 4. Hadoop DW Commercialization Cases Telco Category MPP DBMS Hadoop DW Raw Data Size 0.5 TB/Day 4 TB/Day Total ETL Time Average of 3 hours Average of 6 hours DW Creation ! 30 minutes 40 minutes Mart Creation 1 hour 1 hour 40 minutes Report Creation 1 hour 30 minutes 2 hours 4 minutes
  14. 14. Copyright@ 2015 by SK Telecom All rights reserved. 14 Business Challenge [ Global Top-5 Semiconductor Player ] • Collect immense amount of unstructured measurement data while manufacturing • RDMBS & BI are incapable for such data type • Even data loading can take up to 20 min How SKT Hadoop DW Helped ✓ Support for unstructured data through variable column schema ✓ 100x increase in data processing capacity ✓ Decreased data loading time by 10x (2 min) ✓ Minimized user action for pivot/unpivot 4. Hadoop DW Commercialization Cases Manufacturer
  15. 15. Copyright@ 2015 by SK Telecom All rights reserved. Thank you.

×