SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
SKT Hadoop DW
SK telecom!
Corporate R&D Center

Yousun Jeong
Copyright@ 2015 by SK Telecom All rights reserved.
1. Big Data in SKT
2. What is Hadoop DW ?
3. SQL on Hadoop TAJO
4. Hadoop DW Commercialization Cases
Table of Contents
2
Copyright@ 2015 by SK Telecom All rights reserved.
High TCO for Data Management
250TB/day (91.25PB/year)
4 Hadoop clusters with various 

commercial MPP databases for analytics
Operational

Systems
Integration 

Layer
Data
Warehouse
Marts
Marketing
Sales
ERP
SCM
ODS
Staging

Area
Staging

Area
Mart A
Mart B
Mart C
Mart D
Hadoop+Hive MPP DBMS
High TCO for Data Management

(Too much data is loaded into MPP DBMS)
One Unified Solution
30PB+ (compressed) on 1000+ nodes
10+ Hadoop clusters with Tajo & Spark 

for all purposes
Operational

Systems
Integration 

Layer
Data
Warehouse
Marts
Marketing
Sales
ERP
SCM
ODS
Staging

Area
Staging

Area
Mart A
Mart B
Mart C
Mart D
Hadoop+Tajo+Spark
Affordable & Faster

(Unified framework for Big Data)
1. Big Data in SKT
3
Copyright@ 2015 by SK Telecom All rights reserved.
✓ Optimized configuration of a large-scale cluster
✓ Operation know-how of managing 1000+ nodes
✓ Fault tolerant and effective resource management system
Data Collector
Data Collect
& pre-processing
Main Cluster
Analysis
R&D Cluster
~250 TB/day
(700+ node)
Service
Logic
Repository
(200+ Node)
(100+ node)
Service Cluster
(150+ node)
App. 1 … App. N
T-Hadoop
Data Feeding
Data Feeding
Commercialize
Develop.
1. Big Data in SKT
SKT Hadoop Clusters
4
Copyright@ 2015 by SK Telecom All rights reserved.
“Hadoop S/W and Commodity H/W!
Based Cost-effective IT Infrastructure System”
【 Hadoop DW Infrastructure】
“High-price, High-performance!
Proprietary IT Infrastructure System”
【 Legacy IT Infrastructure 】
※ MPP Massively Parallel Processing, SAN Storage Area Network, NAS Network Attached Storage, RDBMS Relational DB Management System, !
SQL Structured Query Language
2. What is Hadoop DW ?
Structured/Un-structured Data!
Scale-out Structure (Petabyte, Exabyte)
Low price

($200 ~ $1,000 / TB)
Data
Cost
Structured Data!
Scale-up Structure (Terabyte)
High price!
($5,000~$50,000 / TB)
Commodity H/W (x86 Server)H/W
High Performance H/W!
(MPP, Fabric Switch, etc.)
Hadoop Architecture
SQL on Hadoop
S/W
Proprietary S/W

(RDBMS, etc.)
Transaction/Batch
Processing!
(SQL) Hadoop File System
The Hadoop DW provides a Hadoop Architecture based Data Warehouse from
an Enterprise environment so the user can accommodate the massive amount
of increasing data at a low cost.
Solution SKT Hadoop DW
5
Copyright@ 2015 by SK Telecom All rights reserved.
Tajo
- Fully Distributed
- Vector process
HDFS
Hadoop Cluster + Tajo
[ Legacy Approach (MR) ] [Tajo Approach ]
Process more data

on same clusters

with improved

processing speed
Response

Speed
Hadoop
Cluster
Query
Hadoop
Cluster
Query
Up to 

10x min few 

sec~min
+ Tajo
Try more queries

for analysis 

with improved!
response speed
Hive
MapReduce
- Partially Distributed
- Sequential process
HDFS
Hadoop Cluster
Processing

Speed
High-speed SQL-on-Hadoop processing engine
• 3~5x improvement in processing speed to Hive under TPC-H procedure
• 80~100% response speed to Impala without data size limit
• Full ANSI-SQL support for easy RDBMS migration
3. SQL on Hadoop - TAJO
6
Copyright@ 2015 by SK Telecom All rights reserved.
7
3. SQL on Hadoop - TAJO
SQL Support
▪ ANSI SQL support
▪ Partition Type
▪ Meta Store
Service Stability
▪ High Availability
▪ Resource Manager
▪ Fair Scheduler
Performance
▪ High-speed processing
▪ Shuffling
▪ Dynamic Query Optimizer
▪ Query Rewriting
System Integration
▪ BI Connector
▪ Proxy Support
▪ Tajo-R
Function Support
▪ Analytic Function
▪ Hive Function
[ Tajo Features ]
[ Performance Comparison ]
[ Apache Top-Level Project ]
Copyright@ 2015 by SK Telecom All rights reserved.
Worker!
8
3.1 Tajo Architecture
1. Query Master!
2. TaskRunner
Tajo Master!
Persistent Storage!
!!! Derby Store! MySQL Store!
Postgre SQL
Store!
Logical
Planner!
Logical
Optimizer!
Resource
Manager!
SQL Parser!
! Query
Rewriter!
Query
Manager!
Tajo CatalogHCatalog
Client Service
Handler!
JDBC !
Driver
Tajo!
CLI!
Tajo!
CLI!
Worker!
Query Master!
!!!!!!!!
Global 

Planner!
Client Service
Handler!
!!!!!!!
Local Query
Engine!
Storage
Manager!
Local HDFS/Hbase S3 / swift
ODBC !
Driver
Copyright@ 2015 by SK Telecom All rights reserved.
9
3.1 Technical Characteristic - Logical Flow Data Processing
Tajo Master!
!
!
!
!
!
!
!
!
SQL Parser
Logical/Global
Planner
Resource
Manager
Query Parsing
Decomposition of a work unit
Work units delivered to the server
Tajo
Worker!
Tajo
Worker!
Tajo
Worker!
Tajo
Worker!
Tajo Worker!
!
!
!
!
!
!
!
Physical Planner
Query Engine
Storage Manager
Decomposing the!
task operation unit
Unit operation
Disk data I/O control
Copyright@ 2015 by SK Telecom All rights reserved.
10
3.1 Technical Characteristic - JIT Query Engine
Implemented as a binary to 

consider the number of all cases

-> performance degradation

(call, if, switch below 50%)
switch(operand)!
Case numeric : add numeric!
Case string : add string!
real-time code generation 

based on operand type

combined operation can be 

processed by the compiler optimization
Four functions in a 

single operation(+2,-1,*1)
<Existing methods> <JIT methods>
Behavior depends on
the operand
characteristic!
!
- 1 + 2 = 3!
- “a” + “b” = “ab”!
- {1,2} + {3,4} = {4,6}!
- 1 + {1,2} = {2,3}
Result = A x (1-B) + (1+C)
+
x
- +
A A A A A
+
Copyright@ 2015 by SK Telecom All rights reserved.
11
3.1 Technical Characteristic -Vectorized Query Engine
<Tuple at a time> <Vectorized engine>
- DB!
- 1 operation/record
- Vectorized data!
- 1 operation/vector
A[] = {a1, a2, a3, a4, a5, a6}!
B[] = {b1, b2, b3, b4, b5, b6}!
!
C[] = A[] + B[]
a1
a2
a3
a5
a4
a6
b1
b2
b3
b5
b4
b6
+
+
+
+
+
+
a1
a2
a3
a5
a4
a6
+
b1
b2
b3
b5
b4
b6
Copyright@ 2015 by SK Telecom All rights reserved.
12
3.1 Technical Characteristic -Storage Manager
Tajo Worker!
Tajo Worker!
Tajo Worker(scan)!
Storage Manager!
!
!
!
!
!
!
!
!
!
Disk Scanner!
! Pre-fetching Buffer!
Disk Scanner!
Disk Scanner!
Request queue!
! ! ! !
Request queue!
Request queue!
Scan !
Scheduler
Bulk Read
Fine granularity
File

request
Copyright@ 2015 by SK Telecom All rights reserved.
13
Business Challenge
How SKT Hadoop DW Helped
[ SK Telecom ]
• Explosion of log data with LTE service
• Increase in types of data to be analyzed
• Insufficient DW capacity due to high cost
✓ 3x storage expansion under same price, 

or 80% reduction in unit price
✓ Enabled Ad-hoc analysis of unstructured text
data sets for daily
✓ Hadoop DW could decrease contents-based
analysis process time from few hours to 20
minutes max.
4. Hadoop DW Commercialization Cases Telco
Category MPP DBMS Hadoop DW
Raw Data Size 0.5 TB/Day 4 TB/Day
Total ETL Time Average of 3 hours Average of 6 hours
DW Creation
!
30 minutes 40 minutes
Mart Creation 1 hour 1 hour 40 minutes
Report
Creation
1 hour 30 minutes 2 hours 4 minutes
Copyright@ 2015 by SK Telecom All rights reserved.
14
Business Challenge
[ Global Top-5 Semiconductor Player ]
• Collect immense amount of unstructured
measurement data while manufacturing
• RDMBS & BI are incapable for such data type
• Even data loading can take up to 20 min
How SKT Hadoop DW Helped
✓ Support for unstructured data through variable
column schema
✓ 100x increase in data processing capacity
✓ Decreased data loading time by 10x (2 min)
✓ Minimized user action for pivot/unpivot
4. Hadoop DW Commercialization Cases Manufacturer
Copyright@ 2015 by SK Telecom All rights reserved.
Thank you.

Weitere ähnliche Inhalte

Was ist angesagt?

Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowDatabricks
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for HadoopHBaseCon
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Databricks
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceDatabricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with SparkVincent GALOPIN
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkBurak Yavuz
 
Cosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopCosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopDatabricks
 
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan AgrawalApache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan AgrawalDatabricks
 
DIscover Spark and Spark streaming
DIscover Spark and Spark streamingDIscover Spark and Spark streaming
DIscover Spark and Spark streamingMaturin BADO
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...DataStax
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...Databricks
 
Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHParis Data Engineers !
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningEvans Ye
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5SAP Concur
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Sparkdatamantra
 

Was ist angesagt? (20)

Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
 
Cosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopCosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics Workshop
 
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan AgrawalApache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
 
DIscover Spark and Spark streaming
DIscover Spark and Spark streamingDIscover Spark and Spark streaming
DIscover Spark and Spark streaming
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
 
Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
 
Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVH
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 

Andere mochten auch

Informe reflexion en aplicación de control joven sano
Informe reflexion en aplicación de control joven sanoInforme reflexion en aplicación de control joven sano
Informe reflexion en aplicación de control joven sanoMarceloMuller2015
 
EXAMEN MEDICINA PREVENTIVA ¿EFICIENTE HERRAMIENTA PESQUISANDO HIPERTENSIÓN AR...
EXAMEN MEDICINA PREVENTIVA ¿EFICIENTE HERRAMIENTA PESQUISANDO HIPERTENSIÓN AR...EXAMEN MEDICINA PREVENTIVA ¿EFICIENTE HERRAMIENTA PESQUISANDO HIPERTENSIÓN AR...
EXAMEN MEDICINA PREVENTIVA ¿EFICIENTE HERRAMIENTA PESQUISANDO HIPERTENSIÓN AR...MarceloMuller2015
 
DELIVERED AT THE BORDER pdf
DELIVERED AT THE BORDER pdf DELIVERED AT THE BORDER pdf
DELIVERED AT THE BORDER pdf Wafa Goussous
 
El aparatopsíquico
El aparatopsíquicoEl aparatopsíquico
El aparatopsíquicoCamiYaM
 
Track B-3: Delivering Actionable Experiences Through Effective Digital Marketing
Track B-3: Delivering Actionable Experiences Through Effective Digital MarketingTrack B-3: Delivering Actionable Experiences Through Effective Digital Marketing
Track B-3: Delivering Actionable Experiences Through Effective Digital Marketingscoopnewsgroup
 
Track C-2: Creativity & Design-Led Innovation in the Public Sector
Track C-2: Creativity & Design-Led Innovation in the Public SectorTrack C-2: Creativity & Design-Led Innovation in the Public Sector
Track C-2: Creativity & Design-Led Innovation in the Public Sectorscoopnewsgroup
 

Andere mochten auch (8)

2015 Atlanta CHIME Lead Forum
2015 Atlanta CHIME Lead Forum2015 Atlanta CHIME Lead Forum
2015 Atlanta CHIME Lead Forum
 
Informe reflexion en aplicación de control joven sano
Informe reflexion en aplicación de control joven sanoInforme reflexion en aplicación de control joven sano
Informe reflexion en aplicación de control joven sano
 
PRESENTACION
PRESENTACIONPRESENTACION
PRESENTACION
 
EXAMEN MEDICINA PREVENTIVA ¿EFICIENTE HERRAMIENTA PESQUISANDO HIPERTENSIÓN AR...
EXAMEN MEDICINA PREVENTIVA ¿EFICIENTE HERRAMIENTA PESQUISANDO HIPERTENSIÓN AR...EXAMEN MEDICINA PREVENTIVA ¿EFICIENTE HERRAMIENTA PESQUISANDO HIPERTENSIÓN AR...
EXAMEN MEDICINA PREVENTIVA ¿EFICIENTE HERRAMIENTA PESQUISANDO HIPERTENSIÓN AR...
 
DELIVERED AT THE BORDER pdf
DELIVERED AT THE BORDER pdf DELIVERED AT THE BORDER pdf
DELIVERED AT THE BORDER pdf
 
El aparatopsíquico
El aparatopsíquicoEl aparatopsíquico
El aparatopsíquico
 
Track B-3: Delivering Actionable Experiences Through Effective Digital Marketing
Track B-3: Delivering Actionable Experiences Through Effective Digital MarketingTrack B-3: Delivering Actionable Experiences Through Effective Digital Marketing
Track B-3: Delivering Actionable Experiences Through Effective Digital Marketing
 
Track C-2: Creativity & Design-Led Innovation in the Public Sector
Track C-2: Creativity & Design-Led Innovation in the Public SectorTrack C-2: Creativity & Design-Led Innovation in the Public Sector
Track C-2: Creativity & Design-Led Innovation in the Public Sector
 

Ähnlich wie IEEE International Conference on Data Engineering 2015

Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Amazon Web Services
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Apache Tajo - BWC 2014
Apache Tajo - BWC 2014Apache Tajo - BWC 2014
Apache Tajo - BWC 2014Gruter
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsYousun Jeong
 
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...HostedbyConfluent
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015Daniela Zuppini
 
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Web Services
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTDataHadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTDataCloudera, Inc.
 
Oracle Database 11g Lower Your Costs
Oracle Database 11g Lower Your CostsOracle Database 11g Lower Your Costs
Oracle Database 11g Lower Your CostsMark Rabne
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Alluxio, Inc.
 
Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!DataWorks Summit
 
Large Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint DeploymentsLarge Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint DeploymentsJoel Oleson
 
The Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseThe Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseAltibase
 

Ähnlich wie IEEE International Conference on Data Engineering 2015 (20)

Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Apache Tajo - BWC 2014
Apache Tajo - BWC 2014Apache Tajo - BWC 2014
Apache Tajo - BWC 2014
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
 
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in Telco
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
 
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTDataHadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
 
Oracle Database 11g Lower Your Costs
Oracle Database 11g Lower Your CostsOracle Database 11g Lower Your Costs
Oracle Database 11g Lower Your Costs
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!
 
Large Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint DeploymentsLarge Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint Deployments
 
The Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseThe Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- Altibase
 

Mehr von Yousun Jeong

Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesYousun Jeong
 
Druid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidDruid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidYousun Jeong
 
Kafka for begginer
Kafka for begginerKafka for begginer
Kafka for begginerYousun Jeong
 
Data Analytics with Druid
Data Analytics with DruidData Analytics with Druid
Data Analytics with DruidYousun Jeong
 
Enterprise 환경에서의 오픈소스 기반 아키텍처 적용 사례
Enterprise 환경에서의 오픈소스 기반 아키텍처 적용 사례Enterprise 환경에서의 오픈소스 기반 아키텍처 적용 사례
Enterprise 환경에서의 오픈소스 기반 아키텍처 적용 사례Yousun Jeong
 
2012 07 28_cloud_reference_architecture_openplatform
2012 07 28_cloud_reference_architecture_openplatform2012 07 28_cloud_reference_architecture_openplatform
2012 07 28_cloud_reference_architecture_openplatformYousun Jeong
 

Mehr von Yousun Jeong (7)

Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on Kubernetes
 
Druid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidDruid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druid
 
Kubernetes on aws
Kubernetes on awsKubernetes on aws
Kubernetes on aws
 
Kafka for begginer
Kafka for begginerKafka for begginer
Kafka for begginer
 
Data Analytics with Druid
Data Analytics with DruidData Analytics with Druid
Data Analytics with Druid
 
Enterprise 환경에서의 오픈소스 기반 아키텍처 적용 사례
Enterprise 환경에서의 오픈소스 기반 아키텍처 적용 사례Enterprise 환경에서의 오픈소스 기반 아키텍처 적용 사례
Enterprise 환경에서의 오픈소스 기반 아키텍처 적용 사례
 
2012 07 28_cloud_reference_architecture_openplatform
2012 07 28_cloud_reference_architecture_openplatform2012 07 28_cloud_reference_architecture_openplatform
2012 07 28_cloud_reference_architecture_openplatform
 

Kürzlich hochgeladen

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 

Kürzlich hochgeladen (20)

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 

IEEE International Conference on Data Engineering 2015

  • 1. SKT Hadoop DW SK telecom! Corporate R&D Center
 Yousun Jeong
  • 2. Copyright@ 2015 by SK Telecom All rights reserved. 1. Big Data in SKT 2. What is Hadoop DW ? 3. SQL on Hadoop TAJO 4. Hadoop DW Commercialization Cases Table of Contents 2
  • 3. Copyright@ 2015 by SK Telecom All rights reserved. High TCO for Data Management 250TB/day (91.25PB/year) 4 Hadoop clusters with various 
 commercial MPP databases for analytics Operational
 Systems Integration 
 Layer Data Warehouse Marts Marketing Sales ERP SCM ODS Staging
 Area Staging
 Area Mart A Mart B Mart C Mart D Hadoop+Hive MPP DBMS High TCO for Data Management
 (Too much data is loaded into MPP DBMS) One Unified Solution 30PB+ (compressed) on 1000+ nodes 10+ Hadoop clusters with Tajo & Spark 
 for all purposes Operational
 Systems Integration 
 Layer Data Warehouse Marts Marketing Sales ERP SCM ODS Staging
 Area Staging
 Area Mart A Mart B Mart C Mart D Hadoop+Tajo+Spark Affordable & Faster
 (Unified framework for Big Data) 1. Big Data in SKT 3
  • 4. Copyright@ 2015 by SK Telecom All rights reserved. ✓ Optimized configuration of a large-scale cluster ✓ Operation know-how of managing 1000+ nodes ✓ Fault tolerant and effective resource management system Data Collector Data Collect & pre-processing Main Cluster Analysis R&D Cluster ~250 TB/day (700+ node) Service Logic Repository (200+ Node) (100+ node) Service Cluster (150+ node) App. 1 … App. N T-Hadoop Data Feeding Data Feeding Commercialize Develop. 1. Big Data in SKT SKT Hadoop Clusters 4
  • 5. Copyright@ 2015 by SK Telecom All rights reserved. “Hadoop S/W and Commodity H/W! Based Cost-effective IT Infrastructure System” 【 Hadoop DW Infrastructure】 “High-price, High-performance! Proprietary IT Infrastructure System” 【 Legacy IT Infrastructure 】 ※ MPP Massively Parallel Processing, SAN Storage Area Network, NAS Network Attached Storage, RDBMS Relational DB Management System, ! SQL Structured Query Language 2. What is Hadoop DW ? Structured/Un-structured Data! Scale-out Structure (Petabyte, Exabyte) Low price
 ($200 ~ $1,000 / TB) Data Cost Structured Data! Scale-up Structure (Terabyte) High price! ($5,000~$50,000 / TB) Commodity H/W (x86 Server)H/W High Performance H/W! (MPP, Fabric Switch, etc.) Hadoop Architecture SQL on Hadoop S/W Proprietary S/W
 (RDBMS, etc.) Transaction/Batch Processing! (SQL) Hadoop File System The Hadoop DW provides a Hadoop Architecture based Data Warehouse from an Enterprise environment so the user can accommodate the massive amount of increasing data at a low cost. Solution SKT Hadoop DW 5
  • 6. Copyright@ 2015 by SK Telecom All rights reserved. Tajo - Fully Distributed - Vector process HDFS Hadoop Cluster + Tajo [ Legacy Approach (MR) ] [Tajo Approach ] Process more data
 on same clusters
 with improved
 processing speed Response
 Speed Hadoop Cluster Query Hadoop Cluster Query Up to 
 10x min few 
 sec~min + Tajo Try more queries
 for analysis 
 with improved! response speed Hive MapReduce - Partially Distributed - Sequential process HDFS Hadoop Cluster Processing
 Speed High-speed SQL-on-Hadoop processing engine • 3~5x improvement in processing speed to Hive under TPC-H procedure • 80~100% response speed to Impala without data size limit • Full ANSI-SQL support for easy RDBMS migration 3. SQL on Hadoop - TAJO 6
  • 7. Copyright@ 2015 by SK Telecom All rights reserved. 7 3. SQL on Hadoop - TAJO SQL Support ▪ ANSI SQL support ▪ Partition Type ▪ Meta Store Service Stability ▪ High Availability ▪ Resource Manager ▪ Fair Scheduler Performance ▪ High-speed processing ▪ Shuffling ▪ Dynamic Query Optimizer ▪ Query Rewriting System Integration ▪ BI Connector ▪ Proxy Support ▪ Tajo-R Function Support ▪ Analytic Function ▪ Hive Function [ Tajo Features ] [ Performance Comparison ] [ Apache Top-Level Project ]
  • 8. Copyright@ 2015 by SK Telecom All rights reserved. Worker! 8 3.1 Tajo Architecture 1. Query Master! 2. TaskRunner Tajo Master! Persistent Storage! !!! Derby Store! MySQL Store! Postgre SQL Store! Logical Planner! Logical Optimizer! Resource Manager! SQL Parser! ! Query Rewriter! Query Manager! Tajo CatalogHCatalog Client Service Handler! JDBC ! Driver Tajo! CLI! Tajo! CLI! Worker! Query Master! !!!!!!!! Global 
 Planner! Client Service Handler! !!!!!!! Local Query Engine! Storage Manager! Local HDFS/Hbase S3 / swift ODBC ! Driver
  • 9. Copyright@ 2015 by SK Telecom All rights reserved. 9 3.1 Technical Characteristic - Logical Flow Data Processing Tajo Master! ! ! ! ! ! ! ! ! SQL Parser Logical/Global Planner Resource Manager Query Parsing Decomposition of a work unit Work units delivered to the server Tajo Worker! Tajo Worker! Tajo Worker! Tajo Worker! Tajo Worker! ! ! ! ! ! ! ! Physical Planner Query Engine Storage Manager Decomposing the! task operation unit Unit operation Disk data I/O control
  • 10. Copyright@ 2015 by SK Telecom All rights reserved. 10 3.1 Technical Characteristic - JIT Query Engine Implemented as a binary to 
 consider the number of all cases
 -> performance degradation
 (call, if, switch below 50%) switch(operand)! Case numeric : add numeric! Case string : add string! real-time code generation 
 based on operand type
 combined operation can be 
 processed by the compiler optimization Four functions in a 
 single operation(+2,-1,*1) <Existing methods> <JIT methods> Behavior depends on the operand characteristic! ! - 1 + 2 = 3! - “a” + “b” = “ab”! - {1,2} + {3,4} = {4,6}! - 1 + {1,2} = {2,3} Result = A x (1-B) + (1+C) + x - + A A A A A +
  • 11. Copyright@ 2015 by SK Telecom All rights reserved. 11 3.1 Technical Characteristic -Vectorized Query Engine <Tuple at a time> <Vectorized engine> - DB! - 1 operation/record - Vectorized data! - 1 operation/vector A[] = {a1, a2, a3, a4, a5, a6}! B[] = {b1, b2, b3, b4, b5, b6}! ! C[] = A[] + B[] a1 a2 a3 a5 a4 a6 b1 b2 b3 b5 b4 b6 + + + + + + a1 a2 a3 a5 a4 a6 + b1 b2 b3 b5 b4 b6
  • 12. Copyright@ 2015 by SK Telecom All rights reserved. 12 3.1 Technical Characteristic -Storage Manager Tajo Worker! Tajo Worker! Tajo Worker(scan)! Storage Manager! ! ! ! ! ! ! ! ! ! Disk Scanner! ! Pre-fetching Buffer! Disk Scanner! Disk Scanner! Request queue! ! ! ! ! Request queue! Request queue! Scan ! Scheduler Bulk Read Fine granularity File
 request
  • 13. Copyright@ 2015 by SK Telecom All rights reserved. 13 Business Challenge How SKT Hadoop DW Helped [ SK Telecom ] • Explosion of log data with LTE service • Increase in types of data to be analyzed • Insufficient DW capacity due to high cost ✓ 3x storage expansion under same price, 
 or 80% reduction in unit price ✓ Enabled Ad-hoc analysis of unstructured text data sets for daily ✓ Hadoop DW could decrease contents-based analysis process time from few hours to 20 minutes max. 4. Hadoop DW Commercialization Cases Telco Category MPP DBMS Hadoop DW Raw Data Size 0.5 TB/Day 4 TB/Day Total ETL Time Average of 3 hours Average of 6 hours DW Creation ! 30 minutes 40 minutes Mart Creation 1 hour 1 hour 40 minutes Report Creation 1 hour 30 minutes 2 hours 4 minutes
  • 14. Copyright@ 2015 by SK Telecom All rights reserved. 14 Business Challenge [ Global Top-5 Semiconductor Player ] • Collect immense amount of unstructured measurement data while manufacturing • RDMBS & BI are incapable for such data type • Even data loading can take up to 20 min How SKT Hadoop DW Helped ✓ Support for unstructured data through variable column schema ✓ 100x increase in data processing capacity ✓ Decreased data loading time by 10x (2 min) ✓ Minimized user action for pivot/unpivot 4. Hadoop DW Commercialization Cases Manufacturer
  • 15. Copyright@ 2015 by SK Telecom All rights reserved. Thank you.