Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data - Jason Han, NexR

Next Revolution
Toward Open Platform

NYC 2011

www.nexr.com

NexR Introduction
Big data analytics firm
Working on Hadoop and big data for 5 years
Provided a NexR Hadoop solution to all major Korea telcos
(KT, SKT, LG U+)
Leading a Korean Hadoop community and holding Hadoop
conferences

Products
NexR Data Analytics Platform (NDAP)
iCube Cloud: cloud computing platform (like OpenStack)
Massive email archiving solution (presented in Hadoop World 2009)

Next Revolution
Toward Open Platform -2-

Agenda
Voice of Customer: KT CDR Analysis System
KT requirements for system migration
NexR Data Analytics Platform (NDAP) overview
Oracle-to-Hive migration
Enterprise Hive
RHive
Lessons learned
Conclusion

You can download this presentation file:
http://www.nexr.com/hw11/ndap.pdf

Next Revolution

Introduction to KT business

Business
1981.12 Establishment of KT Corporation Domain
2002.08 Privatization from Gov. Owned – Mobile 2G, 3G
Company – WiBro (Mobile
WiMax)
2006.04 Commercial Launch of World’s
first Mobile WiMAX - WiBro –Internet Access
–IPTV
2008.11 Commercial Launch of Real-Time IPTV
–VOIP
2009.06 Merged with KTF –Multimedia
Contents
2010.06 Cloud Service Launched –Local,
International
Telephone
– Cloud Service

# of Sales Telephone Broadband Mobile
Employees (2010) Subscribers Subscribers Subscribers

Introduction to KT CDR data
• KT CDR(Call Detail Record) Unit : TB
row data summary size
Data
1 Month 1 Month (row 1 yr +sum2 yrs)
Unrated CDR
3.7 2.5 104
(VOICE, Data, SMS, MMS)
Wireless Rated CDR 1.5 0.2 22
Wi-Fi 0.4 0.3 12
Wibro 1.5 1.0 42
Wireline Rated CDR 1.5 1.5 55
IPDR IP-TV 1.5 0.1 19
Total 10 5.6 254

• KT Subscriber Analysis System(SAS) for Wireless CDR
 Reporting, call detail summary, subscriber’s call quality, call log search, etc

 Implemented with relational database over a high-end server
- Data gathering and converting in a server every tens of seconds

- Daily batch extract-transform-load(ETL) with SQL queries

- Near real-time search against an indexed column(call number)

 Hundreds of DB tables, over 3000 SQL queries for 10 years

Current KT CDR Analysis System Architecture

Relational
Data Sources Database Real-Time
Bottleneck
Search
Bottleneck
Data Raw
LALA2 Converting Data

Dimension
Table OLAP

Bottleneck Batch Summary
ETL Table
NIBADA Bottleneck

Collector
Server

ARGOS
High-end
Server

New Challenges Faced
• Increasing In Data Volume
– Popular demand for Smart Phone and SNS
– Need to do more complicated data analysis to bit the competition
– Customer behavior analysis is needed
• Slow in performance
– Peak time performance became unacceptable
– Some CDR’s were lost due to slow performance
• KT Cloud Business Launched
– Cheaper New KT Cloud H/W is available
– Open source requirements are increasing in the company

Can traditional DB give us an answer?

KT meets NexR for Big Data
• Scalability
– Coping with increasing data volume
and variety (wired, 2G, 3G, WiMax,
LTE, WiFi, SMS, MMS, etc)
– Enabling horizontal scalability in every
data path (data collection, data Replacing traditional
storage, ETL process, data search) RDB and DW with
Hadoop and similar
• Performance OSS
– Handling streamed CDR data in near
 Project start (2011.4)
real-time
– Completing daily ETL tasks in a given  Applying NexR solution
time period regardless of data for CDR analysis (pilot)
increase

• Cost-Efficiency
– Reducing the cost with inexpensive
equipments

Continuous Journey for KT’s Big Data
• Step by Step approach….with NexR

Steps Open Coverage

. Replacing representative data and SQLs
Hadoop CDR 2012.1Q
Analysis Platform . Unrated Wireless CDR (Pilot)

. Change all traditional application to OSS
2012.
Wireless CDRs . Add more views and reports

. Rated CDR’s, Internet access log, TV log
Data Integration 2013.
Advanced Analytics . Advanced Analytics

. SNS, Location etc
External Data 2014.
. Data from KT subsidiaries
Sources

Rethinking KT’s Requirements
Internet Social
IPTV Log Access Data
Log

Data
Data Volume
Data
+
Explosion Integration Data
Variety

Past Present Future

KT CDR System Data
Hundreds of DB tables, 3000 SQLs Interface
(for 10 years) +
Data
Engineers
SQL Business Analysts Who?
DBA
Developers (OLAP, SAS, etc)

Next Revolution

Big Data Analytics Requirements for Enterprise

 Data volume is only the basic requirement

 Data integration is the fundamental requirement
(Structured data + Unstructured data)

 Need to preserve the existing data and apps
 Need to be familiar to enterprise data engineers
(DBA, SQL developers, business analysts, etc)

 Smooth transition is also essential

What’s the solution?
Next Revolution

NexR Solution: Hadoop + Hive
Hive is the best solution for smooth transition
from database world to Hadoop world

ANSI-SQL-based query engine
Good for RDB migration

Batch data processing
HOW TO (ETL, Reporting, Ad-hoc query)
CONVERT

HOW TO
ADAPT
Common data storage

Enterprise data engineers File-based data store
DBA, SQL Good for data integration
Developers

Next Revolution

Embracing database into Hadoop world
Support for migrating data and logics from RDB to Hadoop
Support for integrating RDB and Hadoop
Offering Hive tools for DBA and SQL developers

Full package for big data analytics
From data collection to batch data processing, real-time query, and
even advanced analytics
Leveraging open source technologies
Horizontal scalability in every data processing path (collection,
batch processing, real-time query, etc)
Injecting real-world practices by the collaboration with KT

Next Revolution

NDAP Bird’s View
Used Open Sources

NDAP RHive Advanced analytics
Integration of R and Hive

NDAP Enterprise Hive
Oracle-to-Hive, Hive workflow, Batch data processing
Hive performance monitor, query planner

NDAP Data Store Common data storage
HDFS, Sqoop-based data import/export

NDAP Search
ElasticSearch-based distributed log search Real-time query
Time-ranged index sharding

NDAP Collector
Flume-based data collector Streamed data collection
Checkpointing for low overhead agents

NDAP Admin Center
Zookeeper-based distributed coordinator Coordination & Management
Collectd-based system/app management

Next Revolution

NDAP Architecture
Data NexR Data Analytics Platform (NDAP) Applications
Sources
Advanced
RHive Analytics

Enterprise Hawk
DBA
PerfMon, Query Plan
Hive
Data
Oracle ETL
Importer Lama
Oracle-to-Hive Ad-hoc query
Hive Workflow Reporting
Databases
Existing
BI/DW
Data
ODS Importer OLAP
Server
Data Store OLAP
(Hadoop)
Data
Exporter Oracle
Collector

REST/JSON
Search API
Real-time
Telco query
Equipments
(Streaming Admin Center
data)

Next Revolution

NDAP Bird’s View – Today’s focus

NDAP RHive
Integration of R and Hive

Today’s talk
NDAP Enterprise Hive
Oracle-to-Hive, Hive workflow,
Hive performance monitor, query planner

NDAP Data Store
HDFS, Sqoop-based data import/export

NDAP Search
Lucene-based distributed log search engine
Time-ranged index sharding

Refer to appendix
NDAP Collector
Flume-based data collector
Checkpointing for low overhead agents

NDAP Admin Center
Zookeeper-based distributed coordinator
Collectd-based system/app management

Next Revolution

Enterprise Hive

Recreating Hive for Enterprise Data Engineers

Two goals
Migration of data and SQL from RDB(Oracle) to Hive
 Oracle-to-Hive support
Rich environment for Hive developers, even DW/BI teams and DBA
 Performance monitor, query planner, workflow manager

Next Revolution

Is Oracle-to-Hive trivial?
Simple Example
SELECT * FROM Employee e1, Dept d1 WHERE e1.ID = d1.Id

SELECT * FROM Employee e1 JOIN Dept d1 ON (e1.ID = d1.Id)

Typical Example
SELECT /*+ PARALLEL(K1 16) USE_NL(K1 B) */
ETL_DATE, CALL_DATE,
CASE WHEN SUBSCRIBER_TYPE ='PREMIUM'
THEN 'Y'
ELSE NVL(TO_CHAR(B.I_NCN),'X')
END AS I_NCN,
I_INOUT,VALID_CNT, I_CFC_TYPE, ……
FROM 3G_CALL_LOG K1
, SASCOMM.PHONE_MAPPING B
WHERE K1.i_etl_dt = TO_DATE('[#SAS_YDAY#,'YYYYMMDD')
AND K1.i_call_dt ||'' >= TO_DATE('[#SAS_YDAY#]','YYYYMMDD')
AND K1.i_call_dt ||'' < TO_DATE('[#SAS_YDAY#]','YYYYMMDD') + 1
and K1.I_INOUT in ('0','1')
AND DECODE(K1.I_INOUT,'0',NVL(K1.I_OUT_CTN, I_CALLING_NUM),'1',K1.I_IN_CTN) = B.I_CTN(+)
AND K1.CALL_DATE >= B.SDATE(+)
AND K1.CALL_DATE < B.EDATE(+);

Next Revolution

Enterprise Hive – Oracle-to-Hive
Enhancing Hive by
Fixing Hive code (JIRA issues, 2253, 2503, 2329, 2332, etc)
Adding Hive UDF and UDAF for Oracle compatibility

Enterprise Hive provides
Conversion rules, a guide and a process
Oracle data types that are not supported in Hive
Oracle functions that are not supported in Hive

Three conversion points to consider
Data model and data types
Basic functions, aggregate and analytic functions
SQL syntax

Next Revolution

Oracle-to-Hive – Data Model, Types, Functions

Hive refered to
MySQL function syntax
Data Model Basic Functions
Oracle Hive Function
Oracle Hive
Type
Table Table
Math round,ceil,mod, round,ceil,pmod,
Partition Partition Functions power,sqrt,sin/cos power,sqrt,sin/cos
substr,trim,lpad/rpad
Sampling Bucket Character substr,trim,lpad/rpad
ltrim/rtrim,regexp_repl
Functions ltrim/rtrim,replace
ace
Data Type Null
coalesce,nvl,nvl2 coalesce (no nvl, nvl2)
Functions
Oracle Hive
TINYINT Added Basic Functions (Hive UDF)
NUMBER(n)
INT/BIGINT
Function Type Hive
NUMBER(n,m) FLOAT/DOUBLE
Condition DECODE, GREATEST
VARCHAR2 STRING
Null NVL, NVL2
STRING
DATE "yyyy-MM-dd Type TO_NUMBER, TO_CHAR, TO_DATE,
HH:mm:ss" format Conversion INSTR4, DATE_FORMAT, LAST_DAY

Hive data type is designed to be converted into Java data type

Next Revolution

Oracle-to-Hive – SQL Syntax & Analytic Functions
Most not-supported Oracle SQL syntax can be converted with Join syntax

Oracle SQL Hive HQL

SELECT * from Employee e WHERE e.DeptNo SELECT * from Employee e
IN subquery
IN(SELECT d.DeptNo FROM Dept d) LEFT SEMI JOIN Dept d ON (e.DeptNo=d.DeptNo)

SELECT e.* from Employee e
NOT IN SELECT * from Employee e WHERE e.DeptNo
LEFT OUTER JOIN Dept d ON (e.DeptNo=d.DeptNo)
subquery NOT IN(SELECT d.DeptNo FROM Dept d)
WHERE d.DeptNo IS NULL

SELECT * SELECT *
JOIN
FROM Employee e1, Dept d1 WHERE e1.ID = d1.Id FROM Employee e1 JOIN Dept d1 ON (e1.ID = d1.Id)

RANK SELECT name,dept,salary,RANK() OVER (PARTITION BY SELECT e.name,e.dept,e.salary,RANK(e.dept,e.salary)
(Analytic dept FROM (SELECT name, dept, salary FROM emp DISTRIBUTED
Function) ORDER BY salary DESC) FROM emp BY dept SORT BY dept, salary DESC) e

MIN SELECT dept,tmp.m FROM emp JOIN (SELECT dept,
SELECT dept, MIN(salary) OVER (PARTITION BY dept)
(Aggregate MIN(salary) m
FROM emp
Function) FROM emp GROUP BY dept) tmp ON emp.dept = tmp.dept

Oracle analytic functions are used sometimes for statistical processing (5% in KT case)
 Implemented some analytic functions
(RANK, DENSE_RANK, ROW_NUMBER, LAG, MIN, MAX, SUM)

Next Revolution

Oracle-to-Hive – Example

select /*+ use_nl(E emp_idx1) */
D.dname, E.empno, E.ename,
decode(nvl(JOB, ‘SALESMAN’), 'SALESMAN', sal, 0) sal,
RANK() over (PARTITION BY D.deptno ORDER BY sal desc) ranking
from dept D, emp E
where D.deptno = E.deptno
and E.ename in (select ename
from bonus
where job in ('SALESMAN', 'CLERK'));

select X.dname, X.empno, X.ename
nexr_rank(HASH(X.deptno, X.sal), X.sal) ranking
from (
select D.dname, D.deptno, E.empno, E.ename,
(case coalese(JOB, ‘SALESMAN’) when 'SALESMAN‘ then sal else 0) sal,
from dept D
join emp E on (D.deptno = E.deptno)
join bonus B on (D.ename = B.ename)
where B.job in ('SALESMAN', 'CLERK')
) X
distribute by hash(D.deptno, E.sal) sort by D.deptno, E.sal;

Next Revolution

NDAP Process for RDB Migration

Data
Preparation Conversion Validation Optimization

Function
Oracle schema Rewriting Hive
conversion
to Hive schema queries
Data
semantically
SQL conversion compatibility
Data loading to (when more
(by 1-on-1 check
Hive using performance
conversion rule
Sqoop is needed)
syntactically)

The case of KT CDR migration
Chose 100 representative SQLs for ETL and successfully converted
Current step: 200-300 mainly used SQLs
Next step(2012): 3000 SQLs

Next Revolution

Enterprise Hive – Rich Environment for Hive

Building up Hive Ecosystem by
Adding assistant programs to help DBA and SQL developers

Enterprise Hive provides

Hive performance monitor and query planner

Hive workflow manager

Next Revolution

Hawk – Hive Performance Monitor
Difficulty of Hive performance diagnostics
Metrics and logs from Hive and Hadoop are seperated
Lack of the historical and statistical view of performance
Hawk performance monitor for DBA
Integrated view of a Hive query and the corresponding MapReduces
Hourly/daily/weekly/monthly performance views of each query

Hawk Screenshot

Hawk Architecture

Next Revolution

Hawk – Hive Query Planner
Difficulty of Hive default query planner
Too complicated due to show the detail of MapReduce execution
Not for DBA, but for Hive internal developers
Hawk query planner for DBA
Displaying a Hive query in a HQL operator level (familiar to DBA)
Showing a performance result with a query at once

Hive default query planner

Hawk query planner

Performance result

Next Revolution

Lama – Hive Workflow Manager

Workflow development and management tool for Hive
Managing data processing jobs for Hive
Choosing Oozie as a core workflow engine
Providing web-based interface
Workflow editing & management, user management, job scheduling,
project management, etc

On-demand workflow change at runtime
Need to fix and resume a workflow at runtime in failure
Not supported in most workflow engines
Patching Oozie for suspend/resume per action(i.e., Hive query)

Future plan
Supporting other data processing jobs like Pig, Sqoop, MapReduce,
HDFS, SSH, and Java

Next Revolution

NDAP Process for Batch Data Processing

Analysis Development Execution Management

Performance
Workflow
Analyze service diagnostics &
Workflow deployment &
request(SR) optimization
development scheduling
& testing
Hive data and Workflow
& validation Performance
query modeling suspend/fix/
monitoring
resume in failure

Lama Hawk Hawk
Workflow Performance Query
Manager Monitor Planner

Next Revolution

Enterprise Hive Demo

Next Revolution

R for Advanced Analytics
R (GNU open source)
Programming language and software environment for statistical
computing and graphics (wikipedia)
4,000+ R libraries (more than SAS’s functionality)
Becoming a de facto standard among statisticians

R for Big Data
R runs in a single node
Some parallel R packages
snowfall, rpvm, rmpi, etc

Recent attempts to combine R and Hadoop
RHIPE(Purdue), RHadoop(RA), Ricardo(IBM)

Next Revolution

RHive
Marrying R and Hive for Big Data Analytics
Most R programmers are familiar to SQL
Hive can hide the detail of Hadoop and MapReduce
Inspired by IBM Ricardo(R+Jaql)

Strong for deep analytics Strong for massive data manipulation
Lack of massive data manipulation Lack of analytical functionalities

Providing Hive interfaces in the R environment
Allowing R programmers to use a familiar SQL for big data manipulation

Released as open source (Apache license version 2)
Source: https://github.com/nexr/RHive
CRAN: http://cran.r-project.org/web/packages/RHive

Next Revolution

RHive API and Architecture
RHive API
rhive.connect(): connect R to Hive
rhive.query(): send a Hive query and return the result
rhive.export(): export R functions to R processes running on the MR nodes
rhive.exportAll(): export R functions and R objects to R processes running on
the MapReduce nodes
rhive.close(): close a Hive connection

RHive
Architecture

Next Revolution

RHive Sample – Flight Delay Prediction
 R: Building a prediction model of flight delay using linear regression with a training data set (sampled from Hive)
 Hive: Running the prediction model(R objects) with an entire data set in Hive

1 library(RHive)
2 rhive.connect("127.0.0.1")
3
4 # get a training data set from Hive
5 trainset <- rhive.query("SELECT dayofweek,arrdelay,distance FROM airlines",fetchsize=30,limit=100)
6
7 # convert to numeric, and extract out missing values
8 trainset$arrdelay <- as.numeric(trainset$arrdelay)
9 trainset$distance <- as.numeric(trainset$distance)
10 trainset <- trainset[!(is.na(trainset$arrdelay) | is.na(trainset$distance)),] Data set: airline on-time performance
11 http://stat-computing.org/dataexpo/2009/
12 # create a prediction model using R model objects and internal funtions • Flight arrival and departure details for
13 model <- lm(arrdelay ~ distance + dayofweek,data=trainset) all commercial flights within the USA,
14 rhpredict <- function(arg1,arg2,arg3) { from October 1987 to April 2008.
15 if(arg1 == "NULL" | arg2 == "NULL" | arg3 == "NULL")
16 return(0.0)
17 res <- predict.lm(model, data.frame(dayofweek=arg1, arrdelay=arg2, distance=arg3))
18 return(as.numeric(res))
19 }
20 null <- "NULL"
21
22 # set up R objects in Hive
23 rhive.assign("null", null)
24 rhive.assign("rhpredict", rhpredict)
25 rhive.assign("model", model)
26
27 # export the R prediction model and run it in Hive
28 rhive.exportAll("rhpredict", c("10.1.3.2","10.1.3.3","10.1.3.4","10.1.3.5","10.1.3.6","10.1.3.7"))
29 rhive.query("create table delaypredict as select R('rhpredict', dayofweek, arrdelay, distance, 0.0) from airlines")

Next Revolution

RHive Demo

Next Revolution

Lessons Learned
RDB migration to open source is complicated, time-
consuming, and labor-intensive. It can become real
with some practice and migration process.
The average time of a query conversion (200~300 lines in average)
8 hours -> 2 hours after 4 months (4 times faster)
Advantageous to those who experienced database migration
(similar to Oracle-to-MySQL migration)

Current data engineers are not familiar with open
sources like Hadoop. They want to use software tools
similar to the ones that they use.
Open sources such as Hadoop and MapReduce are not easy for
current IT managers. Open sources are technology-driven, not
demand-driven.
Open sources and technologies need to be wrapped up in familiar
interfaces in order to hide the detail.

Next Revolution

Lessons Learned
Open source software is not a panacea. Choosing a right
open source is the first significant step. Combining several
OSS is common. The modification of source code of OSS is
inevitable if requirements are not negotiable.
Combining two separate open sources, Hive and ElasticSearch for
batch data processing and real-time query on Hadoop as a common
data store.
The change of Hive, ElasticSearch, Flume, Oozie, Zookeeper, etc.

The integration of various types of data is a critical issue for
an enterprise. Especially, the structured data of database
and DW need to be coupled with unstructured data in order
to better understand customer’s needs.
It is necessary to embrace current data and business logics in a new
environment.
RDB/DW and Hadoop have their pros and cons, so it is necessary to
find the right mix.

Next Revolution

Conclusion

Big data analytics for telco and enterprises

Smooth transition from RDB/DW to


Next Revolution

NexR NDAP Team
 Jaesun Han  Wonkuk Yang
 Sangmin Kwak  Sebong Oh
 JeongMin Kwon  SungHan Woo
 Keumju Kim  Dongmin Yu
 Daegeun Kim  Choonghyun Ryu
 Minseok Kim  Bokju Yun
 Minwoo Kim  Jonghee Lee
 Yeonseop Kim  HyungJoo, Lim
 Youngwoo Kim  HeeWon Jeon
 Hyeon-Cheol Nah  GooBum Jung
 SeungWoo Ryu  Sunghwan Cho
 Seoeun Park  Junho Cho
 Young-Geun Park  ByungMyon Chae
 Eun-Sook Park  Yungtai Choi
 Chihoon Byun  Choi Jong-wook
 SeongHwa Ahn  Inho Han
 Youngbae An  Seonghak Hong
Next Revolution

Thank you
Presentation file: http://www.nexr.com/hw11/ndap.pdf

Contact
jason.han@nexr.com
twitter: @jaesun_han

KT CDR NDAP Enterprise
RHive Appendix
System Overview Hive
(Slide 30) (Slide 37)
(Slide 4) (Slide 14) (Slide 17)

Next Revolution

Appendix

Next Revolution

NDAP Collector
Flume-based scalable data collector
Choosing Flume due to the flexible architecture (source, decorator, sink)
Adding a checkpoint mode and rolling/dedup
Adding a checkpoint reliability mode
Chukwa’s checkpoint is grafted onto Flume
Less resource consumption in agents than Flume E2E mode
Minimizing the amount of log data retransmitted at the failure of agents
Rolling and deduplication
Rolling fragmented log data periodically in Hadoop
Removing duplicated log data in case of failover

Rolling/Dedup Manager
Zookeeper MapReduce
Execution
Rolling/De Workflow
Scheduler Data Store
dup MR Manager
(Hadoop)

Flume
Source Decorator Sink Search
Agent
Log data
& location
Checkpoint

Next Revolution

NDAP Search: Near Real-Time Indexing
Near real-time indexing using RAM Index
Adding RAM index for near real-time indexing in ElasticSearch
Flushing RAM index into Disk index after a given time period or a buffer
overflow
When searching, both RAM index and disk index are examined

Indexer Searcher

add search

IndexWriter IndexReader
create
write

buffer
read

read
commit

Disk Index

Next Revolution

NDAP Search: Index Split Strategies
Modifying ElasticSearch to add more index split schemes for log search
Searching log data has usually time constraint like daily or monthly
Combining time-based index split and size-based index split
Time-based index split
Splitting an index according to a given time period
Improving indexing and search performance
Easy to implement auto-retention
Size-based index split
Splitting an index according to a given size
Resolving a big index performance problem

Time-based Size-based ElasticSearch
Index Partitions Index Sequences Index Shards

2011.10.08 0001 0002 0 Primary Replica Replica

2011.10.09 0001 0002 0003 1 Primary Replica Replica

Search 3 Primary Replica Replica

2011.10.30 0001 0002 0003

Next Revolution

NDAP Admin Center: Distributed Coordinator
Zookeeper-based distributed coordinator
Zookeeper handles the coordination among NDAP components
Patching several issues of Zookeeper and ZkClient
Providing common libraries for NDAP components
Gourp membership, master election, distributed lock, distributed queue
Easy to use and more reliable than any other recipes, especially to read-and-write problems

Patched Zookeeper Ensemble

Zookeeper Client Thread Complex, Unique, Fragile

Patched ZkClient Thread

Zookeeper Recipes

Group Master Distributed Distributed Easy, Reusable, Fault Tolerant
Membership Election Lock Queue

NDAP NDAP
Search Collector

Next Revolution

NDAP Admin Center: System/App Management
Collectd-based system and application monitoring
Server resource monitoring: CPU, memory, disk, process, vmem, tcp connects, etc
Application monitoring: Hadoop, ElasticSearch, Flume, Zookeeper, Memcached, Collectd, etc
Plug-in architecture: add more applications such as NoSQL
Resource-centric view
Displaying all nodes’ resource status in a screen for a specific resource (cpu, mem, etc)
Most system management tools(Ganglia, Nagios, etc) offer node-centric view

Check threshold/
Collectd severity
Server

Management
Dashboard

NDAP
Admin

Next Revolution

Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data - Jason Han, NexR

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data - Jason Han, NexR

Ähnlich wie Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data - Jason Han, NexR (20)

Mehr von Cloudera, Inc.

Mehr von Cloudera, Inc. (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data - Jason Han, NexR