4. 什麼是非結構化資訊 ?
Unstructured Data refers to information that either does
not have a pre-defined data model and/or does not fit
well into relational tables. Unstructured information is
typically text-heavy, but may contain data such as dates,
numbers, and facts as well. This results in irregularities
and ambiguities that make it difficult to understand using
traditional computer programs as compared to data
stored in fielded form in databases or annotated
(semantically tagged) in documents
-- from Wikipedia http://en.wikipedia.org/wiki/Unstructured_data
4
12. Hadoop 不只是 Hadoop
Big Data Applications
Pig!
SQL HIVE
Zoo
RAW Keeper
12
13. Hadoop 生態系統
ZooKeeper – distributed coordination service
HBase – distributed column-oriented database for random
read/write
HIVE – SQL like database on top of Hadoop
Pig – high level scripting language for data processing
Mahout – a scalable machine learning library for MapReduce
Sqoop – SQL-to-Hadoop connector
Flume – a distributed streaming data collection framework
13
24. 企業的 Hadoop 應用策略
PowerView Excel with Predictive Embedded
PowerPivot Analytics BI
Familiar End User Tools
S
S
SSAS R
S
BI Platform
Connectors
Hadoop
Web
Sensors Devices Crawlers
Log ERP CRM LOB APPs
非結構化資料來源 結構化資料來源
31. Etu Appliance 簡介
Big Data End-to-End Solution in a Box
儲存與運算一體,簡化與最佳化的優勢機種:
•10 分鐘內可部署 100+ 節點
•資料擷取能力 1U 勝過 8U
•Big Data 運算處理最適化
• 延展:公有雲等級的運算架構
• 可靠:電信等級的系統品質
• 效能:企業等級的創新績效
32. 三種資料溫度的整合: Hot / Warm / Cold
Hot Data
在線結構化資料
在線半 / 非結構化資
料 OLTP OLAP
Warm Data
在線半 / 非結構化資
料 Hadoop-based Solution
Cold Data
離線資料
SAN / NAS / Scale-out NAS