9. Data Analysis Process
with Hadoop
? !
HADOOP FEATURES TOOLS
2 QUAD-CORES SAS
8GB RAM X 60 NODES WEKA
4TB HDD R
ETC
4 QUAD-CORES
16GB RAM X 30 NODES
4TB HDD
15. SEARCH SPAM INDEX
Mission
Spam이 검색 사용자에게 미치는 영향 파악
Data
Search Log : Text with Delimiter
Post Filtered Documents : Json Format
Operation Deleted Documents : Xml Format
Task
Query - Session - Doc. 1 - Doc. 2 - Doc. 3 - Doc. 4
Click? TER JOIN
OU
Type? (Ham, Spam, OP Del.)
18. BLOG CLASSIFICATION
Mission
Unsupervised Learning을 통한 나쁜 Blog Clustering
Data
30 Days Blog Documents
Task
Blog - Document’s Feature Analysis with Fixed Interval
22. ADVANTAGE OF HADOOP
ADVANTAGE
Low analyze cost!
No more sampling!
Low operation cost!
Programming Language Independent
Various support tools
DISADVANTAGE
Conceptual Change is Needed.
Project under active development.
Version upgrade is not supported.