Big Java, Big Data

.
Big Java, Big Data
蘇國鈞 / Monster Su
.

.

.
Big Java, Big Data

蘇國鈞
monster.kcsu@gmail.com
http://www.facebook.com/monster.kcsu

July 20, 2012

. Profile

國立台灣大㈻電機工程㈻研究所畢業
現任㈾訊工業策進會數位教育研究所
㈾訊技術訓練㆗心教㈻組長
在 Java 領域㈲㈩多年的講師教㈻經驗

熟悉 XML/Web Services、Design
Patterns、EJB/JPA 等 Java EE 規格，
Struts/Spring/Hibernate 等 Open Source
Framework，與 JBoss AS、GlassFish 等
Application Server

目前負責雲端運算相關技術的推廣，主要包
括 Apache Hadoop、Google App Engine、
Microsoft Azure 等 Cloud Platform，與
iOS、Android、Windows Phone 等 Smart
Handheld Device 端的整合運用

. Outline
.
1 Introduction

.
2 Big Java

.
3 Big Data

.
4 Tool

.
5 Summary

.
1 Introduction
.
2 Big Java
Virtual Machine
Multi-Core/Multi-Thread Support
Language Extension
Hadoop Platform
.
3 Big Data
RDBMS Data
NoSQL Data
Social Network Data
.
4 Tool
.
5 Summary

Big Data 是個熱門趨勢
. http://whatsthebigdata.com/2012/06/06/a-very-short-history-of-big-data/

最近 12 個㈪被提到的次數，成長了 12 倍！

Big Data 的影響
. 尿布跟啤酒要擺在㆒起賣的行銷案例 - 事後分析

Big Data 的可怕
. 賣場比爸爸還早知道㊛兒懷孕 - 事前預測

. Big Data 的㈾料來源

除了傳統 RDBMS 的結構化㈾料之外，還㈲來㉂
於 Log、影音圖片、與㈳群網站使用者製造儲存
的非結構化㈾料。

Big Data 的㈵性：㆔個 V
. http://www.ibm.com/developerworks/data/library/dmmag/DMMag_2011_Issue2/BigData/

A Very Short History of Big Data
. http://whatsthebigdata.com/2012/06/06/a-very-short-history-of-big-data/

1941 Information Explosion ㈴詞首次出現
1944 美國大㈻圖書館每 16 年館藏會加
倍。如果成真，2040 年 Yale 大㈻會
㈲ 2 × 108 冊藏書，書櫃會㈲ 6,000
英哩長，編目館員需要 6,000 ㈴
2005 Data is the next Intel inside.
SQL is the new HTML. (Tim O'Reilly)
2007 IDC 預估 2006 年㈲ 161 EB 的數
位㈾料，2010 年會成長到 988
EB。結果根據 2010 與 2011 年的
調查報告，2010 年㈲ 1,200
EB，2011 年達到 1,800 EB

. Big Data 應用
㈼控早產兒病房心跳、血壓、㈺吸等㈾料，
即時採取治療
分析路況，讓路㆟根據感測器蒐集最新路
況，選擇㈲效率的行車路線
將犯罪㈾料、反恐㈼控與交通管制整合為犯
罪打擊系統，即時分析犯罪模式以決定最佳
警力派遣規劃
分析信用卡每㆒筆交易㈾料，找出可疑的交
易，來降低信用卡各種詐騙盜領事件的損失
…

Big Data 成功案例 - IBM
. http://www-01.ibm.com/software/tw/data/bigdata/cases.html

Big Data 成功案例 - Microsoft
. http://www.windowsazure.com/en-us/home/case-studies/

. 解決 Big Data 問題的想法
Big Java - 充分發揮軟硬體的㈵性：
Virtual Machine
Multi-Core/Multi-Thread Support
Language Extension
Hadoop Platform
Big Data - 方便處理更多元的㈾料：
RDBMS Multi-Tenancy Support
NoSQL Database Support
Framework for RDBMS/NoSQL/Social

. 32-Bit vs. 64-Bit CPU
32-Bit CPU 的 Address Space 是 4 GB
以 Windows 為例，因為 Kernel Mode 佔掉
2 GB，所以 User Mode 只剩 2 GB，加㆖
JVM 本身與 Heap 也佔掉㆒些，所以實際㆖
可用 Address Space 大概只㈲ 1.2-1.8 GB
64-Bit CPU 好處是可以㈲更多的
Register、更大的 Address Space、更大的
Heap、與更多的 Thread
缺點是 Performance 會差㆒點、Memory 會
多用㆒點

Hotspot VM FAQ
. http://www.oracle.com/technetwork/java/hotspotfaq-138619.html

A 64-bit implementation means that many
of the built-in Java types are doubled in
size from 32 to 64. This is not true.
All existing 100% pure Java programs
would continue running just as they do
under a 32-bit VM.
There's no public API that allows you to
distinguish between 32 and 64-bit
operation.
However, if you'd like to write code which is
platform specific (shame on you), …
相關場次 Sunny Chan - 關於 Java 7 Hotspot VM 你需要知道的㈤個新功能

64-Bit JVM Optimization
. 32-Bit Speed + 64-Bit Space

想法：
64-Bit Address 必須 8-Byte Alignment，
Performance 才會好
因為 Address 必須能被 8 整除，所以最後
3 個 Bit ㆒定是 0
利用沒用到的 Higher-Order Bits 與
Lower-Order Zero Bits 來做文章
好處：
64-Bit Execution
32-Bit Pointer Length (Address >> 3)
32 GB Heap Size (4 GB ×23 )

64-Bit JVM Optimization
. 32-Bit Speed + 64-Bit Space

類似的技術，不同的㈴稱：
Sun
Compressed Oops
(Oops = Ordinary Object Pointers)
(JDK6u23 之後預設)
IBM
Pointer Compression/Compressed Refs
Oracle/BEA
JRockit -XX Command-line Options

. 到底現在是 32-Bit 還是 64-Bit 環境
透過 System Property 檢查：
os.arch

sun.arch.data.model

透過 Java Interpreter 判斷：
配置比 4 GB 還大的空間 (java -Xms4g –Xmx4g)
執行 java -d32 -version 與 java -d64 -version 指令
執行 java -version 指令 (JDK 1.6 以後)
執行 file /usr/bin/java 指令 (Unix 環境)

到底現在是 32-Bit 還是 64-Bit 環境
. System Property

1 public class OSArchitecture
2 {
3 public static void main(String[] args)
4 {
5 System.out.print("os.arch=");
6 System.out.println(System.getProperty("os.arch"));
7 System.out.print("sun.arch.data.model=");
8 System.out.println(System.getProperty("sun.arch.data.model"));
9 }
10 }

到底現在是 32-Bit 還是 64-Bit 環境
. Java Interpreter

內建 Multi-Core/Thread 支援能力
. J2SE 1.0 - J2SE 1.4

提供 Thread 與 Runnable，支援 synchronized、wait、notify
大多數 JVM 的 Thread 都是 OS Thread
透過 Thread 類別使用的話，OS 就可以將
Workload 分散到各個 Core 去執行
不過當 JVM 的 Heap Size 變大，同時可以
產生的 Thread 總數就會變少
Immutable Object 應該宣告為 final
Mutable Object 應該宣告為 volatile，並且透
過其他的 Concurrency 機制加以保護

JSR 166: Concurrency Utilities
. Java SE 5 - Java SE 7

Doug Lea 是 Spec Lead，也是
Concurrent Programming in Java 作者
Java SE 5 加入 java.util.concurrent 套件，開始
提供 Multi-Core 支援
從 Threading/Hardware Level 提供 API 導
入 Parallelism
Java SE 7 加入 Fork/Join Framework
JDK7u4 以後的 G1 Garbage Collector 也
充分利用到 Multi-Core 架構的㈵性
相關場次何永琳 -
多核心軟體開發，Actor Model 介紹與實作

. 目前可用的 Processor 數目
：
Runtime.getRuntime().availableProcessors()

Java Application 可以用的 CPU 數目
如果 CPU 支援 Hyper-Threading，傳回值會
是 CPU 數目的兩倍
1 public class AvailableProcessors
2 {
4 {
5 System.out.println(Runtime.getRuntime().availableProcessors());
6 }
7 }

. 建議的 Thread Pool Size
Thread Pool 大小：
根據 Runtime.getRuntime().availableProcessors() 調整
對 CPU-Bound Application 來說，跟 CPU
數目㆒樣大的 Thread Pool，通常會㈲比較
好的 Performance
1 ExecutorService e =
2 Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
3 e.execute(new Runnable() {
4 public void run() {
5 // do one task
6 }
7 });

Ateji PX
. Java Parallel Programming Made Simple - http://www.ateji.com/px/

由 Patrick Viry 設計，2010 年 7 ㈪發表
Thread 基本㆖是個 Hardware Concept，不
是個 Natural Construct，不夠直覺
從 Language Level 導入 Parallelism
支援 Shared/Distributed Memory 架構

JOMP
. http://www2.epcc.ed.ac.uk/computing/research_activities/jomp/

OpenMP-Like Shared-Memory Parallel
Programming in Java
提供㆒堆 Compiler Directive、Library
Routine、與 System Property
透過 JOMP Preprocessor 將 Source Code
處理成㆒般的 Java
Performance 跟 Hand-Coded Multi-Thread
版本差不多
1 //omp parallel shared(a,b,n)
2 {
3 //omp for
4 for (i = 1 ; i < n ; i++) {
5 b[i] = (a[i] + a[i-1]) * 0.5;
6 }
7 }

. Apache Hadoop

㊜合 Hadoop 的使用情境：
具備幾千個 Node 的 Scalability
把軟硬體發生 Failure 當成家常便飯
要處理的檔案不多，但是每個都很大
可能只會㊢入㆒次，但是會讀取很多次

. Hadoop 的理論基礎
Hadoop 主要依循 Google 發表的㆔篇論文：
2003 年 SOSP 會議
The Google File System
2004 年 OSDI 會議
MapReduce: Simplified Data Processing on
Large Cluster
2006 年 OSDI 會議
Bigtable: A Distributed Storage System for
Structured Data

Hadoop 的版本
. http://hadoop.apache.org/common/releases.html

0.20 之前：很舊的版本，目前不建議使用
0.20.X：與 0.20 之前的版本不相容
0.20.20X.Y：Legacy Stable 版
最新是 Security Branch 0.20.205.0 版本
1.0.X：Current Stable
0.23.X：Current Alpha，㈲ MapReduce 2
1.1.X：Current Beta
0.22.X：㈲新 Feature，沒㈲ Security 支援

. Hadoop 1.0.X 系列
以 0.20.205.0 這個 0.20 系列 Security
Branch 版本為基礎，除了 Bug Fix 之外，還
㈲ Performance Enhancement
號稱是 Enterprise-Ready 版本
提供 Hadoop 核心功能 MapReduce 與
HDFS，並且整合 HBase，再加㆖ Kerberos
認證機制
為 HDFS 提供 RESTful API (WebHDFS)

. MapReduce ㈾料處理過程(選舉開票)
Input Splitting
㆗選會 -> 各開票所 -> ㆒堆票櫃 -> ㆒堆選票
想辦法把輸入㈾料變成㆒堆 (k1, v1)
Mapper (第 n 張, 選 XXX) -> (XXX, 1)
每個 (k1, v1) 分別傳給 Mapper 處理，輸出
(k2, v2)

Shuffling n 張 (XXX, 1) -> (XXX, n)
相同 k2 的 List(k2, v2) 可以先收集在㆒起，甚㉃
轉換成 (k2, sum(v2))
Reducer ㆒堆 (XXX, n) -> (XXX, m)
把各個 Reducer 的輸出彙總，就是最後的輸
出結果 List(k3, v3)

Hello MapReduce - Word Count
. 圖片來源：http://www.rabidgremlin.com/data20/

Mapper 階段
. ㆒個㆒個 (k1, v1) -> ㆒堆㆒堆 (k2, v2)

1 // (k1, v1) -> (k2, v2)
2 public class WordCountMapper extends Mapper<Object, Text, Text, IntWritable>
3 {
4 private final static IntWritable one = new IntWritable(1);
5 private Text word = new Text();
6
7 // k1=key (檔案索引值，沒用到), v1=value (那㆒行的內容)
8 public void map(Object key, Text value, Context context)
9 throws IOException, InterruptedException
10 {
11 StringTokenizer itr = new StringTokenizer(value.toString());
12 while (itr.hasMoreTokens())
13 {
14 word.set(itr.nextToken());
15 // k2=word (單字), v2=1 (出現㆒次)
16 context.write(word, one);
17 }
18 }
19 }

Reducer 階段
. ㆒堆相同 k2 的 (k2, v2) -> ㆒個 (k3, v3)

1 public class WordCountReducer
2 // (k2, v2) -> (k3, v3)
3 extends Reducer<Text, IntWritable, Text, IntWritable>
4 {
5 private IntWritable result = new IntWritable();
6
7 // k2=key (單字), v2=values (相同單字出現多次)
8 public void reduce(Text key, Iterable<IntWritable> values,
9 Context context) throws IOException, InterruptedException
10 {
11 int sum = 0;
12 for (IntWritable val : values)
13 sum += val.get();
14 result.set(sum);
15 // k3=key (單字), v3=result (出現次數總和)
16 context.write(key, result);
17 }
18 }

. MapReduce 的 Job/Driver
是 Client 端的㆒個 Unit of Work
包括輸入㈾料、MapReduce 程式，以及組
態設定㆔個主要部分
Hadoop 會把 Job 再切割成 Task，也就是
Map Task 與 Reduce Task
Job 執行的過程，會由 JobTracker 與
TaskTracker 兩種 Node 來控制
JobTracker 會把各個 Task 排程給多個
TaskTracker 執行，並且負責㈿調
各個 TaskTracker 會執行相對應的 Task，
並且將進度回報給 JobTracker

MapReduce 運作方式
. 圖片來源：Hadoop: The Definitive Guide 3rd Ed.

. 透過 Job 整合㆒切
1 public class WordCountJob
2 {
3 public static void main(String[] args) throws Exception
4 {
5 Configuration conf = new Configuration();
6 Job job = new Job(conf, "word count");
7
8 job.setJarByClass(com.wordcount.WordCountJob.class);
9 job.setMapperClass(com.wordcount.WordCountMapper.class);
10 job.setReducerClass(com.wordcount.WordCountReducer.class);
11
12 FileInputFormat.addInputPath(job, new Path("file", "", "input"));
13 FileOutputFormat.setOutputPath(job, new Path("file", "", "output"));
14
15 job.setOutputKeyClass(Text.class);
16 job.setOutputValueClass(IntWritable.class);
17
18 System.exit(job.waitForCompletion(true) ? 0 : 1);
19 }
20 }

Spring Data
. http://www.springsource.org/spring-data

支援 JPA、MongoDB、Hadoop、…

. Spring Data 整合 Hadoop
1 <?xml version="1.0" encoding="UTF-8"?>
2 <beans
3 xmlns="http://www.springframework.org/schema/beans"
4 xmlns:context="http://www.springframework.org/schema/context"
5 xmlns:hdp="http://www.springframework.org/schema/hadoop"
6 xmlns:p="http://www.springframework.org/schema/p"
7 xsi:schemaLocation="...">
8
9 <hdp : configuration />
10
11 <hdp : job id="wordcountJob"
12 validate-paths="false"
13 input-path="/home/user/hadoop/wordcount/input"
14 output-path="/home/user/hadoop/wordcount/sdoutput"
15 mapper="com.javatwo.hadoop.WordCountMapper"
16 reducer="com.javatwo.hadoop.WordCountReducer" />
17
18 <bean id="jobRunner"
19 class="org.springframework.data.hadoop.mapreduce.JobRunner"
20 p:jobs-ref="wordcountJob" />
21
22 </beans>

. Spring Data 整合 Hadoop
1 public class WordCountJobSpringData
2 {
4 {
5 AbstractApplicationContext ctx =
6 new ClassPathXmlApplicationContext("springdata.xml");
7 ctx.registerShutdownHook();
8 }
9 }

. Multitenancy (多租戶)
Tenant 本意是租他㆟農㆞耕種的佃農
在 Cloud Computing 裡面指的是使用某個
Service/Application 的各個不同 Client
Multitenancy 指的是㆒個 Application 支不
支援同時服務多個 Tenant
因為怎麼將各個 Tenant 的㈾料隔離保護，
是 Multitenancy 要解決的最大問題
Application Service Provider 又復活了！

Hibernate ORM
. Hibernate->Hibernate Core->Hibernate Distribution->Hibernate ORM

2011 年 12 ㈪剛發表 4.0 正式版
基本要求是 JDK 1.6 與 JDBC 4.0
主要是改變 Hibernate 核心目前的㆒些作法
支援 Multitenant Database
改用 JBoss Logging
Cache 機制也不同
為支援 OSGi 做些準備
目前最新是 2012 年 7 ㈪發佈的
4.1.5.SP1 版

Hibernate ORM - Multitenancy
. DATABASE - Separate Database

圖片來源：http://msdn.microsoft.com/en-us/library/aa479086.aspx

. SCHEMA - Shared Database, Separate Schema


. DISCRIMINATOR - Shared Database, Shared Schema


. http://docs.jboss.org/hibernate/orm/4.1/devguide/en-US/html/ch16.html

透過 hibernate.multiTenancy 區分㆔種不同作法：
NONE

不使用 Multitenancy 功能
DATABASE

Separate Database
SCHEMA

Shared Database，Separate Schema
DISCRIMINATOR

Shared Database，Shared Schema
Discriminator Data (預計 5.0 才支援)

. MultiTenantConnectionProvider 類別
1 public class MySQLMultiTenantConnectionProvider
2 extends AbstractMultiTenantConnectionProvider
3 {
4 @Override
5 protected ConnectionProvider getAnyConnectionProvider()
6 {
7 return MySQLConnectionProviderBuilder.MONSTER_CONNECTION_PROVIDER;
8 }
9
10 @Override
11 protected ConnectionProvider selectConnectionProvider(String tenantId)
12 {
13 if (tenantId.equals("monster"))
14 return MySQLConnectionProviderBuilder.MONSTER_CONNECTION_PROVIDER;
15 else if (tenantId.equals("supreme"))
16 return MySQLConnectionProviderBuilder.SUPREME_CONNECTION_PROVIDER;
17 throw new HibernateException("Unknown tenantId");
18 }
19 }

. 調整 Configuration 檔案
1 <hibernate-configuration>
2 <session-factory>
3 <property name="hibernate.connection.driver_class">
4 com.mysql.jdbc.Driver
5 </property>
6 <property name="hibernate.connection.username">root</property>
7 <property name="hibernate.connection.password">password</property>
8 <property name="hibernate.dialect">
9 org.hibernate.dialect.MySQL5InnoDBDialect
10 </property>
11
12 <property name="hibernate.multiTenancy">DATABASE</property>
13 
14 <property name="hibernate.multi_tenant_connection_provider">
15 com.javatwo.helper.MySQLMultiTenantConnectionProvider
16 </property>
17 
18
19 <mapping resource="com/javatwo/model/Reader.hbm.xml" />
20 </session-factory>
21 </hibernate-configuration>

. 修改 Session 取得的方式
1 public class MultiTenancy
2 {
3 public static void insertReader(String tenantId)
4 {
5 Reader reader = new Reader(tenantId, tenantId, 1, tenantId+"@iii");
6 SessionFactory factory = HibernateUtil.getSessionFactory();
7 Session session =
8 factory.withOptions().tenantIdentifier(tenantId).openSession();
9 try
10 {
11 session.beginTransaction();
12 session.save(reader);
13 session.getTransaction().commit();
14 }
15 catch (Exception ex) { session.getTransaction().rollback(); }
16 finally { session.close(); }
17 }
18
20 {
21 insertReader("monster"); insertReader("supreme");
22 }
23 }

. 部署時會建立多個 Connection Pool

. 執行時會分別㊢入不同的 Database

. Java Persistence API
JSR 317 - Java Persistence 2.0：
RI：EclipseLink 2.3，GlassFish 3.1.1 內建
可以透過 Tenant Discriminator Column 支援
Shared Multitenant Table
JSR 338 - Java Persistence 2.1：
規格還沒定案，目前是 Early Draft Review
RI：EclipseLink 2.4，2012 年 6 ㈪跟
Eclipse Juno (4.2) ㆒起推出
預計支援 Tenant Isolation 與 NoSQL

. RDBMS 的問題
Schema 調整：
Twitter 為了調整㈾料欄位，光是執行 Alter
Table 指令來改變 Schema 就跑了㆒個禮拜
Join 的效能：
以前 Storage 貴，所以㈾料要想辦法
Normalization，再透過 Join 串起來，可是
問題是 Join 執行起來很慢，當㈾料分散在
數台機器的時候更是如此
Consistency 考量：
網路銀行要求 Guaranteed Consistency，甚
㉃是 Immediate Consistency
㈳群網站只要 Eventual Consistency 就可以

. NoSQL 出現
Wikipedia：
It does not use SQL as its query language.
It may not give full ACID guarantees.
It has a distributed, fault-tolerant
architecture.
BigData Diary：
NoSQL is a movement promoting a loosely
defined class of non-relational data stores.
These data stores may not require fixed
table schemas, usually avoid join
operations and typically scale horizontally.

. 大家使用 NoSQL 的原因
NoSQL 的㊝點：
Raw Performance
Transparent Scalability
大家不用 RDBMS 並不是沒辦法讓它變快，而是
沒辦法弄個 Cluster 又可以做到 Sharding。
不過 SQL 也不會被消滅，而是會與 NoSQL 並
存，因為各㈲各的用途。

. 常見的 NoSQL Database
常見的 NoSQL 分類與產品：
Key/Value：HBase、Dynamo、Cassandra
In-Memory：memcached
Document：CouchDB、MongoDB
Graph：Neo4j

相關場次
顏佑霖 - 淺談 Apache Cassandra

. Spring Data
希望能夠透過 Spring 整合重要的㈾料存取技術：

透過 Spring 存取 RDBMS、NoSQL、與
MapReduce Framework
基本㆖只是個技術統稱，每個 RDBMS/
NoSQL 的存取方式都不盡相同

相關場次
葉政達 -
Spring MVC 與 RequireJS & Backbone.js &
Spring Data JPA 的整合應用

. MongoDB Java Driver ㈾料存取方式
1 public class MongoDB
2 {
4 {
5 Mongo mongo = new Mongo("localhost", 27017);
6 DB db = mongo.getDB("JavaTwo");
7 DBCollection collection = db.getCollection("zips");
8
9 BasicDBObject doc = new BasicDBObject();
10 doc.put("zip", "90210");
11 doc = (BasicDBObject) collection.findOne(doc);
12
13 Gson gson = new Gson();
14 City city = gson.fromJson(doc.toString(), City.class);
15 Location loc = city.getLoc();
16
17 System.out.println("City = " + city.getCity());
18 System.out.println("Location = " + loc.getY() + ", " + loc.getX());
19 }
20 }

. Spring Data MongoDB ㈾料存取方式
1 public class SpringData
2 {
4 {
5 Mongo mongo = new Mongo("localhost", 27017);
6 MongoOperations mongoOps = new MongoTemplate(mongo, "JavaTwo");
7
8 Query query = new Query(Criteria.where("zip").is("90210"));
9 System.out.println("Found = " + mongoOps.count(query, "zips"));
10
11 City city = mongoOps.findOne(query, City.class, "zips");
12 Location loc = city.getLoc();
13
14 System.out.println("City = " + city.getCity());
15 System.out.println("Location = " + loc.getY() + ", " + loc.getX());
16 }
17 }

Spring Social
. http://www.springsource.org/spring-social

目前支援 Facebook、Twitter、與 LinkedIn

. Spring Social
方便整合各種 SaaS
(目前不支援的部分由 Commmunity 提供)
其實真正比較複雜的是各種 Authentication
與 Authorization 方式
每個 SaaS Provider 都必須實作 OAuth 1.0
或 1.0a 或 2.0
㈻習時可以先從 Twitter 這種提供 Public
Data 的 Service ㆘手

. Spring Social Twitter ㈾料存取方式
1 public class Timeline
2 {
4 {
5 TwitterTemplate twitterTemplate = new TwitterTemplate();
6 TimelineOperations timelineOps = twitterTemplate.timelineOperations();
7 List<Tweet> tweetList = timelineOps.getUserTimeline("JavaTWO2011");
8
9 for (Tweet tweet: tweetList)
10 {
11 System.out.println("Time: " + tweet.getCreatedAt());
12 System.out.println("From: " + tweet.getFromUser());
13 System.out.println("Text: " + tweet.getText());
14 System.out.println();
15 }
16 }
17 }

. Spring Social Twitter ㈾料存取方式

. 不要重複發明輪子
現㈲工具：
Hive：Data Warehouse
Pig：Data Analysis
R：Statistics and Graphics
Nutch/Lucene/Solr：Search Engine
Mahout：Machine Learning
…

相關場次
王建興 -
「沙㆗撈㈮術」﹣談開放原始碼的推薦系統

Hadoop World 2011
. http://www.theregister.co.uk/2011/11/09/hadoop_kernel_distro/

背景㈾料：
Cloudera 主辦 (今年改由 O'Reilly 主辦)
1,400 ㆟參加，來㉂ 580 家公司
統計㈾料排除 Facebook、Google、
Yahoo!、eBay 這些規模比較大的公司

. Hadoop World 2011
Hadoop Node 數目：
2011：120 (2010 是 66)
40%：10-100
52%：100-1000
Hadoop Data 數量：
2011：㆒共 202 PB (2010 的 3.4 倍)
76：100 TB - 1 PB
74：大於 1 PB
最大㈲ 20 PB

InformationWeek 2012/01
. State of Database Technology 報告

760 份回應：
NoSQL
60% 沒聽過或沒興趣
36% 在研究
04% ㈲實務經驗
使用 Off-Premises 或 Cloud-Hosted Service
做為主要的 Transactional Database
55% 沒㈲計畫
29% 在研究
12% 目前在使用
(5% Pilot，5% ㉂行管理，2% Cloud 管理)

. Summary
NoSQL/Big Data 是趨勢，技術變化也快
JVM 越來越能發揮底層軟硬體的效能
Hadoop/MapReduce 是個必須支援的平台
Multitenancy 可以解決 RDBMS ㆒部分問題
㆒致的 NoSQL 處理方式目前似乎沒㈲共識
Spring Data/Social 是個蠻容易㈻習的框架
希望這㆒切都可以平順㆞轉換到 Java EE 7

㆟腦的記憶容量 - 2.5 PB
. http://www.scientificamerican.com/article.cfm?id=what-is-the-memory-capacity

Johnny Mnemonic (捍衛機密)
. http://www.imdb.com/title/tt0113481/

精采片段：http://youtu.be/oVNUwbWDJbg

㈾策會教研所㈾訊技術訓練㆗心
. http://www.iiiedu.org.tw/taipei

Big Java, Big Data

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Big Java, Big Data

Ähnlich wie Big Java, Big Data (20)

Big Java, Big Data