4. . Outline
.
1 Introduction
.
2 Big Java
.
3 Big Data
.
4 Tool
.
5 Summary
5. .
1 Introduction
.
2 Big Java
Virtual Machine
Multi-Core/Multi-Thread Support
Language Extension
Hadoop Platform
.
3 Big Data
RDBMS Data
NoSQL Data
Social Network Data
.
4 Tool
.
5 Summary
6. Big Data 是個熱門趨勢
. http://whatsthebigdata.com/2012/06/06/a-very-short-history-of-big-data/
最近 12 個㈪被提到的次數,成長了 12 倍!
9. . Big Data 的㈾料來源
除了傳統 RDBMS 的結構化㈾料之外,還㈲來㉂
於 Log、影音圖片、與㈳群網站使用者製造儲存
的非結構化㈾料。
10. Big Data 的㈵性:㆔個 V
. http://www.ibm.com/developerworks/data/library/dmmag/DMMag_2011_Issue2/BigData/
11. A Very Short History of Big Data
. http://whatsthebigdata.com/2012/06/06/a-very-short-history-of-big-data/
1941 Information Explosion ㈴詞首次出現
1944 美國大㈻圖書館每 16 年館藏會加
倍。如果成真,2040 年 Yale 大㈻會
㈲ 2 × 108 冊藏書,書櫃會㈲ 6,000
英哩長,編目館員需要 6,000 ㈴
2005 Data is the next Intel inside.
SQL is the new HTML. (Tim O'Reilly)
2007 IDC 預估 2006 年㈲ 161 EB 的數
位㈾料,2010 年會成長到 988
EB。結果根據 2010 與 2011 年的
調查報告,2010 年㈲ 1,200
EB,2011 年達到 1,800 EB
13. . Big Data 應用
㈼控早產兒病房心跳、血壓、㈺吸等㈾料,
即時採取治療
分析路況,讓路㆟根據感測器蒐集最新路
況,選擇㈲效率的行車路線
將犯罪㈾料、反恐㈼控與交通管制整合為犯
罪打擊系統,即時分析犯罪模式以決定最佳
警力派遣規劃
分析信用卡每㆒筆交易㈾料,找出可疑的交
易,來降低信用卡各種詐騙盜領事件的損失
…
14. Big Data 成功案例 - IBM
. http://www-01.ibm.com/software/tw/data/bigdata/cases.html
15. Big Data 成功案例 - Microsoft
. http://www.windowsazure.com/en-us/home/case-studies/
16. . 解決 Big Data 問題的想法
Big Java - 充分發揮軟硬體的㈵性:
Virtual Machine
Multi-Core/Multi-Thread Support
Language Extension
Hadoop Platform
Big Data - 方便處理更多元的㈾料:
RDBMS Multi-Tenancy Support
NoSQL Database Support
Framework for RDBMS/NoSQL/Social
17. .
1 Introduction
.
2 Big Java
Virtual Machine
Multi-Core/Multi-Thread Support
Language Extension
Hadoop Platform
.
3 Big Data
RDBMS Data
NoSQL Data
Social Network Data
.
4 Tool
.
5 Summary
18. .
1 Introduction
.
2 Big Java
Virtual Machine
Multi-Core/Multi-Thread Support
Language Extension
Hadoop Platform
.
3 Big Data
RDBMS Data
NoSQL Data
Social Network Data
.
4 Tool
.
5 Summary
19. . 32-Bit vs. 64-Bit CPU
32-Bit CPU 的 Address Space 是 4 GB
以 Windows 為例,因為 Kernel Mode 佔掉
2 GB,所以 User Mode 只剩 2 GB,加㆖
JVM 本身與 Heap 也佔掉㆒些,所以實際㆖
可用 Address Space 大概只㈲ 1.2-1.8 GB
64-Bit CPU 好處是可以㈲更多的
Register、更大的 Address Space、更大的
Heap、與更多的 Thread
缺點是 Performance 會差㆒點、Memory 會
多用㆒點
20. Hotspot VM FAQ
. http://www.oracle.com/technetwork/java/hotspotfaq-138619.html
A 64-bit implementation means that many
of the built-in Java types are doubled in
size from 32 to 64. This is not true.
All existing 100% pure Java programs
would continue running just as they do
under a 32-bit VM.
There's no public API that allows you to
distinguish between 32 and 64-bit
operation.
However, if you'd like to write code which is
platform specific (shame on you), …
相關場次 Sunny Chan - 關於 Java 7 Hotspot VM 你需要知道的㈤個新功能
26. .
1 Introduction
.
2 Big Java
Virtual Machine
Multi-Core/Multi-Thread Support
Language Extension
Hadoop Platform
.
3 Big Data
RDBMS Data
NoSQL Data
Social Network Data
.
4 Tool
.
5 Summary
28. JSR 166: Concurrency Utilities
. Java SE 5 - Java SE 7
Doug Lea 是 Spec Lead,也是
Concurrent Programming in Java 作者
Java SE 5 加入 java.util.concurrent 套件,開始
提供 Multi-Core 支援
從 Threading/Hardware Level 提供 API 導
入 Parallelism
Java SE 7 加入 Fork/Join Framework
JDK7u4 以後的 G1 Garbage Collector 也
充分利用到 Multi-Core 架構的㈵性
相關場次 何永琳 -
多核心軟體開發,Actor Model 介紹與實作
29. . 目前可用的 Processor 數目
:
Runtime.getRuntime().availableProcessors()
Java Application 可以用的 CPU 數目
如果 CPU 支援 Hyper-Threading,傳回值會
是 CPU 數目的兩倍
1 public class AvailableProcessors
2 {
3 public static void main(String[] args)
4 {
5 System.out.println(Runtime.getRuntime().availableProcessors());
6 }
7 }
30. . 建議的 Thread Pool Size
Thread Pool 大小:
根據 Runtime.getRuntime().availableProcessors() 調整
對 CPU-Bound Application 來說,跟 CPU
數目㆒樣大的 Thread Pool,通常會㈲比較
好的 Performance
1 ExecutorService e =
2 Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
3 e.execute(new Runnable() {
4 public void run() {
5 // do one task
6 }
7 });
31. .
1 Introduction
.
2 Big Java
Virtual Machine
Multi-Core/Multi-Thread Support
Language Extension
Hadoop Platform
.
3 Big Data
RDBMS Data
NoSQL Data
Social Network Data
.
4 Tool
.
5 Summary
33. JOMP
. http://www2.epcc.ed.ac.uk/computing/research_activities/jomp/
OpenMP-Like Shared-Memory Parallel
Programming in Java
提供㆒堆 Compiler Directive、Library
Routine、與 System Property
透過 JOMP Preprocessor 將 Source Code
處理成㆒般的 Java
Performance 跟 Hand-Coded Multi-Thread
版本差不多
1 //omp parallel shared(a,b,n)
2 {
3 //omp for
4 for (i = 1 ; i < n ; i++) {
5 b[i] = (a[i] + a[i-1]) * 0.5;
6 }
7 }
34. .
1 Introduction
.
2 Big Java
Virtual Machine
Multi-Core/Multi-Thread Support
Language Extension
Hadoop Platform
.
3 Big Data
RDBMS Data
NoSQL Data
Social Network Data
.
4 Tool
.
5 Summary
36. . Hadoop 的理論基礎
Hadoop 主要依循 Google 發表的㆔篇論文:
2003 年 SOSP 會議
The Google File System
2004 年 OSDI 會議
MapReduce: Simplified Data Processing on
Large Cluster
2006 年 OSDI 會議
Bigtable: A Distributed Storage System for
Structured Data
50. .
1 Introduction
.
2 Big Java
Virtual Machine
Multi-Core/Multi-Thread Support
Language Extension
Hadoop Platform
.
3 Big Data
RDBMS Data
NoSQL Data
Social Network Data
.
4 Tool
.
5 Summary
51. .
1 Introduction
.
2 Big Java
Virtual Machine
Multi-Core/Multi-Thread Support
Language Extension
Hadoop Platform
.
3 Big Data
RDBMS Data
NoSQL Data
Social Network Data
.
4 Tool
.
5 Summary
64. .
1 Introduction
.
2 Big Java
Virtual Machine
Multi-Core/Multi-Thread Support
Language Extension
Hadoop Platform
.
3 Big Data
RDBMS Data
NoSQL Data
Social Network Data
.
4 Tool
.
5 Summary
66. . NoSQL 出現
Wikipedia:
It does not use SQL as its query language.
It may not give full ACID guarantees.
It has a distributed, fault-tolerant
architecture.
BigData Diary:
NoSQL is a movement promoting a loosely
defined class of non-relational data stores.
These data stores may not require fixed
table schemas, usually avoid join
operations and typically scale horizontally.
69. Spring Data
. http://www.springsource.org/spring-data
支援 JPA、MongoDB、Hadoop、…
70. . Spring Data
希望能夠透過 Spring 整合重要的㈾料存取技術:
透過 Spring 存取 RDBMS、NoSQL、與
MapReduce Framework
基本㆖只是個技術統稱,每個 RDBMS/
NoSQL 的存取方式都不盡相同
相關場次
葉政達 -
Spring MVC 與 RequireJS & Backbone.js &
Spring Data JPA 的整合應用
71. . MongoDB Java Driver ㈾料存取方式
1 public class MongoDB
2 {
3 public static void main(String[] args) throws Exception
4 {
5 Mongo mongo = new Mongo("localhost", 27017);
6 DB db = mongo.getDB("JavaTwo");
7 DBCollection collection = db.getCollection("zips");
8
9 BasicDBObject doc = new BasicDBObject();
10 doc.put("zip", "90210");
11 doc = (BasicDBObject) collection.findOne(doc);
12
13 Gson gson = new Gson();
14 City city = gson.fromJson(doc.toString(), City.class);
15 Location loc = city.getLoc();
16
17 System.out.println("City = " + city.getCity());
18 System.out.println("Location = " + loc.getY() + ", " + loc.getX());
19 }
20 }
72. . Spring Data MongoDB ㈾料存取方式
1 public class SpringData
2 {
3 public static void main(String[] args) throws Exception
4 {
5 Mongo mongo = new Mongo("localhost", 27017);
6 MongoOperations mongoOps = new MongoTemplate(mongo, "JavaTwo");
7
8 Query query = new Query(Criteria.where("zip").is("90210"));
9 System.out.println("Found = " + mongoOps.count(query, "zips"));
10
11 City city = mongoOps.findOne(query, City.class, "zips");
12 Location loc = city.getLoc();
13
14 System.out.println("City = " + city.getCity());
15 System.out.println("Location = " + loc.getY() + ", " + loc.getX());
16 }
17 }
73. .
1 Introduction
.
2 Big Java
Virtual Machine
Multi-Core/Multi-Thread Support
Language Extension
Hadoop Platform
.
3 Big Data
RDBMS Data
NoSQL Data
Social Network Data
.
4 Tool
.
5 Summary
74. Spring Social
. http://www.springsource.org/spring-social
目前支援 Facebook、Twitter、與 LinkedIn
75. . Spring Social
方便整合各種 SaaS
(目前不支援的部分由 Commmunity 提供)
其實真正比較複雜的是各種 Authentication
與 Authorization 方式
每個 SaaS Provider 都必須實作 OAuth 1.0
或 1.0a 或 2.0
㈻習時可以先從 Twitter 這種提供 Public
Data 的 Service ㆘手
76. . Spring Social Twitter ㈾料存取方式
1 public class Timeline
2 {
3 public static void main(String[] args)
4 {
5 TwitterTemplate twitterTemplate = new TwitterTemplate();
6 TimelineOperations timelineOps = twitterTemplate.timelineOperations();
7 List<Tweet> tweetList = timelineOps.getUserTimeline("JavaTWO2011");
8
9 for (Tweet tweet: tweetList)
10 {
11 System.out.println("Time: " + tweet.getCreatedAt());
12 System.out.println("From: " + tweet.getFromUser());
13 System.out.println("Text: " + tweet.getText());
14 System.out.println();
15 }
16 }
17 }
78. .
1 Introduction
.
2 Big Java
Virtual Machine
Multi-Core/Multi-Thread Support
Language Extension
Hadoop Platform
.
3 Big Data
RDBMS Data
NoSQL Data
Social Network Data
.
4 Tool
.
5 Summary
80. .
1 Introduction
.
2 Big Java
Virtual Machine
Multi-Core/Multi-Thread Support
Language Extension
Hadoop Platform
.
3 Big Data
RDBMS Data
NoSQL Data
Social Network Data
.
4 Tool
.
5 Summary