SlideShare ist ein Scribd-Unternehmen logo
1 von 101
Downloaden Sie, um offline zu lesen
Cache coherence for
GPU Architectures

Inderpreet Singh, Arrvindh Shriraman, Wilson W. L. Fung, Mike O'Connor, Tor M. Aamodt, Cache Coherence for GPU
Architectures, In proceedings of the 19th IEEE International Symposium on High-Performance Computer Architecture
1
(HPCA-19)
Agenda

2
Agenda
Challenges with CPU
coherence on GPUs.

2
Agenda
Challenges with CPU
coherence on GPUs.
Temporal Coherence:
Rethinking coherence for GPUs

2
Agenda
Challenges with CPU
coherence on GPUs.
Temporal Coherence:
Rethinking coherence for GPUs
What is the cost of
providing coherence?
2
Why provide coherence?
1. Inter-workgroup
communication

2. Atomic operations

Characterizing and Evaluating a Key-value Store
Application on Heterogeneous CPU-GPU Systems, ISPASS 2012

3. Task queues

3
Cache Coherence
Programmer

P P P P
Shared
Memory
Appearance: One global copy of every location

4
Cache Coherence
Multicores

GPUs

P P P P
L1 L1 L1 L1
L2
L2

L1 L1 L1 L1
Memory

...

Memory

5
Cache Coherence
Heterogeneous Systems

P P P P
L1 L1 L1 L1
L2
L2
...

L1 L1 L1 L1
...

Memory

How to provide coherence?
6
Challenges

7
Challenges with coherence

L1

L1
Shared L2

8
Challenges with coherence

L1

L1
Shared L2

8
Challenges with coherence

L1

L1

1

2

Shared L2

8
Challenges with coherence

L1

L1

1

3

Shared L2

8

2
Challenge 1: Traffic

L1

L1
Shared L2

9
Challenge 1: Traffic

L1

L1
Shared L2

9

L1
Challenge 1: Traffic

L1

L1
Shared L2

9

L1
Challenge 1: Traffic

L1

L1

L1

Shared L2

30% more traffic than current GPUs
9
Challenge 2: Buffer Overhead

L1

L1
Shared L2

10

L1
Challenge 2: Buffer Overhead

L1

L1

Protocol
Buffer

Shared L2

10

L1
Challenge 2: Buffer Overhead

L1

L1

Protocol
Buffer

Shared L2

10

L1
Challenge 2: Buffer Overhead

L1

L1

Protocol
Buffer

Shared L2

10

L1
Challenge 2: Buffer Overhead

L1

L1

L1

Protocol
Buffer

Shared L2
Coherence protocol buffers require 28% of total L2
10
Challenge 3: Complexity

L1

L1

1

Shared L2
Incoherent
protocol
4 states
11
Challenge 3: Complexity

L1

L1
3

1

2

Shared L2
Incoherent
protocol
4 states

Coherent
protocol
16 states
11
Coherence Overhead.
L1

Coherence messages
1. Traffic transferring
2. Area overhead
3. Protocol complexity

How to achieve coherence without messages?

12
TEMPORAL COHERENCE
13
Temporal Coherence
Time-based Approach
- trigger protocol events on timer alerts

L1

L1
Shared L2
14
Temporal Coherence
Time-based Approach
- trigger protocol events on timer alerts

L1

L1
Shared L2
14
Temporal Coherence
Time-based Approach
- trigger protocol events on timer alerts

L1

L1
Shared L2
14
Temporal Coherence
Time-based Approach
- trigger protocol events on timer alerts

L1

L1
Shared L2
14
Temporal Coherence

L1

L1
Shared L2
15
Temporal Coherence
Clock

1

L1

L1
Shared L2
15
Temporal Coherence
Clock

1
Load

L1

L1
Shared L2
15
Temporal Coherence
Clock

1
Load

Valid if
TIME  LT
!

L1

L1
Shared L2
15

LT
Temporal Coherence
Clock

1
Load

Valid if
TIME  LT
!

L1

L1
Shared L2
15

LT
Temporal Coherence
Clock

1
Load

Valid if
TIME  LT
!

L1

L1

LT

!

GT
Shared L2
15

Shared if
TIME  GT
Temporal Coherence

L1

L1

16
Temporal Coherence
TIME 0

L1

L1

16
Temporal Coherence
TIME 0
Load

L1

L1

16
Temporal Coherence
TIME 0
Load
!

L1

L1 20

16
Temporal Coherence
TIME 0
Load
!

L1

L1 20
!

20

Line shared
till 20

16
TIME 5

!

L1

L1 20

17

L1
TIME 5
Load
!

L1

L1 20

17

!

L1 25
TIME 5
Load
!

L1

L1 20

!

L1 25

!

25
Line shared
till 25
17
TIME

15
!

L1

L1 20
!

25

18

!

L1 25
TIME

15

Write
!

L1

L1 20
!

25

18

!

L1 25
TIME

20

Write
!

L1

L1 20
!

25

19

!

L1 25
TIME

20

Write
!

L1

L1 20
!

25

19

!

L1 25
TIME

25

Write

L1

!

L1

L1 25

!

25

20
TIME

25

Write

L1

!

L1

L1 25

!

25

20
TIME

25

Write

L1

!

L1

L1 25

!

25

20
TIME

25

Write

L1

!

L1

L1 25

!

25

20
Temporal Coherence
No coherence messages
All transactions are 2-hop
Protocol complexity minimal
Supports strong and weak
memory models
Enables optimized communication
(ask me later...)
21
How to set the block lifetime?
• Longer

= writes may stall

• Shorter

= may not exploit temporal locality

!

•

Lifetime predictor

at L2.

-Load to expired block (for temporal locality)
-Store to unexpired block (reduce write stalls)
-Eviction of unexpired block (reduce L2 eviction stalls)
22
Temporal Coherence (Weak)

Write
!

L1

L1 20
!

25
Shared L2
23

!

L1 25
Temporal Coherence (Weak)

Write
!

L1

L1 20
!

25
Shared L2
23

!

L1 25
Temporal Coherence (Weak)
Sensitive to misprediction
Write
!

L1

L1 20
!

25
Shared L2
23

!

L1 25
Temporal Coherence (Weak)
Sensitive to misprediction
Write

Resource ! stalls
L1

L1 20
!

25
Shared L2
23

!

L1 25
Temporal Coherence (Weak)
Sensitive to misprediction
Write

Resource ! stalls
L1

L1 20
!

!

L1 25

25
Hurts GPU applications
Shared L2
23
Temporal Coherence (Weak)
Sensitive to misprediction
Write

Resource ! stalls
L1

L1 20

!

L1 25

!

25
Hurts GPU applications
Shared L2

Goal : eliminate Write Stalls!
23
Temporal Coherence (Weak)
TIME

15
!

L1

OLD
L1
!

25

24

20

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

15

Write
!

L1

OLD
L1
!

25

24

20

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

15

Write
Fence
!

L1

OLD
L1
!

25

24

20

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

15

Write
Fence
......
!

L1

OLD
L1
!

25

24

20

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

20

Fence
!

L1

L1 20
!

25

25

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

20

Fence
!

L1

L1 20
!

25

25

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

20

Fence
......
!

L1

L1 20
!

25

25

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

25

Fence

L1

!

L1

L1 25

!

25

26
Temporal Coherence (Weak)
TIME
!

25

25

Fence

L1

!

L1

L1 25

!

25

26
Temporal Coherence (Weak)
TIME
!

25

25

Fence

L1

!

L1

L1 25

!

25

26
Temporal Coherence (Weak)
TIME

25

!

25
L1

!

L1

L1 25

!

25

26
Temporal Coherence (Weak)
No Access Stalls
Efficient GPU applications
Aggressive lifetime predictors
Supports weak memory models
27
28
Coherence Applications
• Lock-based

programs

-Barnes Hut
-Cloth Physics
-Place-and-Route

• Stencil

-Max-Flow Min-cut
-3D equation solver

• Load

balancing

-Octree Partitioning
29
Interconnect Traffic

GPU Applications (do not need coherence)

30
Interconnect Traffic

GPU Applications (do not need coherence)
2
1.5
1
0.5
0
30
Interconnect Traffic

GPU Applications (do not need coherence)
2

1
0.5
0

NO.CC

1.5

30
Interconnect Traffic

GPU Applications (do not need coherence)
2.3

2

0.5
0

MESI

1

NO.CC

1.5

30
Interconnect Traffic

GPU Applications (do not need coherence)
2.3

2

0

GPU-VI

0.5

MESI

1

NO.CC

1.5

30
Interconnect Traffic

GPU Applications (do not need coherence)
Wr-Through

2.3

2

.8x

0

GPU-VI

0.5

MESI

1

NO.CC

1.5

30
Interconnect Traffic

GPU Applications (do not need coherence)
Wr-Through

2.3

2

.8x

1.5

0

30

.3x

TC

GPU-VI

MESI

0.5

NO.CC

1

No msgs
Coherence Applications
• Lock-based

programs

-Barnes Hut
-Cloth Physics
-Place-and-Route

• Stencil

-Max-Flow Min-cut
-3D equation solver

• Load

balancing

-Octree Partitioning
31
Speedup
Coherence Applications

32
Speedup
Coherence Applications
1.75
1.5
1.25
1
0.75
0.5
0.25
0
32
Speedup
Coherence Applications
1.75
1.5

1
0.75
0.5
0.25

NO L1

1.25

0
32
Speedup
Coherence Applications
1.75
1.5

0.75
0.5
0.25

MESI

1

NO L1

1.25

0
32
Speedup
Coherence Applications
1.75
1.5

0.75
0.5
0.25

MESI

1

NO L1

1.25

0
32
Speedup
Coherence Applications
1.75
1.5

0.5
0.25
0

32

TC

GPU-VI

0.75

MESI

1

NO L1

1.25
Speedup
Coherence Applications
Need a 32KB
directory

1.75
1.5

0.5
0.25
0

32

TC

GPU-VI

0.75

MESI

1

NO L1

1.25
Protocol Complexity	

33
Protocol Complexity	

L1 Stable
L1
Transient
L2 Stable
L2
Transient
33
Protocol Complexity	
NonCoherent

L1 Stable
L1
Transient
L2 Stable
L2
Transient

2
2
2
2
33
Protocol Complexity	
NonCoherent

L1 Stable
L1
Transient
L2 Stable
L2
Transient

GPU-VI	

 	


2
2
2
2

2
1
5
10
33
Protocol Complexity	
NonCoherent

L1 Stable
L1
Transient
L2 Stable
L2
Transient

Temporal
GPU-VI	

 	

Coherence

2
2
2
2

2
1
5
10
33

2
1
5
3
What did we learn
!

• Throughput

and heterogeneous architectures
require a more streamlined caching framework.
!

• Single-chip

integration enables mechanisms
that we can exploit to simplify communication
protocols.
!

• Efficient

coherence protocols enable
programmers to deploy accelerators for wider
purposes..
Contact:
ashriram@cs.sfu.ca
or
aamodt@ece.ubc.ca
• Obtain

GPGPU-Sim with coherence support
http://www.ece.ubc.ca/~isingh/gpgpusim-ruby.tar.gz
35
Interconnect Energy

Interworkgroup

1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0

Router (Static)

Interworkgroup

Intraworkgroup

36

NO-COH
MESI
GPU-VI
GPU-Vini
TCW

Link (Static)

NO-L1
MESI
GPU-VI
GPU-Vini
TCW

Normalized Energy

Router (Dynamic)

NO-COH
MESI
GPU-VI
GPU-Vini
TCW

1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0

NO-L1
MESI
GPU-VI
GPU-Vini
TCW

Normalized Power

Link (Dynamic)

Intraworkgroup
1.0

1.0

0.5

0.5

0.0

0.0

STN
HSP

VPR

37
or coherent and non-coherent GPU memory systems.

communication
2.0

1.5

KMN

(b) Intra-workgroup communication

RCL=0.25
REQ=0.55

2.0

1.5

1.0

0.5

0.0

LPS
NO-COH
MESI
GPU-VI
GPU-Vini
NO-COH
TCW

Interconnect Traffic

0.0

NDL

MESI
NO-COH
GPU-VI
MESI
GPU-Vini
GPU-VI
TCW
GPU-Vini

RCL=0.09
REQ=0.55

HSP
KMN

RG

SR

TCW

2.0

ST

NO-COH
MESI
NO-COH
GPU-VI
MESI
GPU-VI
GPU-Vini
GPU-Vini
TCW

ATO

TCW

RCL=0.15
REQ=0.63

GPU-Vini
TCW
NO-COH

RCL ST LD REQ
INV
ATO

MESI
GPU-VI
NO-COH
GPU-Vini
MESI
TCW
GPU-VI

AVG
NO-COH
MESI
GPU-VI
GPU-Vini
TCW

NO-COH
NO-L1
MESI
MESI
GPU-VI
GPU-VI
GPU-Vini
GPU-Vini
TCW
TCW

1.5

Traffic

2.0

NO-L1
NO-COH
MESI
MESI
GPU-VI
GPU-VI
GPU-Vini
GPU-Vini
TCW
TCW

NO-L1
MESI
GPU-VI Interconnect
GPU-Vini
TCW

REQ

LD

RCL=0.16 RCL=0.25
REQ=0.63 REQ=0.55
2.27

R
R

1.5

1.0

0.5

AVG

LPS

(b) Intra-work
1.0

STN
NO-L1
NO-L1
MESI MESI
GPU-VI
GPU-VI
GPU-Vini
GPU-Vini
TCW TCW

BH

VPR

(a) Inter-workgroup communicationKMN
HSP
AVG
CL

NO-L1
MESI
GPU-VI
GPU-Vini
Interconnect
TCW

NO-L1
NO-L1
MESI MESI
GPU-VI
GPU-VI
GPU-Vini
GPU-Vini
TCW TCW

CC

DLB

0.5

0.5
0.0

0.0

0.0

STN

VPR

GPU-VI
GPU-Vini
NO-L1
TCWMESI

2.0

AVG

GPU-VI
GPU-Vini
TCW

ST

GPU-VI
GPU-Vini
NO-COH
MESI TCW

ATO

GPU-VI
NO-COH
GPU-Vini
MESI TCW

REQ

GPU-Vini
NO-L1
TCWMESI

1.5

NO-L1
MESI
GPU-VI
NO-COH
GPU-Vini
MESI
GPU-VI TCW

2.0

Traffic

INV

1.0

0.5

TCW

NO-L1
NO-L1
MESI
GPU-VI MESI
GPU-VI
GPU-Vini
GPU-Vini
TCW

RCL
RCL=0.03
INV=0.03
REQ=0.68

RCL
INV

LD

REQ

2.0 R
RCL=0.25
REQ=0.55
R
1.5

1.5
1.0

LPS

communication Breakdown of interconnect(b) Intra-work
Figure 8.
traffic for co
38
TC-Strong vs TC-Weak
TCSUO

TCSOO

TCS

TCW

TCW w/ predictor

Fixed lifetime for all applications

Best lifetime for each application
1.2

1.2

Speedup

Speedup

1.4
1.0
0.8
0.6

1.0
0.8
0.6

All applications

39

All applications

Weitere ähnliche Inhalte

Andere mochten auch

PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyAMD Developer Central
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...AMD Developer Central
 
Cache coherence
Cache coherenceCache coherence
Cache coherenceEmployee
 
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013사례를 통해 살펴보는 프로파일링과 최적화 NDC2013
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013Esun Kim
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.ozlael ozlael
 
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...ozlael ozlael
 

Andere mochten auch (6)

PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013사례를 통해 살펴보는 프로파일링과 최적화 NDC2013
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
 
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
 

Ähnlich wie PL-4049, Cache Coherence for GPU Architectures, by Arvindh Shriraman and Tor Aamodt

Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)
Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)
Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)Anne Nicolas
 
When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022HostedbyConfluent
 
Large-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressivenessLarge-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressivenessSangjin Han
 
[114] DRC hubo technical review
[114] DRC hubo technical review[114] DRC hubo technical review
[114] DRC hubo technical reviewNAVER D2
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
 
Build a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by RustBuild a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by Rust安齊 劉
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingKostas Tzoumas
 
lec25-final.ppt
lec25-final.pptlec25-final.ppt
lec25-final.pptzahixdd
 
Paper_Scalable database logging for multicores
Paper_Scalable database logging for multicoresPaper_Scalable database logging for multicores
Paper_Scalable database logging for multicoresHyo jeong Lee
 
Transactions in Action: the Story of Exactly Once in Apache Kafka
Transactions in Action: the Story of Exactly Once in Apache KafkaTransactions in Action: the Story of Exactly Once in Apache Kafka
Transactions in Action: the Story of Exactly Once in Apache KafkaHostedbyConfluent
 
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...Flink Forward
 
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Aljoscha Krettek
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsCeph Community
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HAParis Carbone
 
Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Chris Fregly
 

Ähnlich wie PL-4049, Cache Coherence for GPU Architectures, by Arvindh Shriraman and Tor Aamodt (20)

Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)
Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)
Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)
 
When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022
 
Large-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressivenessLarge-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressiveness
 
Pipeline parallelism
Pipeline parallelismPipeline parallelism
Pipeline parallelism
 
06 pipeline
06 pipeline06 pipeline
06 pipeline
 
[114] DRC hubo technical review
[114] DRC hubo technical review[114] DRC hubo technical review
[114] DRC hubo technical review
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 
Build a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by RustBuild a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by Rust
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream Processing
 
Pipelining1
Pipelining1Pipelining1
Pipelining1
 
lec25-final.ppt
lec25-final.pptlec25-final.ppt
lec25-final.ppt
 
Paper_Scalable database logging for multicores
Paper_Scalable database logging for multicoresPaper_Scalable database logging for multicores
Paper_Scalable database logging for multicores
 
Transactions in Action: the Story of Exactly Once in Apache Kafka
Transactions in Action: the Story of Exactly Once in Apache KafkaTransactions in Action: the Story of Exactly Once in Apache Kafka
Transactions in Action: the Story of Exactly Once in Apache Kafka
 
L12.FA20.ppt
L12.FA20.pptL12.FA20.ppt
L12.FA20.ppt
 
SDN in Warehouse Scale Datacenters v2.0
SDN in Warehouse Scale Datacenters v2.0SDN in Warehouse Scale Datacenters v2.0
SDN in Warehouse Scale Datacenters v2.0
 
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
 
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HA
 
Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016
 

Mehr von AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 

Mehr von AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 

Kürzlich hochgeladen

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

PL-4049, Cache Coherence for GPU Architectures, by Arvindh Shriraman and Tor Aamodt