SlideShare ist ein Scribd-Unternehmen logo
1 von 101
Downloaden Sie, um offline zu lesen
Cache coherence for
GPU Architectures

Inderpreet Singh, Arrvindh Shriraman, Wilson W. L. Fung, Mike O'Connor, Tor M. Aamodt, Cache Coherence for GPU
Architectures, In proceedings of the 19th IEEE International Symposium on High-Performance Computer Architecture
1
(HPCA-19)
Agenda

2
Agenda
Challenges with CPU
coherence on GPUs.

2
Agenda
Challenges with CPU
coherence on GPUs.
Temporal Coherence:
Rethinking coherence for GPUs

2
Agenda
Challenges with CPU
coherence on GPUs.
Temporal Coherence:
Rethinking coherence for GPUs
What is the cost of
providing coherence?
2
Why provide coherence?
1. Inter-workgroup
communication

2. Atomic operations

Characterizing and Evaluating a Key-value Store
Application on Heterogeneous CPU-GPU Systems, ISPASS 2012

3. Task queues

3
Cache Coherence
Programmer

P P P P
Shared
Memory
Appearance: One global copy of every location

4
Cache Coherence
Multicores

GPUs

P P P P
L1 L1 L1 L1
L2
L2

L1 L1 L1 L1
Memory

...

Memory

5
Cache Coherence
Heterogeneous Systems

P P P P
L1 L1 L1 L1
L2
L2
...

L1 L1 L1 L1
...

Memory

How to provide coherence?
6
Challenges

7
Challenges with coherence

L1

L1
Shared L2

8
Challenges with coherence

L1

L1
Shared L2

8
Challenges with coherence

L1

L1

1

2

Shared L2

8
Challenges with coherence

L1

L1

1

3

Shared L2

8

2
Challenge 1: Traffic

L1

L1
Shared L2

9
Challenge 1: Traffic

L1

L1
Shared L2

9

L1
Challenge 1: Traffic

L1

L1
Shared L2

9

L1
Challenge 1: Traffic

L1

L1

L1

Shared L2

30% more traffic than current GPUs
9
Challenge 2: Buffer Overhead

L1

L1
Shared L2

10

L1
Challenge 2: Buffer Overhead

L1

L1

Protocol
Buffer

Shared L2

10

L1
Challenge 2: Buffer Overhead

L1

L1

Protocol
Buffer

Shared L2

10

L1
Challenge 2: Buffer Overhead

L1

L1

Protocol
Buffer

Shared L2

10

L1
Challenge 2: Buffer Overhead

L1

L1

L1

Protocol
Buffer

Shared L2
Coherence protocol buffers require 28% of total L2
10
Challenge 3: Complexity

L1

L1

1

Shared L2
Incoherent
protocol
4 states
11
Challenge 3: Complexity

L1

L1
3

1

2

Shared L2
Incoherent
protocol
4 states

Coherent
protocol
16 states
11
Coherence Overhead.
L1

Coherence messages
1. Traffic transferring
2. Area overhead
3. Protocol complexity

How to achieve coherence without messages?

12
TEMPORAL COHERENCE
13
Temporal Coherence
Time-based Approach
- trigger protocol events on timer alerts

L1

L1
Shared L2
14
Temporal Coherence
Time-based Approach
- trigger protocol events on timer alerts

L1

L1
Shared L2
14
Temporal Coherence
Time-based Approach
- trigger protocol events on timer alerts

L1

L1
Shared L2
14
Temporal Coherence
Time-based Approach
- trigger protocol events on timer alerts

L1

L1
Shared L2
14
Temporal Coherence

L1

L1
Shared L2
15
Temporal Coherence
Clock

1

L1

L1
Shared L2
15
Temporal Coherence
Clock

1
Load

L1

L1
Shared L2
15
Temporal Coherence
Clock

1
Load

Valid if
TIME  LT
!

L1

L1
Shared L2
15

LT
Temporal Coherence
Clock

1
Load

Valid if
TIME  LT
!

L1

L1
Shared L2
15

LT
Temporal Coherence
Clock

1
Load

Valid if
TIME  LT
!

L1

L1

LT

!

GT
Shared L2
15

Shared if
TIME  GT
Temporal Coherence

L1

L1

16
Temporal Coherence
TIME 0

L1

L1

16
Temporal Coherence
TIME 0
Load

L1

L1

16
Temporal Coherence
TIME 0
Load
!

L1

L1 20

16
Temporal Coherence
TIME 0
Load
!

L1

L1 20
!

20

Line shared
till 20

16
TIME 5

!

L1

L1 20

17

L1
TIME 5
Load
!

L1

L1 20

17

!

L1 25
TIME 5
Load
!

L1

L1 20

!

L1 25

!

25
Line shared
till 25
17
TIME

15
!

L1

L1 20
!

25

18

!

L1 25
TIME

15

Write
!

L1

L1 20
!

25

18

!

L1 25
TIME

20

Write
!

L1

L1 20
!

25

19

!

L1 25
TIME

20

Write
!

L1

L1 20
!

25

19

!

L1 25
TIME

25

Write

L1

!

L1

L1 25

!

25

20
TIME

25

Write

L1

!

L1

L1 25

!

25

20
TIME

25

Write

L1

!

L1

L1 25

!

25

20
TIME

25

Write

L1

!

L1

L1 25

!

25

20
Temporal Coherence
No coherence messages
All transactions are 2-hop
Protocol complexity minimal
Supports strong and weak
memory models
Enables optimized communication
(ask me later...)
21
How to set the block lifetime?
• Longer

= writes may stall

• Shorter

= may not exploit temporal locality

!

•

Lifetime predictor

at L2.

-Load to expired block (for temporal locality)
-Store to unexpired block (reduce write stalls)
-Eviction of unexpired block (reduce L2 eviction stalls)
22
Temporal Coherence (Weak)

Write
!

L1

L1 20
!

25
Shared L2
23

!

L1 25
Temporal Coherence (Weak)

Write
!

L1

L1 20
!

25
Shared L2
23

!

L1 25
Temporal Coherence (Weak)
Sensitive to misprediction
Write
!

L1

L1 20
!

25
Shared L2
23

!

L1 25
Temporal Coherence (Weak)
Sensitive to misprediction
Write

Resource ! stalls
L1

L1 20
!

25
Shared L2
23

!

L1 25
Temporal Coherence (Weak)
Sensitive to misprediction
Write

Resource ! stalls
L1

L1 20
!

!

L1 25

25
Hurts GPU applications
Shared L2
23
Temporal Coherence (Weak)
Sensitive to misprediction
Write

Resource ! stalls
L1

L1 20

!

L1 25

!

25
Hurts GPU applications
Shared L2

Goal : eliminate Write Stalls!
23
Temporal Coherence (Weak)
TIME

15
!

L1

OLD
L1
!

25

24

20

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

15

Write
!

L1

OLD
L1
!

25

24

20

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

15

Write
Fence
!

L1

OLD
L1
!

25

24

20

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

15

Write
Fence
......
!

L1

OLD
L1
!

25

24

20

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

20

Fence
!

L1

L1 20
!

25

25

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

20

Fence
!

L1

L1 20
!

25

25

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

20

Fence
......
!

L1

L1 20
!

25

25

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

25

Fence

L1

!

L1

L1 25

!

25

26
Temporal Coherence (Weak)
TIME
!

25

25

Fence

L1

!

L1

L1 25

!

25

26
Temporal Coherence (Weak)
TIME
!

25

25

Fence

L1

!

L1

L1 25

!

25

26
Temporal Coherence (Weak)
TIME

25

!

25
L1

!

L1

L1 25

!

25

26
Temporal Coherence (Weak)
No Access Stalls
Efficient GPU applications
Aggressive lifetime predictors
Supports weak memory models
27
28
Coherence Applications
• Lock-based

programs

-Barnes Hut
-Cloth Physics
-Place-and-Route

• Stencil

-Max-Flow Min-cut
-3D equation solver

• Load

balancing

-Octree Partitioning
29
Interconnect Traffic

GPU Applications (do not need coherence)

30
Interconnect Traffic

GPU Applications (do not need coherence)
2
1.5
1
0.5
0
30
Interconnect Traffic

GPU Applications (do not need coherence)
2

1
0.5
0

NO.CC

1.5

30
Interconnect Traffic

GPU Applications (do not need coherence)
2.3

2

0.5
0

MESI

1

NO.CC

1.5

30
Interconnect Traffic

GPU Applications (do not need coherence)
2.3

2

0

GPU-VI

0.5

MESI

1

NO.CC

1.5

30
Interconnect Traffic

GPU Applications (do not need coherence)
Wr-Through

2.3

2

.8x

0

GPU-VI

0.5

MESI

1

NO.CC

1.5

30
Interconnect Traffic

GPU Applications (do not need coherence)
Wr-Through

2.3

2

.8x

1.5

0

30

.3x

TC

GPU-VI

MESI

0.5

NO.CC

1

No msgs
Coherence Applications
• Lock-based

programs

-Barnes Hut
-Cloth Physics
-Place-and-Route

• Stencil

-Max-Flow Min-cut
-3D equation solver

• Load

balancing

-Octree Partitioning
31
Speedup
Coherence Applications

32
Speedup
Coherence Applications
1.75
1.5
1.25
1
0.75
0.5
0.25
0
32
Speedup
Coherence Applications
1.75
1.5

1
0.75
0.5
0.25

NO L1

1.25

0
32
Speedup
Coherence Applications
1.75
1.5

0.75
0.5
0.25

MESI

1

NO L1

1.25

0
32
Speedup
Coherence Applications
1.75
1.5

0.75
0.5
0.25

MESI

1

NO L1

1.25

0
32
Speedup
Coherence Applications
1.75
1.5

0.5
0.25
0

32

TC

GPU-VI

0.75

MESI

1

NO L1

1.25
Speedup
Coherence Applications
Need a 32KB
directory

1.75
1.5

0.5
0.25
0

32

TC

GPU-VI

0.75

MESI

1

NO L1

1.25
Protocol Complexity	

33
Protocol Complexity	

L1 Stable
L1
Transient
L2 Stable
L2
Transient
33
Protocol Complexity	
NonCoherent

L1 Stable
L1
Transient
L2 Stable
L2
Transient

2
2
2
2
33
Protocol Complexity	
NonCoherent

L1 Stable
L1
Transient
L2 Stable
L2
Transient

GPU-VI	

 	


2
2
2
2

2
1
5
10
33
Protocol Complexity	
NonCoherent

L1 Stable
L1
Transient
L2 Stable
L2
Transient

Temporal
GPU-VI	

 	

Coherence

2
2
2
2

2
1
5
10
33

2
1
5
3
What did we learn
!

• Throughput

and heterogeneous architectures
require a more streamlined caching framework.
!

• Single-chip

integration enables mechanisms
that we can exploit to simplify communication
protocols.
!

• Efficient

coherence protocols enable
programmers to deploy accelerators for wider
purposes..
Contact:
ashriram@cs.sfu.ca
or
aamodt@ece.ubc.ca
• Obtain

GPGPU-Sim with coherence support
http://www.ece.ubc.ca/~isingh/gpgpusim-ruby.tar.gz
35
Interconnect Energy

Interworkgroup

1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0

Router (Static)

Interworkgroup

Intraworkgroup

36

NO-COH
MESI
GPU-VI
GPU-Vini
TCW

Link (Static)

NO-L1
MESI
GPU-VI
GPU-Vini
TCW

Normalized Energy

Router (Dynamic)

NO-COH
MESI
GPU-VI
GPU-Vini
TCW

1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0

NO-L1
MESI
GPU-VI
GPU-Vini
TCW

Normalized Power

Link (Dynamic)

Intraworkgroup
1.0

1.0

0.5

0.5

0.0

0.0

STN
HSP

VPR

37
or coherent and non-coherent GPU memory systems.

communication
2.0

1.5

KMN

(b) Intra-workgroup communication

RCL=0.25
REQ=0.55

2.0

1.5

1.0

0.5

0.0

LPS
NO-COH
MESI
GPU-VI
GPU-Vini
NO-COH
TCW

Interconnect Traffic

0.0

NDL

MESI
NO-COH
GPU-VI
MESI
GPU-Vini
GPU-VI
TCW
GPU-Vini

RCL=0.09
REQ=0.55

HSP
KMN

RG

SR

TCW

2.0

ST

NO-COH
MESI
NO-COH
GPU-VI
MESI
GPU-VI
GPU-Vini
GPU-Vini
TCW

ATO

TCW

RCL=0.15
REQ=0.63

GPU-Vini
TCW
NO-COH

RCL ST LD REQ
INV
ATO

MESI
GPU-VI
NO-COH
GPU-Vini
MESI
TCW
GPU-VI

AVG
NO-COH
MESI
GPU-VI
GPU-Vini
TCW

NO-COH
NO-L1
MESI
MESI
GPU-VI
GPU-VI
GPU-Vini
GPU-Vini
TCW
TCW

1.5

Traffic

2.0

NO-L1
NO-COH
MESI
MESI
GPU-VI
GPU-VI
GPU-Vini
GPU-Vini
TCW
TCW

NO-L1
MESI
GPU-VI Interconnect
GPU-Vini
TCW

REQ

LD

RCL=0.16 RCL=0.25
REQ=0.63 REQ=0.55
2.27

R
R

1.5

1.0

0.5

AVG

LPS

(b) Intra-work
1.0

STN
NO-L1
NO-L1
MESI MESI
GPU-VI
GPU-VI
GPU-Vini
GPU-Vini
TCW TCW

BH

VPR

(a) Inter-workgroup communicationKMN
HSP
AVG
CL

NO-L1
MESI
GPU-VI
GPU-Vini
Interconnect
TCW

NO-L1
NO-L1
MESI MESI
GPU-VI
GPU-VI
GPU-Vini
GPU-Vini
TCW TCW

CC

DLB

0.5

0.5
0.0

0.0

0.0

STN

VPR

GPU-VI
GPU-Vini
NO-L1
TCWMESI

2.0

AVG

GPU-VI
GPU-Vini
TCW

ST

GPU-VI
GPU-Vini
NO-COH
MESI TCW

ATO

GPU-VI
NO-COH
GPU-Vini
MESI TCW

REQ

GPU-Vini
NO-L1
TCWMESI

1.5

NO-L1
MESI
GPU-VI
NO-COH
GPU-Vini
MESI
GPU-VI TCW

2.0

Traffic

INV

1.0

0.5

TCW

NO-L1
NO-L1
MESI
GPU-VI MESI
GPU-VI
GPU-Vini
GPU-Vini
TCW

RCL
RCL=0.03
INV=0.03
REQ=0.68

RCL
INV

LD

REQ

2.0 R
RCL=0.25
REQ=0.55
R
1.5

1.5
1.0

LPS

communication Breakdown of interconnect(b) Intra-work
Figure 8.
traffic for co
38
TC-Strong vs TC-Weak
TCSUO

TCSOO

TCS

TCW

TCW w/ predictor

Fixed lifetime for all applications

Best lifetime for each application
1.2

1.2

Speedup

Speedup

1.4
1.0
0.8
0.6

1.0
0.8
0.6

All applications

39

All applications

Weitere ähnliche Inhalte

Andere mochten auch

PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyAMD Developer Central
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...AMD Developer Central
 
Cache coherence
Cache coherenceCache coherence
Cache coherenceEmployee
 
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013사례를 통해 살펴보는 프로파일링과 최적화 NDC2013
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013Esun Kim
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.ozlael ozlael
 
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...ozlael ozlael
 

Andere mochten auch (6)

PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013사례를 통해 살펴보는 프로파일링과 최적화 NDC2013
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
 
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
 

Ähnlich wie PL-4049, Cache Coherence for GPU Architectures, by Arvindh Shriraman and Tor Aamodt

Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)
Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)
Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)Anne Nicolas
 
When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022HostedbyConfluent
 
Large-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressivenessLarge-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressivenessSangjin Han
 
[114] DRC hubo technical review
[114] DRC hubo technical review[114] DRC hubo technical review
[114] DRC hubo technical reviewNAVER D2
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
 
Build a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by RustBuild a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by Rust安齊 劉
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingKostas Tzoumas
 
lec25-final.ppt
lec25-final.pptlec25-final.ppt
lec25-final.pptzahixdd
 
Paper_Scalable database logging for multicores
Paper_Scalable database logging for multicoresPaper_Scalable database logging for multicores
Paper_Scalable database logging for multicoresHyo jeong Lee
 
Transactions in Action: the Story of Exactly Once in Apache Kafka
Transactions in Action: the Story of Exactly Once in Apache KafkaTransactions in Action: the Story of Exactly Once in Apache Kafka
Transactions in Action: the Story of Exactly Once in Apache KafkaHostedbyConfluent
 
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...Flink Forward
 
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Aljoscha Krettek
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsCeph Community
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HAParis Carbone
 
Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Chris Fregly
 

Ähnlich wie PL-4049, Cache Coherence for GPU Architectures, by Arvindh Shriraman and Tor Aamodt (20)

Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)
Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)
Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)
 
When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022
 
Large-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressivenessLarge-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressiveness
 
Pipeline parallelism
Pipeline parallelismPipeline parallelism
Pipeline parallelism
 
06 pipeline
06 pipeline06 pipeline
06 pipeline
 
[114] DRC hubo technical review
[114] DRC hubo technical review[114] DRC hubo technical review
[114] DRC hubo technical review
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 
Build a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by RustBuild a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by Rust
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream Processing
 
Pipelining1
Pipelining1Pipelining1
Pipelining1
 
lec25-final.ppt
lec25-final.pptlec25-final.ppt
lec25-final.ppt
 
Paper_Scalable database logging for multicores
Paper_Scalable database logging for multicoresPaper_Scalable database logging for multicores
Paper_Scalable database logging for multicores
 
Transactions in Action: the Story of Exactly Once in Apache Kafka
Transactions in Action: the Story of Exactly Once in Apache KafkaTransactions in Action: the Story of Exactly Once in Apache Kafka
Transactions in Action: the Story of Exactly Once in Apache Kafka
 
L12.FA20.ppt
L12.FA20.pptL12.FA20.ppt
L12.FA20.ppt
 
SDN in Warehouse Scale Datacenters v2.0
SDN in Warehouse Scale Datacenters v2.0SDN in Warehouse Scale Datacenters v2.0
SDN in Warehouse Scale Datacenters v2.0
 
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
 
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HA
 
Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016
 

Mehr von AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 

Mehr von AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 

Kürzlich hochgeladen

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 

Kürzlich hochgeladen (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 

PL-4049, Cache Coherence for GPU Architectures, by Arvindh Shriraman and Tor Aamodt