SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing
9-1
Chap. 9 Pipeline and Vector Processing
9-1 Parallel Processing
 Simultaneous data processing tasks for the purpose of increasing the
computational speed
 Perform concurrent data processing to achieve faster execution time
 Multiple Functional Unit : Fig. 9-1
Separate the execution unit into eight functional units operating in parallel
 Computer Architectural Classification
Data-Instruction Stream : Flynn
Serial versus Parallel Processing : Feng
Parallelism and Pipelining : Händler
 Flynn’s Classification
1) SISD (Single Instruction - Single Data stream)
» for practical purpose: only one processor is useful
» Example systems : Amdahl 470V/6, IBM 360/91
Parallel Processing Example
A d d e r - s u b t r a c t o r
In t e g e r m u lt ip ly
F lo a t in t - p o in t
a d d - s u b t r a c t
In c r e m e n t e r
S h if t u n it
L o g ic u n i t
F lo a t in t - p o in t
d iv i d e
F lo a t in t - p o in t
m u lt ip ly
P r o c e s s o r
r e g is t e r s
T o M e m o r y
=
C U M MP U
IS D S
IS
Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing
9-2
2) SIMD
(Single Instruction - Multiple Data stream)
» vector or array operations 에 적합한 형태
one vector operation includes many
operations on a data stream
» Example systems : CRAY -1, ILLIAC-IV
3) MISD
(Multiple Instruction - Single Data stream)
» Data Stream 에 Bottle neck 으로 인해
실제로 사용되지 않음
C U
P U 1
P U n
P U 2
M M 1
M M n
M M 2
D S 1
D S 2
D S n
IS
IS
S h a r e d m e m m o r y
P U 1
P U n
P U 2
D S
C U 1
C U n
C U 2
I S 1
I S 2
I S n
M M 1M M n M M 2
I S 1
I S 2
I S n
D S
S h a r e d m e m o r y
Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing
9-3
4) MIMD
(Multiple Instruction - Multiple Data stream)
» 대부분의 Multiprocessor
System 에서 사용됨
 Main topics in this Chapter
Pipeline processing : Sec. 9-2
» Arithmetic pipeline : Sec. 9-3
» Instruction pipeline : Sec. 9-4
Vector processing :adder/multiplier pipeline 이용 , Sec. 9-6
Array processing : 별도의 array processor 이용 , Sec. 9-7
» Attached array processor : Fig. 9-14
» SIMD array processor : Fig. 9-15
Large vector, Matrices,
그리고 Array Data 계산
P U 1
P U n
P U 2
D S
C U 1
C U n
C U 2
I S 1
I S 2
I S n
IS 1
IS 2
IS n
M M 1
M M n
M M 2
S h a r e d m e m o r y
vv
Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing
9-4
9-2 Pipelining
 Pipelining 의 원리
Decomposing a sequential process into suboperations
Each subprocess is executed in a special dedicated segment concurrently
 Pipelining 의 예제 : Fig. 9-2
Multiply and add operation : ( for i = 1, 2, …, 7 )
3 개의 Suboperation Segment 로 분리
» 1) : Input Ai and Bi
» 2) : Multiply and input Ci
» 3) : Add Ci
Content of registers in pipeline example : Tab. 9-1
 General considerations
4 segment pipeline : Fig. 9-3
» S : Combinational circuit for Suboperation
» R : Register(intermediate results between the segments)
Space-time diagram : Fig. 9-4
» Show segment utilization as a function of time
Task : T1, T2, T3,…, T6
» Total operation performed going through all the segment
435
4,2*13
2,1
RRR
CiRRRR
BiRAiR
+←
←←
←←
CiBiAi +*
Segment
versus
clock-cycle
Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing
9-5
 Speedup S : Nonpipeline / Pipeline
S = n • tn / ( k + n - 1 ) • tp = 6 • 6 tn / ( 4 + 6 - 1 ) • tp = 36 tn / 9 tn = 4
» n : task number ( 6 )
» tn : time to complete each task in nonpipeline ( 6 cycle times = 6 tp)
» tp : clock cycle time ( 1 clock cycle )
» k : segment number ( 4 )
If n→ ∞ 이면 , S = tn / tp
한 개의 task 를 처리하는 시간이 같을 때
즉 , nonpipeline ( tn ) = pipeline ( k • tp )
이라고 가정하면 ,
S = tn / tp = k • tp / tp = k
따라서 이론적으로 k 배 (segment 개수 )
만큼 처리 속도가 향상된다 .
 Pipeline 에는 Arithmetic Pipeline(Sec. 9-3) 과 Instruction Pipeline(Sec. 9-4) 이 있
다
Sec. 9-3 Arithmetic Pipeline
 Floating-point Adder Pipeline Example : Fig. 9-6
Add / Subtract two normalized floating-point binary number
» X = A x 2a
= 0.9504 x 103
1 8765432 9
1
4
3
2
C lo c k c y c le s
T 1 T 6T 3 T 5T 2 T 4
T 1 T 6T 3 T 5T 2 T 4
T 1 T 6T 3 T 5T 2 T 4
T 1 T 6T 3 T 5T 2 T 4
Segment
Pipeline 에서의 처리 시간 = 9 clock cycles
k + n - 1 ≈ n
Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing
9-6
4 segments suboperations
» 1) Compare exponents by subtraction :
3 - 2 = 1
X = 0.9504 x 103
Y = 0.8200 x 102
» 2) Align mantissas
X = 0.9504 x 103
Y = 0.08200 x 103
» 3) Add mantissas
Z = 1.0324 x 103
» 4) Normalize result
Z = 0.1324 x 104
R
C o m p a r e
e x p o n e n t s
b y s u b t r a c t io n
R
C h o o s e e x p o n e n t A lig n m a n t is s a s
R
A d d o r s u b t r a c t
m a n t is s a s
R
N o r m a liz e
r e s u lt
R
R
A d ju s t
e x p o n e n t
R
R
a b BA
E x p o n e n t s M a n t is s a s
D if f e r e n c e
S e g m e n t 1 :
S e g m e n t 4 :
S e g m e n t 3 :
S e g m e n t 2 :
Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing
9-7
9-4 Instruction Pipeline
 Instruction Cycle
1) Fetch the instruction from memory
2) Decode the instruction
3) Calculate the effective address
4) Fetch the operands from memory
5) Execute the instruction
6) Store the result in the proper place
 Example : Four-segment Instruction Pipeline
Four-segment CPU pipeline : Fig. 9-7
» 1) FI : Instruction Fetch
» 2) DA : Decode Instruction & calculate EA
» 3) FO : Operand Fetch
» 4) EX : Execution
Timing of Instruction Pipeline : Fig. 9-8
» Instruction 3 에서 Branch 명령 실행
S e g m e n t 1 :
S e g m e n t 4 :
S e g m e n t 3 :
S e g m e n t 2 :
F e t c h i n s t r u c t io n
f r o m m e m o r y
D e c o d e i n s t r u c t i o n
a n d c a l c u l a t e
e f f e c t i v e a d d r e s s
F e t c h o p e r a n d
f r o m m e m o r y
E x e c u t e in s t r u c t io n
B r a n c h ?
I n t e r r u p t ?
I n t e r r u p t
h a n d lin g
U p d a t e P C
E m p t y p i p e
1 32
1
4
3
2
7
6
5
87654 9 1 21 11 0 1 3
F I E XF OD A
F I E XF OD A
F I E XF OD A
F I E XF OD A
F I E XF OD A
F I E XF OD A
F I E XF OD A
F I
In s t r u c t io n :
( B r a n c h )
S t e p :
BranchNo Branch
Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing
9-8
 Pipeline Conflicts : 3 major difficulties
1) Resource conflicts
» memory access by two segments at the same time
2) Data dependency
» when an instruction depend on the result of a previous instruction, but this result is not
yet available
3) Branch difficulties
» branch and other instruction (interrupt, ret, ..) that change the value of PC
 Data Dependency 해결 방법
Hardware 적인 방법
» Hardware Interlock
previous instruction 의 결과가 나올 때 까지 Hardware 적인 Delay 를 강제 삽입
» Operand Forwarding
previous instruction 의 결과를 곧바로 ALU 로 전달 ( 정상적인 경우 , register 를 경유함 )
Software 적인 방법
» Delayed Load
previous instruction 의 결과가 나올 때 까지 No-operation instruction 을 삽입
 Handling of Branch Instructions
Prefetch target instruction
» Conditional branch 에서 branch target instruction ( 조건 맞음 ) 과 다음 instruction ( 조
건 안 맞음 ) 을 모두 fetch
Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing
9-9
Branch Target Buffer : BTB
» 1) Associative memory 를 이용하여 branch target address 이후에 몇 개에 instruction
을 미리 BTB 에 저장한다 .
» 2) 만약 branch instruction 이면 우선 BTB 를 검사하여 BTB 에 있으면 곧바로 가져온
다 (Cache 개념 도입 )
Loop Buffer
» 1) small very high speed register file (RAM) 을
이용하여 프로그램에서 loop 를 detect 한다 .
» 2) 만약 loop 가 발견되면 loop 프로그램 전체를
Loop Buffer 에 load 하여 실행하면 외부
메모리를 access 하지 않는다 .
Branch Prediction
» Branch 를 predict 하는 additional hardware logic 사용
 Delayed Branch 해결 방법
Fig. 9-8 에서와 같이 branch instruction 이
pipeline operation 을 지연시키는 경우
예제 : Fig. 9-10, p. 318, Sec. 9-5
» 1) No-operation instruction 삽입
» 2) Instruction Rearranging : Compiler 지원
1 32 654
1 . L o a d
4 . S u b t r a c t
3 . A d d
2 . In c r e m e n t
I EA
I EA
I EA
I EA
( a ) U s in g n o - o p e r a t io n in s t r u c t io n s
C lo c k c y c le s :
1 32 654
I EA
I EA
I EA
I EA
( b ) R e a r r a n g in g t h e in s t r u c t io n s
7
I EA
C lo c k c y c le s :
5 . B r a n c h t o X
8 . In s t r u c t io n in X
6 . N o - o p e r a t io n
7 . N o - o p e r a t io n
7 1 098
I EA
I EA
I EA
I EA
1 . L o a d
5 . S u b t r a c t
4 . A d d
2 . In c r e m e n t
3 . B r a n c h t o X
6 . In s t r u c t io n in X
8
I EA
Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing
9-10
9-5 RISC Pipeline
 RISC CPU 의 특징
Instruction Pipeline 을 이용함
Single-cycle instruction execution
Compiler support
 Example : Three-segment Instruction Pipeline
3 Suboperations Instruction Cycle
» 1) I : Instruction fetch
» 2) A : Instruction decoded and ALU operation
» 3) E : Transfer the output of ALU to a register,
memory, or PC
Delayed Load : Fig. 9-9(a)
» 3 번째 Instruction(ADD R1 + R3) 에서 Conflict 발생
4 번째 clock cycle 에서 2 번째 Instruction (LOAD R2)
실행과 동시에 3 번째 instruction 에서 R2 를 연산
» Delayed Load 해결 방법 : Fig. 9-9(b)
No-operation 삽입
Delayed Branch : Sec. 9-4 에서 이미 설명
1 32 654
1 . L o a d R 1
4 . S t o r e R 3
3 . A d d R 1 + R 2
2 . L o a d R 2
I EA
I EA
I EA
I EA
( a ) P ip e lin e t im in g w it h d a t a c o n f lic t
1 32 654
1 . L o a d R 1
4 . A d d R 1 + R 2
2 . L o a d R 2
I EA
I EA
I EA
I EA
( b ) P ip e lin e t im in g w it h d e la y e d lo a d
5 . S t o r e R 3
3 . N o - o p e r a t io n
7
I EA
C lo c k c y c le s :
C lo c k c y c le s :
Conflict 발생
Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing
9-11
9-6 Vector Processing
 Science and Engineering Applications
Long-range weather forecasting, Petroleum explorations, Seismic data analysis,
Medical diagnosis, Aerodynamics and space flight simulations, Artificial
intelligence and expert systems, Mapping the human genome, Image processing
 Vector Operations
Arithmetic operations on large arrays of numbers
Conventional scalar processor
» Machine language
Vector processor
» Single vector instruction
Initialize I = 0
20 Read A(I)
Read B(I)
Store C(I) = A(I) + B(I)
Increment I = I + 1
If I ≤ 100 go to 20
Continue
» Fortran language
DO 20 I = 1, 100
20 C(I) = A(I) + B(I)
C(1:100) = A(1:100) + B(1:100)
Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing
9-12
 Vector Instruction Format : Fig. 9-11
ADD A B C 100
 Matrix Multiplication
3 x 3 matrices multiplication : n2
= 9 inner product
» : 이와 같은 inner product 가 9 개
Cumulative multiply-add operation : n3
= 27 multiply-add
» : 이와 같은 multiply-add 가 3 개
따라서 9 X 3 multiply-add = 27
O p e r a t io n
c o d e
B a s e a d d r e s s
s o u r c e 1
B a s e a d d r e s s
s o u r c e 2
B a s e a d d r e s s
d e s t in a t io n
V e c t o r
le n g t h










=










×










333231
232221
131211
333231
232221
131211
333231
232221
131211
ccc
ccc
ccc
bbb
bbb
bbb
aaa
aaa
aaa
31132112111111 bababac ++=
bacc ×+=
3113211211111111 bababacc +++=
    
C11 의 초기값 = 0
Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing
9-13
 Pipeline for calculating an inner product : Fig. 9-12
Floating point multiplier pipeline : 4 segment
Floating point adder pipeline : 4 segment
예제 )
» after 1st clock input
» after 8th clock input
» Four section summation
kk BABABABAC ++++= 332211
S o u r c e
A
S o u r c e
B
M u lt ip lie r
p ip e lin e
A d d e r
p ip e lin e
» after 4th clock input
A1B1
S o u r c e
A
S o u r c e
B
M u lt ip lie r
p ip e lin e
A d d e r
p ip e lin e
A4B4 A3B3 A2B2 A1B1
S o u r c e
A
S o u r c e
B
M u lt ip lie r
p ip e lin e
A d d e r
p ip e lin e
S o u r c e
A
S o u r c e
B
M u lt ip lie r
p ip e lin e
A d d e r
p ip e lin e
» after 9th, 10th, 11th ,...
A8B8 A7B7 A6B6 A5B5 A4B4 A3B3 A2B2 A1B1 A8B8 A7B7 A6B6 A5B5 A4B4 A3B3 A2B2 A1B1
5511 BABA +
, , ,  
6622 BABA +




+++++
+++++
+++++
++++=
161612128844
151511117733
141410106622
1313995511
BABABABA
BABABABA
BABABABA
BABABABAC
Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing
9-14
 Memory Interleaving : Fig. 9-13
Simultaneous access to memory from two or
more source using one memory bus system
AR 의 하위 2 bit 를 사용하여 4 개중 1 개의
memory module 선택
예제 ) Even / Odd Address Memory Access
A R
M e m o r y
a r r a y
D R
A R
M e m o r y
a r r a y
D R
A R
M e m o r y
a r r a y
D R
A R
M e m o r y
a r r a y
D R
A d d r e s s b u s
D a t a b u s
 Supercomputer
Supercomputer = Vector Instruction + Pipelined floating-point arithmetic
Performance Evaluation Index
» MIPS : Million Instruction Per Second
» FLOPS : Floating-point Operation Per Second
megaflops : 106
, gigaflops : 109
Cray supercomputer : Cray Research
» Clay-1 : 80 megaflops, 4 million 64 bit words memory
» Clay-2 : 12 times more powerful than the clay-1
VP supercomputer : Fujitsu
» VP-200 : 300 megaflops, 32 million memory, 83 vector instruction, 195 scalar
instruction
» VP-2600 : 5 gigaflops
Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing
9-15
9-7 Array Processors
 Performs computations on large arrays of data
 Array Processing
Attached array processor : Fig. 9-14
» Auxiliary processor attached to a general purpose computer
SIMD array processor : Fig. 9-15
» Computer with multiple processing units operating in parallel
Vector 계산 C = A + B 에서 ci = ai + bi 를
각각의 PEi 에서 동시에 실행
Vector processing : Adder/Multiplier pipeline 이용
Array processing : 별도의 array processor 이용
G e n e r a l- p u r p o s e
c o m p u t e r
In p u t - O u t p u t
in t e r f a c e
A t t a c h e d a r r a y
P r o c e s s o r
M a in m e m o r y L o c a l m e m o r y
H ig h - s p e e d m e m o r y t o -
m e m o r y b u s
P E 1
P E 3
P E 2
M 1
M 3
M 2
P E n M n
M a s t e r c o n t r o l
u n it
M a i n m e m o r y

Weitere ähnliche Inhalte

Was ist angesagt?

INSTRUCTION LEVEL PARALLALISM
INSTRUCTION LEVEL PARALLALISMINSTRUCTION LEVEL PARALLALISM
INSTRUCTION LEVEL PARALLALISM
Kamran Ashraf
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
Anuj Modi
 

Was ist angesagt? (20)

DMA operation
DMA operationDMA operation
DMA operation
 
Computer Organization and Architecture.
Computer Organization and Architecture.Computer Organization and Architecture.
Computer Organization and Architecture.
 
Modes of data transfer
Modes of data transferModes of data transfer
Modes of data transfer
 
INSTRUCTION LEVEL PARALLALISM
INSTRUCTION LEVEL PARALLALISMINSTRUCTION LEVEL PARALLALISM
INSTRUCTION LEVEL PARALLALISM
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
 
Memory organization in computer architecture
Memory organization in computer architectureMemory organization in computer architecture
Memory organization in computer architecture
 
Instruction cycle
Instruction cycleInstruction cycle
Instruction cycle
 
Basic Computer Organization and Design
Basic  Computer  Organization  and  DesignBasic  Computer  Organization  and  Design
Basic Computer Organization and Design
 
Cpu organisation
Cpu organisationCpu organisation
Cpu organisation
 
Direct memory access (dma)
Direct memory access (dma)Direct memory access (dma)
Direct memory access (dma)
 
Input Output - Computer Architecture
Input Output - Computer ArchitectureInput Output - Computer Architecture
Input Output - Computer Architecture
 
Signed Addition And Subtraction
Signed Addition And SubtractionSigned Addition And Subtraction
Signed Addition And Subtraction
 
Pipelining powerpoint presentation
Pipelining powerpoint presentationPipelining powerpoint presentation
Pipelining powerpoint presentation
 
Computer architecture addressing modes and formats
Computer architecture addressing modes and formatsComputer architecture addressing modes and formats
Computer architecture addressing modes and formats
 
Microprogrammed Control Unit
Microprogrammed Control UnitMicroprogrammed Control Unit
Microprogrammed Control Unit
 
Von Neumann Architecture
Von Neumann ArchitectureVon Neumann Architecture
Von Neumann Architecture
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
 
Central processing unit
Central processing unitCentral processing unit
Central processing unit
 
Instruction Execution Cycle
Instruction Execution CycleInstruction Execution Cycle
Instruction Execution Cycle
 
Input output interface
Input output interfaceInput output interface
Input output interface
 

Andere mochten auch (12)

Memory Organization
Memory OrganizationMemory Organization
Memory Organization
 
Prim Algorithm and kruskal algorithm
Prim Algorithm and kruskal algorithmPrim Algorithm and kruskal algorithm
Prim Algorithm and kruskal algorithm
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
input output Organization
input output Organizationinput output Organization
input output Organization
 
Tiny os
Tiny osTiny os
Tiny os
 
Structure and Typedef
Structure and TypedefStructure and Typedef
Structure and Typedef
 
Union from C and Data Strutures
Union from C and Data StruturesUnion from C and Data Strutures
Union from C and Data Strutures
 
Data retrieval in sensor networks
Data retrieval in sensor networksData retrieval in sensor networks
Data retrieval in sensor networks
 
c and data structures first unit notes (jntuh syllabus)
c and data structures first unit notes (jntuh syllabus)c and data structures first unit notes (jntuh syllabus)
c and data structures first unit notes (jntuh syllabus)
 
Branch and bound
Branch and boundBranch and bound
Branch and bound
 
Classification and prediction
Classification and predictionClassification and prediction
Classification and prediction
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 

Ähnlich wie pipeline and vector processing

“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
Edge AI and Vision Alliance
 
2007 Tidc India Profiling
2007 Tidc India Profiling2007 Tidc India Profiling
2007 Tidc India Profiling
danrinkes
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
Jeff Larkin
 

Ähnlich wie pipeline and vector processing (20)

Pipelining And Vector Processing
Pipelining And Vector ProcessingPipelining And Vector Processing
Pipelining And Vector Processing
 
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdfCS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
 
Parallel Processing Techniques Pipelining
Parallel Processing Techniques PipeliningParallel Processing Techniques Pipelining
Parallel Processing Techniques Pipelining
 
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.pptComputer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
 
Unit 3-pipelining & vector processing
Unit 3-pipelining & vector processingUnit 3-pipelining & vector processing
Unit 3-pipelining & vector processing
 
pipelining ppt.pdf
pipelining ppt.pdfpipelining ppt.pdf
pipelining ppt.pdf
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System Achitecture
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
 
CO Module 5
CO Module 5CO Module 5
CO Module 5
 
Core pipelining
Core pipelining Core pipelining
Core pipelining
 
BTCS501_MM_Ch9.pptx
BTCS501_MM_Ch9.pptxBTCS501_MM_Ch9.pptx
BTCS501_MM_Ch9.pptx
 
Computer Organozation
Computer OrganozationComputer Organozation
Computer Organozation
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Short.course.introduction.to.vhdl
Short.course.introduction.to.vhdlShort.course.introduction.to.vhdl
Short.course.introduction.to.vhdl
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
 
A Comparative Study of PID and Fuzzy Controller for Speed Control of Brushles...
A Comparative Study of PID and Fuzzy Controller for Speed Control of Brushles...A Comparative Study of PID and Fuzzy Controller for Speed Control of Brushles...
A Comparative Study of PID and Fuzzy Controller for Speed Control of Brushles...
 
Dsp lab manual 15 11-2016
Dsp lab manual 15 11-2016Dsp lab manual 15 11-2016
Dsp lab manual 15 11-2016
 
2007 Tidc India Profiling
2007 Tidc India Profiling2007 Tidc India Profiling
2007 Tidc India Profiling
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
 
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
 

Mehr von Acad (15)

routing alg.pptx
routing alg.pptxrouting alg.pptx
routing alg.pptx
 
Network Layer design Issues.pptx
Network Layer design Issues.pptxNetwork Layer design Issues.pptx
Network Layer design Issues.pptx
 
Computer Science basics
Computer Science basics Computer Science basics
Computer Science basics
 
Union
UnionUnion
Union
 
Stacks
StacksStacks
Stacks
 
Str
StrStr
Str
 
Functions
FunctionsFunctions
Functions
 
File
FileFile
File
 
Ds
DsDs
Ds
 
Dma
DmaDma
Dma
 
Botnet detection by Imitation method
Botnet detection  by Imitation methodBotnet detection  by Imitation method
Botnet detection by Imitation method
 
Bot net detection by using ssl encryption
Bot net detection by using ssl encryptionBot net detection by using ssl encryption
Bot net detection by using ssl encryption
 
An Aggregate Location Monitoring System Of Privacy Preserving In Authenticati...
An Aggregate Location Monitoring System Of Privacy Preserving In Authenticati...An Aggregate Location Monitoring System Of Privacy Preserving In Authenticati...
An Aggregate Location Monitoring System Of Privacy Preserving In Authenticati...
 
Literature survey on peer to peer botnets
Literature survey on peer to peer botnetsLiterature survey on peer to peer botnets
Literature survey on peer to peer botnets
 
multi processors
multi processorsmulti processors
multi processors
 

Kürzlich hochgeladen

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Kürzlich hochgeladen (20)

Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 

pipeline and vector processing

  • 1. Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing 9-1 Chap. 9 Pipeline and Vector Processing 9-1 Parallel Processing  Simultaneous data processing tasks for the purpose of increasing the computational speed  Perform concurrent data processing to achieve faster execution time  Multiple Functional Unit : Fig. 9-1 Separate the execution unit into eight functional units operating in parallel  Computer Architectural Classification Data-Instruction Stream : Flynn Serial versus Parallel Processing : Feng Parallelism and Pipelining : Händler  Flynn’s Classification 1) SISD (Single Instruction - Single Data stream) » for practical purpose: only one processor is useful » Example systems : Amdahl 470V/6, IBM 360/91 Parallel Processing Example A d d e r - s u b t r a c t o r In t e g e r m u lt ip ly F lo a t in t - p o in t a d d - s u b t r a c t In c r e m e n t e r S h if t u n it L o g ic u n i t F lo a t in t - p o in t d iv i d e F lo a t in t - p o in t m u lt ip ly P r o c e s s o r r e g is t e r s T o M e m o r y = C U M MP U IS D S IS
  • 2. Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing 9-2 2) SIMD (Single Instruction - Multiple Data stream) » vector or array operations 에 적합한 형태 one vector operation includes many operations on a data stream » Example systems : CRAY -1, ILLIAC-IV 3) MISD (Multiple Instruction - Single Data stream) » Data Stream 에 Bottle neck 으로 인해 실제로 사용되지 않음 C U P U 1 P U n P U 2 M M 1 M M n M M 2 D S 1 D S 2 D S n IS IS S h a r e d m e m m o r y P U 1 P U n P U 2 D S C U 1 C U n C U 2 I S 1 I S 2 I S n M M 1M M n M M 2 I S 1 I S 2 I S n D S S h a r e d m e m o r y
  • 3. Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing 9-3 4) MIMD (Multiple Instruction - Multiple Data stream) » 대부분의 Multiprocessor System 에서 사용됨  Main topics in this Chapter Pipeline processing : Sec. 9-2 » Arithmetic pipeline : Sec. 9-3 » Instruction pipeline : Sec. 9-4 Vector processing :adder/multiplier pipeline 이용 , Sec. 9-6 Array processing : 별도의 array processor 이용 , Sec. 9-7 » Attached array processor : Fig. 9-14 » SIMD array processor : Fig. 9-15 Large vector, Matrices, 그리고 Array Data 계산 P U 1 P U n P U 2 D S C U 1 C U n C U 2 I S 1 I S 2 I S n IS 1 IS 2 IS n M M 1 M M n M M 2 S h a r e d m e m o r y vv
  • 4. Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing 9-4 9-2 Pipelining  Pipelining 의 원리 Decomposing a sequential process into suboperations Each subprocess is executed in a special dedicated segment concurrently  Pipelining 의 예제 : Fig. 9-2 Multiply and add operation : ( for i = 1, 2, …, 7 ) 3 개의 Suboperation Segment 로 분리 » 1) : Input Ai and Bi » 2) : Multiply and input Ci » 3) : Add Ci Content of registers in pipeline example : Tab. 9-1  General considerations 4 segment pipeline : Fig. 9-3 » S : Combinational circuit for Suboperation » R : Register(intermediate results between the segments) Space-time diagram : Fig. 9-4 » Show segment utilization as a function of time Task : T1, T2, T3,…, T6 » Total operation performed going through all the segment 435 4,2*13 2,1 RRR CiRRRR BiRAiR +← ←← ←← CiBiAi +* Segment versus clock-cycle
  • 5. Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing 9-5  Speedup S : Nonpipeline / Pipeline S = n • tn / ( k + n - 1 ) • tp = 6 • 6 tn / ( 4 + 6 - 1 ) • tp = 36 tn / 9 tn = 4 » n : task number ( 6 ) » tn : time to complete each task in nonpipeline ( 6 cycle times = 6 tp) » tp : clock cycle time ( 1 clock cycle ) » k : segment number ( 4 ) If n→ ∞ 이면 , S = tn / tp 한 개의 task 를 처리하는 시간이 같을 때 즉 , nonpipeline ( tn ) = pipeline ( k • tp ) 이라고 가정하면 , S = tn / tp = k • tp / tp = k 따라서 이론적으로 k 배 (segment 개수 ) 만큼 처리 속도가 향상된다 .  Pipeline 에는 Arithmetic Pipeline(Sec. 9-3) 과 Instruction Pipeline(Sec. 9-4) 이 있 다 Sec. 9-3 Arithmetic Pipeline  Floating-point Adder Pipeline Example : Fig. 9-6 Add / Subtract two normalized floating-point binary number » X = A x 2a = 0.9504 x 103 1 8765432 9 1 4 3 2 C lo c k c y c le s T 1 T 6T 3 T 5T 2 T 4 T 1 T 6T 3 T 5T 2 T 4 T 1 T 6T 3 T 5T 2 T 4 T 1 T 6T 3 T 5T 2 T 4 Segment Pipeline 에서의 처리 시간 = 9 clock cycles k + n - 1 ≈ n
  • 6. Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing 9-6 4 segments suboperations » 1) Compare exponents by subtraction : 3 - 2 = 1 X = 0.9504 x 103 Y = 0.8200 x 102 » 2) Align mantissas X = 0.9504 x 103 Y = 0.08200 x 103 » 3) Add mantissas Z = 1.0324 x 103 » 4) Normalize result Z = 0.1324 x 104 R C o m p a r e e x p o n e n t s b y s u b t r a c t io n R C h o o s e e x p o n e n t A lig n m a n t is s a s R A d d o r s u b t r a c t m a n t is s a s R N o r m a liz e r e s u lt R R A d ju s t e x p o n e n t R R a b BA E x p o n e n t s M a n t is s a s D if f e r e n c e S e g m e n t 1 : S e g m e n t 4 : S e g m e n t 3 : S e g m e n t 2 :
  • 7. Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing 9-7 9-4 Instruction Pipeline  Instruction Cycle 1) Fetch the instruction from memory 2) Decode the instruction 3) Calculate the effective address 4) Fetch the operands from memory 5) Execute the instruction 6) Store the result in the proper place  Example : Four-segment Instruction Pipeline Four-segment CPU pipeline : Fig. 9-7 » 1) FI : Instruction Fetch » 2) DA : Decode Instruction & calculate EA » 3) FO : Operand Fetch » 4) EX : Execution Timing of Instruction Pipeline : Fig. 9-8 » Instruction 3 에서 Branch 명령 실행 S e g m e n t 1 : S e g m e n t 4 : S e g m e n t 3 : S e g m e n t 2 : F e t c h i n s t r u c t io n f r o m m e m o r y D e c o d e i n s t r u c t i o n a n d c a l c u l a t e e f f e c t i v e a d d r e s s F e t c h o p e r a n d f r o m m e m o r y E x e c u t e in s t r u c t io n B r a n c h ? I n t e r r u p t ? I n t e r r u p t h a n d lin g U p d a t e P C E m p t y p i p e 1 32 1 4 3 2 7 6 5 87654 9 1 21 11 0 1 3 F I E XF OD A F I E XF OD A F I E XF OD A F I E XF OD A F I E XF OD A F I E XF OD A F I E XF OD A F I In s t r u c t io n : ( B r a n c h ) S t e p : BranchNo Branch
  • 8. Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing 9-8  Pipeline Conflicts : 3 major difficulties 1) Resource conflicts » memory access by two segments at the same time 2) Data dependency » when an instruction depend on the result of a previous instruction, but this result is not yet available 3) Branch difficulties » branch and other instruction (interrupt, ret, ..) that change the value of PC  Data Dependency 해결 방법 Hardware 적인 방법 » Hardware Interlock previous instruction 의 결과가 나올 때 까지 Hardware 적인 Delay 를 강제 삽입 » Operand Forwarding previous instruction 의 결과를 곧바로 ALU 로 전달 ( 정상적인 경우 , register 를 경유함 ) Software 적인 방법 » Delayed Load previous instruction 의 결과가 나올 때 까지 No-operation instruction 을 삽입  Handling of Branch Instructions Prefetch target instruction » Conditional branch 에서 branch target instruction ( 조건 맞음 ) 과 다음 instruction ( 조 건 안 맞음 ) 을 모두 fetch
  • 9. Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing 9-9 Branch Target Buffer : BTB » 1) Associative memory 를 이용하여 branch target address 이후에 몇 개에 instruction 을 미리 BTB 에 저장한다 . » 2) 만약 branch instruction 이면 우선 BTB 를 검사하여 BTB 에 있으면 곧바로 가져온 다 (Cache 개념 도입 ) Loop Buffer » 1) small very high speed register file (RAM) 을 이용하여 프로그램에서 loop 를 detect 한다 . » 2) 만약 loop 가 발견되면 loop 프로그램 전체를 Loop Buffer 에 load 하여 실행하면 외부 메모리를 access 하지 않는다 . Branch Prediction » Branch 를 predict 하는 additional hardware logic 사용  Delayed Branch 해결 방법 Fig. 9-8 에서와 같이 branch instruction 이 pipeline operation 을 지연시키는 경우 예제 : Fig. 9-10, p. 318, Sec. 9-5 » 1) No-operation instruction 삽입 » 2) Instruction Rearranging : Compiler 지원 1 32 654 1 . L o a d 4 . S u b t r a c t 3 . A d d 2 . In c r e m e n t I EA I EA I EA I EA ( a ) U s in g n o - o p e r a t io n in s t r u c t io n s C lo c k c y c le s : 1 32 654 I EA I EA I EA I EA ( b ) R e a r r a n g in g t h e in s t r u c t io n s 7 I EA C lo c k c y c le s : 5 . B r a n c h t o X 8 . In s t r u c t io n in X 6 . N o - o p e r a t io n 7 . N o - o p e r a t io n 7 1 098 I EA I EA I EA I EA 1 . L o a d 5 . S u b t r a c t 4 . A d d 2 . In c r e m e n t 3 . B r a n c h t o X 6 . In s t r u c t io n in X 8 I EA
  • 10. Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing 9-10 9-5 RISC Pipeline  RISC CPU 의 특징 Instruction Pipeline 을 이용함 Single-cycle instruction execution Compiler support  Example : Three-segment Instruction Pipeline 3 Suboperations Instruction Cycle » 1) I : Instruction fetch » 2) A : Instruction decoded and ALU operation » 3) E : Transfer the output of ALU to a register, memory, or PC Delayed Load : Fig. 9-9(a) » 3 번째 Instruction(ADD R1 + R3) 에서 Conflict 발생 4 번째 clock cycle 에서 2 번째 Instruction (LOAD R2) 실행과 동시에 3 번째 instruction 에서 R2 를 연산 » Delayed Load 해결 방법 : Fig. 9-9(b) No-operation 삽입 Delayed Branch : Sec. 9-4 에서 이미 설명 1 32 654 1 . L o a d R 1 4 . S t o r e R 3 3 . A d d R 1 + R 2 2 . L o a d R 2 I EA I EA I EA I EA ( a ) P ip e lin e t im in g w it h d a t a c o n f lic t 1 32 654 1 . L o a d R 1 4 . A d d R 1 + R 2 2 . L o a d R 2 I EA I EA I EA I EA ( b ) P ip e lin e t im in g w it h d e la y e d lo a d 5 . S t o r e R 3 3 . N o - o p e r a t io n 7 I EA C lo c k c y c le s : C lo c k c y c le s : Conflict 발생
  • 11. Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing 9-11 9-6 Vector Processing  Science and Engineering Applications Long-range weather forecasting, Petroleum explorations, Seismic data analysis, Medical diagnosis, Aerodynamics and space flight simulations, Artificial intelligence and expert systems, Mapping the human genome, Image processing  Vector Operations Arithmetic operations on large arrays of numbers Conventional scalar processor » Machine language Vector processor » Single vector instruction Initialize I = 0 20 Read A(I) Read B(I) Store C(I) = A(I) + B(I) Increment I = I + 1 If I ≤ 100 go to 20 Continue » Fortran language DO 20 I = 1, 100 20 C(I) = A(I) + B(I) C(1:100) = A(1:100) + B(1:100)
  • 12. Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing 9-12  Vector Instruction Format : Fig. 9-11 ADD A B C 100  Matrix Multiplication 3 x 3 matrices multiplication : n2 = 9 inner product » : 이와 같은 inner product 가 9 개 Cumulative multiply-add operation : n3 = 27 multiply-add » : 이와 같은 multiply-add 가 3 개 따라서 9 X 3 multiply-add = 27 O p e r a t io n c o d e B a s e a d d r e s s s o u r c e 1 B a s e a d d r e s s s o u r c e 2 B a s e a d d r e s s d e s t in a t io n V e c t o r le n g t h           =           ×           333231 232221 131211 333231 232221 131211 333231 232221 131211 ccc ccc ccc bbb bbb bbb aaa aaa aaa 31132112111111 bababac ++= bacc ×+= 3113211211111111 bababacc +++=      C11 의 초기값 = 0
  • 13. Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing 9-13  Pipeline for calculating an inner product : Fig. 9-12 Floating point multiplier pipeline : 4 segment Floating point adder pipeline : 4 segment 예제 ) » after 1st clock input » after 8th clock input » Four section summation kk BABABABAC ++++= 332211 S o u r c e A S o u r c e B M u lt ip lie r p ip e lin e A d d e r p ip e lin e » after 4th clock input A1B1 S o u r c e A S o u r c e B M u lt ip lie r p ip e lin e A d d e r p ip e lin e A4B4 A3B3 A2B2 A1B1 S o u r c e A S o u r c e B M u lt ip lie r p ip e lin e A d d e r p ip e lin e S o u r c e A S o u r c e B M u lt ip lie r p ip e lin e A d d e r p ip e lin e » after 9th, 10th, 11th ,... A8B8 A7B7 A6B6 A5B5 A4B4 A3B3 A2B2 A1B1 A8B8 A7B7 A6B6 A5B5 A4B4 A3B3 A2B2 A1B1 5511 BABA + , , ,   6622 BABA +     +++++ +++++ +++++ ++++= 161612128844 151511117733 141410106622 1313995511 BABABABA BABABABA BABABABA BABABABAC
  • 14. Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing 9-14  Memory Interleaving : Fig. 9-13 Simultaneous access to memory from two or more source using one memory bus system AR 의 하위 2 bit 를 사용하여 4 개중 1 개의 memory module 선택 예제 ) Even / Odd Address Memory Access A R M e m o r y a r r a y D R A R M e m o r y a r r a y D R A R M e m o r y a r r a y D R A R M e m o r y a r r a y D R A d d r e s s b u s D a t a b u s  Supercomputer Supercomputer = Vector Instruction + Pipelined floating-point arithmetic Performance Evaluation Index » MIPS : Million Instruction Per Second » FLOPS : Floating-point Operation Per Second megaflops : 106 , gigaflops : 109 Cray supercomputer : Cray Research » Clay-1 : 80 megaflops, 4 million 64 bit words memory » Clay-2 : 12 times more powerful than the clay-1 VP supercomputer : Fujitsu » VP-200 : 300 megaflops, 32 million memory, 83 vector instruction, 195 scalar instruction » VP-2600 : 5 gigaflops
  • 15. Computer System Architecture Dept. of Info. Of ComputerChap. 9 Pipeline and Vector ProcessingChap. 9 Pipeline and Vector Processing 9-15 9-7 Array Processors  Performs computations on large arrays of data  Array Processing Attached array processor : Fig. 9-14 » Auxiliary processor attached to a general purpose computer SIMD array processor : Fig. 9-15 » Computer with multiple processing units operating in parallel Vector 계산 C = A + B 에서 ci = ai + bi 를 각각의 PEi 에서 동시에 실행 Vector processing : Adder/Multiplier pipeline 이용 Array processing : 별도의 array processor 이용 G e n e r a l- p u r p o s e c o m p u t e r In p u t - O u t p u t in t e r f a c e A t t a c h e d a r r a y P r o c e s s o r M a in m e m o r y L o c a l m e m o r y H ig h - s p e e d m e m o r y t o - m e m o r y b u s P E 1 P E 3 P E 2 M 1 M 3 M 2 P E n M n M a s t e r c o n t r o l u n it M a i n m e m o r y