How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
Operating Systems - Distributed Parallel Computing
1. Operating Systems
CMPSCI 377
Distributed Parallel Programming
Emery Berger
University of Massachusetts Amherst
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
2. Outline
Previously:
Programming with threads
Shared memory, single machine
Today:
Distributed parallel programming
Message passing
some material adapted from slides by Kathy Yelick, UC Berkeley
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 2
3. Why Distribute?
SMP (symmetric
multiprocessor):
P2
P1 Pn
easy to program
$
$ $
but limited
Bus becomes network/bus
bottleneck when
processors not memory
operating locally
Typically < 32
processors
$$$
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 3
4. Distributed Memory
Vastly different platforms
Networks of workstations
Supercomputers
Clusters
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 4
5. Distributed Architectures
Distributed memory machines:
local memory but no global memory
Individual nodes often SMPs
Network interface for all interprocessor
communication – message passing
P1 NI
P0 NI Pn NI
memory
memory ... memory
interconnect
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 5
6. Message Passing
Program: # independent communicating processes
Thread + local address space only
Shared data: partitioned
Communicate by send & receive events
Cluster = message sent over sockets
s: 14
s: 12 s: 11
receive Pn,s
y = ..s ... i: 3
i: 2 i: 1
send P1,s
P1 Pn
P0
Network
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 6
7. Message Passing
Pros: efficient
Makes data sharing explicit
Can communicate only what is strictly
necessary for computation
No coherence protocols, etc.
Cons: difficult
Requires manual partitioning
Divide up problem across processors
Unnatural model (for some)
Deadlock-prone (hurray)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 7
8. Message Passing Interface
Library approach to message-passing
Supports most common architectural
abstractions
Vendors supply optimized versions
⇒ programs run on different machine, but with
(somewhat) different performance
Bindings for popular languages
Especially Fortran, C
Also C++, Java
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 8
9. MPI execution model
Spawns multiple copies of same program
(SPMD = single program, multiple data)
Each one is different “process”
(different local memory)
Can act differently by determining which
processor “self” corresponds to
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 9
10. An Example
#include <stdio.h>
#include <mpi.h>
int main(int argc, char * argv[]) {
int rank, size;
MPI_Init(&argc, &argv );
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf(quot;Hello world from process %d of %dnquot;,
rank, size);
MPI_Finalize();
return 0;
}
% mpirun –np 10 exampleProgram
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 10
11. An Example
#include <stdio.h>
#include <mpi.h> initializes MPI
(passes
int main(int argc, char * argv[])arguments in)
{
int rank, size;
MPI_Init(&argc, &argv );
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf(quot;Hello world from process %d of %dnquot;,
rank, size);
MPI_Finalize();
return 0;
}
% mpirun –np 10 exampleProgram
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 11
12. An Example
#include <stdio.h>
#include <mpi.h> returns # of
processors in
int main(int argc, char * argv[]) { “world”
int rank, size;
MPI_Init(&argc, &argv );
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf(quot;Hello world from process %d of %dnquot;,
rank, size);
MPI_Finalize();
return 0;
}
% mpirun –np 10 exampleProgram
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 12
13. An Example
#include <stdio.h>
#include <mpi.h>
int main(int argc, char * argv[]) {
int rank, size;
which processor
MPI_Init(&argc, &argv );
MPI_Comm_size(MPI_COMM_WORLD, &size); am I?
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf(quot;Hello world from process %d of %dnquot;,
rank, size);
MPI_Finalize();
return 0;
}
% mpirun –np 10 exampleProgram
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 13
14. An Example
#include <stdio.h>
#include <mpi.h>
int main(int argc, char * argv[]) {
int rank, size;
MPI_Init(&argc, &argv );
MPI_Comm_size(MPI_COMM_WORLD, &size);
we’re done
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
sending
printf(quot;Hello world from process %d of %dnquot;,
messages
rank, size);
MPI_Finalize();
return 0;
}
% mpirun –np 10 exampleProgram
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 14
15. An Example
% mpirun –np 10 exampleProgram
Hello world from process 5 of 10
Hello world from process 3 of 10
Hello world from process 9 of 10
Hello world from process 0 of 10
Hello world from process 2 of 10
Hello world from process 4 of 10
Hello world from process 1 of 10
Hello world from process 6 of 10
Hello world from process 8 of 10
Hello world from process 7 of 10
% // what happened?
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 15
16. Message Passing
Messages can be sent directly to another
processor
MPI_Send, MPI_Recv
Or to all processors
MPI_Bcast (does send or receive)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 16
17. Send/Recv Example
Send data from process 0 to all
“Pass it along” communication
Operations:
MPI_Send (data *, count, MPI_INT, dest, 0,
MPI_COMM_WORLD );
MPI_Recv (data *, count, MPI_INT, source, 0,
MPI_COMM_WORLD );
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 17
18. Send & Receive
int main(int argc, char * argv[]) {
int rank, value, size;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
do {
if (rank == 0) {
scanf( quot;%dquot;, &value );
MPI_Send(&value, 1, MPI_INT, rank + 1,
0, MPI_COMM_WORLD );
} else {
MPI_Recv(&value, 1, MPI_INT, rank - 1,
0, MPI_COMM_WORLD, &status );
if (rank < size - 1)
MPI_Send( &value, 1, MPI_INT, rank + 1,
0, MPI_COMM_WORLD );
}
printf(quot;Process %d got %dnquot;, rank, value);
} while (value >= 0);
MPI_Finalize();
return 0;
}
Send integer input in a ring
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 18
19. Send & Receive
int main(int argc, char * argv[]) {
int rank, value, size;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
send
do {
if (rank == 0) {
destination?
scanf( quot;%dquot;, &value );
MPI_Send(&value, 1, MPI_INT, rank + 1,
0, MPI_COMM_WORLD );
} else {
MPI_Recv(&value, 1, MPI_INT, rank - 1,
0, MPI_COMM_WORLD, &status );
if (rank < size - 1)
MPI_Send( &value, 1, MPI_INT, rank + 1,
0, MPI_COMM_WORLD );
}
printf(quot;Process %d got %dnquot;, rank, value);
} while (value >= 0);
MPI_Finalize();
return 0;
}
Send integer input in a ring
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 19
20. Send & Receive
int main(int argc, char * argv[]) {
int rank, value, size;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
do {
if (rank == 0) {
scanf( quot;%dquot;, &value );
MPI_Send(&value, 1, MPI_INT, rank + 1, receive from?
0, MPI_COMM_WORLD );
} else {
MPI_Recv(&value, 1, MPI_INT, rank - 1,
0, MPI_COMM_WORLD, &status );
if (rank < size - 1)
MPI_Send( &value, 1, MPI_INT, rank + 1,
0, MPI_COMM_WORLD );
}
printf(quot;Process %d got %dnquot;, rank, value);
} while (value >= 0);
MPI_Finalize();
return 0;
}
Send integer input in a ring
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 20
21. Send & Receive
int main(int argc, char * argv[]) {
int rank, value, size;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
do {
if (rank == 0) {
scanf( quot;%dquot;, &value );
MPI_Send(&value, 1, MPI_INT, rank + 1, message tag
0, MPI_COMM_WORLD );
} else {
message tag
MPI_Recv(&value, 1, MPI_INT, rank - 1,
0, MPI_COMM_WORLD, &status );
if (rank < size - 1)
MPI_Send( &value, 1, MPI_INT, rank + 1, message tag
0, MPI_COMM_WORLD );
}
printf(quot;Process %d got %dnquot;, rank, value);
} while (value >= 0);
MPI_Finalize();
return 0;
}
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 21
22. Exercise
Compute expensiveComputation(i) on n processors;
process 0 computes & prints sum
// MPI_Send (&value, 1, MPI_INT, dest, 0, MPI_COMM_WORLD );
int main(int argc, char * argv[]) {
int rank, size;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank == 0) {
int sum = 0;
printf(“sum = %dnquot;, sum);
} else {
}
MPI_Finalize(); return 0;
}
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 22
23. Broadcast
Send and receive: point-to-point
Can also broadcast data
Source sends to everyone else
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 23
24. Broadcast
#include <stdio.h>
#include <mpi.h>
int main(int argc, char * argv[]) {
int rank, value;
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
do {
if (rank == 0)
scanf( quot;%dquot;, &value );
MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf( quot;Process %d got %dnquot;, rank, value );
} while (value >= 0);
MPI_Finalize( );
return 0;
}
Repeatedly broadcast input (one integer) to all
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 24
25. Broadcast
#include <stdio.h>
#include <mpi.h>
int main(int argc, char * argv[]) {
int rank, value;
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
send or receive
do {
value
if (rank == 0)
scanf( quot;%dquot;, &value );
MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf( quot;Process %d got %dnquot;, rank, value );
} while (value >= 0);
MPI_Finalize( );
return 0;
}
Repeatedly broadcast input (one integer) to all
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 25
26. Broadcast
#include <stdio.h>
#include <mpi.h>
int main(int argc, char * argv[]) {
int rank, value;
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
how many to
do {
send/receive?
if (rank == 0)
scanf( quot;%dquot;, &value );
MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf( quot;Process %d got %dnquot;, rank, value );
} while (value >= 0);
MPI_Finalize( );
return 0;
}
Repeatedly broadcast input (one integer) to all
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 26
27. Broadcast
#include <stdio.h>
#include <mpi.h>
int main(int argc, char * argv[]) {
int rank, value;
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
what’s the
do {
datatype?
if (rank == 0)
scanf( quot;%dquot;, &value );
MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf( quot;Process %d got %dnquot;, rank, value );
} while (value >= 0);
MPI_Finalize( );
return 0;
}
Repeatedly broadcast input (one integer) to all
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 27
28. Broadcast
#include <stdio.h>
#include <mpi.h>
int main(int argc, char * argv[]) {
int rank, value;
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
who’s “root” for
do {
broadcast?
if (rank == 0)
scanf( quot;%dquot;, &value );
MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf( quot;Process %d got %dnquot;, rank, value );
} while (value >= 0);
MPI_Finalize( );
return 0;
}
Repeatedly broadcast input (one integer) to all
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 28
29. Communication Flavors
Basic communication
blocking = wait until done
point-to-point = from me to you
broadcast = from me to everyone
Non-blocking
Think create & join, fork & wait…
MPI_ISend, MPI_IRecv
MPI_Wait, MPI_Waitall, MPI_Test
Collective
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 29
30. The End
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 30
31. Scaling Limits
Kernel used in
atmospheric models
99% floating point
ops; multiplies/adds
Sweeps through
memory with little
reuse
One “copy” of code
running
independently on
varying numbers of
procs
From Pat Worley, ORNL
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 31