SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Ciaran Cox (1115773)
MA5605: Financial Computing 3 Assignment
0.1 Explicit Time Stepping (C File Appendix .1)
Revisiting the explicit time stepping problem from task 2; assignment 2, this code can be paral-
lelised in multiple places. The computation of the updated vector along x uses 3 values from the
previous vector back in time. Therefore, this for loop can be computed in parallel and does not
matter what order the elements of this new time vector are placed.
#pragma omp parallel for num_threads(NUM_THREADS) private(i)
for(i=1;i<N;i++)
Unew[i]=alpha*Uold[i-1]+(1-2*alpha)*Uold[i]
+alpha*Uold[i+1]+k*Function(a+i*h,t-k);
The same parallel technique can be used with the initial condition and on the updating of the x
vectors iterating up through time. A sum reduction can be done on the error summation.
#pragma omp parallel for num_threads(NUM_THREADS) private(i) reduction(+:sum)
for(i=0;i<=N;i++)
sum+=pow(check(a+i*h,T)-Uold[i],2);
Each thread receiving their own part of the summation and then the reduction statement brings
them back together at the end of the for loop. The computation times were taken for 1 thread and 2
threads. The times tabled are an average of 3 runs, along with the convergence of error.
As the iteration increases the error converges along with an increasing computation time. 2 threads
Table 1: Times and errors for parallelised algorithm
M N error 1 thread 2 threads
131072 128 8.41137e-05 1.90976 1.36499
524288 256 2.97356e-05 15.158 9.24518
2097152 512 1.05128e-05 119.0797 70.27753
8388608 1024 3.71683e-06 956.022 684.3453
is quicker than 1 thread on all iterations in a similar ratio, due to the portion of serial code being
parallelised. Taking the algorithm and running it on a cluster and then taking average readings for
3 runs along with the speed ups shown below.
1
Table 2: Times and speed ups on the cluster
M N 1 thread 2 threads 4 threads 8 threads
131072 128 2.2872 1.82623 1.4034 1.49945
Speed up 1.2524 1.6298 1.5254
524288 256 17.3948 10.9538 6.8831 7.0257
Speed up 1.588 2.5272 2.4759
2097152 512 134.877 77.9357 46.2854 36.0823
Speed up 1.7306 2.914 3.738
8388608 1024 1074.44 581.364 330.6903 215.02
Speed up 1.8481 3.2491 4.9969
From 4 threads to 8 threads on the first 2 rows the speed up is less. However, when going to larger
M values the speed up is noticed because each thread has more to do. The maximum speed up
achieved for the first two rows is just on 4 threads. Increasing the threads will decrease the speed
up because not enough work is distributed to each individual thread, and each thread is waiting on
one another to process.
0.2 Implicit Time Stepping - Serial Case (C File Appendix .2)
The initial conditions gives the first time vector for 0 < n < N. Moving up one more time vec-
tor implicitly gives an N −1 tri-diagonal matrix, with the forcing term being added to each of the
previous values in the previous time vector. The left boundary condition and right boundary con-
dition also needs to be added to the first and last (N − 1) entries respectively. Shown below is the
corresponding matrix system.










1+2α −α 0 ... 0
−α 1+2α −α 0
...
0
... ... ... 0
... 0 −α 1+2α −α
0 ... 0 −α 1+2α



















Um
1
Um
2
...
Um
N−2
Um
N−1









=









Um−1
1 +k f(x1,tm)+αUL(tm)
Um−1
2 +k f(x2,tm)
...
Um−1
N−2 +k f(xN−2,tm)
Um−2
N−1 +k f(xN−1,tm)+αUR(tm)









Solving the linear system is done using the CG-algorithm with ε = 1e − 10. After each iteration
forward through time the next vector is updated with the previous solution, along with the forcing
term and boundary conditions being updated. The initial solution of the system is set to 0, passed
into the CG-algorithm. The average number of CG iterations used with the error’s are shown below.
2
Table 3: Implicit Time stepping errors and average CG iterations
M N Error Time Avg. CG iterations
131072 256 3.11153e-05 21.654 23.9947
524288 512 1.09378e-05 156.757 22.9984
2097152 1024 3.7316e-06 1267.71 22.99
8388608 2048 9.6833e-07 10413.3 22.9993
Showing strong convergence to the solution as N and M are increased, however CG-iterations
remaining on average the same. Increase in computation time is increased more than the previous
explicit method, to just under three hours for the final computation.
0.3 Implicit Time Stepping - Open MP (C File Appendix .3)
Parallelising the previous implicit algorithm was done in multiple places, including the CG-algorithm.
The initial conditions and the addition of the forcing term for each vector component, of the r.h.s
of the system at each time is done with a simple for pragma. Inside the CG-algorithm; the initial
computation of the residual, two norms (r.h.s squared and residual squared) done in parallel. When
in the while loop, the pq and rr computation uses the reduction pragma. The updating of p, x and r
are also done in parallel along with the calculation of q. Once out of the function, replacement of
old vector with the new solution vector is done in parallel. Concluding with a reduction pragma for
the average of the iterations and error summation.
Table 4: Times and errors for parallelised algorithm
M N error 1 thread 2 threads
131072 256 3.11153e-05 25.2604 24.9901
524288 512 1.09378e-05 189.347 142.821
2097152 1024 3.7316e-06 1456.87 990.64
8388608 2048 9.6833e-07 11413.5 7180.39
With these changes the error’s remain the same, but the computation time drastically increases for
smaller iterations than the bigger iterations. With the extra line of code for parallelising, the com-
putation time for 1 thread is longer than running the program in serial. Taking the algorithm on to
3
the cluster, bigger problems were run quicker as more threads were used because more processing
was done than communication between threads. Tabulated below are the speed-ups for a different
number of threads run on the cluster, with each reading is an average of three runs.
Table 5: Times and speed ups on the cluster
M N 1 thread 2 threads 4 threads 8 threads
131072 256 27.4566 50.8358 61.1607 89.3802
Speed up 0.5401 0.4489 0.3072
524288 512 199.298 249.5213 258.126 348.0613
Speed up 0.7987 0.7721 0.5726
2097152 1024 1535.14 1357.613 1251.647 1511.127
Speed up 1.1308 1.2265 1.0159
8388608 2048 11542.6 8407.076 6232.403 6782.21
Speed up 1.373 1.852 1.702
Showing speed-up clearly decreasing for the smaller problem, however speed-up increasing for
the bigger problems. For M = 2097152,N = 1024 speed-up is achieved up to 4 threads, however
back to the original 1 thread speed for 8 threads. On the biggest problem speed-up is successfully
achieved up to 4 threads, but a decrease moving to 8 threads. Concluding 8 threads is in efficient
for this problem.
Going beyond open-MP
In the CG-iterations algorithm the two initial norms, residual squared and r.h.s squared are inde-
pendent from each other and could be computed on separate machines. Also, in the while loop the
updating of x and r can also be done on separate machines. The average number of the iterations
and the error summation do not rely on each other, so can be done on separate machines. These few
changes may speed up the algorithm, but communication time between machines is now another
factor. Perhaps partitioning the initial condition and re-formulating intermediate boundary condi-
tions between partitions, each partition could then be a separate boundary value problem. Each
partition can be computed on separate machines and bought back together again at maturity. Ex-
plicit method would need to be used to avoid solving a tri-diagonal linear system of equations at
each point in time. The partitioning boundary conditions would need to be computed first and then
each partition to be solved explicitly in a for loop at each point in time.
4
.1 Task 1
/*Ciaran Cox (1115773) 1115773@my.brunel.ac.uk*/
/*relevant libruarys*/
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include<omp.h>
#define PI 3.14159265358979323846264338327950288
#define NUM_THREADS 1
/*Function prototypes*/
double Function(double, double);
double leftbound(double);
double rightbound(double);
double initial(double);
double check(double, double);
/*main function*/
int main()
{
/*Defining variables and parameters*/
long int M,N,i,j;
double a,b,h,T,k,alpha,*Unew,*Uold,error,sum=0,t1,t2;
int status1, status2;
/*Input from user for N and M*/
printf("Enter N:n");
status1=scanf("%ld",&N);
printf("Enter M:n");
status2=scanf("%ld",&M);
/*checking input is valid*/
if(status1!=1 || status2!=1)
{
printf("incorrect input...exitingn");
exit(1);
}
printf("MtNterrortttimen");
5
t1=omp_get_wtime();
/*Dynamically allocating memory*/
if((Unew=(double*)malloc((N+1)*sizeof(double)))==NULL) exit(1);
if((Uold=(double*)malloc((N+1)*sizeof(double)))==NULL) exit(1);
a=0; b=1; T=2; h=(b-a)/N; k=T/M; alpha=k*(1/pow(h,2));
/*Initial conditions*/
#pragma omp parallel for num_threads(NUM_THREADS) private(i)
for(i=0;i<=N;i++)
{
Uold[i]=initial(a+i*h);
}
/*Iterating up through time*/
for(j=1;j<=M;j++)
{
double t = j*k;
Unew[0]=leftbound(t);
Unew[N]=rightbound(t);
/*solving explicitly*/
#pragma omp parallel for num_threads(NUM_THREADS) private(i)
for(i=1;i<N;i++)
{
Unew[i]=alpha*Uold[i-1]+(1-2*alpha)*Uold[i]
+alpha*Uold[i+1]+k*Function(a+i*h,t-k);
}
/*replacing old vector with new vector*/
#pragma omp parallel for num_threads(NUM_THREADS) private(i)
for(i=0;i<=N;i++)
{
Uold[i]=Unew[i];
}
}
/*Computing error*/
#pragma omp parallel for num_threads(NUM_THREADS) private(i) reduction(+:sum)
for(i=0;i<=N;i++)
6
{
sum+=pow(check(a+i*h,T)-Uold[i],2);
}
error=sqrt(sum);
/*freeing memory*/
free(Unew); free(Uold);
t2=omp_get_wtime();
/*prints results for required input from user*/
printf("%dt%dt%1gt%lgnnn",M,N,error,t2-t1);
}
/*Function for the forcing term*/
double Function(double x, double t)
{
double out;
out=exp(-2*t)*cos(2*PI*x)*(-PI*cos(PI*t)
+(4*pow(PI,2)-2)*(1-sin(PI*t)));
return out;
}
/*Function for the left boundary condition*/
double leftbound(double t)
{
double out;
out=check(0.0,t);
return out;
}
/*Function for the right boundary condition*/
double rightbound(double t)
{
double out;
out=check(1.0,t);
return out;
}
/*Function for the initial condition*/
double initial(double x)
7
{
double out;
out=check(x,0.0);
return out;
}
/*Function for the exact solution*/
double check(double x, double t)
{
double out;
out=(1-sin(PI*t))*exp(-2*t)*cos(2*PI*x);
return out;
}
.2 task 2
/*Ciaran Cox (1115773) 1115773@my.brunel.ac.uk*/
/*relevant libruarys*/
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include<omp.h>
#define PI (4.0*atan(1.0))
/*Function prototypes*/
double Function(double, double);
double leftbound(double);
double rightbound(double);
double initial(double);
double check(double, double);
int cg(double, double, int, double*, double*, double);
/*main function*/
int main()
{
/*Defining variables and parameters*/
long int M,N,i,j;
8
double a,b,h,T,k,alpha,error,sum=0.0,t1,t2,*xx,
*it=NULL,*bb0=NULL,*sol=NULL,itav=0.0,eps;
int status1, status2;
/*Input from user for N and M*/
printf("Enter N:n");
status1=scanf("%ld",&N);
printf("Enter M:n");
status2=scanf("%ld",&M);
/*checking input is valid*/
if(status1!=1 || status2!=1)
{
printf("incorrect input...exitingn");
exit(1);
}
printf("MtNterrortttimetCG-iterationsn");
t1=omp_get_wtime();
/*Dynamically allocating memory*/
if((xx=(double*)malloc((N-1)*sizeof(double)))==NULL) exit(1);
if((bb0=(double*)malloc((N-1)*sizeof(double)))==NULL) exit(1);
if((it=(double*)malloc(M*sizeof(double)))==NULL) exit(1);
if((sol=(double*)malloc((N+1)*sizeof(double)))==NULL) exit(1);
a=0.0; b=1.0; T=2.0; h=(b-a)/N; k=T/M; alpha=k/h/h; eps=1e-10;
/*initial conditions*/
for(i=1;i<=N-1;i++)
{
bb0[i-1]=initial(a+i*h);
}
/*Iterating up through time*/
for(j=1;j<=M;j++)
{
/*setting guess to zero*/
for(i=0;i<N-1;i++)
{
xx[i]=0;
9
}
/*adjusting for forcing term*/
for(i=1;i<=N-1;i++)
{
bb0[i-1]+=k*Function(a+i*h,j*k);
}
/*incorporating boundary conditions*/
bb0[0]+=leftbound(j*k)*alpha;
bb0[N-2]+=rightbound(j*k)*alpha;
/*CG-Algorithm, solving linear system*/
it[j-1]=cg(1+2*alpha,-alpha,N-1,&*xx,bb0,eps);
/*replacing old vector with new solution vector*/
for(i=0;i<N-1;i++)
{
bb0[i]=xx[i];
}
}
/*creating solution vector for error computation*/
for(i=1;i<=N-1;i++)
{
sol[i]=bb0[i-1];
}
sol[0]=leftbound(T); sol[N]=rightbound(T);
/*average of CG iterations*/
for(j=0;j<M;j++)
{
itav+=it[j];
}
itav=itav/M;
/*Computing error*/
for(i=0;i<=N;i++)
{
sum+=pow(check(a+i*h,T)-sol[i],2);
}
10
error=sqrt(sum);
/*freeing memory*/
free(xx); free(bb0); free(it); free(sol);
t2=omp_get_wtime();
/*prints results for required input from user*/
printf("%dt%dt%1gt%lgt%lgnnn",M,N,error,t2-t1,itav);
}
/*Function for the forcing term*/
double Function(double x, double t)
{
double out;
out=exp(-2*t)*cos(2*PI*x)*(-PI*cos(PI*t)
+(4*pow(PI,2)-2)*(1-sin(PI*t)));
return out;
}
/*Function for the left boundary condition*/
double leftbound(double t)
{
double out;
out=check(0.0,t);
return out;
}
/*Function for the right boundary condition*/
double rightbound(double t)
{
double out;
out=check(1.0,t);
return out;
}
/*Function for the initial condition*/
double initial(double x)
{
double out;
out=check(x,0.0);
11
return out;
}
/*Function for the exact solution*/
double check(double x, double t)
{
double out;
out=(1-sin(PI*t))*exp(-2*t)*cos(2*PI*x);
return out;
}
/*Function for CG-Algorithm*/
int cg(double A,double B,int n,double *x,double *b,double eps)
{
int i,j,k;
double rr,pq,bb,alpha,beta,rrold;
double *r=NULL,*p=NULL,*q=NULL;
k=0;
if (n<=2) { return 0; }
if( (r=(double*)malloc(n*sizeof(double)))==NULL) exit(1);
if( (p=(double*)malloc(n*sizeof(double)))==NULL) exit(1);
if( (q=(double*)malloc(n*sizeof(double)))==NULL) exit(1);
r[0]=b[0]-(A*x[0]+B*x[0+1]);
for (i=1;i<n-1;i++)
{
r[i]=b[i]-(B*x[i-1]+A*x[i]+B*x[i+1]);
}
r[n-1]=b[n-1]-(B*x[n-2]+A*x[n-1]);
bb=0;
for (i=0;i<n;i++)
{
bb+=b[i]*b[i];
}
rr=0;
for (i=0;i<n;i++)
{
12
rr+=r[i]*r[i];
}
k=0;
while (sqrt(rr)>eps*sqrt(bb))
{
k=k+1;
if (k==1)
{
for (i=0;i<n;i++)
{
p[i]=r[i];
}
beta=0;
}
else
{
beta=rr/rrold;
for (i=0;i<n;i++)
{
p[i]=r[i]+beta*p[i];
}
}
q[0]=A*p[0]+B*p[0+1];
for (i=1;i<n-1;i++)
{
q[i]=B*p[i-1]+A*p[i]+B*p[i+1];
}
q[n-1]=B*p[n-2]+A*p[n-1];
pq=0;
for (i=0;i<n;i++)
{
pq+=p[i]*q[i];
}
alpha=rr/pq;
13
for (i=0;i<n;i++)
{
x[i]=x[i]+alpha*p[i];
r[i]=r[i]-alpha*q[i];
}
rrold=rr;
rr=0;
for (i=0;i<n;i++)
{
rr+=r[i]*r[i];
}
}
free(p); free(q); free(r);
return k;
}
.3 task 3
/*Ciaran Cox (1115773) 1115773@my.brunel.ac.uk*/
/*relevant libruarys*/
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include<omp.h>
#define PI (4.0*atan(1.0))
#define NUM_THREADS 2
/*Function prototypes*/
double Function(double, double);
double leftbound(double);
double rightbound(double);
double initial(double);
double check(double, double);
int cg(double, double, int, double*, double*, double);
/*main function*/
14
int main()
{
/*Defining variables and parameters*/
long int M,N,i,j;
double a,b,h,T,k,alpha,error,sum=0.0,t1,t2,*xx,
*it=NULL,*bb0=NULL,*sol=NULL,itav=0.0,eps;
/*Input from user for N and M*/
printf("Enter N:n"); scanf("%ld",&N);
if(N%1!=0)
{
print("Input not integern");
exit(1);
}
printf("Enter M:n"); scanf("%ld",&M);
printf("MtNterrortttimetcg-iterationsn");
t1=omp_get_wtime();
/*Dynamically allocating memory*/
if((xx=(double*)malloc((N-1)*sizeof(double)))==NULL) exit(1);
if((bb0=(double*)malloc((N-1)*sizeof(double)))==NULL) exit(1);
if((it=(double*)malloc(M*sizeof(double)))==NULL) exit(1);
if((sol=(double*)malloc((N+1)*sizeof(double)))==NULL) exit(1);
a=0.0; b=1.0; T=2.0; h=(b-a)/N; k=T/M; alpha=k/h/h; eps=1e-10;
/*Initial conditions*/
#pragma omp parallel for num_threads(NUM_THREADS) private(i)
for(i=1;i<=N-1;i++)
{
bb0[i-1]=initial(a+i*h);
}
/*Iterating up through time*/
for(j=1;j<=M;j++)
{
/*setting guess to zero*/
for(i=0;i<N-1;i++)
{
15
xx[i]=0;
}
/*Adjusting for forcing term*/
#pragma omp parallel for num_threads(NUM_THREADS) private(i)
for(i=1;i<=N-1;i++)
{
bb0[i-1]+=k*Function(a+i*h,j*k);
}
/*boundary conditions*/
bb0[0]+=leftbound(j*k)*alpha;
bb0[N-2]+=rightbound(j*k)*alpha;
/*solving system of equations by CG-algorithm*/
it[j-1]=cg(1+2*alpha,-alpha,N-1,&*xx,bb0,eps);
/*replacing old vector with new solution vector*/
#pragma omp parallel for num_threads(NUM_THREADS) private(i)
for(i=0;i<N-1;i++)
{
bb0[i]=xx[i];
}
}
/*setting up solution vector for error computation*/
#pragma omp parallel for num_threads(NUM_THREADS) private(i)
for(i=1;i<=N-1;i++)
{
sol[i]=bb0[i-1];
}
sol[0]=leftbound(T); sol[N]=rightbound(T);
/*average of CG-algorithm iterations*/
#pragma omp parallel for num_threads(NUM_THREADS) private(j) reduction(+:itav)
for(j=0;j<M;j++)
{
itav+=it[j];
}
itav=itav/M;
16
/*Computing error*/
#pragma omp parallel for num_threads(NUM_THREADS) private(j) reduction(+:sum)
for(i=0;i<=N;i++)
{
sum+=pow(check(a+i*h,T)-sol[i],2);
}
error=sqrt(sum);
/*freeing memory*/
free(xx); free(bb0); free(it); free(sol);
t2=omp_get_wtime();
/*printing results for required input from user*/
printf("%dt%dt%1gt%lgt%lgnnn",M,N,error,t2-t1,itav);
}
/*Function for the forcing term*/
double Function(double x, double t)
{
double out;
out=exp(-2*t)*cos(2*PI*x)*(-PI*cos(PI*t)
+(4*pow(PI,2)-2)*(1-sin(PI*t)));
return out;
}
/*Function for the left boundary condition*/
double leftbound(double t)
{
double out;
out=check(0.0,t);
return out;
}
/*Function for the right boundary condition*/
double rightbound(double t)
{
double out;
out=check(1.0,t);
return out;
17
}
/*Function for the initial condition*/
double initial(double x)
{
double out;
out=check(x,0.0);
return out;
}
/*Function for the exact solution*/
double check(double x, double t)
{
double out;
out=(1-sin(PI*t))*exp(-2*t)*cos(2*PI*x);
return out;
}
/*Function for CG-Algorithm*/
int cg(double A,double B,int n,double *x,double *b,double eps)
{
int i,j,k;
double rr,pq,bb,alpha,beta,rrold;
double *r=NULL,*p=NULL,*q=NULL;
k=0;
if (n<=2) { return 0; }
if( (r=(double*)malloc(n*sizeof(double)))==NULL) exit(1);
if( (p=(double*)malloc(n*sizeof(double)))==NULL) exit(1);
if( (q=(double*)malloc(n*sizeof(double)))==NULL) exit(1);
r[0]=b[0]-(A*x[0]+B*x[0+1]);
#pragma omp parallel for num_threads(NUM_THREADS) private(i)
for (i=1;i<n-1;i++)
{
r[i]=b[i]-(B*x[i-1]+A*x[i]+B*x[i+1]);
}
r[n-1]=b[n-1]-(B*x[n-2]+A*x[n-1]);
bb=0;
18
#pragma omp parallel for num_threads(NUM_THREADS) private(i) reduction(+:bb)
for (i=0;i<n;i++)
{
bb+=b[i]*b[i];
}
rr=0;
#pragma omp parallel for num_threads(NUM_THREADS) private(i) reduction(+:rr)
for (i=0;i<n;i++)
{
rr+=r[i]*r[i];
}
k=0;
while (sqrt(rr)>eps*sqrt(bb))
{
k=k+1;
if (k==1)
{
#pragma omp parallel for num_threads(NUM_THREADS) private(i)
for (i=0;i<n;i++)
{
p[i]=r[i];
}
beta=0;
}
else
{
beta=rr/rrold;
#pragma omp parallel for num_threads(NUM_THREADS) private(i)
for (i=0;i<n;i++)
{
p[i]=r[i]+beta*p[i];
}
}
q[0]=A*p[0]+B*p[0+1];
19
#pragma omp parallel for num_threads(NUM_THREADS) private(i)
for (i=1;i<n-1;i++)
{
q[i]=B*p[i-1]+A*p[i]+B*p[i+1];
}
q[n-1]=B*p[n-2]+A*p[n-1];
pq=0;
#pragma omp parallel for num_threads(NUM_THREADS) private(i) reduction(+:pq)
for (i=0;i<n;i++)
{
pq+=p[i]*q[i];
}
alpha=rr/pq;
#pragma omp parallel for num_threads(NUM_THREADS) private(i)
for (i=0;i<n;i++)
{
x[i]=x[i]+alpha*p[i];
r[i]=r[i]-alpha*q[i];
}
rrold=rr;
rr=0;
#pragma omp parallel for num_threads(NUM_THREADS) private(i) reduction(+:rr)
for (i=0;i<n;i++)
{
rr+=r[i]*r[i];
}
}
free(p); free(q); free(r);
return k;
}
20

Weitere ähnliche Inhalte

Was ist angesagt?

Basic communication operations - One to all Broadcast
Basic communication operations - One to all BroadcastBasic communication operations - One to all Broadcast
Basic communication operations - One to all BroadcastRashiJoshi11
 
JOURNAL PAPER
JOURNAL PAPERJOURNAL PAPER
JOURNAL PAPERRaj kumar
 
Multicore and GPU Programming
Multicore and GPU ProgrammingMulticore and GPU Programming
Multicore and GPU ProgrammingRoland Bruggmann
 
user_defined_functions_forinterpolation
user_defined_functions_forinterpolationuser_defined_functions_forinterpolation
user_defined_functions_forinterpolationsushanth tiruvaipati
 
Self-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesSelf-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesNECST Lab @ Politecnico di Milano
 
Parallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPParallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPIJSRED
 
IIIRJET-Implementation of Image Compression Algorithm on FPGA
IIIRJET-Implementation of Image Compression Algorithm on FPGAIIIRJET-Implementation of Image Compression Algorithm on FPGA
IIIRJET-Implementation of Image Compression Algorithm on FPGAIRJET Journal
 
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMCOMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMcsitconf
 
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMCOMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMcscpconf
 
PRAM algorithms from deepika
PRAM algorithms from deepikaPRAM algorithms from deepika
PRAM algorithms from deepikaguest1f4fb3
 
CMAC Neural Networks
CMAC Neural NetworksCMAC Neural Networks
CMAC Neural NetworksIJMREMJournal
 
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLI
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLIMETHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLI
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLIijcsit
 
Cerebellar Model Articulation Controller
Cerebellar Model Articulation ControllerCerebellar Model Articulation Controller
Cerebellar Model Articulation ControllerZahra Sadeghi
 
Using information theory principles to schedule real time tasks
 Using information theory principles to schedule real time tasks Using information theory principles to schedule real time tasks
Using information theory principles to schedule real time tasksazm13
 
Elementary Parallel Algorithms
Elementary Parallel AlgorithmsElementary Parallel Algorithms
Elementary Parallel AlgorithmsHeman Pathak
 
A comprehensive study on Applications of Vedic Multipliers in signal processing
A comprehensive study on Applications of Vedic Multipliers in signal processingA comprehensive study on Applications of Vedic Multipliers in signal processing
A comprehensive study on Applications of Vedic Multipliers in signal processingIRJET Journal
 
Study of a Gear-Rack and links System: Equations, configuration and CAD design
Study of a Gear-Rack and links System: Equations, configuration and CAD designStudy of a Gear-Rack and links System: Equations, configuration and CAD design
Study of a Gear-Rack and links System: Equations, configuration and CAD designPietro Galli
 
Parallel programming
Parallel programmingParallel programming
Parallel programmingAnshul Sharma
 

Was ist angesagt? (19)

Basic communication operations - One to all Broadcast
Basic communication operations - One to all BroadcastBasic communication operations - One to all Broadcast
Basic communication operations - One to all Broadcast
 
JOURNAL PAPER
JOURNAL PAPERJOURNAL PAPER
JOURNAL PAPER
 
Multicore and GPU Programming
Multicore and GPU ProgrammingMulticore and GPU Programming
Multicore and GPU Programming
 
user_defined_functions_forinterpolation
user_defined_functions_forinterpolationuser_defined_functions_forinterpolation
user_defined_functions_forinterpolation
 
Self-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesSelf-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policies
 
Parallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPParallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MP
 
IIIRJET-Implementation of Image Compression Algorithm on FPGA
IIIRJET-Implementation of Image Compression Algorithm on FPGAIIIRJET-Implementation of Image Compression Algorithm on FPGA
IIIRJET-Implementation of Image Compression Algorithm on FPGA
 
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMCOMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
 
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMCOMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHM
 
PRAM algorithms from deepika
PRAM algorithms from deepikaPRAM algorithms from deepika
PRAM algorithms from deepika
 
CMAC Neural Networks
CMAC Neural NetworksCMAC Neural Networks
CMAC Neural Networks
 
Chap4 slides
Chap4 slidesChap4 slides
Chap4 slides
 
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLI
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLIMETHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLI
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLI
 
Cerebellar Model Articulation Controller
Cerebellar Model Articulation ControllerCerebellar Model Articulation Controller
Cerebellar Model Articulation Controller
 
Using information theory principles to schedule real time tasks
 Using information theory principles to schedule real time tasks Using information theory principles to schedule real time tasks
Using information theory principles to schedule real time tasks
 
Elementary Parallel Algorithms
Elementary Parallel AlgorithmsElementary Parallel Algorithms
Elementary Parallel Algorithms
 
A comprehensive study on Applications of Vedic Multipliers in signal processing
A comprehensive study on Applications of Vedic Multipliers in signal processingA comprehensive study on Applications of Vedic Multipliers in signal processing
A comprehensive study on Applications of Vedic Multipliers in signal processing
 
Study of a Gear-Rack and links System: Equations, configuration and CAD design
Study of a Gear-Rack and links System: Equations, configuration and CAD designStudy of a Gear-Rack and links System: Equations, configuration and CAD design
Study of a Gear-Rack and links System: Equations, configuration and CAD design
 
Parallel programming
Parallel programmingParallel programming
Parallel programming
 

Andere mochten auch

1.сур ураган-презентация итоговый 2
1.сур  ураган-презентация итоговый 21.сур  ураган-презентация итоговый 2
1.сур ураган-презентация итоговый 2Николай Краюшенко
 
CANDY - DEEMAH PRODUCT CATALOG
CANDY - DEEMAH PRODUCT CATALOGCANDY - DEEMAH PRODUCT CATALOG
CANDY - DEEMAH PRODUCT CATALOGStanley Cayari
 
Corporate presentation Primeur_Ita
Corporate presentation Primeur_ItaCorporate presentation Primeur_Ita
Corporate presentation Primeur_ItaPrimeur
 
M2 c2 competencias de liderazgo
M2 c2 competencias de liderazgoM2 c2 competencias de liderazgo
M2 c2 competencias de liderazgomolasa
 
Productivity problems and the secrets
Productivity problems and the secretsProductivity problems and the secrets
Productivity problems and the secretsFerianka Ramdana
 
Module 3 Causes of Failure
Module 3 Causes of Failure Module 3 Causes of Failure
Module 3 Causes of Failure caniceconsulting
 
Phan tich bao cao tai chinh doanh nghiep
Phan tich bao cao tai chinh doanh nghiepPhan tich bao cao tai chinh doanh nghiep
Phan tich bao cao tai chinh doanh nghiepVu Long (Mr)
 
The 360 Degree Mrketing Communications Strategy
The 360 Degree Mrketing Communications StrategyThe 360 Degree Mrketing Communications Strategy
The 360 Degree Mrketing Communications Strategyffortin
 
Doubleday v. Cartwright
Doubleday v. CartwrightDoubleday v. Cartwright
Doubleday v. CartwrightKolin McMillen
 

Andere mochten auch (18)

1.сур ураган-презентация итоговый 2
1.сур  ураган-презентация итоговый 21.сур  ураган-презентация итоговый 2
1.сур ураган-презентация итоговый 2
 
RAPB
RAPBRAPB
RAPB
 
CANDY - DEEMAH PRODUCT CATALOG
CANDY - DEEMAH PRODUCT CATALOGCANDY - DEEMAH PRODUCT CATALOG
CANDY - DEEMAH PRODUCT CATALOG
 
Finanza personale con excel
Finanza personale con excelFinanza personale con excel
Finanza personale con excel
 
국문
국문국문
국문
 
Corporate presentation Primeur_Ita
Corporate presentation Primeur_ItaCorporate presentation Primeur_Ita
Corporate presentation Primeur_Ita
 
Kttc2 cdcq ab2_c014
Kttc2 cdcq ab2_c014Kttc2 cdcq ab2_c014
Kttc2 cdcq ab2_c014
 
Praktikum kewirausahaan ii
Praktikum kewirausahaan  iiPraktikum kewirausahaan  ii
Praktikum kewirausahaan ii
 
M2 c2 competencias de liderazgo
M2 c2 competencias de liderazgoM2 c2 competencias de liderazgo
M2 c2 competencias de liderazgo
 
Thoracolumbar fracture cme
Thoracolumbar fracture cmeThoracolumbar fracture cme
Thoracolumbar fracture cme
 
Productivity problems and the secrets
Productivity problems and the secretsProductivity problems and the secrets
Productivity problems and the secrets
 
Презентация 9 клас
Презентация 9 класПрезентация 9 клас
Презентация 9 клас
 
Module 3 Causes of Failure
Module 3 Causes of Failure Module 3 Causes of Failure
Module 3 Causes of Failure
 
Phan tich bao cao tai chinh doanh nghiep
Phan tich bao cao tai chinh doanh nghiepPhan tich bao cao tai chinh doanh nghiep
Phan tich bao cao tai chinh doanh nghiep
 
City of communication
City of communicationCity of communication
City of communication
 
The 360 Degree Mrketing Communications Strategy
The 360 Degree Mrketing Communications StrategyThe 360 Degree Mrketing Communications Strategy
The 360 Degree Mrketing Communications Strategy
 
해외 서비스디자인 교육 현황 및 서비스디자인 툴킷 조사 결과 보고서
해외 서비스디자인 교육 현황 및 서비스디자인 툴킷 조사 결과 보고서해외 서비스디자인 교육 현황 및 서비스디자인 툴킷 조사 결과 보고서
해외 서비스디자인 교육 현황 및 서비스디자인 툴킷 조사 결과 보고서
 
Doubleday v. Cartwright
Doubleday v. CartwrightDoubleday v. Cartwright
Doubleday v. Cartwright
 

Ähnlich wie assignment_3

Testing of Matrices Multiplication Methods on Different Processors
Testing of Matrices Multiplication Methods on Different ProcessorsTesting of Matrices Multiplication Methods on Different Processors
Testing of Matrices Multiplication Methods on Different ProcessorsEditor IJMTER
 
Accelerating Dynamic Time Warping Subsequence Search with GPU
Accelerating Dynamic Time Warping Subsequence Search with GPUAccelerating Dynamic Time Warping Subsequence Search with GPU
Accelerating Dynamic Time Warping Subsequence Search with GPUDavide Nardone
 
An eternal question of timing
An eternal question of timingAn eternal question of timing
An eternal question of timingPVS-Studio
 
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMDESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMsipij
 
29 19 sep17 17may 6637 10140-1-ed(edit)
29 19 sep17 17may 6637 10140-1-ed(edit)29 19 sep17 17may 6637 10140-1-ed(edit)
29 19 sep17 17may 6637 10140-1-ed(edit)IAESIJEECS
 
29 19 sep17 17may 6637 10140-1-ed(edit)
29 19 sep17 17may 6637 10140-1-ed(edit)29 19 sep17 17may 6637 10140-1-ed(edit)
29 19 sep17 17may 6637 10140-1-ed(edit)IAESIJEECS
 
Enhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmEnhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmHadi Fadlallah
 
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxjohnsmith96441
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System AchitectureYashiUpadhyay3
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Pedro Lopes
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 

Ähnlich wie assignment_3 (20)

Concurrent Programming
Concurrent ProgrammingConcurrent Programming
Concurrent Programming
 
assignment_1
assignment_1assignment_1
assignment_1
 
FrackingPaper
FrackingPaperFrackingPaper
FrackingPaper
 
Testing of Matrices Multiplication Methods on Different Processors
Testing of Matrices Multiplication Methods on Different ProcessorsTesting of Matrices Multiplication Methods on Different Processors
Testing of Matrices Multiplication Methods on Different Processors
 
Accelerating Dynamic Time Warping Subsequence Search with GPU
Accelerating Dynamic Time Warping Subsequence Search with GPUAccelerating Dynamic Time Warping Subsequence Search with GPU
Accelerating Dynamic Time Warping Subsequence Search with GPU
 
TO_EDIT
TO_EDITTO_EDIT
TO_EDIT
 
An eternal question of timing
An eternal question of timingAn eternal question of timing
An eternal question of timing
 
Final Project
Final ProjectFinal Project
Final Project
 
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMDESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
 
Hu3513551364
Hu3513551364Hu3513551364
Hu3513551364
 
29 19 sep17 17may 6637 10140-1-ed(edit)
29 19 sep17 17may 6637 10140-1-ed(edit)29 19 sep17 17may 6637 10140-1-ed(edit)
29 19 sep17 17may 6637 10140-1-ed(edit)
 
29 19 sep17 17may 6637 10140-1-ed(edit)
29 19 sep17 17may 6637 10140-1-ed(edit)29 19 sep17 17may 6637 10140-1-ed(edit)
29 19 sep17 17may 6637 10140-1-ed(edit)
 
Final Project
Final ProjectFinal Project
Final Project
 
Enhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmEnhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithm
 
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
 
Genetic Algorithms
Genetic AlgorithmsGenetic Algorithms
Genetic Algorithms
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System Achitecture
 
project_2
project_2project_2
project_2
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 

assignment_3

  • 1. Ciaran Cox (1115773) MA5605: Financial Computing 3 Assignment 0.1 Explicit Time Stepping (C File Appendix .1) Revisiting the explicit time stepping problem from task 2; assignment 2, this code can be paral- lelised in multiple places. The computation of the updated vector along x uses 3 values from the previous vector back in time. Therefore, this for loop can be computed in parallel and does not matter what order the elements of this new time vector are placed. #pragma omp parallel for num_threads(NUM_THREADS) private(i) for(i=1;i<N;i++) Unew[i]=alpha*Uold[i-1]+(1-2*alpha)*Uold[i] +alpha*Uold[i+1]+k*Function(a+i*h,t-k); The same parallel technique can be used with the initial condition and on the updating of the x vectors iterating up through time. A sum reduction can be done on the error summation. #pragma omp parallel for num_threads(NUM_THREADS) private(i) reduction(+:sum) for(i=0;i<=N;i++) sum+=pow(check(a+i*h,T)-Uold[i],2); Each thread receiving their own part of the summation and then the reduction statement brings them back together at the end of the for loop. The computation times were taken for 1 thread and 2 threads. The times tabled are an average of 3 runs, along with the convergence of error. As the iteration increases the error converges along with an increasing computation time. 2 threads Table 1: Times and errors for parallelised algorithm M N error 1 thread 2 threads 131072 128 8.41137e-05 1.90976 1.36499 524288 256 2.97356e-05 15.158 9.24518 2097152 512 1.05128e-05 119.0797 70.27753 8388608 1024 3.71683e-06 956.022 684.3453 is quicker than 1 thread on all iterations in a similar ratio, due to the portion of serial code being parallelised. Taking the algorithm and running it on a cluster and then taking average readings for 3 runs along with the speed ups shown below. 1
  • 2. Table 2: Times and speed ups on the cluster M N 1 thread 2 threads 4 threads 8 threads 131072 128 2.2872 1.82623 1.4034 1.49945 Speed up 1.2524 1.6298 1.5254 524288 256 17.3948 10.9538 6.8831 7.0257 Speed up 1.588 2.5272 2.4759 2097152 512 134.877 77.9357 46.2854 36.0823 Speed up 1.7306 2.914 3.738 8388608 1024 1074.44 581.364 330.6903 215.02 Speed up 1.8481 3.2491 4.9969 From 4 threads to 8 threads on the first 2 rows the speed up is less. However, when going to larger M values the speed up is noticed because each thread has more to do. The maximum speed up achieved for the first two rows is just on 4 threads. Increasing the threads will decrease the speed up because not enough work is distributed to each individual thread, and each thread is waiting on one another to process. 0.2 Implicit Time Stepping - Serial Case (C File Appendix .2) The initial conditions gives the first time vector for 0 < n < N. Moving up one more time vec- tor implicitly gives an N −1 tri-diagonal matrix, with the forcing term being added to each of the previous values in the previous time vector. The left boundary condition and right boundary con- dition also needs to be added to the first and last (N − 1) entries respectively. Shown below is the corresponding matrix system.           1+2α −α 0 ... 0 −α 1+2α −α 0 ... 0 ... ... ... 0 ... 0 −α 1+2α −α 0 ... 0 −α 1+2α                    Um 1 Um 2 ... Um N−2 Um N−1          =          Um−1 1 +k f(x1,tm)+αUL(tm) Um−1 2 +k f(x2,tm) ... Um−1 N−2 +k f(xN−2,tm) Um−2 N−1 +k f(xN−1,tm)+αUR(tm)          Solving the linear system is done using the CG-algorithm with ε = 1e − 10. After each iteration forward through time the next vector is updated with the previous solution, along with the forcing term and boundary conditions being updated. The initial solution of the system is set to 0, passed into the CG-algorithm. The average number of CG iterations used with the error’s are shown below. 2
  • 3. Table 3: Implicit Time stepping errors and average CG iterations M N Error Time Avg. CG iterations 131072 256 3.11153e-05 21.654 23.9947 524288 512 1.09378e-05 156.757 22.9984 2097152 1024 3.7316e-06 1267.71 22.99 8388608 2048 9.6833e-07 10413.3 22.9993 Showing strong convergence to the solution as N and M are increased, however CG-iterations remaining on average the same. Increase in computation time is increased more than the previous explicit method, to just under three hours for the final computation. 0.3 Implicit Time Stepping - Open MP (C File Appendix .3) Parallelising the previous implicit algorithm was done in multiple places, including the CG-algorithm. The initial conditions and the addition of the forcing term for each vector component, of the r.h.s of the system at each time is done with a simple for pragma. Inside the CG-algorithm; the initial computation of the residual, two norms (r.h.s squared and residual squared) done in parallel. When in the while loop, the pq and rr computation uses the reduction pragma. The updating of p, x and r are also done in parallel along with the calculation of q. Once out of the function, replacement of old vector with the new solution vector is done in parallel. Concluding with a reduction pragma for the average of the iterations and error summation. Table 4: Times and errors for parallelised algorithm M N error 1 thread 2 threads 131072 256 3.11153e-05 25.2604 24.9901 524288 512 1.09378e-05 189.347 142.821 2097152 1024 3.7316e-06 1456.87 990.64 8388608 2048 9.6833e-07 11413.5 7180.39 With these changes the error’s remain the same, but the computation time drastically increases for smaller iterations than the bigger iterations. With the extra line of code for parallelising, the com- putation time for 1 thread is longer than running the program in serial. Taking the algorithm on to 3
  • 4. the cluster, bigger problems were run quicker as more threads were used because more processing was done than communication between threads. Tabulated below are the speed-ups for a different number of threads run on the cluster, with each reading is an average of three runs. Table 5: Times and speed ups on the cluster M N 1 thread 2 threads 4 threads 8 threads 131072 256 27.4566 50.8358 61.1607 89.3802 Speed up 0.5401 0.4489 0.3072 524288 512 199.298 249.5213 258.126 348.0613 Speed up 0.7987 0.7721 0.5726 2097152 1024 1535.14 1357.613 1251.647 1511.127 Speed up 1.1308 1.2265 1.0159 8388608 2048 11542.6 8407.076 6232.403 6782.21 Speed up 1.373 1.852 1.702 Showing speed-up clearly decreasing for the smaller problem, however speed-up increasing for the bigger problems. For M = 2097152,N = 1024 speed-up is achieved up to 4 threads, however back to the original 1 thread speed for 8 threads. On the biggest problem speed-up is successfully achieved up to 4 threads, but a decrease moving to 8 threads. Concluding 8 threads is in efficient for this problem. Going beyond open-MP In the CG-iterations algorithm the two initial norms, residual squared and r.h.s squared are inde- pendent from each other and could be computed on separate machines. Also, in the while loop the updating of x and r can also be done on separate machines. The average number of the iterations and the error summation do not rely on each other, so can be done on separate machines. These few changes may speed up the algorithm, but communication time between machines is now another factor. Perhaps partitioning the initial condition and re-formulating intermediate boundary condi- tions between partitions, each partition could then be a separate boundary value problem. Each partition can be computed on separate machines and bought back together again at maturity. Ex- plicit method would need to be used to avoid solving a tri-diagonal linear system of equations at each point in time. The partitioning boundary conditions would need to be computed first and then each partition to be solved explicitly in a for loop at each point in time. 4
  • 5. .1 Task 1 /*Ciaran Cox (1115773) 1115773@my.brunel.ac.uk*/ /*relevant libruarys*/ #include<stdio.h> #include<stdlib.h> #include<math.h> #include<omp.h> #define PI 3.14159265358979323846264338327950288 #define NUM_THREADS 1 /*Function prototypes*/ double Function(double, double); double leftbound(double); double rightbound(double); double initial(double); double check(double, double); /*main function*/ int main() { /*Defining variables and parameters*/ long int M,N,i,j; double a,b,h,T,k,alpha,*Unew,*Uold,error,sum=0,t1,t2; int status1, status2; /*Input from user for N and M*/ printf("Enter N:n"); status1=scanf("%ld",&N); printf("Enter M:n"); status2=scanf("%ld",&M); /*checking input is valid*/ if(status1!=1 || status2!=1) { printf("incorrect input...exitingn"); exit(1); } printf("MtNterrortttimen"); 5
  • 6. t1=omp_get_wtime(); /*Dynamically allocating memory*/ if((Unew=(double*)malloc((N+1)*sizeof(double)))==NULL) exit(1); if((Uold=(double*)malloc((N+1)*sizeof(double)))==NULL) exit(1); a=0; b=1; T=2; h=(b-a)/N; k=T/M; alpha=k*(1/pow(h,2)); /*Initial conditions*/ #pragma omp parallel for num_threads(NUM_THREADS) private(i) for(i=0;i<=N;i++) { Uold[i]=initial(a+i*h); } /*Iterating up through time*/ for(j=1;j<=M;j++) { double t = j*k; Unew[0]=leftbound(t); Unew[N]=rightbound(t); /*solving explicitly*/ #pragma omp parallel for num_threads(NUM_THREADS) private(i) for(i=1;i<N;i++) { Unew[i]=alpha*Uold[i-1]+(1-2*alpha)*Uold[i] +alpha*Uold[i+1]+k*Function(a+i*h,t-k); } /*replacing old vector with new vector*/ #pragma omp parallel for num_threads(NUM_THREADS) private(i) for(i=0;i<=N;i++) { Uold[i]=Unew[i]; } } /*Computing error*/ #pragma omp parallel for num_threads(NUM_THREADS) private(i) reduction(+:sum) for(i=0;i<=N;i++) 6
  • 7. { sum+=pow(check(a+i*h,T)-Uold[i],2); } error=sqrt(sum); /*freeing memory*/ free(Unew); free(Uold); t2=omp_get_wtime(); /*prints results for required input from user*/ printf("%dt%dt%1gt%lgnnn",M,N,error,t2-t1); } /*Function for the forcing term*/ double Function(double x, double t) { double out; out=exp(-2*t)*cos(2*PI*x)*(-PI*cos(PI*t) +(4*pow(PI,2)-2)*(1-sin(PI*t))); return out; } /*Function for the left boundary condition*/ double leftbound(double t) { double out; out=check(0.0,t); return out; } /*Function for the right boundary condition*/ double rightbound(double t) { double out; out=check(1.0,t); return out; } /*Function for the initial condition*/ double initial(double x) 7
  • 8. { double out; out=check(x,0.0); return out; } /*Function for the exact solution*/ double check(double x, double t) { double out; out=(1-sin(PI*t))*exp(-2*t)*cos(2*PI*x); return out; } .2 task 2 /*Ciaran Cox (1115773) 1115773@my.brunel.ac.uk*/ /*relevant libruarys*/ #include<stdio.h> #include<stdlib.h> #include<math.h> #include<omp.h> #define PI (4.0*atan(1.0)) /*Function prototypes*/ double Function(double, double); double leftbound(double); double rightbound(double); double initial(double); double check(double, double); int cg(double, double, int, double*, double*, double); /*main function*/ int main() { /*Defining variables and parameters*/ long int M,N,i,j; 8
  • 9. double a,b,h,T,k,alpha,error,sum=0.0,t1,t2,*xx, *it=NULL,*bb0=NULL,*sol=NULL,itav=0.0,eps; int status1, status2; /*Input from user for N and M*/ printf("Enter N:n"); status1=scanf("%ld",&N); printf("Enter M:n"); status2=scanf("%ld",&M); /*checking input is valid*/ if(status1!=1 || status2!=1) { printf("incorrect input...exitingn"); exit(1); } printf("MtNterrortttimetCG-iterationsn"); t1=omp_get_wtime(); /*Dynamically allocating memory*/ if((xx=(double*)malloc((N-1)*sizeof(double)))==NULL) exit(1); if((bb0=(double*)malloc((N-1)*sizeof(double)))==NULL) exit(1); if((it=(double*)malloc(M*sizeof(double)))==NULL) exit(1); if((sol=(double*)malloc((N+1)*sizeof(double)))==NULL) exit(1); a=0.0; b=1.0; T=2.0; h=(b-a)/N; k=T/M; alpha=k/h/h; eps=1e-10; /*initial conditions*/ for(i=1;i<=N-1;i++) { bb0[i-1]=initial(a+i*h); } /*Iterating up through time*/ for(j=1;j<=M;j++) { /*setting guess to zero*/ for(i=0;i<N-1;i++) { xx[i]=0; 9
  • 10. } /*adjusting for forcing term*/ for(i=1;i<=N-1;i++) { bb0[i-1]+=k*Function(a+i*h,j*k); } /*incorporating boundary conditions*/ bb0[0]+=leftbound(j*k)*alpha; bb0[N-2]+=rightbound(j*k)*alpha; /*CG-Algorithm, solving linear system*/ it[j-1]=cg(1+2*alpha,-alpha,N-1,&*xx,bb0,eps); /*replacing old vector with new solution vector*/ for(i=0;i<N-1;i++) { bb0[i]=xx[i]; } } /*creating solution vector for error computation*/ for(i=1;i<=N-1;i++) { sol[i]=bb0[i-1]; } sol[0]=leftbound(T); sol[N]=rightbound(T); /*average of CG iterations*/ for(j=0;j<M;j++) { itav+=it[j]; } itav=itav/M; /*Computing error*/ for(i=0;i<=N;i++) { sum+=pow(check(a+i*h,T)-sol[i],2); } 10
  • 11. error=sqrt(sum); /*freeing memory*/ free(xx); free(bb0); free(it); free(sol); t2=omp_get_wtime(); /*prints results for required input from user*/ printf("%dt%dt%1gt%lgt%lgnnn",M,N,error,t2-t1,itav); } /*Function for the forcing term*/ double Function(double x, double t) { double out; out=exp(-2*t)*cos(2*PI*x)*(-PI*cos(PI*t) +(4*pow(PI,2)-2)*(1-sin(PI*t))); return out; } /*Function for the left boundary condition*/ double leftbound(double t) { double out; out=check(0.0,t); return out; } /*Function for the right boundary condition*/ double rightbound(double t) { double out; out=check(1.0,t); return out; } /*Function for the initial condition*/ double initial(double x) { double out; out=check(x,0.0); 11
  • 12. return out; } /*Function for the exact solution*/ double check(double x, double t) { double out; out=(1-sin(PI*t))*exp(-2*t)*cos(2*PI*x); return out; } /*Function for CG-Algorithm*/ int cg(double A,double B,int n,double *x,double *b,double eps) { int i,j,k; double rr,pq,bb,alpha,beta,rrold; double *r=NULL,*p=NULL,*q=NULL; k=0; if (n<=2) { return 0; } if( (r=(double*)malloc(n*sizeof(double)))==NULL) exit(1); if( (p=(double*)malloc(n*sizeof(double)))==NULL) exit(1); if( (q=(double*)malloc(n*sizeof(double)))==NULL) exit(1); r[0]=b[0]-(A*x[0]+B*x[0+1]); for (i=1;i<n-1;i++) { r[i]=b[i]-(B*x[i-1]+A*x[i]+B*x[i+1]); } r[n-1]=b[n-1]-(B*x[n-2]+A*x[n-1]); bb=0; for (i=0;i<n;i++) { bb+=b[i]*b[i]; } rr=0; for (i=0;i<n;i++) { 12
  • 13. rr+=r[i]*r[i]; } k=0; while (sqrt(rr)>eps*sqrt(bb)) { k=k+1; if (k==1) { for (i=0;i<n;i++) { p[i]=r[i]; } beta=0; } else { beta=rr/rrold; for (i=0;i<n;i++) { p[i]=r[i]+beta*p[i]; } } q[0]=A*p[0]+B*p[0+1]; for (i=1;i<n-1;i++) { q[i]=B*p[i-1]+A*p[i]+B*p[i+1]; } q[n-1]=B*p[n-2]+A*p[n-1]; pq=0; for (i=0;i<n;i++) { pq+=p[i]*q[i]; } alpha=rr/pq; 13
  • 14. for (i=0;i<n;i++) { x[i]=x[i]+alpha*p[i]; r[i]=r[i]-alpha*q[i]; } rrold=rr; rr=0; for (i=0;i<n;i++) { rr+=r[i]*r[i]; } } free(p); free(q); free(r); return k; } .3 task 3 /*Ciaran Cox (1115773) 1115773@my.brunel.ac.uk*/ /*relevant libruarys*/ #include<stdio.h> #include<stdlib.h> #include<math.h> #include<omp.h> #define PI (4.0*atan(1.0)) #define NUM_THREADS 2 /*Function prototypes*/ double Function(double, double); double leftbound(double); double rightbound(double); double initial(double); double check(double, double); int cg(double, double, int, double*, double*, double); /*main function*/ 14
  • 15. int main() { /*Defining variables and parameters*/ long int M,N,i,j; double a,b,h,T,k,alpha,error,sum=0.0,t1,t2,*xx, *it=NULL,*bb0=NULL,*sol=NULL,itav=0.0,eps; /*Input from user for N and M*/ printf("Enter N:n"); scanf("%ld",&N); if(N%1!=0) { print("Input not integern"); exit(1); } printf("Enter M:n"); scanf("%ld",&M); printf("MtNterrortttimetcg-iterationsn"); t1=omp_get_wtime(); /*Dynamically allocating memory*/ if((xx=(double*)malloc((N-1)*sizeof(double)))==NULL) exit(1); if((bb0=(double*)malloc((N-1)*sizeof(double)))==NULL) exit(1); if((it=(double*)malloc(M*sizeof(double)))==NULL) exit(1); if((sol=(double*)malloc((N+1)*sizeof(double)))==NULL) exit(1); a=0.0; b=1.0; T=2.0; h=(b-a)/N; k=T/M; alpha=k/h/h; eps=1e-10; /*Initial conditions*/ #pragma omp parallel for num_threads(NUM_THREADS) private(i) for(i=1;i<=N-1;i++) { bb0[i-1]=initial(a+i*h); } /*Iterating up through time*/ for(j=1;j<=M;j++) { /*setting guess to zero*/ for(i=0;i<N-1;i++) { 15
  • 16. xx[i]=0; } /*Adjusting for forcing term*/ #pragma omp parallel for num_threads(NUM_THREADS) private(i) for(i=1;i<=N-1;i++) { bb0[i-1]+=k*Function(a+i*h,j*k); } /*boundary conditions*/ bb0[0]+=leftbound(j*k)*alpha; bb0[N-2]+=rightbound(j*k)*alpha; /*solving system of equations by CG-algorithm*/ it[j-1]=cg(1+2*alpha,-alpha,N-1,&*xx,bb0,eps); /*replacing old vector with new solution vector*/ #pragma omp parallel for num_threads(NUM_THREADS) private(i) for(i=0;i<N-1;i++) { bb0[i]=xx[i]; } } /*setting up solution vector for error computation*/ #pragma omp parallel for num_threads(NUM_THREADS) private(i) for(i=1;i<=N-1;i++) { sol[i]=bb0[i-1]; } sol[0]=leftbound(T); sol[N]=rightbound(T); /*average of CG-algorithm iterations*/ #pragma omp parallel for num_threads(NUM_THREADS) private(j) reduction(+:itav) for(j=0;j<M;j++) { itav+=it[j]; } itav=itav/M; 16
  • 17. /*Computing error*/ #pragma omp parallel for num_threads(NUM_THREADS) private(j) reduction(+:sum) for(i=0;i<=N;i++) { sum+=pow(check(a+i*h,T)-sol[i],2); } error=sqrt(sum); /*freeing memory*/ free(xx); free(bb0); free(it); free(sol); t2=omp_get_wtime(); /*printing results for required input from user*/ printf("%dt%dt%1gt%lgt%lgnnn",M,N,error,t2-t1,itav); } /*Function for the forcing term*/ double Function(double x, double t) { double out; out=exp(-2*t)*cos(2*PI*x)*(-PI*cos(PI*t) +(4*pow(PI,2)-2)*(1-sin(PI*t))); return out; } /*Function for the left boundary condition*/ double leftbound(double t) { double out; out=check(0.0,t); return out; } /*Function for the right boundary condition*/ double rightbound(double t) { double out; out=check(1.0,t); return out; 17
  • 18. } /*Function for the initial condition*/ double initial(double x) { double out; out=check(x,0.0); return out; } /*Function for the exact solution*/ double check(double x, double t) { double out; out=(1-sin(PI*t))*exp(-2*t)*cos(2*PI*x); return out; } /*Function for CG-Algorithm*/ int cg(double A,double B,int n,double *x,double *b,double eps) { int i,j,k; double rr,pq,bb,alpha,beta,rrold; double *r=NULL,*p=NULL,*q=NULL; k=0; if (n<=2) { return 0; } if( (r=(double*)malloc(n*sizeof(double)))==NULL) exit(1); if( (p=(double*)malloc(n*sizeof(double)))==NULL) exit(1); if( (q=(double*)malloc(n*sizeof(double)))==NULL) exit(1); r[0]=b[0]-(A*x[0]+B*x[0+1]); #pragma omp parallel for num_threads(NUM_THREADS) private(i) for (i=1;i<n-1;i++) { r[i]=b[i]-(B*x[i-1]+A*x[i]+B*x[i+1]); } r[n-1]=b[n-1]-(B*x[n-2]+A*x[n-1]); bb=0; 18
  • 19. #pragma omp parallel for num_threads(NUM_THREADS) private(i) reduction(+:bb) for (i=0;i<n;i++) { bb+=b[i]*b[i]; } rr=0; #pragma omp parallel for num_threads(NUM_THREADS) private(i) reduction(+:rr) for (i=0;i<n;i++) { rr+=r[i]*r[i]; } k=0; while (sqrt(rr)>eps*sqrt(bb)) { k=k+1; if (k==1) { #pragma omp parallel for num_threads(NUM_THREADS) private(i) for (i=0;i<n;i++) { p[i]=r[i]; } beta=0; } else { beta=rr/rrold; #pragma omp parallel for num_threads(NUM_THREADS) private(i) for (i=0;i<n;i++) { p[i]=r[i]+beta*p[i]; } } q[0]=A*p[0]+B*p[0+1]; 19
  • 20. #pragma omp parallel for num_threads(NUM_THREADS) private(i) for (i=1;i<n-1;i++) { q[i]=B*p[i-1]+A*p[i]+B*p[i+1]; } q[n-1]=B*p[n-2]+A*p[n-1]; pq=0; #pragma omp parallel for num_threads(NUM_THREADS) private(i) reduction(+:pq) for (i=0;i<n;i++) { pq+=p[i]*q[i]; } alpha=rr/pq; #pragma omp parallel for num_threads(NUM_THREADS) private(i) for (i=0;i<n;i++) { x[i]=x[i]+alpha*p[i]; r[i]=r[i]-alpha*q[i]; } rrold=rr; rr=0; #pragma omp parallel for num_threads(NUM_THREADS) private(i) reduction(+:rr) for (i=0;i<n;i++) { rr+=r[i]*r[i]; } } free(p); free(q); free(r); return k; } 20