IJCER (www.ijceronline.com) International Journal of computational Engineerin...
assignment_3
1. Ciaran Cox (1115773)
MA5605: Financial Computing 3 Assignment
0.1 Explicit Time Stepping (C File Appendix .1)
Revisiting the explicit time stepping problem from task 2; assignment 2, this code can be paral-
lelised in multiple places. The computation of the updated vector along x uses 3 values from the
previous vector back in time. Therefore, this for loop can be computed in parallel and does not
matter what order the elements of this new time vector are placed.
#pragma omp parallel for num_threads(NUM_THREADS) private(i)
for(i=1;i<N;i++)
Unew[i]=alpha*Uold[i-1]+(1-2*alpha)*Uold[i]
+alpha*Uold[i+1]+k*Function(a+i*h,t-k);
The same parallel technique can be used with the initial condition and on the updating of the x
vectors iterating up through time. A sum reduction can be done on the error summation.
#pragma omp parallel for num_threads(NUM_THREADS) private(i) reduction(+:sum)
for(i=0;i<=N;i++)
sum+=pow(check(a+i*h,T)-Uold[i],2);
Each thread receiving their own part of the summation and then the reduction statement brings
them back together at the end of the for loop. The computation times were taken for 1 thread and 2
threads. The times tabled are an average of 3 runs, along with the convergence of error.
As the iteration increases the error converges along with an increasing computation time. 2 threads
Table 1: Times and errors for parallelised algorithm
M N error 1 thread 2 threads
131072 128 8.41137e-05 1.90976 1.36499
524288 256 2.97356e-05 15.158 9.24518
2097152 512 1.05128e-05 119.0797 70.27753
8388608 1024 3.71683e-06 956.022 684.3453
is quicker than 1 thread on all iterations in a similar ratio, due to the portion of serial code being
parallelised. Taking the algorithm and running it on a cluster and then taking average readings for
3 runs along with the speed ups shown below.
1
2. Table 2: Times and speed ups on the cluster
M N 1 thread 2 threads 4 threads 8 threads
131072 128 2.2872 1.82623 1.4034 1.49945
Speed up 1.2524 1.6298 1.5254
524288 256 17.3948 10.9538 6.8831 7.0257
Speed up 1.588 2.5272 2.4759
2097152 512 134.877 77.9357 46.2854 36.0823
Speed up 1.7306 2.914 3.738
8388608 1024 1074.44 581.364 330.6903 215.02
Speed up 1.8481 3.2491 4.9969
From 4 threads to 8 threads on the first 2 rows the speed up is less. However, when going to larger
M values the speed up is noticed because each thread has more to do. The maximum speed up
achieved for the first two rows is just on 4 threads. Increasing the threads will decrease the speed
up because not enough work is distributed to each individual thread, and each thread is waiting on
one another to process.
0.2 Implicit Time Stepping - Serial Case (C File Appendix .2)
The initial conditions gives the first time vector for 0 < n < N. Moving up one more time vec-
tor implicitly gives an N −1 tri-diagonal matrix, with the forcing term being added to each of the
previous values in the previous time vector. The left boundary condition and right boundary con-
dition also needs to be added to the first and last (N − 1) entries respectively. Shown below is the
corresponding matrix system.
1+2α −α 0 ... 0
−α 1+2α −α 0
...
0
... ... ... 0
... 0 −α 1+2α −α
0 ... 0 −α 1+2α
Um
1
Um
2
...
Um
N−2
Um
N−1
=
Um−1
1 +k f(x1,tm)+αUL(tm)
Um−1
2 +k f(x2,tm)
...
Um−1
N−2 +k f(xN−2,tm)
Um−2
N−1 +k f(xN−1,tm)+αUR(tm)
Solving the linear system is done using the CG-algorithm with ε = 1e − 10. After each iteration
forward through time the next vector is updated with the previous solution, along with the forcing
term and boundary conditions being updated. The initial solution of the system is set to 0, passed
into the CG-algorithm. The average number of CG iterations used with the error’s are shown below.
2
3. Table 3: Implicit Time stepping errors and average CG iterations
M N Error Time Avg. CG iterations
131072 256 3.11153e-05 21.654 23.9947
524288 512 1.09378e-05 156.757 22.9984
2097152 1024 3.7316e-06 1267.71 22.99
8388608 2048 9.6833e-07 10413.3 22.9993
Showing strong convergence to the solution as N and M are increased, however CG-iterations
remaining on average the same. Increase in computation time is increased more than the previous
explicit method, to just under three hours for the final computation.
0.3 Implicit Time Stepping - Open MP (C File Appendix .3)
Parallelising the previous implicit algorithm was done in multiple places, including the CG-algorithm.
The initial conditions and the addition of the forcing term for each vector component, of the r.h.s
of the system at each time is done with a simple for pragma. Inside the CG-algorithm; the initial
computation of the residual, two norms (r.h.s squared and residual squared) done in parallel. When
in the while loop, the pq and rr computation uses the reduction pragma. The updating of p, x and r
are also done in parallel along with the calculation of q. Once out of the function, replacement of
old vector with the new solution vector is done in parallel. Concluding with a reduction pragma for
the average of the iterations and error summation.
Table 4: Times and errors for parallelised algorithm
M N error 1 thread 2 threads
131072 256 3.11153e-05 25.2604 24.9901
524288 512 1.09378e-05 189.347 142.821
2097152 1024 3.7316e-06 1456.87 990.64
8388608 2048 9.6833e-07 11413.5 7180.39
With these changes the error’s remain the same, but the computation time drastically increases for
smaller iterations than the bigger iterations. With the extra line of code for parallelising, the com-
putation time for 1 thread is longer than running the program in serial. Taking the algorithm on to
3
4. the cluster, bigger problems were run quicker as more threads were used because more processing
was done than communication between threads. Tabulated below are the speed-ups for a different
number of threads run on the cluster, with each reading is an average of three runs.
Table 5: Times and speed ups on the cluster
M N 1 thread 2 threads 4 threads 8 threads
131072 256 27.4566 50.8358 61.1607 89.3802
Speed up 0.5401 0.4489 0.3072
524288 512 199.298 249.5213 258.126 348.0613
Speed up 0.7987 0.7721 0.5726
2097152 1024 1535.14 1357.613 1251.647 1511.127
Speed up 1.1308 1.2265 1.0159
8388608 2048 11542.6 8407.076 6232.403 6782.21
Speed up 1.373 1.852 1.702
Showing speed-up clearly decreasing for the smaller problem, however speed-up increasing for
the bigger problems. For M = 2097152,N = 1024 speed-up is achieved up to 4 threads, however
back to the original 1 thread speed for 8 threads. On the biggest problem speed-up is successfully
achieved up to 4 threads, but a decrease moving to 8 threads. Concluding 8 threads is in efficient
for this problem.
Going beyond open-MP
In the CG-iterations algorithm the two initial norms, residual squared and r.h.s squared are inde-
pendent from each other and could be computed on separate machines. Also, in the while loop the
updating of x and r can also be done on separate machines. The average number of the iterations
and the error summation do not rely on each other, so can be done on separate machines. These few
changes may speed up the algorithm, but communication time between machines is now another
factor. Perhaps partitioning the initial condition and re-formulating intermediate boundary condi-
tions between partitions, each partition could then be a separate boundary value problem. Each
partition can be computed on separate machines and bought back together again at maturity. Ex-
plicit method would need to be used to avoid solving a tri-diagonal linear system of equations at
each point in time. The partitioning boundary conditions would need to be computed first and then
each partition to be solved explicitly in a for loop at each point in time.
4
5. .1 Task 1
/*Ciaran Cox (1115773) 1115773@my.brunel.ac.uk*/
/*relevant libruarys*/
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include<omp.h>
#define PI 3.14159265358979323846264338327950288
#define NUM_THREADS 1
/*Function prototypes*/
double Function(double, double);
double leftbound(double);
double rightbound(double);
double initial(double);
double check(double, double);
/*main function*/
int main()
{
/*Defining variables and parameters*/
long int M,N,i,j;
double a,b,h,T,k,alpha,*Unew,*Uold,error,sum=0,t1,t2;
int status1, status2;
/*Input from user for N and M*/
printf("Enter N:n");
status1=scanf("%ld",&N);
printf("Enter M:n");
status2=scanf("%ld",&M);
/*checking input is valid*/
if(status1!=1 || status2!=1)
{
printf("incorrect input...exitingn");
exit(1);
}
printf("MtNterrortttimen");
5
6. t1=omp_get_wtime();
/*Dynamically allocating memory*/
if((Unew=(double*)malloc((N+1)*sizeof(double)))==NULL) exit(1);
if((Uold=(double*)malloc((N+1)*sizeof(double)))==NULL) exit(1);
a=0; b=1; T=2; h=(b-a)/N; k=T/M; alpha=k*(1/pow(h,2));
/*Initial conditions*/
#pragma omp parallel for num_threads(NUM_THREADS) private(i)
for(i=0;i<=N;i++)
{
Uold[i]=initial(a+i*h);
}
/*Iterating up through time*/
for(j=1;j<=M;j++)
{
double t = j*k;
Unew[0]=leftbound(t);
Unew[N]=rightbound(t);
/*solving explicitly*/
#pragma omp parallel for num_threads(NUM_THREADS) private(i)
for(i=1;i<N;i++)
{
Unew[i]=alpha*Uold[i-1]+(1-2*alpha)*Uold[i]
+alpha*Uold[i+1]+k*Function(a+i*h,t-k);
}
/*replacing old vector with new vector*/
#pragma omp parallel for num_threads(NUM_THREADS) private(i)
for(i=0;i<=N;i++)
{
Uold[i]=Unew[i];
}
}
/*Computing error*/
#pragma omp parallel for num_threads(NUM_THREADS) private(i) reduction(+:sum)
for(i=0;i<=N;i++)
6
7. {
sum+=pow(check(a+i*h,T)-Uold[i],2);
}
error=sqrt(sum);
/*freeing memory*/
free(Unew); free(Uold);
t2=omp_get_wtime();
/*prints results for required input from user*/
printf("%dt%dt%1gt%lgnnn",M,N,error,t2-t1);
}
/*Function for the forcing term*/
double Function(double x, double t)
{
double out;
out=exp(-2*t)*cos(2*PI*x)*(-PI*cos(PI*t)
+(4*pow(PI,2)-2)*(1-sin(PI*t)));
return out;
}
/*Function for the left boundary condition*/
double leftbound(double t)
{
double out;
out=check(0.0,t);
return out;
}
/*Function for the right boundary condition*/
double rightbound(double t)
{
double out;
out=check(1.0,t);
return out;
}
/*Function for the initial condition*/
double initial(double x)
7
8. {
double out;
out=check(x,0.0);
return out;
}
/*Function for the exact solution*/
double check(double x, double t)
{
double out;
out=(1-sin(PI*t))*exp(-2*t)*cos(2*PI*x);
return out;
}
.2 task 2
/*Ciaran Cox (1115773) 1115773@my.brunel.ac.uk*/
/*relevant libruarys*/
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include<omp.h>
#define PI (4.0*atan(1.0))
/*Function prototypes*/
double Function(double, double);
double leftbound(double);
double rightbound(double);
double initial(double);
double check(double, double);
int cg(double, double, int, double*, double*, double);
/*main function*/
int main()
{
/*Defining variables and parameters*/
long int M,N,i,j;
8
9. double a,b,h,T,k,alpha,error,sum=0.0,t1,t2,*xx,
*it=NULL,*bb0=NULL,*sol=NULL,itav=0.0,eps;
int status1, status2;
/*Input from user for N and M*/
printf("Enter N:n");
status1=scanf("%ld",&N);
printf("Enter M:n");
status2=scanf("%ld",&M);
/*checking input is valid*/
if(status1!=1 || status2!=1)
{
printf("incorrect input...exitingn");
exit(1);
}
printf("MtNterrortttimetCG-iterationsn");
t1=omp_get_wtime();
/*Dynamically allocating memory*/
if((xx=(double*)malloc((N-1)*sizeof(double)))==NULL) exit(1);
if((bb0=(double*)malloc((N-1)*sizeof(double)))==NULL) exit(1);
if((it=(double*)malloc(M*sizeof(double)))==NULL) exit(1);
if((sol=(double*)malloc((N+1)*sizeof(double)))==NULL) exit(1);
a=0.0; b=1.0; T=2.0; h=(b-a)/N; k=T/M; alpha=k/h/h; eps=1e-10;
/*initial conditions*/
for(i=1;i<=N-1;i++)
{
bb0[i-1]=initial(a+i*h);
}
/*Iterating up through time*/
for(j=1;j<=M;j++)
{
/*setting guess to zero*/
for(i=0;i<N-1;i++)
{
xx[i]=0;
9
10. }
/*adjusting for forcing term*/
for(i=1;i<=N-1;i++)
{
bb0[i-1]+=k*Function(a+i*h,j*k);
}
/*incorporating boundary conditions*/
bb0[0]+=leftbound(j*k)*alpha;
bb0[N-2]+=rightbound(j*k)*alpha;
/*CG-Algorithm, solving linear system*/
it[j-1]=cg(1+2*alpha,-alpha,N-1,&*xx,bb0,eps);
/*replacing old vector with new solution vector*/
for(i=0;i<N-1;i++)
{
bb0[i]=xx[i];
}
}
/*creating solution vector for error computation*/
for(i=1;i<=N-1;i++)
{
sol[i]=bb0[i-1];
}
sol[0]=leftbound(T); sol[N]=rightbound(T);
/*average of CG iterations*/
for(j=0;j<M;j++)
{
itav+=it[j];
}
itav=itav/M;
/*Computing error*/
for(i=0;i<=N;i++)
{
sum+=pow(check(a+i*h,T)-sol[i],2);
}
10
11. error=sqrt(sum);
/*freeing memory*/
free(xx); free(bb0); free(it); free(sol);
t2=omp_get_wtime();
/*prints results for required input from user*/
printf("%dt%dt%1gt%lgt%lgnnn",M,N,error,t2-t1,itav);
}
/*Function for the forcing term*/
double Function(double x, double t)
{
double out;
out=exp(-2*t)*cos(2*PI*x)*(-PI*cos(PI*t)
+(4*pow(PI,2)-2)*(1-sin(PI*t)));
return out;
}
/*Function for the left boundary condition*/
double leftbound(double t)
{
double out;
out=check(0.0,t);
return out;
}
/*Function for the right boundary condition*/
double rightbound(double t)
{
double out;
out=check(1.0,t);
return out;
}
/*Function for the initial condition*/
double initial(double x)
{
double out;
out=check(x,0.0);
11
12. return out;
}
/*Function for the exact solution*/
double check(double x, double t)
{
double out;
out=(1-sin(PI*t))*exp(-2*t)*cos(2*PI*x);
return out;
}
/*Function for CG-Algorithm*/
int cg(double A,double B,int n,double *x,double *b,double eps)
{
int i,j,k;
double rr,pq,bb,alpha,beta,rrold;
double *r=NULL,*p=NULL,*q=NULL;
k=0;
if (n<=2) { return 0; }
if( (r=(double*)malloc(n*sizeof(double)))==NULL) exit(1);
if( (p=(double*)malloc(n*sizeof(double)))==NULL) exit(1);
if( (q=(double*)malloc(n*sizeof(double)))==NULL) exit(1);
r[0]=b[0]-(A*x[0]+B*x[0+1]);
for (i=1;i<n-1;i++)
{
r[i]=b[i]-(B*x[i-1]+A*x[i]+B*x[i+1]);
}
r[n-1]=b[n-1]-(B*x[n-2]+A*x[n-1]);
bb=0;
for (i=0;i<n;i++)
{
bb+=b[i]*b[i];
}
rr=0;
for (i=0;i<n;i++)
{
12
13. rr+=r[i]*r[i];
}
k=0;
while (sqrt(rr)>eps*sqrt(bb))
{
k=k+1;
if (k==1)
{
for (i=0;i<n;i++)
{
p[i]=r[i];
}
beta=0;
}
else
{
beta=rr/rrold;
for (i=0;i<n;i++)
{
p[i]=r[i]+beta*p[i];
}
}
q[0]=A*p[0]+B*p[0+1];
for (i=1;i<n-1;i++)
{
q[i]=B*p[i-1]+A*p[i]+B*p[i+1];
}
q[n-1]=B*p[n-2]+A*p[n-1];
pq=0;
for (i=0;i<n;i++)
{
pq+=p[i]*q[i];
}
alpha=rr/pq;
13