SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
Paper No. 1068, Proc. 8 th. East Asia-Pacific Conference on
                                                                          Structural Engineering and Construction (EASEC-8),
                                                                                              Singapore, December 5-7, 2001.




                A PARALLEL IMPLEMENTATION OF THE ELEMENT-FREE
                               GALERKIN METHOD

                                      W. Barry1 and T. Vacharasintopchai2


ABSTRACT : This work focuses on the application of parallel processing to element-free Galerkin method
analyses, particularly in the formulation of the stiffness matrix, the assembly of the system of discrete equations,
and the solution for nodal unknowns. The objective is to significantly reduce the analysis time while retaining
high efficiency and accuracy. Several relatively low-cost Intel Pentium-based personal computers are joined
together to form a parallel computer. The processors communicate via a local high-speed network using the
Message Passing Interface. Load balancing is achieved through the use of a dynamic queue server that assigns
tasks to available processors. Benchmark problems in 3D structural mechanics are analyzed to demonstrate that
the parallelized computer program can provide substantially shorter run time than its serial counterpart, without
loss of solution accuracy.

KEYWORDS : meshless method, parallel processing, element-free Galerkin method, EFGM, queue server,
Beowulf, solid mechanics


1. INTRODUCTION

In performing the finite element analysis of structural components, meshing, which is the process of
discretizing the problem domain into small sub-regions or elements with specific nodal connectivities,
can be a tedious and time-consuming task. Although some relatively simple geometric configurations
may be meshed automatically, some complex geometric configurations require manual preparation of
the mesh. The element-free Galerkin method (EFGM), one of the recently developed meshless
methods, avoids the need for meshing by employing a moving least-squares (MLS) approximation for
the field quantities of interest. With EFGM, the discrete model of the problem domain is completely
described by nodes and a description of the problem domain boundary. This is a particular advantage
for problems involving propagating cracks or large deformations since no remeshing is required at
each step of the analysis. Detailed formulations of the MLS approximation functions and the
application of EFGM to problems in solid mechanics may be found in [1].

However, the advantage of avoiding the requirement of a mesh does not come cheaply, as EFGM is
much more computationally expensive than the finite element method (FEM). The increased
computational cost is especially evident for three-dimensional and non-linear applications of the
EFGM, due to the usage of MLS shape functions, which are formulated by a least-squares procedure
at each integration point. This computational costliness is the predominant drawback of EFGM.

Parallel processing has long been an available technique to improve the performance of scientific
computing programs. Typically, a parallel computer program employs the ‘divide and conquer’


1
    Asian Institute of Technology, Thailand, Assistant Professor
2
    Asian Institute of Technology, Thailand, Graduate Student
paradigm [2], which involves the partitioning of a large task into several smaller tasks that are then
assigned to available computer processors. Efficient load balancing ensures that all processors are
busy working on assigned tasks as long as there are unfinished tasks. The most common approach
taken in computational mechanics is domain decomposition [3], a method of static load balancing in
which the tasks are identified prior to the analysis and assigned to each processor, along with any data
that may be required. Due to the complex nodal connectivities that arise in the EFGM, domain
decomposition may not be the most efficient approach, and thus a dynamic class of load balancing
based on the concept of a queue server is employed in this work.

2. THE AIT BEOWULF

The effort to deliver low-cost, high-performance computing platforms to scientific communities has
been on-going for many years. A network of personal computers is attractive for this type of use since
it has the same architecture as a distributed memory multi-computer system [4]. Many research groups
have assembled commodity off-the-shelf PC’s and fast LAN connections to build parallel computers.
Parallel computers of this type, termed Beowulf computers after the NASA project of the same name
[5], are suitable for coarse-grained applications that are not communication intensive because of the
high communication start-up time and the limited bandwidth associated with the underlying network
architectures [6].

The AIT Beowulf, a four-node Beowulf class parallel computer was assembled based on the
guidelines in [5] and [7]. Red Hat Linux 6.0, including both the server and workstation operating
system packages, was installed on each node. The AIT Beowulf is a message-passing multiple-
instruction, multiple-data (MIMD) architecture and thus a message-passing infrastructure is needed.
The mpich library [8], which is the most widely used free implementation of the Message Passing
Interface was chosen for the AIT Beowulf. Meschach, a powerful matrix computation library [9] is
employed for serial matrix operations that are performed on each processor.

3. THE QUEUE SERVER

Load balancing has a crucial role in the performance of parallel software. If unbalanced workloads are
assigned to the processors, some may finish their work and be forced to wait for the other processors
to finish, leading to reduced efficiency and increased run-times. In this work, a dynamic load-
balancing agent named Qserv is developed within the framework of the EFGM. Qserv balances the
computational load among the processors in the AIT Beowulf during run-time by acting as clerk that
directs the queued tasks to the available processors. When one processor finishes a task, it requests
another task from Qserv, which continues assigning the tasks to processors until no unfinished tasks
remain.

Figure 1 presents a flowchart of the queue server designed and implemented in the current work. To
separate the dynamic workload allocation from normal operations, the communication between Qserv
and the processors is done through the UNIX socket concept developed at the University of California
at Berkeley [4]. When the Qserv process is initiated, it creates a socket that allows the processors to
simultaneously connect. Initially, the number of total unprocessed subtasks known to Qserv is zero,
and one processor, usually the master processor, must inform Qserv of the actual value. This number
is stored in the max_num variable and can be altered by processors through the SET_MAX_NUM
request. A processor can ask Qserv, through the GET_NUM request, for a subtask to work on. It will be
assigned the numerical identifier of an unprocessed subtask, ranging from zero to max_num. When
the unprocessed subtasks are exhausted, an ALL_DONE signal will be sent to acknowledge the
requesting processor. During the execution of Qserv, a process can also reset the subtask identifier
counter by the RESET_COUNTER request. Qserv will continue serving tasks to processors until the
TERMINATE signal is received.
START


                                                                                                        fd = current client identifier
                                                         Initialize the Socket               max_fd = number of client connections maintained

                                                                                                 runstate = run state of the server program
                                                                                                       count = current counter value
                                                         runstate = READY                           max_num = maximum counter value
                                                             count = 0                        request_msg = current client's request message
                                                           max_num = 0




                                                             runstate =                     Close the
                                                                                   YES                                  END
                                                            TERMINATE                        Socket


                           NO                                     NO


                                                   Accept a client connection
                                                   request and update max_fd




                                                                fd <=
                                                               max_fd


                                                                 YES


                                                              Receive
                                                            request_msg




                                                                Error
                                                              receiving
                                                            request_msg



              YES                                                 NO



                                                             Process the
                                                               request




                        request msg =       request msg =                   request msg =               request msg =
                         TERMINATE         RESET_COUNTER                   SET_MAX_NUM                    GET_NUM




                            YES                 YES                              YES                        YES


                                                                            get the new
                         runstate =                                                                     count <=
                                             count = 0                     max_num from                                           NO
                        TERMINATE                                                                       max_num
                                                                             the client


                                                                                                            YES


                                                                                                                              send the message
                                                                         max_num = new            send 'count' to the
                                                                                                                              ALL_DONE to the
                                                                           max_num                      client
                                                                                                                                    client
            Close the
           connection
           and update
             max_fd                                                                                count = count + 1




                                                      move to the next client




                                    Figure 1: Flowchart of the Queue Server

4. SOFTWARE IMPLEMENTATION

When a parallel program is run, each parallel processor will have one copy of the executable program,
termed a process. One process is assigned as the master process while the remaining processes are
worker processes. The MPI default process identifier of the master is 0. In addition to performing the
basic tasks of a worker process, the master process performs additional work involved with
coordinating the tasks among all the workers. Therefore the master process is assigned to run on the
server node, which is the most powerful processor, in terms of both processor speed and core
memory, in the AIT Beowulf.

A flowchart of the main process computer code for both the master node and the workers nodes is
presented in Figure 2. The analysis procedures can be grouped into five phases, namely, the pre-
processing phase, the stiffness matrix formulation phase, the force vector formulation phase, the
solution phase, and the post-processing phase. A custom-made parallel Gaussian elimination equation
solver, developed based on the algorithm presented in [10], is employed in the solution phase since
the available public domain parallel equation solvers are typically efficient only for banded, sparse
matrices, which does not match the dense property of the EFGM global stiffness matrix.

                              MASTER PROCESS                                  WORKER PROCESSES

                                       START                                              START



                                     dd_input
                               (process the input file)



                         Broadcast the processed input data    broadcast     Receive the processed input data




                            Connect to the queue server                        Connect to the queue server



                                      ddefg_stiff                                       ddefg_stiff
                                                                 gather
                              (form the stiffness matrix)                       (form the stiffness matrix)



                         Form the concentrated load vector



                                       ddforce                                            ddforce
                                                                 gather
                          (form the distributed load vector)                 (form the distributed load vector)



                          Assemble the global force vectors



                                   master_ddsolve                                    worker_ddsolve
                                                               collaborate
                            (apply B.C.'s then solve eqns)                            (solve eqns)



                             Write nodal displacements
                                  to the output file


                                       ddpost                                             ddpost
                              (post-process for desired          gather          (post-process for desired
                            displacements and stresses)                        displacements and stresses)


                              Write the post-processed
                              results to the output file



                          Disconnect from the queue server                   Disconnect from the queue server




                                         END                                               END



                     Figure 2: Flowcharts of the Master and Worker Modules

5. NUMERICAL RESULTS

Several 3D, elastostatic examples are solved to illustrate the performance and to verify the validity of
the parallel EFGM analysis code. The results obtained for each analysis closely matched the
analytical solutions [11], as shown in previous serial EFGM works [1]. Thus, the main focus of these
numerical examples is to investigate the run-time and efficiency of the parallel implementation of the
EFGM. Four test cases, with increasing numbers of degrees of freedom, are analyzed using parallel
processor counts ranging from one to four. The             4.5
                                                                                               NP1
specific test cases are listed as: 1) linear               4.0
                                                                                               NP2
displacement patch test (336 d.o.f.); 2) cantilever
                                                           3.5                                 NP3
beam with end loading (825 d.o.f.); 3) pure bending                                            NP4




                                                                 Overall Speedup
of a thick arch (975 d.o.f.); and 4) perforated            3.0
tension strip (2850 d.o.f.). The speedup of the            2.5
overall solution process, the computation and
                                                           2.0
assembly of the global stiffness matrix, and the
solution of the discrete system of equations are           1.5
shown in Figures 3 to 5, respectively. When the            1.0
number of degrees of freedom is less than 1,000,
Figure 4 shows that the speedup of the stiffness           0.5

matrix formulation phase gradually approaches the          0.0
theoretical limit value which is equal to the number           0          1000        2000         3000
of processors used in the analysis. However, the                        Degrees of Freedom
speedup begins to decrease when the number of
degrees of freedom exceeds 1,000, apparently due           Figure 3: Overall Speedup of the EFGM
to the initiation of memory page file swapping on                        Analysis Code
each processor. This may occur since the current
implementation requires the full storage of the global stiffness matrix on each processor. Figure 5
shows that the optimal points, in terms of speedup, for the parallel Gaussian elimination solver are
near 350, 550, and 600 equations for two, three, and four processors, respectively. When the number
of equations is greater than 1000, the speedup of the solver begins to decrease. This may be due to the
same reason as in the stiffness matrix formulation phase, that is, memory page file swapping
commences. Hence, it can be concluded that the current implementation is scalable up to 1,000
degrees of freedom.

                    4.5
                                                    NP1
                                                                                             2.5
                    4.0                             NP2
                                                    NP3
                    3.5                                                                      2.0
Stiffness Speedup




                                                    NP4
                                                                            Solver Speedup




                    3.0

                    2.5                                                                      1.5

                    2.0
                                                                                             1.0
                    1.5
                                                                                                                            NP1
                    1.0                                                                                                     NP2
                                                                                             0.5
                                                                                                                            NP3
                    0.5                                                                                                     NP4

                    0.0                                                                      0.0
                          0      1000       2000          3000                                     0     1000      2000           3000
                               Degrees of Freedom
                                                                                                       Degrees of Freedom

                      Figure 4: Speedup of the Stiffness                                      Figure 5: Speedup of the Gaussian
                           Computation Module                                                        Elimination Solver

6. CONCLUSION

AIT Beowulf, a high-performance yet low-cost parallel computer assembled from a network of
commodity personal computers, was established. A parallel implementation of the element-free
Galerkin method was developed on this platform. Four desired properties of parallel software, which
are concurrency, scalability, locality, and modularity, were taken into account during the design of the
parallel version of the element-free Galerkin method. A dynamic load-balancing algorithm was
utilized for the computation of the structural stiffness matrix and external force vector and a parallel
Gaussian elimination algorithm was employed in the solution for the nodal unknowns
(displacements). Several numerical examples showed that the displacements and stresses obtained
from the parallel implementation closely matched the analytical solutions and exactly matched
solutions obtained by the sequential element-free Galerkin method software. With Qserv, a dynamic
load-balancing algorithm, high scalability was obtained for the three-dimensional structural
mechanics problems up to approximately 1,000 degrees of freedom. However, scalability was not
achieved for larger problems, due to the requirement of full stiffness matrix storage on each processor
while only 64 megabytes of memory was available on each worker node. The parallel Gaussian
elimination equation solver took less time to solve the system of equation than its sequential
counterpart. With larger systems of equations, the efficiency of the parallel equation solver tended to
increase because of the increased computation-to-communication ratio. Nevertheless, in the current
implementation of the parallel EFGM analysis code, when the number of equations was more than
1,000, high efficiency was not obtained. Refinement of the memory management algorithms is
recommended so that the parallel EFGM analysis code may be scalable for problem sizes much larger
than 1,000 degrees of freedom.

7. REFERENCES

[1]  T. Belytschko, Y. Krongauz, D. Organ, M. Fleming, and P. Krysl, “Meshless methods: An
     overview and recent developments”, Computer Methods in Applied Mechanics and
     Engineering, Vol. 139, No. 1-4, pp. 3-47, 1996.
[2] Adeli and O. Kamal, Parallel Processing in Structural Engineering, Elsevier Science
     Publishers Ltd., U.K., 1993.
[3] K. T. Danielson, S. Hao, W. K. Liu, A. Uras, and S. Li, “Parallel computation of meshless
     methods for explicit dynamic analysis”, Accepted for publication in International Journal for
     Numerical Methods in Engineering, 1999.
[4] Brown, UNIX Distributed Programming, Prentice Hall International (UK) Limited, UK, 1994.
[5] P. Merkey, “Beowulf: Introduction & overview”, Center of Excellence in Space Data and
     Information Sciences, University Space Research Association, Goddard Space Flight Center,
     Maryland, USA, September 1998, URL:http://www.beowulf.org/intro.html.
[6] Baker and R. Buyya, “Cluster computing: The commodity supercomputer”, Software—Practice
     and Experience, Vol. 29, No. 6, pp. 551-576, 1999.
[7] J.     Radajewski     and     D.    Eadline,    “Beowulf      HOWTO”,       November    1998,
     URL:http://www.linux.org/help/ldp/howto/Beowulf-HOWTO.html.
[8] W. Gropp and E. Lusk, User's Guide for mpich, a Portable Implementation of MPI, Technical
     Report ANL-96/6, Argonne National Laboratory, USA, 1996.
[9] Stewart and Z. Leyk, Meschach: Matrix Computations in C, Proceedings of the Center for
     Mathematics and Its Applications, Vol. 32, Australian National University, 1994.
[10] Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and
     Analysis of Algorithms, The Benjamin/Cummings Publishing Company, Inc., USA, 1994.
[11] S. P. Timoshenko and J. N. Goodier, Theory of Elasticity, 3rd ed., McGraw-Hill, 1970.

Weitere ähnliche Inhalte

Mehr von Dr. Thiti Vacharasintopchai, ATSI-DX, CISA

Data Security and Data Governance: Foundation and Case Studies - November 12,...
Data Security and Data Governance: Foundation and Case Studies - November 12,...Data Security and Data Governance: Foundation and Case Studies - November 12,...
Data Security and Data Governance: Foundation and Case Studies - November 12,...Dr. Thiti Vacharasintopchai, ATSI-DX, CISA
 
Blockchain and Cryptocurrency Lecture for Accounting Students นักศึกษาบัญชี, ...
Blockchain and Cryptocurrency Lecture for Accounting Students นักศึกษาบัญชี, ...Blockchain and Cryptocurrency Lecture for Accounting Students นักศึกษาบัญชี, ...
Blockchain and Cryptocurrency Lecture for Accounting Students นักศึกษาบัญชี, ...Dr. Thiti Vacharasintopchai, ATSI-DX, CISA
 
Data Security and Data Governance: Foundation and Case Studies - November 4, ...
Data Security and Data Governance: Foundation and Case Studies - November 4, ...Data Security and Data Governance: Foundation and Case Studies - November 4, ...
Data Security and Data Governance: Foundation and Case Studies - November 4, ...Dr. Thiti Vacharasintopchai, ATSI-DX, CISA
 
Smart Cities - A New Professional Platform for Modern Engineers เมืองอัจฉริย...
Smart Cities - A New Professional Platform for Modern Engineers  เมืองอัจฉริย...Smart Cities - A New Professional Platform for Modern Engineers  เมืองอัจฉริย...
Smart Cities - A New Professional Platform for Modern Engineers เมืองอัจฉริย...Dr. Thiti Vacharasintopchai, ATSI-DX, CISA
 
Construction 4.0 & Drones in Action - ดร.ธิติ วัชรสินธพชัย
Construction 4.0 & Drones in Action - ดร.ธิติ วัชรสินธพชัยConstruction 4.0 & Drones in Action - ดร.ธิติ วัชรสินธพชัย
Construction 4.0 & Drones in Action - ดร.ธิติ วัชรสินธพชัยDr. Thiti Vacharasintopchai, ATSI-DX, CISA
 
Knowledge Management (KM) in Business - ม.เทคโนโลยีสุรนารี - 18 ส.ค. 63
Knowledge Management (KM) in Business - ม.เทคโนโลยีสุรนารี - 18 ส.ค. 63Knowledge Management (KM) in Business - ม.เทคโนโลยีสุรนารี - 18 ส.ค. 63
Knowledge Management (KM) in Business - ม.เทคโนโลยีสุรนารี - 18 ส.ค. 63Dr. Thiti Vacharasintopchai, ATSI-DX, CISA
 
Smart City: A New Professional Platform for Modern Engineers - AIT Graduates ...
Smart City: A New Professional Platform for Modern Engineers - AIT Graduates ...Smart City: A New Professional Platform for Modern Engineers - AIT Graduates ...
Smart City: A New Professional Platform for Modern Engineers - AIT Graduates ...Dr. Thiti Vacharasintopchai, ATSI-DX, CISA
 
ระบบการจัดการห้องสมุดดิจิทัล : คุณสมบัติ ความสามารถ การใช้งาน ประโยชน์
ระบบการจัดการห้องสมุดดิจิทัล : คุณสมบัติ ความสามารถ การใช้งาน ประโยชน์ระบบการจัดการห้องสมุดดิจิทัล : คุณสมบัติ ความสามารถ การใช้งาน ประโยชน์
ระบบการจัดการห้องสมุดดิจิทัล : คุณสมบัติ ความสามารถ การใช้งาน ประโยชน์Dr. Thiti Vacharasintopchai, ATSI-DX, CISA
 
แนวทางการสร้างทรัพยาการสารสนเทศดิจิทัล (Digital Library Collection)
แนวทางการสร้างทรัพยาการสารสนเทศดิจิทัล (Digital Library Collection)แนวทางการสร้างทรัพยาการสารสนเทศดิจิทัล (Digital Library Collection)
แนวทางการสร้างทรัพยาการสารสนเทศดิจิทัล (Digital Library Collection)Dr. Thiti Vacharasintopchai, ATSI-DX, CISA
 
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กรการประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กรDr. Thiti Vacharasintopchai, ATSI-DX, CISA
 
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กรการประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กรDr. Thiti Vacharasintopchai, ATSI-DX, CISA
 
Weblog, Digital Library, and Semantic Web Services Approach to Computer-Aided...
Weblog, Digital Library, and Semantic Web Services Approach to Computer-Aided...Weblog, Digital Library, and Semantic Web Services Approach to Computer-Aided...
Weblog, Digital Library, and Semantic Web Services Approach to Computer-Aided...Dr. Thiti Vacharasintopchai, ATSI-DX, CISA
 
Semantic Web Services for Computational Mechanics : A Literature Survey and R...
Semantic Web Services for Computational Mechanics : A Literature Survey and R...Semantic Web Services for Computational Mechanics : A Literature Survey and R...
Semantic Web Services for Computational Mechanics : A Literature Survey and R...Dr. Thiti Vacharasintopchai, ATSI-DX, CISA
 

Mehr von Dr. Thiti Vacharasintopchai, ATSI-DX, CISA (20)

Civil Engineers and the Development of Smart City
Civil Engineers and the Development of Smart CityCivil Engineers and the Development of Smart City
Civil Engineers and the Development of Smart City
 
Data Security and Data Governance: Foundation and Case Studies - November 12,...
Data Security and Data Governance: Foundation and Case Studies - November 12,...Data Security and Data Governance: Foundation and Case Studies - November 12,...
Data Security and Data Governance: Foundation and Case Studies - November 12,...
 
Blockchain and Cryptocurrency Lecture for Accounting Students นักศึกษาบัญชี, ...
Blockchain and Cryptocurrency Lecture for Accounting Students นักศึกษาบัญชี, ...Blockchain and Cryptocurrency Lecture for Accounting Students นักศึกษาบัญชี, ...
Blockchain and Cryptocurrency Lecture for Accounting Students นักศึกษาบัญชี, ...
 
Data Security and Data Governance: Foundation and Case Studies - November 4, ...
Data Security and Data Governance: Foundation and Case Studies - November 4, ...Data Security and Data Governance: Foundation and Case Studies - November 4, ...
Data Security and Data Governance: Foundation and Case Studies - November 4, ...
 
Smart Cities - A New Professional Platform for Modern Engineers เมืองอัจฉริย...
Smart Cities - A New Professional Platform for Modern Engineers  เมืองอัจฉริย...Smart Cities - A New Professional Platform for Modern Engineers  เมืองอัจฉริย...
Smart Cities - A New Professional Platform for Modern Engineers เมืองอัจฉริย...
 
Construction 4.0 & Drones in Action - ดร.ธิติ วัชรสินธพชัย
Construction 4.0 & Drones in Action - ดร.ธิติ วัชรสินธพชัยConstruction 4.0 & Drones in Action - ดร.ธิติ วัชรสินธพชัย
Construction 4.0 & Drones in Action - ดร.ธิติ วัชรสินธพชัย
 
Knowledge Management (KM) in Business - ม.เทคโนโลยีสุรนารี - 18 ส.ค. 63
Knowledge Management (KM) in Business - ม.เทคโนโลยีสุรนารี - 18 ส.ค. 63Knowledge Management (KM) in Business - ม.เทคโนโลยีสุรนารี - 18 ส.ค. 63
Knowledge Management (KM) in Business - ม.เทคโนโลยีสุรนารี - 18 ส.ค. 63
 
Smart City: A New Professional Platform for Modern Engineers - AIT Graduates ...
Smart City: A New Professional Platform for Modern Engineers - AIT Graduates ...Smart City: A New Professional Platform for Modern Engineers - AIT Graduates ...
Smart City: A New Professional Platform for Modern Engineers - AIT Graduates ...
 
ระบบการจัดการห้องสมุดดิจิทัล : คุณสมบัติ ความสามารถ การใช้งาน ประโยชน์
ระบบการจัดการห้องสมุดดิจิทัล : คุณสมบัติ ความสามารถ การใช้งาน ประโยชน์ระบบการจัดการห้องสมุดดิจิทัล : คุณสมบัติ ความสามารถ การใช้งาน ประโยชน์
ระบบการจัดการห้องสมุดดิจิทัล : คุณสมบัติ ความสามารถ การใช้งาน ประโยชน์
 
แนวทางการสร้างทรัพยาการสารสนเทศดิจิทัล (Digital Library Collection)
แนวทางการสร้างทรัพยาการสารสนเทศดิจิทัล (Digital Library Collection)แนวทางการสร้างทรัพยาการสารสนเทศดิจิทัล (Digital Library Collection)
แนวทางการสร้างทรัพยาการสารสนเทศดิจิทัล (Digital Library Collection)
 
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กรการประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
 
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กรการประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
 
Introducing Architectural Precast Concrete Structures - Part 2
Introducing Architectural Precast Concrete Structures - Part 2Introducing Architectural Precast Concrete Structures - Part 2
Introducing Architectural Precast Concrete Structures - Part 2
 
Low-rise vs. Tall Buildings: What is Safer during Earthquake in Bangkok?
Low-rise vs. Tall Buildings: What is Safer during Earthquake in Bangkok?Low-rise vs. Tall Buildings: What is Safer during Earthquake in Bangkok?
Low-rise vs. Tall Buildings: What is Safer during Earthquake in Bangkok?
 
Weblog and Digital Library in Knowledge Management
Weblog and Digital Library in Knowledge ManagementWeblog and Digital Library in Knowledge Management
Weblog and Digital Library in Knowledge Management
 
Introducing Architectural Precast Concrete Structures - Part 1
Introducing Architectural Precast Concrete Structures - Part 1Introducing Architectural Precast Concrete Structures - Part 1
Introducing Architectural Precast Concrete Structures - Part 1
 
An Improved People-Search Technique for Directed Social Network Graphs
An Improved People-Search Technique for Directed Social Network GraphsAn Improved People-Search Technique for Directed Social Network Graphs
An Improved People-Search Technique for Directed Social Network Graphs
 
Weblog, Digital Library, and Semantic Web Services Approach to Computer-Aided...
Weblog, Digital Library, and Semantic Web Services Approach to Computer-Aided...Weblog, Digital Library, and Semantic Web Services Approach to Computer-Aided...
Weblog, Digital Library, and Semantic Web Services Approach to Computer-Aided...
 
A Structural Engineering Support System using Semantic Computing
A Structural Engineering Support System using Semantic ComputingA Structural Engineering Support System using Semantic Computing
A Structural Engineering Support System using Semantic Computing
 
Semantic Web Services for Computational Mechanics : A Literature Survey and R...
Semantic Web Services for Computational Mechanics : A Literature Survey and R...Semantic Web Services for Computational Mechanics : A Literature Survey and R...
Semantic Web Services for Computational Mechanics : A Literature Survey and R...
 

Kürzlich hochgeladen

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 

Kürzlich hochgeladen (20)

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 

A Parallel Implementation of the Element-Free Galerkin Method

  • 1. Paper No. 1068, Proc. 8 th. East Asia-Pacific Conference on Structural Engineering and Construction (EASEC-8), Singapore, December 5-7, 2001. A PARALLEL IMPLEMENTATION OF THE ELEMENT-FREE GALERKIN METHOD W. Barry1 and T. Vacharasintopchai2 ABSTRACT : This work focuses on the application of parallel processing to element-free Galerkin method analyses, particularly in the formulation of the stiffness matrix, the assembly of the system of discrete equations, and the solution for nodal unknowns. The objective is to significantly reduce the analysis time while retaining high efficiency and accuracy. Several relatively low-cost Intel Pentium-based personal computers are joined together to form a parallel computer. The processors communicate via a local high-speed network using the Message Passing Interface. Load balancing is achieved through the use of a dynamic queue server that assigns tasks to available processors. Benchmark problems in 3D structural mechanics are analyzed to demonstrate that the parallelized computer program can provide substantially shorter run time than its serial counterpart, without loss of solution accuracy. KEYWORDS : meshless method, parallel processing, element-free Galerkin method, EFGM, queue server, Beowulf, solid mechanics 1. INTRODUCTION In performing the finite element analysis of structural components, meshing, which is the process of discretizing the problem domain into small sub-regions or elements with specific nodal connectivities, can be a tedious and time-consuming task. Although some relatively simple geometric configurations may be meshed automatically, some complex geometric configurations require manual preparation of the mesh. The element-free Galerkin method (EFGM), one of the recently developed meshless methods, avoids the need for meshing by employing a moving least-squares (MLS) approximation for the field quantities of interest. With EFGM, the discrete model of the problem domain is completely described by nodes and a description of the problem domain boundary. This is a particular advantage for problems involving propagating cracks or large deformations since no remeshing is required at each step of the analysis. Detailed formulations of the MLS approximation functions and the application of EFGM to problems in solid mechanics may be found in [1]. However, the advantage of avoiding the requirement of a mesh does not come cheaply, as EFGM is much more computationally expensive than the finite element method (FEM). The increased computational cost is especially evident for three-dimensional and non-linear applications of the EFGM, due to the usage of MLS shape functions, which are formulated by a least-squares procedure at each integration point. This computational costliness is the predominant drawback of EFGM. Parallel processing has long been an available technique to improve the performance of scientific computing programs. Typically, a parallel computer program employs the ‘divide and conquer’ 1 Asian Institute of Technology, Thailand, Assistant Professor 2 Asian Institute of Technology, Thailand, Graduate Student
  • 2. paradigm [2], which involves the partitioning of a large task into several smaller tasks that are then assigned to available computer processors. Efficient load balancing ensures that all processors are busy working on assigned tasks as long as there are unfinished tasks. The most common approach taken in computational mechanics is domain decomposition [3], a method of static load balancing in which the tasks are identified prior to the analysis and assigned to each processor, along with any data that may be required. Due to the complex nodal connectivities that arise in the EFGM, domain decomposition may not be the most efficient approach, and thus a dynamic class of load balancing based on the concept of a queue server is employed in this work. 2. THE AIT BEOWULF The effort to deliver low-cost, high-performance computing platforms to scientific communities has been on-going for many years. A network of personal computers is attractive for this type of use since it has the same architecture as a distributed memory multi-computer system [4]. Many research groups have assembled commodity off-the-shelf PC’s and fast LAN connections to build parallel computers. Parallel computers of this type, termed Beowulf computers after the NASA project of the same name [5], are suitable for coarse-grained applications that are not communication intensive because of the high communication start-up time and the limited bandwidth associated with the underlying network architectures [6]. The AIT Beowulf, a four-node Beowulf class parallel computer was assembled based on the guidelines in [5] and [7]. Red Hat Linux 6.0, including both the server and workstation operating system packages, was installed on each node. The AIT Beowulf is a message-passing multiple- instruction, multiple-data (MIMD) architecture and thus a message-passing infrastructure is needed. The mpich library [8], which is the most widely used free implementation of the Message Passing Interface was chosen for the AIT Beowulf. Meschach, a powerful matrix computation library [9] is employed for serial matrix operations that are performed on each processor. 3. THE QUEUE SERVER Load balancing has a crucial role in the performance of parallel software. If unbalanced workloads are assigned to the processors, some may finish their work and be forced to wait for the other processors to finish, leading to reduced efficiency and increased run-times. In this work, a dynamic load- balancing agent named Qserv is developed within the framework of the EFGM. Qserv balances the computational load among the processors in the AIT Beowulf during run-time by acting as clerk that directs the queued tasks to the available processors. When one processor finishes a task, it requests another task from Qserv, which continues assigning the tasks to processors until no unfinished tasks remain. Figure 1 presents a flowchart of the queue server designed and implemented in the current work. To separate the dynamic workload allocation from normal operations, the communication between Qserv and the processors is done through the UNIX socket concept developed at the University of California at Berkeley [4]. When the Qserv process is initiated, it creates a socket that allows the processors to simultaneously connect. Initially, the number of total unprocessed subtasks known to Qserv is zero, and one processor, usually the master processor, must inform Qserv of the actual value. This number is stored in the max_num variable and can be altered by processors through the SET_MAX_NUM request. A processor can ask Qserv, through the GET_NUM request, for a subtask to work on. It will be assigned the numerical identifier of an unprocessed subtask, ranging from zero to max_num. When the unprocessed subtasks are exhausted, an ALL_DONE signal will be sent to acknowledge the requesting processor. During the execution of Qserv, a process can also reset the subtask identifier counter by the RESET_COUNTER request. Qserv will continue serving tasks to processors until the TERMINATE signal is received.
  • 3. START fd = current client identifier Initialize the Socket max_fd = number of client connections maintained runstate = run state of the server program count = current counter value runstate = READY max_num = maximum counter value count = 0 request_msg = current client's request message max_num = 0 runstate = Close the YES END TERMINATE Socket NO NO Accept a client connection request and update max_fd fd <= max_fd YES Receive request_msg Error receiving request_msg YES NO Process the request request msg = request msg = request msg = request msg = TERMINATE RESET_COUNTER SET_MAX_NUM GET_NUM YES YES YES YES get the new runstate = count <= count = 0 max_num from NO TERMINATE max_num the client YES send the message max_num = new send 'count' to the ALL_DONE to the max_num client client Close the connection and update max_fd count = count + 1 move to the next client Figure 1: Flowchart of the Queue Server 4. SOFTWARE IMPLEMENTATION When a parallel program is run, each parallel processor will have one copy of the executable program, termed a process. One process is assigned as the master process while the remaining processes are worker processes. The MPI default process identifier of the master is 0. In addition to performing the basic tasks of a worker process, the master process performs additional work involved with coordinating the tasks among all the workers. Therefore the master process is assigned to run on the
  • 4. server node, which is the most powerful processor, in terms of both processor speed and core memory, in the AIT Beowulf. A flowchart of the main process computer code for both the master node and the workers nodes is presented in Figure 2. The analysis procedures can be grouped into five phases, namely, the pre- processing phase, the stiffness matrix formulation phase, the force vector formulation phase, the solution phase, and the post-processing phase. A custom-made parallel Gaussian elimination equation solver, developed based on the algorithm presented in [10], is employed in the solution phase since the available public domain parallel equation solvers are typically efficient only for banded, sparse matrices, which does not match the dense property of the EFGM global stiffness matrix. MASTER PROCESS WORKER PROCESSES START START dd_input (process the input file) Broadcast the processed input data broadcast Receive the processed input data Connect to the queue server Connect to the queue server ddefg_stiff ddefg_stiff gather (form the stiffness matrix) (form the stiffness matrix) Form the concentrated load vector ddforce ddforce gather (form the distributed load vector) (form the distributed load vector) Assemble the global force vectors master_ddsolve worker_ddsolve collaborate (apply B.C.'s then solve eqns) (solve eqns) Write nodal displacements to the output file ddpost ddpost (post-process for desired gather (post-process for desired displacements and stresses) displacements and stresses) Write the post-processed results to the output file Disconnect from the queue server Disconnect from the queue server END END Figure 2: Flowcharts of the Master and Worker Modules 5. NUMERICAL RESULTS Several 3D, elastostatic examples are solved to illustrate the performance and to verify the validity of the parallel EFGM analysis code. The results obtained for each analysis closely matched the analytical solutions [11], as shown in previous serial EFGM works [1]. Thus, the main focus of these numerical examples is to investigate the run-time and efficiency of the parallel implementation of the EFGM. Four test cases, with increasing numbers of degrees of freedom, are analyzed using parallel
  • 5. processor counts ranging from one to four. The 4.5 NP1 specific test cases are listed as: 1) linear 4.0 NP2 displacement patch test (336 d.o.f.); 2) cantilever 3.5 NP3 beam with end loading (825 d.o.f.); 3) pure bending NP4 Overall Speedup of a thick arch (975 d.o.f.); and 4) perforated 3.0 tension strip (2850 d.o.f.). The speedup of the 2.5 overall solution process, the computation and 2.0 assembly of the global stiffness matrix, and the solution of the discrete system of equations are 1.5 shown in Figures 3 to 5, respectively. When the 1.0 number of degrees of freedom is less than 1,000, Figure 4 shows that the speedup of the stiffness 0.5 matrix formulation phase gradually approaches the 0.0 theoretical limit value which is equal to the number 0 1000 2000 3000 of processors used in the analysis. However, the Degrees of Freedom speedup begins to decrease when the number of degrees of freedom exceeds 1,000, apparently due Figure 3: Overall Speedup of the EFGM to the initiation of memory page file swapping on Analysis Code each processor. This may occur since the current implementation requires the full storage of the global stiffness matrix on each processor. Figure 5 shows that the optimal points, in terms of speedup, for the parallel Gaussian elimination solver are near 350, 550, and 600 equations for two, three, and four processors, respectively. When the number of equations is greater than 1000, the speedup of the solver begins to decrease. This may be due to the same reason as in the stiffness matrix formulation phase, that is, memory page file swapping commences. Hence, it can be concluded that the current implementation is scalable up to 1,000 degrees of freedom. 4.5 NP1 2.5 4.0 NP2 NP3 3.5 2.0 Stiffness Speedup NP4 Solver Speedup 3.0 2.5 1.5 2.0 1.0 1.5 NP1 1.0 NP2 0.5 NP3 0.5 NP4 0.0 0.0 0 1000 2000 3000 0 1000 2000 3000 Degrees of Freedom Degrees of Freedom Figure 4: Speedup of the Stiffness Figure 5: Speedup of the Gaussian Computation Module Elimination Solver 6. CONCLUSION AIT Beowulf, a high-performance yet low-cost parallel computer assembled from a network of commodity personal computers, was established. A parallel implementation of the element-free Galerkin method was developed on this platform. Four desired properties of parallel software, which are concurrency, scalability, locality, and modularity, were taken into account during the design of the
  • 6. parallel version of the element-free Galerkin method. A dynamic load-balancing algorithm was utilized for the computation of the structural stiffness matrix and external force vector and a parallel Gaussian elimination algorithm was employed in the solution for the nodal unknowns (displacements). Several numerical examples showed that the displacements and stresses obtained from the parallel implementation closely matched the analytical solutions and exactly matched solutions obtained by the sequential element-free Galerkin method software. With Qserv, a dynamic load-balancing algorithm, high scalability was obtained for the three-dimensional structural mechanics problems up to approximately 1,000 degrees of freedom. However, scalability was not achieved for larger problems, due to the requirement of full stiffness matrix storage on each processor while only 64 megabytes of memory was available on each worker node. The parallel Gaussian elimination equation solver took less time to solve the system of equation than its sequential counterpart. With larger systems of equations, the efficiency of the parallel equation solver tended to increase because of the increased computation-to-communication ratio. Nevertheless, in the current implementation of the parallel EFGM analysis code, when the number of equations was more than 1,000, high efficiency was not obtained. Refinement of the memory management algorithms is recommended so that the parallel EFGM analysis code may be scalable for problem sizes much larger than 1,000 degrees of freedom. 7. REFERENCES [1] T. Belytschko, Y. Krongauz, D. Organ, M. Fleming, and P. Krysl, “Meshless methods: An overview and recent developments”, Computer Methods in Applied Mechanics and Engineering, Vol. 139, No. 1-4, pp. 3-47, 1996. [2] Adeli and O. Kamal, Parallel Processing in Structural Engineering, Elsevier Science Publishers Ltd., U.K., 1993. [3] K. T. Danielson, S. Hao, W. K. Liu, A. Uras, and S. Li, “Parallel computation of meshless methods for explicit dynamic analysis”, Accepted for publication in International Journal for Numerical Methods in Engineering, 1999. [4] Brown, UNIX Distributed Programming, Prentice Hall International (UK) Limited, UK, 1994. [5] P. Merkey, “Beowulf: Introduction & overview”, Center of Excellence in Space Data and Information Sciences, University Space Research Association, Goddard Space Flight Center, Maryland, USA, September 1998, URL:http://www.beowulf.org/intro.html. [6] Baker and R. Buyya, “Cluster computing: The commodity supercomputer”, Software—Practice and Experience, Vol. 29, No. 6, pp. 551-576, 1999. [7] J. Radajewski and D. Eadline, “Beowulf HOWTO”, November 1998, URL:http://www.linux.org/help/ldp/howto/Beowulf-HOWTO.html. [8] W. Gropp and E. Lusk, User's Guide for mpich, a Portable Implementation of MPI, Technical Report ANL-96/6, Argonne National Laboratory, USA, 1996. [9] Stewart and Z. Leyk, Meschach: Matrix Computations in C, Proceedings of the Center for Mathematics and Its Applications, Vol. 32, Australian National University, 1994. [10] Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms, The Benjamin/Cummings Publishing Company, Inc., USA, 1994. [11] S. P. Timoshenko and J. N. Goodier, Theory of Elasticity, 3rd ed., McGraw-Hill, 1970.