libHPC: Software sustainability and reuse through metadata preservation

libHPC: Software sustainability and
reuse through metadata preservation
Jeremy Cohen, John Darlington, Brian Fuchs
London e-Science Centre / Department of Computing, Imperial College London
David Moxey, Chris Cantwell, Pavel Burovskiy, Spencer Sherwin
Department of Aeronautics, Imperial College London
Neil Chue Hong
Software Sustainability Institute, University of Edinburgh

First Workshop on Maintainable Software Practices in e-Science, Chicago
Tuesday 9th October 2012

Introduction

•  Decision making – building scientific software can be hard

•  Abstraction – hide the complexity

•  Efficiency – achieve the performance

•  Aim for a universal technology that spans all application

domains, machines, metrics

ns

M
Num.

tio
•  Coordination forms – a different approach to task
Cluster

ac
Intensive

ica
Cloud

hi
Data Intensive

ne
Multi-core

pl

s
Bioinformatics GPU

Ap
specification CFD FPGA

Cost

Time Energy

•  Components – encapsulated building blocks Metrics

Information and decisions

Why is software development and re-use hard?

•  A particular piece of code is the result of many development decisions

•  Developers invest significant knowledge about the task to be solved

…however…

•  Decisions made by developers cannot be reconstructed from the code

•  Loss of original information and structure invested by developer(s)


Understanding code structure and the options available and the decisions
made during development is important:

•  Portability; optimisation on different architectures

•  Long-term sustainability

Need an explicit representation of decisions and alternatives:

•  Decision tree used to represent this (structure)

•  Metadata used to annotate decision tree (information)

•  Modifications can be made to decision tree (based on metadata
analysis) which can than be mapped to modified code


e.g. code that uses a solver:

•  Many options to select suitable solver – abstract components

•  Choice dependent on problem being addressed, parameters, etc.

•  Represent solver choice on a tree of component alternatives, leaf
nodes are concrete implementations higher-level nodes are abstract

Matrix Linear Vector
Vector Solver"

Matrix Matrix
Vector LU" Vector Jacobi" Vector
Vector

Parallel LU" Parallel LU" Sequential Parallel Jacobi
Sequential LU"
(OpenMP)" (MPI)" Jacobi" (UPC)"

Abstractions

a Encapsulation
Encapsulate functions as components (reuse)

Allow alternatives

a Functional properties
Referentially transparent a Encapsulation

Church-Rosser a Alternative behaviours

Abstractions – alternative behaviours

i.e. Church-Rosser

(4 + 3) – (2 + 1)

7 – (2 + 1) (4 + 3) – 3

7–3

4

Application flow and specification

We represent application elements using two techniques

•  Data processing – core code that forms application building
blocks
a Components (first-order functions)

•  Control flow, orchestration
a High-order functions

a Coordination Forms

e.g. Pipe, Parallel, Map / Reduce, …

Coordination Forms

•  A functional/mathematical approach to job specification

•  Based on work by Darlington, et al.
J. Darlington, Y. Guo, H. W. To and J. Yang. Functional skeletons for parallel coordination.
In proceedings of EURO-PAR ’95 Parallel Processing, LNCS 966/1995, p. 55-66, 1995.
Springer Berlin/Heidelberg

•  Applied to components – define application flow

•  May be:
•  General – applicable to most applications – e.g. PIPE, PAR

•  Iterative patterns – e.g FARM, ITERATE

•  Domain-specific higher-level forms – e.g. Monte Carlo

•  Extensible – new patterns can be introduced

Coordination Forms

•  A given form may have multiple underlying implementations
•  E.g. PAR may provide sequential, multi-threaded and MPI parallel
implementations

•  Forms aim to be as lightweight as possible
•  They result in code that can be run

•  They intelligently glue together component building blocks

•  PIPE as an example – functions f1 to fn with initial input a:

PIPE [ f1, f2,…fn ]a = (f1 ° f2 ° … fn)a

= f1(f2 (… (fn(a))))

Coordination Forms – Impementation

•  Prototype implementation in Python
•  Class wrappers for component and parameter metadata –
concrete implementation code selectable

PIPE – Compose a series of components in the order specified
PIPE ([component list], initial input)

Additional parameters can be added in component list

PAR – Run a series of components independently (perhaps in parallel)
PAR ([component list], [(input1), (input2), …, (inputn)])

E.g. for components add, multiply, divide:

2 * ( (245+34) / (6+8) )

PIPE([(multiply, 2), divide, PAR([add,add],[(245,34),(6,8)])])

Bioinformatics: Genome Read Pre-Processing/Mapping
Short Read

Input files –
Reference
Set (Paired)
Genome

Reference Genome – FASTA file
Single FASTQ
FASTA file
file

Reads from sequencing machine - FASTQ
bwa index FASTQ split

((sr1,sr2), u) = PAR([fastq_split, bwa_index], SR_1 SR_2

[(short_read_file, None, None),(ref_genome_file,)])
bwa aln bwa aln

(v, w) = PAR([bwa_aln, bwa_aln],

FASTA file + index file
[(ref_genome_file, sr1, None),
bwa sampe - generate alignment (paired ended)

(ref_genome_file, sr2, None)]) SAM file

samtools import

result = PIPE([samtools_index, samtools_sort, BAM file

(samtools_import, ref_genome_file), samtools sort

bwa_sampe], sorted BAM file

[ref_genome_file, [v,w], [sr1, sr2], None]) samtools index

OUTPUT

LibHPC Project

•  LibHPC

•  Two year project under EPSRC HPC Software Programme

•  Imperial College London (Computing (LeSC), Aeronautics,
ICT)
•  SSI, Edinburgh

•  Implementing/demonstrating framework with main
supporting application (Nektar++) + other exemplars

Example

High-level Application Description / Job Specification
(Co-ordination Forms, DSLs, etc.)

Job Specification Analysis/Processing

Optimising Software Component
Library & Metadata Resource
Discovery &
Domain-specific

FEM Codes Application Support Metadata
Libraries

Hardware Resources

Nektar++ - Hybrid Assembly

•  Nektar++ operates on
matrices based on input
mesh
•  Each element of input mesh
is mapped to an (elemental)
matrix
•  There are two matrix
assembly strategies:
•  Local

•  Global

    
    
    
Nektar++ - Hybrid Assembly
  = 
    
    
    

    
    
    
    
    
    




  
       




  
       




= 
     = 
         






  
  
  
    




  
       
    
    

    

Local Assembly
   
 Global Assembly  
    
    




  
     




= 
     




  
     




  
     
       
  
  
  
  
  
  
  
  
  
  

    
    
    
  = 
    
    
    
    
    
    
    

 Nektar++ - Hybrid Assembly


  
  

    
    
    
    
    
    
  = 
    
    
    
    
    
    

Hybrid Assembly

libHPC: Software sustainability and reuse through metadata preservation

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie libHPC: Software sustainability and reuse through metadata preservation

Ähnlich wie libHPC: Software sustainability and reuse through metadata preservation (20)

Mehr von SoftwarePractice

Mehr von SoftwarePractice (7)

libHPC: Software sustainability and reuse through metadata preservation