SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
libHPC: Software sustainability and
reuse through metadata preservation
Jeremy Cohen, John Darlington, Brian Fuchs
London e-Science Centre / Department of Computing, Imperial College London
David Moxey, Chris Cantwell, Pavel Burovskiy, Spencer Sherwin
Department of Aeronautics, Imperial College London
Neil Chue Hong
Software Sustainability Institute, University of Edinburgh



First Workshop on Maintainable Software Practices in e-Science, Chicago
Tuesday 9th October 2012
Introduction


•  Decision making – building scientific software can be hard

•  Abstraction – hide the complexity

•  Efficiency – achieve the performance

•  Aim for a universal technology that spans all application

  domains, machines, metrics




                                                             ns




                                                                                           M
                                                                           Num.




                                                            tio
•  Coordination forms – a different approach to task
                                                                                     Cluster




                                                                                                ac
                                                                         Intensive




                                                        ica
                                                                                        Cloud




                                                                                                  hi
                                                                  Data Intensive




                                                                                                      ne
                                                                                        Multi-core




                                                       pl




                                                                                                        s
                                                             Bioinformatics                     GPU




                                                       Ap
  specification                                             CFD                                   FPGA

                                                                                 Cost

                                                                  Time                         Energy

•  Components – encapsulated building blocks                                 Metrics
Information and decisions

Why is software development and re-use hard?

•  A particular piece of code is the result of many development decisions

•  Developers invest significant knowledge about the task to be solved



                               …however…



•  Decisions made by developers cannot be reconstructed from the code

•  Loss of original information and structure invested by developer(s)
Information and decisions

Understanding code structure and the options available and the decisions
made during development is important:

•  Portability; optimisation on different architectures

•  Long-term sustainability



Need an explicit representation of decisions and alternatives:

•  Decision tree used to represent this (structure)

•  Metadata used to annotate decision tree (information)

•  Modifications can be made to decision tree (based on metadata
   analysis) which can than be mapped to modified code
Information and decisions

 e.g. code that uses a solver:

 •  Many options to select suitable solver – abstract components

 •  Choice dependent on problem being addressed, parameters, etc.

 •  Represent solver choice on a tree of component alternatives, leaf
    nodes are concrete implementations higher-level nodes are abstract

                           Matrix              Linear             Vector
                           Vector              Solver"



 Matrix                                                  Matrix
 Vector             LU"               Vector                               Jacobi"          Vector
                                                         Vector



                 Parallel LU"       Parallel LU"              Sequential        Parallel Jacobi
Sequential LU"
                 (OpenMP)"            (MPI)"                   Jacobi"              (UPC)"
Abstractions


      a Encapsulation
               Encapsulate functions as components (reuse)

               Allow alternatives



      a Functional properties
               Referentially transparent a Encapsulation

               Church-Rosser a Alternative behaviours
Abstractions – alternative behaviours

i.e. Church-Rosser


                     (4 + 3) – (2 + 1)


          7 – (2 + 1)               (4 + 3) – 3


                          7–3

                            4
Application flow and specification


We represent application elements using two techniques

•  Data processing – core code that forms application building
   blocks
        a   Components (first-order functions)



•  Control flow, orchestration
        a   High-order functions

        a   Coordination Forms

                 e.g. Pipe, Parallel, Map / Reduce, …
Coordination Forms


•  A functional/mathematical approach to job specification

•  Based on work by Darlington, et al.
   J. Darlington, Y. Guo, H. W. To and J. Yang. Functional skeletons for parallel coordination.
   In proceedings of EURO-PAR ’95 Parallel Processing, LNCS 966/1995, p. 55-66, 1995.
   Springer Berlin/Heidelberg


•  Applied to components – define application flow

•  May be:
  •  General – applicable to most applications – e.g. PIPE, PAR

  •  Iterative patterns – e.g FARM, ITERATE

  •  Domain-specific higher-level forms – e.g. Monte Carlo

  •  Extensible – new patterns can be introduced
Coordination Forms


•  A given form may have multiple underlying implementations
  •  E.g. PAR may provide sequential, multi-threaded and MPI parallel
     implementations

•  Forms aim to be as lightweight as possible
  •  They result in code that can be run

  •  They intelligently glue together component building blocks

•  PIPE as an example – functions f1 to fn with initial input a:

        PIPE [ f1, f2,…fn ]a = (f1 ° f2 ° … fn)a

                                    = f1(f2 (… (fn(a))))
Coordination Forms – Impementation

•  Prototype implementation in Python
•  Class wrappers for component and parameter metadata –
   concrete implementation code selectable

PIPE – Compose a series of components in the order specified
PIPE ([component list], initial input)

Additional parameters can be added in component list

PAR – Run a series of components independently (perhaps in parallel)
PAR ([component list], [(input1), (input2), …, (inputn)])

E.g. for components add, multiply, divide:

2 * ( (245+34) / (6+8) )

PIPE([(multiply, 2), divide, PAR([add,add],[(245,34),(6,8)])])
Bioinformatics: Genome Read Pre-Processing/Mapping
                                                                                                               Short Read

Input files –
                                                         Reference
                                                                                                               Set (Paired)
                                                          Genome


   Reference Genome – FASTA file
                                                                                                               Single FASTQ
                                                         FASTA file
                                                                                                                     file


   Reads from sequencing machine - FASTQ
                                                         bwa index                                             FASTQ split



((sr1,sr2), u) = PAR([fastq_split, bwa_index],                                                          SR_1                  SR_2


   [(short_read_file, None, None),(ref_genome_file,)])
                                                                                                     bwa aln                  bwa aln




         (v, w) = PAR([bwa_aln, bwa_aln],




                                                            FASTA file + index file
             [(ref_genome_file, sr1, None),
                                                                                    bwa sampe - generate alignment (paired ended)



             (ref_genome_file, sr2, None)])                                                          SAM file


                                                                                                   samtools import



result = PIPE([samtools_index, samtools_sort,                                                        BAM file



              (samtools_import, ref_genome_file),                                                   samtools sort


               bwa_sampe],                                                                         sorted BAM file



         [ref_genome_file, [v,w], [sr1, sr2], None])                                               samtools index




                                                                                                     OUTPUT
LibHPC Project


•  LibHPC

 •  Two year project under EPSRC HPC Software Programme

 •  Imperial College London (Computing (LeSC), Aeronautics,
    ICT)
 •  SSI, Edinburgh

•  Implementing/demonstrating framework with main
   supporting application (Nektar++) + other exemplars
Example


              High-level Application Description / Job Specification
                       (Co-ordination Forms, DSLs, etc.)


                     Job Specification Analysis/Processing



Optimising   Software Component
              Library & Metadata       Resource
                                      Discovery &
              Domain-specific

FEM Codes    Application Support       Metadata
                  Libraries



                             Hardware Resources
Nektar++ - Hybrid Assembly


•  Nektar++ operates on
   matrices based on input
   mesh
•  Each element of input mesh
   is mapped to an (elemental)
   matrix
•  There are two matrix
   assembly strategies:
 •  Local

 •  Global
               
               
               
    Nektar++ - Hybrid Assembly
             = 
               
               
               


                             
                             
                                                         
                             
                             
                             


                         
                         
                                
                                                    


                         
                         
                                
                                                    


                         
                         
                              = 
                                                  = 
                                                 



                         
                         
                         
                                
                                
                                
                                                       


                         
                         
                                
                                                    
                             
                             


                      

      Local Assembly
                       
                                         Global Assembly          
                      
                      


                  
                  
                         
                                                                 


                  
                  
                       = 
                                                                 


                  
                  
                         
                                                                 


                  
                  
                         
                                                                 
                                                              
                                                                    
                                                                    
                                                                    
                                                                    
                                                                    
                                                                    
                                                                    
                                                                    
                                                                    
                                                                    
                              
                              
                              
                            = 
                              
                              
                              
                              
                              
                              
                              

   Nektar++ - Hybrid Assembly
                          
                          
                                 
                                 


                       
                       
                       
                       
                       
                       
                     = 
                       
                       
                       
                       
                       
                       




      Hybrid Assembly
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignmentDavid Gleich
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerFörderverein Technische Fakultät
 
Multilayer Slides
Multilayer  SlidesMultilayer  Slides
Multilayer SlidesESCOM
 
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...chakravarthy Gopi
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 
クラウドいろは勉強会
クラウドいろは勉強会クラウドいろは勉強会
クラウドいろは勉強会Daisuke Nakazawa
 
FPGA Based Design of High Performance Decimator using DALUT Algorithm
FPGA Based Design of High Performance Decimator using DALUT AlgorithmFPGA Based Design of High Performance Decimator using DALUT Algorithm
FPGA Based Design of High Performance Decimator using DALUT AlgorithmIDES Editor
 
Mastering Differentiated MDSD Requirements at Deutsche Boerse AG
Mastering Differentiated MDSD Requirements at Deutsche Boerse AGMastering Differentiated MDSD Requirements at Deutsche Boerse AG
Mastering Differentiated MDSD Requirements at Deutsche Boerse AGHeiko Behrens
 
Ubiquitous Resources Abstraction using a File System Interface on Sensor Nodes
Ubiquitous Resources Abstraction using a File System Interface on Sensor NodesUbiquitous Resources Abstraction using a File System Interface on Sensor Nodes
Ubiquitous Resources Abstraction using a File System Interface on Sensor NodesTill Riedel
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Yet another object system for R
Yet another object system for RYet another object system for R
Yet another object system for RHadley Wickham
 
Building an Intelligent Web: Theory & Practice
Building an Intelligent Web: Theory & PracticeBuilding an Intelligent Web: Theory & Practice
Building an Intelligent Web: Theory & PracticeR A Akerkar
 
Regents Physics Pacing Map
Regents Physics Pacing MapRegents Physics Pacing Map
Regents Physics Pacing Mapjsawyer3434
 
SVD and Lifting Wavelet Based Fragile Image Watermarking
SVD and Lifting Wavelet Based Fragile Image WatermarkingSVD and Lifting Wavelet Based Fragile Image Watermarking
SVD and Lifting Wavelet Based Fragile Image WatermarkingIDES Editor
 
Leveraging collaborativetaggingforwebitemdesign ajithajjarani
Leveraging collaborativetaggingforwebitemdesign ajithajjaraniLeveraging collaborativetaggingforwebitemdesign ajithajjarani
Leveraging collaborativetaggingforwebitemdesign ajithajjaraniAjith Ajjarani
 
Solving Unit Commitment Problem Using Chemo-tactic PSO–DE Optimization Algori...
Solving Unit Commitment Problem Using Chemo-tactic PSO–DE Optimization Algori...Solving Unit Commitment Problem Using Chemo-tactic PSO–DE Optimization Algori...
Solving Unit Commitment Problem Using Chemo-tactic PSO–DE Optimization Algori...IDES Editor
 
tdt4260
tdt4260tdt4260
tdt4260jonecx
 

Was ist angesagt? (20)

28 35
28 3528 35
28 35
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignment
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
 
Multilayer Slides
Multilayer  SlidesMultilayer  Slides
Multilayer Slides
 
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architectures
 
クラウドいろは勉強会
クラウドいろは勉強会クラウドいろは勉強会
クラウドいろは勉強会
 
FPGA Based Design of High Performance Decimator using DALUT Algorithm
FPGA Based Design of High Performance Decimator using DALUT AlgorithmFPGA Based Design of High Performance Decimator using DALUT Algorithm
FPGA Based Design of High Performance Decimator using DALUT Algorithm
 
Mastering Differentiated MDSD Requirements at Deutsche Boerse AG
Mastering Differentiated MDSD Requirements at Deutsche Boerse AGMastering Differentiated MDSD Requirements at Deutsche Boerse AG
Mastering Differentiated MDSD Requirements at Deutsche Boerse AG
 
Ubiquitous Resources Abstraction using a File System Interface on Sensor Nodes
Ubiquitous Resources Abstraction using a File System Interface on Sensor NodesUbiquitous Resources Abstraction using a File System Interface on Sensor Nodes
Ubiquitous Resources Abstraction using a File System Interface on Sensor Nodes
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Yet another object system for R
Yet another object system for RYet another object system for R
Yet another object system for R
 
IPT HSC Summary
IPT HSC SummaryIPT HSC Summary
IPT HSC Summary
 
Building an Intelligent Web: Theory & Practice
Building an Intelligent Web: Theory & PracticeBuilding an Intelligent Web: Theory & Practice
Building an Intelligent Web: Theory & Practice
 
Regents Physics Pacing Map
Regents Physics Pacing MapRegents Physics Pacing Map
Regents Physics Pacing Map
 
SVD and Lifting Wavelet Based Fragile Image Watermarking
SVD and Lifting Wavelet Based Fragile Image WatermarkingSVD and Lifting Wavelet Based Fragile Image Watermarking
SVD and Lifting Wavelet Based Fragile Image Watermarking
 
Leveraging collaborativetaggingforwebitemdesign ajithajjarani
Leveraging collaborativetaggingforwebitemdesign ajithajjaraniLeveraging collaborativetaggingforwebitemdesign ajithajjarani
Leveraging collaborativetaggingforwebitemdesign ajithajjarani
 
[ppt]
[ppt][ppt]
[ppt]
 
Solving Unit Commitment Problem Using Chemo-tactic PSO–DE Optimization Algori...
Solving Unit Commitment Problem Using Chemo-tactic PSO–DE Optimization Algori...Solving Unit Commitment Problem Using Chemo-tactic PSO–DE Optimization Algori...
Solving Unit Commitment Problem Using Chemo-tactic PSO–DE Optimization Algori...
 
tdt4260
tdt4260tdt4260
tdt4260
 

Andere mochten auch

Andere mochten auch (20)

kenapa bennese
kenapa bennesekenapa bennese
kenapa bennese
 
Penjelasan
PenjelasanPenjelasan
Penjelasan
 
Slide patungan usaha
Slide patungan usahaSlide patungan usaha
Slide patungan usaha
 
Jane austen
Jane austenJane austen
Jane austen
 
1
11
1
 
B03
B03B03
B03
 
Billi kaulitz
Billi kaulitzBilli kaulitz
Billi kaulitz
 
Velocity 2014 - From Hero to Zero
Velocity 2014 - From Hero to ZeroVelocity 2014 - From Hero to Zero
Velocity 2014 - From Hero to Zero
 
Apartheid system
Apartheid systemApartheid system
Apartheid system
 
Evaluation for preliminary task
Evaluation for preliminary taskEvaluation for preliminary task
Evaluation for preliminary task
 
La inteligencia
La inteligenciaLa inteligencia
La inteligencia
 
Pres4 blogger
Pres4 bloggerPres4 blogger
Pres4 blogger
 
Penjelasan
PenjelasanPenjelasan
Penjelasan
 
Religion diet
Religion dietReligion diet
Religion diet
 
Okkar mál í Fellahverfi - nóvember 2013
Okkar mál í Fellahverfi - nóvember 2013Okkar mál í Fellahverfi - nóvember 2013
Okkar mál í Fellahverfi - nóvember 2013
 
Clasificación del ambiente Marino
Clasificación del ambiente MarinoClasificación del ambiente Marino
Clasificación del ambiente Marino
 
40 Days of Prayer Guide
40 Days of Prayer Guide40 Days of Prayer Guide
40 Days of Prayer Guide
 
Tgr
TgrTgr
Tgr
 
Festa dell'Adesione di AC Crocetta
Festa dell'Adesione di AC CrocettaFesta dell'Adesione di AC Crocetta
Festa dell'Adesione di AC Crocetta
 
La inteligencia
La inteligenciaLa inteligencia
La inteligencia
 

Ähnlich wie libHPC: Software sustainability and reuse through metadata preservation

Ogce Workflow Suite Tg09
Ogce Workflow Suite Tg09Ogce Workflow Suite Tg09
Ogce Workflow Suite Tg09smarru
 
PA Develops an LTE PHY for Catapult
PA Develops an LTE PHY for CatapultPA Develops an LTE PHY for Catapult
PA Develops an LTE PHY for Catapultgrahambell
 
Reverse Engineering of Software Architecture
Reverse Engineering of Software ArchitectureReverse Engineering of Software Architecture
Reverse Engineering of Software ArchitectureDharmalingam Ganesan
 
Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical ISSGC Summer School
 
A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceA Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceTim Ellison
 
The Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform Composer
The Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform ComposerThe Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform Composer
The Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform ComposerMDC_UNICA
 
Session 49 Practical Semantic Sticky Note
Session 49 Practical Semantic Sticky NoteSession 49 Practical Semantic Sticky Note
Session 49 Practical Semantic Sticky NoteISSGC Summer School
 
Tajo: A Distributed Data Warehouse System for Hadoop
Tajo: A Distributed Data Warehouse System for HadoopTajo: A Distributed Data Warehouse System for Hadoop
Tajo: A Distributed Data Warehouse System for HadoopHyunsik Choi
 
Iris an architecture for cognitive radio networking testbeds
Iris   an architecture for cognitive radio networking testbedsIris   an architecture for cognitive radio networking testbeds
Iris an architecture for cognitive radio networking testbedsPatricia Oniga
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) ijceronline
 
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...MLconf
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyDaniel Bimschas
 
The BondMachine Toolkit, enabling machine learning on FPGA
The BondMachine Toolkit, enabling machine learning on FPGAThe BondMachine Toolkit, enabling machine learning on FPGA
The BondMachine Toolkit, enabling machine learning on FPGAMirko Mariotti
 

Ähnlich wie libHPC: Software sustainability and reuse through metadata preservation (20)

Ogce Workflow Suite Tg09
Ogce Workflow Suite Tg09Ogce Workflow Suite Tg09
Ogce Workflow Suite Tg09
 
PA Develops an LTE PHY for Catapult
PA Develops an LTE PHY for CatapultPA Develops an LTE PHY for Catapult
PA Develops an LTE PHY for Catapult
 
1
11
1
 
Reverse Engineering of Software Architecture
Reverse Engineering of Software ArchitectureReverse Engineering of Software Architecture
Reverse Engineering of Software Architecture
 
Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical
 
Streams on wires
Streams on wiresStreams on wires
Streams on wires
 
A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceA Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark Performance
 
43
4343
43
 
The Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform Composer
The Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform ComposerThe Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform Composer
The Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform Composer
 
Session 49 Practical Semantic Sticky Note
Session 49 Practical Semantic Sticky NoteSession 49 Practical Semantic Sticky Note
Session 49 Practical Semantic Sticky Note
 
Tma ph d_school_2011
Tma ph d_school_2011Tma ph d_school_2011
Tma ph d_school_2011
 
Tajo: A Distributed Data Warehouse System for Hadoop
Tajo: A Distributed Data Warehouse System for HadoopTajo: A Distributed Data Warehouse System for Hadoop
Tajo: A Distributed Data Warehouse System for Hadoop
 
Iris an architecture for cognitive radio networking testbeds
Iris   an architecture for cognitive radio networking testbedsIris   an architecture for cognitive radio networking testbeds
Iris an architecture for cognitive radio networking testbeds
 
Self-awareness at the Hardware/Software Interface - Marco Platzner
Self-awareness at the Hardware/Software Interface - Marco PlatznerSelf-awareness at the Hardware/Software Interface - Marco Platzner
Self-awareness at the Hardware/Software Interface - Marco Platzner
 
Bs25412419
Bs25412419Bs25412419
Bs25412419
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
 
The BondMachine Toolkit, enabling machine learning on FPGA
The BondMachine Toolkit, enabling machine learning on FPGAThe BondMachine Toolkit, enabling machine learning on FPGA
The BondMachine Toolkit, enabling machine learning on FPGA
 

Mehr von SoftwarePractice

Software Practice 12 breakout - Tracking usage and impact of software
Software Practice 12 breakout - Tracking usage and impact of softwareSoftware Practice 12 breakout - Tracking usage and impact of software
Software Practice 12 breakout - Tracking usage and impact of softwareSoftwarePractice
 
Software Practice 12 breakout - Life for Software Beyond Public Funding
Software Practice 12 breakout - Life for Software Beyond Public FundingSoftware Practice 12 breakout - Life for Software Beyond Public Funding
Software Practice 12 breakout - Life for Software Beyond Public FundingSoftwarePractice
 
Maintainable Software Practices for e-Science - Introduction to Workshop
Maintainable Software Practices for e-Science - Introduction to WorkshopMaintainable Software Practices for e-Science - Introduction to Workshop
Maintainable Software Practices for e-Science - Introduction to WorkshopSoftwarePractice
 
Sustainable Software for Computational Chemistry and Materials Modeling
Sustainable Software for Computational Chemistry and Materials ModelingSustainable Software for Computational Chemistry and Materials Modeling
Sustainable Software for Computational Chemistry and Materials ModelingSoftwarePractice
 
ScienceSoft: Open Software for Open Science
ScienceSoft: Open Software for Open ScienceScienceSoft: Open Software for Open Science
ScienceSoft: Open Software for Open ScienceSoftwarePractice
 
Adoption of Software By A User Community: The Montage Image Mosaic Engine Exa...
Adoption of Software By A User Community: The Montage Image Mosaic Engine Exa...Adoption of Software By A User Community: The Montage Image Mosaic Engine Exa...
Adoption of Software By A User Community: The Montage Image Mosaic Engine Exa...SoftwarePractice
 
The Relationship Between Development Problems and Use of Software Engineering...
The Relationship Between Development Problems and Use of Software Engineering...The Relationship Between Development Problems and Use of Software Engineering...
The Relationship Between Development Problems and Use of Software Engineering...SoftwarePractice
 

Mehr von SoftwarePractice (7)

Software Practice 12 breakout - Tracking usage and impact of software
Software Practice 12 breakout - Tracking usage and impact of softwareSoftware Practice 12 breakout - Tracking usage and impact of software
Software Practice 12 breakout - Tracking usage and impact of software
 
Software Practice 12 breakout - Life for Software Beyond Public Funding
Software Practice 12 breakout - Life for Software Beyond Public FundingSoftware Practice 12 breakout - Life for Software Beyond Public Funding
Software Practice 12 breakout - Life for Software Beyond Public Funding
 
Maintainable Software Practices for e-Science - Introduction to Workshop
Maintainable Software Practices for e-Science - Introduction to WorkshopMaintainable Software Practices for e-Science - Introduction to Workshop
Maintainable Software Practices for e-Science - Introduction to Workshop
 
Sustainable Software for Computational Chemistry and Materials Modeling
Sustainable Software for Computational Chemistry and Materials ModelingSustainable Software for Computational Chemistry and Materials Modeling
Sustainable Software for Computational Chemistry and Materials Modeling
 
ScienceSoft: Open Software for Open Science
ScienceSoft: Open Software for Open ScienceScienceSoft: Open Software for Open Science
ScienceSoft: Open Software for Open Science
 
Adoption of Software By A User Community: The Montage Image Mosaic Engine Exa...
Adoption of Software By A User Community: The Montage Image Mosaic Engine Exa...Adoption of Software By A User Community: The Montage Image Mosaic Engine Exa...
Adoption of Software By A User Community: The Montage Image Mosaic Engine Exa...
 
The Relationship Between Development Problems and Use of Software Engineering...
The Relationship Between Development Problems and Use of Software Engineering...The Relationship Between Development Problems and Use of Software Engineering...
The Relationship Between Development Problems and Use of Software Engineering...
 

libHPC: Software sustainability and reuse through metadata preservation

  • 1. libHPC: Software sustainability and reuse through metadata preservation Jeremy Cohen, John Darlington, Brian Fuchs London e-Science Centre / Department of Computing, Imperial College London David Moxey, Chris Cantwell, Pavel Burovskiy, Spencer Sherwin Department of Aeronautics, Imperial College London Neil Chue Hong Software Sustainability Institute, University of Edinburgh First Workshop on Maintainable Software Practices in e-Science, Chicago Tuesday 9th October 2012
  • 2. Introduction •  Decision making – building scientific software can be hard •  Abstraction – hide the complexity •  Efficiency – achieve the performance •  Aim for a universal technology that spans all application domains, machines, metrics ns M Num. tio •  Coordination forms – a different approach to task Cluster ac Intensive ica Cloud hi Data Intensive ne Multi-core pl s Bioinformatics GPU Ap specification CFD FPGA Cost Time Energy •  Components – encapsulated building blocks Metrics
  • 3. Information and decisions Why is software development and re-use hard? •  A particular piece of code is the result of many development decisions •  Developers invest significant knowledge about the task to be solved …however… •  Decisions made by developers cannot be reconstructed from the code •  Loss of original information and structure invested by developer(s)
  • 4. Information and decisions Understanding code structure and the options available and the decisions made during development is important: •  Portability; optimisation on different architectures •  Long-term sustainability Need an explicit representation of decisions and alternatives: •  Decision tree used to represent this (structure) •  Metadata used to annotate decision tree (information) •  Modifications can be made to decision tree (based on metadata analysis) which can than be mapped to modified code
  • 5. Information and decisions e.g. code that uses a solver: •  Many options to select suitable solver – abstract components •  Choice dependent on problem being addressed, parameters, etc. •  Represent solver choice on a tree of component alternatives, leaf nodes are concrete implementations higher-level nodes are abstract Matrix Linear Vector Vector Solver" Matrix Matrix Vector LU" Vector Jacobi" Vector Vector Parallel LU" Parallel LU" Sequential Parallel Jacobi Sequential LU" (OpenMP)" (MPI)" Jacobi" (UPC)"
  • 6. Abstractions a Encapsulation Encapsulate functions as components (reuse) Allow alternatives a Functional properties Referentially transparent a Encapsulation Church-Rosser a Alternative behaviours
  • 7. Abstractions – alternative behaviours i.e. Church-Rosser (4 + 3) – (2 + 1) 7 – (2 + 1) (4 + 3) – 3 7–3 4
  • 8. Application flow and specification We represent application elements using two techniques •  Data processing – core code that forms application building blocks a Components (first-order functions) •  Control flow, orchestration a High-order functions a Coordination Forms e.g. Pipe, Parallel, Map / Reduce, …
  • 9. Coordination Forms •  A functional/mathematical approach to job specification •  Based on work by Darlington, et al. J. Darlington, Y. Guo, H. W. To and J. Yang. Functional skeletons for parallel coordination. In proceedings of EURO-PAR ’95 Parallel Processing, LNCS 966/1995, p. 55-66, 1995. Springer Berlin/Heidelberg •  Applied to components – define application flow •  May be: •  General – applicable to most applications – e.g. PIPE, PAR •  Iterative patterns – e.g FARM, ITERATE •  Domain-specific higher-level forms – e.g. Monte Carlo •  Extensible – new patterns can be introduced
  • 10. Coordination Forms •  A given form may have multiple underlying implementations •  E.g. PAR may provide sequential, multi-threaded and MPI parallel implementations •  Forms aim to be as lightweight as possible •  They result in code that can be run •  They intelligently glue together component building blocks •  PIPE as an example – functions f1 to fn with initial input a: PIPE [ f1, f2,…fn ]a = (f1 ° f2 ° … fn)a = f1(f2 (… (fn(a))))
  • 11. Coordination Forms – Impementation •  Prototype implementation in Python •  Class wrappers for component and parameter metadata – concrete implementation code selectable PIPE – Compose a series of components in the order specified PIPE ([component list], initial input) Additional parameters can be added in component list PAR – Run a series of components independently (perhaps in parallel) PAR ([component list], [(input1), (input2), …, (inputn)]) E.g. for components add, multiply, divide: 2 * ( (245+34) / (6+8) ) PIPE([(multiply, 2), divide, PAR([add,add],[(245,34),(6,8)])])
  • 12. Bioinformatics: Genome Read Pre-Processing/Mapping Short Read Input files – Reference Set (Paired) Genome Reference Genome – FASTA file Single FASTQ FASTA file file Reads from sequencing machine - FASTQ bwa index FASTQ split ((sr1,sr2), u) = PAR([fastq_split, bwa_index], SR_1 SR_2 [(short_read_file, None, None),(ref_genome_file,)]) bwa aln bwa aln (v, w) = PAR([bwa_aln, bwa_aln], FASTA file + index file [(ref_genome_file, sr1, None), bwa sampe - generate alignment (paired ended) (ref_genome_file, sr2, None)]) SAM file samtools import result = PIPE([samtools_index, samtools_sort, BAM file (samtools_import, ref_genome_file), samtools sort bwa_sampe], sorted BAM file [ref_genome_file, [v,w], [sr1, sr2], None]) samtools index OUTPUT
  • 13. LibHPC Project •  LibHPC •  Two year project under EPSRC HPC Software Programme •  Imperial College London (Computing (LeSC), Aeronautics, ICT) •  SSI, Edinburgh •  Implementing/demonstrating framework with main supporting application (Nektar++) + other exemplars
  • 14. Example High-level Application Description / Job Specification (Co-ordination Forms, DSLs, etc.) Job Specification Analysis/Processing Optimising Software Component Library & Metadata Resource Discovery & Domain-specific FEM Codes Application Support Metadata Libraries Hardware Resources
  • 15. Nektar++ - Hybrid Assembly •  Nektar++ operates on matrices based on input mesh •  Each element of input mesh is mapped to an (elemental) matrix •  There are two matrix assembly strategies: •  Local •  Global
  • 16.               Nektar++ - Hybrid Assembly   =                                                                                 =       =                                                               Local Assembly      Global Assembly                              =                                                                       
  • 17.                 =                                       Nektar++ - Hybrid Assembly                                         =                                Hybrid Assembly