1. Jenkins + CVMFS :
Distributed Development,
Centralised Delivery
Bruce Becker | bbecker@csir.co.za
Coordinator: SAGrid
SANREN, Meraka Institute, CSIR
Stefanus Riekert | RiekertSJPK@ufs.ac.za
HPC Application Engineer
University of the Free State
2. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Outline
● What users want
● SAGrid VO – a catch-all VO with many applications
● Problem statements:
● Problem 1: ”the usual problem” – maintaining
applications in a distributed computing environment
● Problem 2: ”Another usual problem” - maintaining a
complex application inventory
● General solution : CVMFS + Jenkins
● Some specifics of SAGrid CI platform
● Outlook
3. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
SAGrid as a catch-all VO
● The South African National Grid operates a
catch-all VO which all South African researchers
can use to access computing and data
resources.
● SAGrid VO is not a domain-specific VO, so
● several widely-varying uses for the applications
supported by this VO
● Applications requested by users or communities
themselves
4. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
What users want
Amazing infrastructure
Some users want highly
varied, modular
application selection
Vertically integrated
Highly specialised
applications
Highly trained supportHighly trained support
5. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
What users get sometimes
6. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
The problem (1) -
”the usual problem”
● Software distribution was done mostly by hand”:
● Someone from the ops team develops script to install the application
● Apps installed via job submission
● Tags applied via script or by the job itself
● Issues:
● Major overhead of work
● Inconsistent installation procedures between applications and sites
● Bottleneck in porting applications (has to be done by someone in the
VO)
● Duplication of effort, especially in dependencies of applications
● Difficult to manage application lifecycles
7. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
The problem (2) -
what about the community ?
● Managing the inventory in a catch-all VO can be complex
when there are many applications
● Prioritising porting requests depends on the knowledge
of the export porting the application
●
Can lead to major delays in porting and deploying applications
● However, a user or community usually has an expert who
knows how to tune, port and configure the application
properly, as well as dependencies
●
Usually, ”they” have to conform to ”us” - learn grid tools and
terminology, etc
8. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Problem (3) :
Changes to the playing field
● New middleware stacks
● New architectures – GPGPU, ARM
9. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Questions to answer
● How do we lower the barrier to entry to the grid or
cloud infrastructure ?
● How can the application expert prove to the resource
provider that the application will actually run on the
execution environment of the site ?
● How can we manage the lifecycle of applications
across multiple versions, architectures, configurations ?
● How can we ensure that once applications are
”certified”, they are actually available on as many sites
as possible ?
10. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
General Solution:
Jenkins + CVMFS
● The issues outlined are ”typical” in a large
software project
● Usually solved by judicious use of Continuous
Integration system
● Once applications have been ”ported”, put them
into a trusted repository
● Previously – built RPMs, but required site-
admin intervention
● One-time configuration with CVMFS
11. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
First, some changes
● Distribute the effort, centralise the tools
● Move repository from ”closed” SVN repo
– https://ops.sagrid.ac.za/trac/svn/repo
● to git
– https://github.com/SAGridOps/SoftwareInstallation
● Don't have to give write access to a single repo, instead
accept pull requests
● Take advantage of all the Github infrastructure
● Expand possible contributors to those ”outside” the
infrastructure
● Recognise individuals' contribution
15. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Let the robots do the work
● Define what we want to deploy – let the experts
take care of how to deploy
● DevOps paradigm – same review/tag/release
mechanisms on operations code as we have for
scientific applications
● Teach a marketable skill
● Allow specialisation
● Enable remote management of complex services
● Ensure that published methodology is adopted
methodology
16. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Quality Control and feedback
● Ensure that
requested
applications are
included in the repo
● Provide testing and
QA infrastructure
● Self-serve to users
17. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
The CI environment
● Jenkins is extremely flexible... can do almost anything
● AuthN/AuthZ
● Currently using Github Oauth
● Take advantage of future Identity Federation
● We wanted to simulate different execution
environments
● Already in production
● Planned for future
● Track and re-use depedendencies
18. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Matrix-based builds
● Independent different builds and build statuses for
different configurations:
● Application name
● Version
● OS
● Architecture
● … can add specific tuning configurations...
● We can see exactly what's broken where – build
more resilient integration code.
19. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Typical workflow
Testingmatrix
Defines relevant
tests in Jenkins
Writes code to
pass required tests
Dev/Stage env.Application
developer
Infrastructure
expert
Reads description
of execution environment tests
Promote a build
to CVMFS
20. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Dependency management
simple case
● Common problem with applications :
need a specific version of a
compiler
● Compiling the compiler can itself be
tricky...
● Jenkins tests the full dependency
chain necessary
24. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Generic build script
# GADGET requires HDF5 FFTW2 ZLIB and
openmpi
module add ci
module add fftw/2.1.5
module add hdf5
module add openmpi
module add gsl
# GADGET requires HDF5 FFTW2 ZLIB and
openmpi
module add ci
module add fftw/2.1.5
module add hdf5
module add openmpi
module add gsl
rm rf $FFTW_DIR
tar xvfz /repo/$SITE/$OS/$ARCH/fftw/$FFTW_VERSION/build.tar.gz C /
rm rf $HDF5_DIR
tar xvfz /repo/$SITE/$OS/$ARCH/hdf5/$HDF5_VERSION/build.tar.gz C /
rm rf $OPENMPI_DIR
tar xvfz /repo/$SITE/$OS/$ARCH/openmpi/$OPENMPI_VERSION/build.tar.gz C /
rm rf $GSL_DIR
tar xvfz /repo/$SITE/$OS/$ARCH/gsl/$GSL_VERSION/build.tar.gz C /
rm rf $FFTW_DIR
tar xvfz /repo/$SITE/$OS/$ARCH/fftw/$FFTW_VERSION/build.tar.gz C /
rm rf $HDF5_DIR
tar xvfz /repo/$SITE/$OS/$ARCH/hdf5/$HDF5_VERSION/build.tar.gz C /
rm rf $OPENMPI_DIR
tar xvfz /repo/$SITE/$OS/$ARCH/openmpi/$OPENMPI_VERSION/build.tar.gz C /
rm rf $GSL_DIR
tar xvfz /repo/$SITE/$OS/$ARCH/gsl/$GSL_VERSION/build.tar.gz C /
Set up the
environment
Clean build, retrieve
dependency artifacts
25. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Generic build script
make install DESTDIR=$WORKSPACE/build
mkdir p $REPO_DIR
rm rf $REPO_DIR/*
tar cvzf $REPO_DIR/build.tar.gz C
$WORKSPACE/build apprepo
make install DESTDIR=$WORKSPACE/build
mkdir p $REPO_DIR
rm rf $REPO_DIR/*
tar cvzf $REPO_DIR/build.tar.gz C
$WORKSPACE/build apprepo
Actually build...
Create the artifact
cat <<MODULE_FILE
#%Module1.0
## $NAME modulefile
##
proc ModulesHelp { } {
puts stderr " This module does nothing but alert the user"
puts stderr " that the [moduleinfo name] module is not available"
}
preqreq("gsl","fftw/2.1.5","hdf5")
modulewhatis "$NAME $VERSION."
setenv GSL_VERSION $VERSION
setenv GSL_DIR /apprepo/$::env(SITE)/$::env(OS)/$::env(ARCH)/$NAME/$VERSION
prependpath LD_LIBRARY_PATH $::env(GSL_DIR)/lib
MODULE_FILE
) > modules/$VERSION
cat <<MODULE_FILE
#%Module1.0
## $NAME modulefile
##
proc ModulesHelp { } {
puts stderr " This module does nothing but alert the user"
puts stderr " that the [moduleinfo name] module is not available"
}
preqreq("gsl","fftw/2.1.5","hdf5")
modulewhatis "$NAME $VERSION."
setenv GSL_VERSION $VERSION
setenv GSL_DIR /apprepo/$::env(SITE)/$::env(OS)/$::env(ARCH)/$NAME/$VERSION
prependpath LD_LIBRARY_PATH $::env(GSL_DIR)/lib
MODULE_FILE
) > modules/$VERSION
Create the modulefile
26. Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
So, it works ! … almost
Next steps
● We have an open, collaborative, low-barrier platform for researchers
to bring applications to the grid
● Small technical tasks :
● Implement promoted builds mechanism to populate sagrid.ac.za CVMFS repo
● Implement SAML AuthN, integrate IdF
● Probes to check that CVMFS is mounted on sites (?)
● Operating in ”stealth mode” at the moment – not advertising, but open
to anyone who is interested to collect feedback
● Addressing specific user communities to test drive the system:
● Machine learning astro applications (rapid prototyping)
● Bioinformatics application suites (complex ecosystem)
● Present next phase of the project in November in Cape Town – move
to production