Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
HPC Web overview - Mobyle Workshop - September 28, 2012
1. Pasteur Institute – Mobyle Developers Workshop
28 September 2012
Jennifer Dommer, HPC Web Developer
Alex Levitsky, HPC Infrastructure Team Lead
NIAID OCICB Bioinformatics & Computational
Biosciences Branch (BCBB)
2. Outline
What is HPC Web?
Project Goals and Background (5 min.)
HPC Web Design (10 min.)
Use of the Mobyle Framework in HPC Web (10 min.)
HPC Web Video Demos (15 min.)
BMID and BMPS Overview
HPC Web Next Steps
Questions/Discussion (10 min.)
2
3. What is HPC Web?
Web application developed by National Institute of Allergy
and Infectious Diseases (NIAID) Bioinformatics and
Computational Biosciences Branch (BCBB)
HPC Web Team:
• Alex Levitsky, HPC Infrastructure Team Lead
• Vivek Gopalan, Former HPC Infrastructure Team Lead
• Jennifer Dommer, Software Developer
• Jie Li, Former Software Developer
• Ramandeep Kaur, Software Developer
• Karlynn Noble, Designer/Communications
• Darrell Hurt, Mariam Quinones, Andrew Oler, Vijay
Nagarajan, Xavier Ambroggio, Kurt Wollenberg, Mike
Dolan, Burke Squires, Maarten Leerkes, Subject Matter
Experts
• Nick Weber, Project Manager
• Tram Huyen, Project Sponsor
3
4. What is HPC Web?
Web interface to NIAID High Performance Computing
(HPC) cluster
Leverages Mobyle framework for job submission, data
management, and pipeline creation
4
6. Project Goals
Democratize access to high performance computing
resources
• Allow non-command-line-savvy bench researchers
to access sophisticated computational tools and
infrastructure for their high-throughput research data
Provide capabilities to:
• Engage an interactive user community
• Access, manage, and share HPC files through an
intuitive web interface
• Run, track progress, and re-run jobs using simple
web forms and interfaces
• Create simple, automated analysis pipelines
6
7. Project Background
2010
• NIAID HPC infrastructure established
– Small cluster of ~5 nodes, 30 cores
• Late 2010 HPC Web v1 released
– Static content about how to use HPC resources, which
applications were installed, and how to use them
– Frameworks established, including integration of Mobyle
– Simple functionality for requesting accounts and support,
viewing cluster status, engaging with community, etc.
– Integrated with custom UCSC Track Manager application
2011
• HPC Web phase II development began
– Cluster had grown from 5 to nearly 40 nodes, from 30 to nearly
400 cores
– Project scope to include job submission, data mangement, and
pipeline creation from web
7
8. Project Background (continued)
2012
• Cluster continuing to grow (now ~50 nodes, 600+ cores,
GPU- and Infiniband-enabled)
• Approximately 750 TB data, with plans in place to
expand data storage and implement hierarchical storage
management / archiving mechanisms to support future
growth
• HPC Web Phase II released in May 2012
– ~20 applications with Mobyle interfaces, for a total of ~60 forms
for job submission (including sub-packages for applications,
e.g., tools within SAMtools suite)
– Limited number of standardized workflow templates
E.g., RNA-seq-single-sample-mapping, which maps RNA-
seq reads to a reference genome using TopHat, then passes
the alignment file to 1) Cufflinks to assemble transcripts and
quantify the expression and to 2) SAMtools to index the
alignment file)
8
9. HPC Web Server
Authorization Storage
Apache user module /Shared folder
(hpcwebadm) Apache user /group folder
Apache user
/application folder
Mobyle library
Apache user Apache user
DRMAA library
Apache user
SGE submit SGE Compute
host Apache user nodes
HPC Web job submission implementation schema using Mobyle
10. HPC Web Mobyle Job Management Interface
Let‘s focus on the job bl2seq.T11045404625893
11. Mobyle job results page for bl2seq.T11045404625893
BLAST result obtained from server
12. SGE account details job
bl2seq.T11045404625893
Job runs using SGE
DRMAA library is used
for job submission from
Mobyle
Job runs as apache user
We could show any of
these parameters in the
HPC Web interface
• Start time
• Queue time
• End time
• Cpu time
qacct command for the job
13. HPC Web Video Demos
Navigating the HPC Web interface:
• http://www.youtube.com/watch?feature=player_emb
edded&v=cxxALr5PGlY
Using My File Manager in HPC Web
• http://www.youtube.com/watch?feature=player_emb
edded&v=9K8h2l28S2Y
Submitting jobs to Cluster from HPC Web
• http://www.youtube.com/watch?feature=player_emb
edded&v=9K8h2l28S2Y
13
14. BCBB Mobyle Interface Designer (BMID)
A web based GUI for creating Mobyle XML using
drag-and-drop options and wizards
Eliminates the need to manually generate XML,
aiming to facilitate community creation of interfaces
and minimize development “bottlenecks”
14
16. BCBB Mobyle Pipeline System (BMPS)
Leverages Mobyle framework to string applications
together such that the output of one process becomes
the input of the next
Simplifies analysis by automating standard set of
procedures that may have previously required manual
processing
Enables sharing of useful/novel pipelines among
users
Facilitates QC analysis by making it easy to iteratively
tweak one or a few parameters of an application
within a saved pipeline and validate results
16
17. Example BMPS Template
Other BMPS template examples
available in HPC Web:
• ChIP-seq-with-control
• Map-reads-and-index
• Fastq-quality-boxplot
17
18. Next Steps in HPC Web Development
Continued development of web forms, especially for
NGS and structural biology applications
BMID interface enhancements
BMPS/Pipeline system enhancements, including
additional templates
Integration with Mobyle2 framework
18
19. Feature Request Considerations
Workflow template sharing between HPC users
Data sharing with non-HPC account holders, including
those outside NIH
Ability for users to create their own application
interfaces using BCBB Mobyle Interface Designer
(BMID), and share interfaces with others
19