The document summarizes several projects undertaken by the HPC Lab, including developing software and algorithms for graph analysis on emerging platforms (CASS-MT), genome assembly (GALAXY), and RNA structure prediction (GTFold). It also mentions projects involving graph benchmarks (Graph500), dynamic graph packages for Intel platforms (STING), and phylogenetics research on the IBM Blue Waters supercomputer (PetaApps).
Scanning the Internet for External Cloud Exposures via SSL Certs
HPC lab projects
1. HPC Lab
David A. Bader, E. Jason Riedy, Henning
Meyerhenke, (horde of students...)
2. HPC Lab Projects
• UHPC (DARPA)
– Echelon: Extreme-scale Compute Hierarchies with Efficient Locality-
Optimized Nodes
– CHASM: Challenge Applications and Scalable Metrics (CHASM) for
Ubiquitous High Performance Computing
• GTFOLD (NIH): Combinatorial and Computational Methods for the
Analysis, Prediction, and Design of Viral RNA Structures
• PETA-APPS (NSF): Petascale Simulation for Understanding Whole-
Genome Evolution
• Graph500 (Sandia): Establish benchmarks for high-performance data-
intensive computations on parallel, shared-memory platforms
• STING (Intel): An open-source dynamic graph package for Intel platforms
• CASS-MT (DoD): Graph Analytics for Streaming Data on Emerging
Platforms
• GALAXY (NIH, PI Dr. J. Taylor, Emory): Dynamically
Scaling Parallel Execution for Cloud-based Bioinformatics
2
3. HPC Lab Projects
And yet more...
• Burton (NSF): Develop software and algorithmic
infrastructure for massively multithreaded
architectures.
• Dynamic Graph Data Structures in X10 (IBM):
Develop and evaluate graph data structures in X10
• I/UCRC Center for Hybrid and Multicore
Productivity Research, CHMPR (NSF)
3
4. Ubiquitous High Performance
Computing (DARPA): Echelon
Overall goal: develop highly parallel, security enabled, power
efficient processing systems, supporting ease of programming, with
resilient execution through all failure modes and intrusion attacks
Architectural Drivers:
Energy Efficient
Security and Dependability
Programmability
Program Objectives:
One PFLOPS, single cabinet including self-contained cooling
50 GFLOPS/W (equivalent to 20 pJ/FLOP)
Total cabinet power budget 57KW, includes processing
resources, storage and cooling
Security embedded at all system levels
Parallel, efficient execution models
Highly programmable parallel systems
Scalable systems – from terascale to petascale David A. Bader (CSE)
Echelon Leadership Team
“NVIDIA-Led Team Receives $25 Million Contract From DARPA to Develop High-Performance GPU Computing Systems” -MarketWatch
Echelon: Extreme-scale Compute Hierarchies
with Efficient Locality-Optimized Nodes
4
5. Ubiquitous High Performance
Computing (DARPA): CHASM
Overall goal: develop highly parallel, security enabled, power
efficient processing systems, supporting ease of programming, with
resilient execution through all failure modes and intrusion attacks
Architectural Drivers:
New architectures require new benchmarks
Evaluating usability requires applications
Existing metrics do not encompass alll UHPC goals
Program Objectives:
Develop applications, benchmarks, and metrics
Drive UHPC development
Support performance analysis of UHPC systems
Dan Campbell, GTRI, co-PI
CHASM: Challenge Applications and Scalable
Metrics for Ubiquitous High Performance
Computing
5
6. GTFold (NIH):
RNA Secondary Structure Prediction
Program Goals
Accurate structure of large
viruses such as:
FACULTY
•Influenza
Christine Heitsch (Mathematics)
•HIV
•Polio
David A. Bader
•Tobacco Mosaic
Steve Harvey (Biology)
•Hanta
6
7. PetaApps (NSF):
Phylogenetics Research on IBM Blue Waters
As part of the IBM PERCS team, we designed the IBM
Blue Waters supercomputer that will sustain petascale
performance on our applications, under the DARPA
High Productivity Computing Systems program.
• GRAPPA: Genome Rearrangements Analysis under Parsimony and other
Phylogenetic Algorithm
• Freely-available, open-source, GNU GPL
• already used by other computational phylogeny groups, Caprara,
Pevzner, LANL, FBI, Smithsonian Institute, Aventis, GlaxoSmithKline,
PharmCos.
• Gene-order Phylogeny Reconstruction
• Breakpoint Median
• Inversion Median
• over one-billion fold speedup from previous codes
• Parallelism scales linearly with the number of processors
FACULTY
David A. Bader, CSE
www.phylo.org
7
8. Graph500 (SNL):
Exploration of shared-memory graph benchmarks
• Establish benchmarks for
high-performance data-
intensive computations on
parallel, shared-memory Image Source: Nexus (Facebook application)
platforms.
• NOT LINPACK! 5 8
1
Image Source: Giot et al., “A Protein
Interaction Map of Drosophila
• Spec, reference
melanogaster”,
Science 302, 1722-1736, 2003
7 3 4 6 9
implementations at
http://graph500.org 2
Problem Size
• Ranking debuted at SC10 Class
• Press: IEEE Spectrum, Toy (10) 17 GiB
Computerworld, HPCWire, MIT Mini (11) 140 GiB
Tech. Review, EE Times, Small (12) 1.1 TiB
slashdot, etc...
Medium (13) 18 TiB
Large (14) 140 TiB
Huge (15) 1.1 PiB
8
10. CASS-MT:
Center for Adaptive Supercomputing Software
• DoD-sponsored, launched July 2008
• Pacific-Northwest Lab
– Georgia Tech, Sandia, WA State, Delaware
• The newest breed of supercomputers have hardware set up not just for
speed, but also to better tackle large networks of seemingly random data.
And now, a multi-institutional group of researchers has been awarded more
than $12M to develop software for these supercomputers. Applications
include anywhere complex webs of information can be found: from internet
security and power grid stability to complex biological networks.
10
12. GALAXY (NIH, PI Dr. J. Taylor, Emory):
Dynamically Scaling Parallel Execution for Cloud-based Bioinformatics
Parallel Genome Sequence Assembly
Next Generation Sequencing experiments produce a
large amount of small base pair strings (reads)
Task: Assemble (concatenate) reads appropriately
into larger substrings (contigs)
Two main assembly approaches, both graph-based
(de Bruijn vs. overlap/string graph)
Objectives: Improve running time and ultimately
also assembly accuracy Assembly
Approach:
Use overlap/string graph for higher accuracy
Parallelism to reduce running time
Compression to reduce memory consumption
12
13. Pasqual:
New memory-efficient, parallel fast sequence assembler
Experimental Results
Memory Usage and Running Time
●
Pasqual: Our parallel
(shared memory, OpenMP)
sequence assembler
● Run on commodity server
(8 cores, 16 hyperthreads)
● Memory usage reduced to
ca. 50% for large data sets
● Running time compared to
sequential assemblers:
24 to 325 times faster!
● Biologists can assembler
larger data sets faster
13