1. Open Gateway Computing
Environments: Software for
Science Gateways
Marlon Pierce, Suresh Marru, Raminder
Singh, Gerald Guo, ArchitKulshrestha, Ye
Fan, PatanachaiTangchaisin, and
collaborators.
2. What Is a Science Gateway?
• User Interface and supporting Web services to
scientific applications, data sets, and resources running
on cyberinfrastructure.
– Science portals, Grid Computing Environments, …
– Broaden and simplify usage
• Cyberinfrastructure: Distributed computing resources
and overlaying middleware for scientific computing.
– Prominent examples include TeraGrid, Open Science Grid
– Middleware includes Globus, Condor, iRods/SRB, …
– Some of these approaches being pushed by scientific cloud
computing
– That is another topic
3. TeraGrid is one of the largest investments in shared CI
from NSF’s Office of Cyberinfrastructure
4. Cyberinfrastructure Layers
Web/Gadg Web Enabled Gateway
User Web/Gadge
et Desktop Abstraction
t Interfaces
Interfaces Container Applications Interfaces
Application Fault User Information
Monitoring
Abstractions Tolerance Management Services
Gateway
Software Workflow
Provenance
Auditing & Registry & Metadata
Security
System Reporting Management
Resource SSH & Resource
Middleware Cloud Interfaces Grid Middleware
Managers
Compute Computational Computational
Local Resources
Resources Clouds Grids
Color Coding OGCE Gateway Components
Complimentary Gateway Components
Dependent resource provider components
5. Open Gateway Computing
Environments
• The OGCE team develops software for building
secure, Web-based Science Gateways
– Chemistry, Bioinformatics, Biophysics,
Environmental Sciences
• OGCE is funded by the National Science
Foundation’s Software Development for
Cyberinfrastructure (SDCI) program.
7. OGCE Software
Name Description
OGCE Gadget AnOpenSocial and Google gadget-compatible Web
Container container for running Web gadgets.
GFAC A Web service for generating, securely invoking,
and managing the lifecycle of scientific
applications on Grids and Clouds
Workflow Tools Composer (XBaya), enactment (“interpreter”)
engines, event system, and service registry to
support scientific workflows on Grids and Clouds.
Gadgets and Tools for building secure Google-gadget based
Gadget Building Science Gateways.
Tools
9. OGCE Components in Action
FeaturedGatewa OGCE Components Used
y
UltraScan GFAC scientific application management service
GridChem,Param XBaya workflow composer, OGCE Messenger
Chem Service, XRegistry
SimpleGrid OGCE Gadget Container (in development)
Purdue CCSM Gadget Container and gadget building libraries
Portal (in development)
BioVLAB GFAC, XBaya,XRegistry, Workflow Interpreter
Service
10. Software Strategy
• We develop downloadable, packaged, open source
software
• SourceForge
• Focus: a) gadget container and b) tools for running
science applications and workflows on grids and
clouds.
• Provide a tool set that can be used in whole or in part.
– If you just want GFac, then you can use it without buying
an entire framework.
• Out of our scope: visualization, security, information
services, data and metadata provenance and
management.
– MyProxy, TG IIS, Globus, Condor, XMC Cat, iRods, etc.
11. Apache Incubators
• Joining Apache is key to our software sustainability
strategy
– Open source licensing, meritocracy, visibility
• Vigyan: tools for science gateway services and
workflows
– XBaya, GFAC, Messenger, XRegistry
– Collaboration with WS02/LSF, IBM
– Builds on Apache Axis2, Apache ODE
• Rave: OpenSocial gadget manager, general purpose
gadgets
– Collaboration with Hippo, Mitre, SURFnet
– Builds on Apache Shindig
12. The OGCE Gadget Container
Managing layouts, look and feel, and
behind-the-scenes services for
aggregated Web gadgets
13. The OGCE Gadget Container allows you to build portals
out of public and private Google Open Social gadgets.
Supports HTTPS. Downloadable, packaged software.
14. The OGCE Application Registry gadget allows users to
interactively register hosts and applications that are
dynamically wrapped as Web services.
16. Mobile Support
Gadget Container is
built with
HTML, JavaScript
and CSS. Works in
both iPhone and
Android native
browsers with out
modification.
Developing layout
managers better
suited to limited
screen real estate.
17. Feature Groups Features
Look and Feel Tabbed and Tree layout managers, 2 and 3 column layouts,
default maximized views of gadgets, customizable color
styling.
Security Supports end-to-end SSL between browser, container, and
gadgets; OpenIDauthentation; OAuth-secured gadgets;
MyProxy logins; limited Grid credential sharing between
gadgets; CILogon for InCommon login
Inter-Gadget Supports OpenAjax publish-subscribe style messaging
Communication between gadgets. PMRPC JavaScript messaging support in
development
REST Service API Layouts, logins, sign-ups, user administration, user
identification, and Grid credentials all accessible via REST
service calls as well as the user interface.
Open Source Social All code is open source and builds on Apache Shindig 2.0.
Networking
Gadget Support for GWT-based gadgets and YUI JavaScript libraries in
Development development.
21. BioVLABApplication Deployment
Procedure
User
• Develop a command line app.
Gfac Registration form
• Install the app. in Amazon EC2
Admin
• Let the app. store any output to Amazon S3
• Make a virtual machine image
• Register the app. by using Gfac
• Instantiate EC2 and run the app. by using
User
XBaya
22. BioVLAB-Microarray
• Analysis of high
throughput microarray
experiment
• Multiple tasks in a single
batch
• Output of a task can
plugged into another task
• Repeat the same set of
tasks with small changes of
parameters
26. UltraScan2 High Level Overview
User
Web Server
MySQL DB
US LIMS GridControl TeraGrid
UTHSCSA Jacinto TIGRE/Globus High Performance
Terascale storage Network Computing Clusters
28. UltraScan Collaboration
• Immediate Goals: Use GFAC as a
replacement job submission
service.
– GRAM 2, 4, 5 independence
– Significant effort into GRAM5
testing on Ranger.
• Longer term goals
– Integrate with TG information
services to provide better job
scheduling.
• OGCE Resource Prediction Service
– Support UNICORE job
Current Architecture
management.
29. UltraScanproblems Solution provided by OGCE
Gateway code can only submit to resources GFAC supports different provider like
with GRAM4 installed and running. GRAM2/4/5, Condor, Local, Remote using SSH
keys. There is a generic GUI interface to
configure them all.
Adding new resource is time consuming User need to fill two web form to configure
new resource.
Local cluster needed to install GRAM4. We can directly invoke mpirun on local or
remote cluster using local/remote providers.
TACC resources like Lonestar and Ranger Its was easy to start using GRAM5 in GFAC but
decided not to install GRAM4 and move to time consuming to GRAM5 to runoperationally
GRAM5. on these resources.
Problem related to job failure and missing Retry mechanism for certain GRAM error
status. codes but still trying to find how to deal with
missing status or reconnect to those jobs as
Globus api does not support that.
Restart of jobs were not provided in Gateway Added restartjob support from checkpoint
even application supports check pointing. files.
Ultrascan3 need to rewrite all these Provided REST interface to OGCE services and
component again as it using different now different language clients can call same
technology. interfaces for required operations.
30. GFac Current & Future Features
Globus
Input Registry Scheduling Monitoring
Handlers Interface Interface Interface Campus
Resources
Apache Axis2
Output Fault Data Management Amazon
Handlers Tolerance Abstraction Eucalyptus
Auditing Checkpoint Job Management Unicore
Support Abstraction
Condor
Color Coding Existing Features
Planned/Requested Features
31. Gram5 Testing
• Developed Testing harness to run different
cases.
• Started with small number of jobs and
increased the concurrency later
• Watched job behavior of the job on resource
and monitored the gram log
– There were lot of issue which we found from
the logs and working with Globus team to fix
them
• Recorded all the job run data to create a google
gadget to create graph for different runs on
different resources.
32. TG Resources and patterns
Version Resource Endpoint
GT 5.0.2 QueenBee queenbee.loni-lsu.teragrid.org:2120/jobmanager-pbs
GT 5.0.2 Ranger login5.ranger.tacc.teragrid.org:2120/jobmanager-sge
gatekeeper.lonestar.tacc.teragrid.org:2120/jobmanager-
GT 5.0.2 Lonestar
lsf
Patterns:
Concurrent jobs Batch Size Total jobs Job Status Pass : Fail
1 10 10 10:0
3 10 30 30:0
5 10 50 50:0
10 10 100 20:0
20 10 200 40:0
50 10 500 100:0
100 10 1000 200:0
200 5 1000 Not tested (Need allocation)
Not tested (Need allocation)
500 2 1000
33. GFAC Integration
• UltraScan job submission previously relied on GRAM4
GFAC integrated as middleware to abstract
submission process
GRAM5, UNICORE and any future mechanism
• Science Gateway is in active use
Initial testing done on IU quarry node
Extensively tested job submission process using
GFAC to LONI'sQueenBee and TACC's Ranger
Deployed 26 October 2010
Implementation details available
http://wiki.bcf.uthscsa.edu/cauma/wiki/US2GFACTe
sting
35. GridChem Science Gateway
• A chemistry/material Science Gateway for running
computational chemistry codes, workflows, and parameter
sweeps.
• Integrates molecular science applications and tools for
community use.
• 400+ users heavily using TeraGrid. One of the consistent top5
TeraGrid Gateway users.
• Supports all popular Chemistry applications including
Gaussian, GAMESS, NWChem, QMCPack, Amber and MolPro,
CHARMM
• ParamChemis a follow-on project to develop workflows for
chemical parameter studies and provide the infrastructure to
execute them.
36. Empirical ForceFields Parameterization
Need Process
Lack of Accurate Force Fields Produce
Erroneous Property Estimation
Fig. 1. Errors (V) in electrostatic potential on a surface at 1.8 times vander Waals radii around N-methyl
propanamide for two models. (Left) Point charges; (right) charge, dipole, and quadrupole on C, N, and O; charge and
dipole on H. The errors are much reduced in themultipole approach
A. J. Stone Science 321, 787-789 (2008)
Published by AAAS
Vanommeslaeghe et al.
J. Comp.Chem 2010, 31,
671-690
38. Conclusions
• Our project focus is providing long-term
sustainable software for science gateways.
• What we learned:
– Try to serve a few high profile collaborators very well.
• Derive good software engineering practices from this:
versioning, code reviews, testing , packaging, portability, …
– Define and keep to your project’s scope.
– Let the collaborations determine the direction of
innovation
• This is more than just getting “customer requirements”.
Collaborators expect you to know your field and guide them.
• There is a tension between this and research
– “Collaborators, not customers” is the resolution.
39. More Information
• OGCE Web Site: http://www.collab-ogce.org
• News Feed/Blog: http://collab-ogce.blogspot.com
• Contact us:
– ogce-discuss@googlegroups.com
– http://groups.google.com/group/ogce-discuss/
• Software Downloads: Software is available as
tagged SVN releases from our SourceForge project.
– http://sourceforge.net/projects/ogce/
– See http://www.collab-
ogce.org/ogce/index.php/Portal_download
41. OGCE Partners and People
Institution People
Indiana Marlon Pierce, Suresh Marru, Raminder
University Singh, ArchitKulshrestha, Gerald Guo
NCSA/UIUC SudhakarPamidighantam, Shaowen
Wang, Yan Liu
Purdue Carol Song, Lan Zhao, David Braun,
University Shawn Wu
UTHSCSA Emre Brookes, BorriesDemeler, Bruce
Dubbs
42. Award Highlights
• Full Circle Development
– Directly fund both software developers and gateway
consumers.
• Directly supported (non-IU) gateways:
– UltraScan (UTHSCSA), GridChem (NCSA), SimpleGrid
(UIUC), Purdue CCSM and Environmental Gateways
– Among the most used TG gateways.
• Sustainability strategy: Apache Incubator for
workflow suite of tools
– XBaya, GFac, and supporting services.
43. SimpleGrid, GISolve
• Short term goal: develop SimpleGrid Gadgets
deployable into gadget container.
– Must meet security requirements
– Support PHP development
– Support interactivity requirements
• Integrate YUI JavaScript libraries with Gadget
JavaScript.
• Longer term goals: investigate workflow, job
management tools. Apply to GISolve
44. Purdue CCSM and Data Portals
• Short terms goals: Develop CCSM and data
management gadgets and necessary backing
middleware.
– Interactivity and security requirements.
– Significant requirements overlap with SimpleGrid
• Longer term goals: Build gateways out of
gadgets hosted by multiple containers;
examine workflow and other tools.
45. Open Gateway Computing
Environments
• The OGCE team develops software for building
secure, Web-based Science Gateways
– Chemistry, Bioinformatics, Biophysics,
Environmental Sciences
• OGCE is funded by the National Science
Foundation’s Software Development for
Cyberinfrastructure (SDCI) program.
46. More Information
• OGCE Web Site: http://www.collab-ogce.org
• News Feed/Blog: http://collab-ogce.blogspot.com
• Contact us:
– ogce-discuss@googlegroups.com
– http://groups.google.com/group/ogce-discuss/
• Software Downloads: Software is available as
tagged SVN releases from our SourceForge
project.
– http://sourceforge.net/projects/ogce/
– See http://www.collab-
ogce.org/ogce/index.php/Portal_download
47. The OGCE Gadget Container
Managing layouts, look and feel, and
behind-the-scenes services for
aggregated Web gadgets
48. • MicroRNAs (miRNAs)
• small (19-22 nucleotide) non-
protein-coding RNA molecules
• regulate the expression of specific
gene products
• effect translational blockade or
message degradation
• MMIA: microRNA and mRNA
integrated analysis
BioVLAB-MMIA
• Computation in the Cloud
• MMIA expertise in workflow
49. BioVLAB-Microarray
• Analysis of high
throughput microarray
experiment
• Multiple tasks in a single
batch
• Output of a task can
plugged into another task
• Repeat the same set of
tasks with small changes of
parameters
Bac
54. BioVLAB Summary
• Usability (Reconfigurable environments)
– As an adoption of the SaaS model of Cloud Computing for BioVLAB, end-users only need
to launch the pre-composed BioVLAB workflows.
With XBaya, users can easily customize it by modifying just a few components and input
parameters.
• Flexibility (Full privileges)
– As a way of the IaaS model, BioVLAB workflow developers can have flexibility for
handling computing resources and implementing applications with Amazon Cloud. They
can choose specific systems resources to satisfy their needs with a fully controlled
access power.
• Reducing processing time & Cost effective
– Users can have number of servers, and control their usage time as they want. That
reduces researching cost and initial time to construct physical infrastructure for
research.
Bac
55. Background: What is AUC ?
AUC is an important technique for the solution study of
macromolecules
Molecules are not fixed to a microscope grid
Molecules are not distorted by crystal packing forces (vs X-Ray
crystallography)
Very large size range (complements cryo-EM and NMR)
Dynamic processes can be studied
Conformational changes
56. Background: What is AUC ?
Sample placed in cell
Run Ultracentrifuge
Usually 20-60k RPM
Collect data
4 to 24 hours or
more
Analyze the data
Bac
57. TG SG
Usage 2007-10
• Job statistics for UltraScan project
for approximately the last 4 years.
• Only partial data is available for
2007 (2nd half) and 2010 (thru
June), and only successful runs are
included.
•Totals of CPU hours consumed
from TeraGrid, UTHSCSA and
international resources
•Number of investigators
whose data were analyzed (left
Y-axis), and number of
submitted jobs (right Y-axis).
• Both panels indicate increasing
usage and need for TeraGrid
resources and an increasing
number of investigators requiring
access to these resources.
Bac
58. GFAC Integration
UltraScan job submission previously relied on GRAM4
GFAC integrated as middleware to abstract submission process
GRAM5, UNICORE and any future mechanism
Science Gateway is in active use
Initial testing done on IU quarry node
Extensively tested job submission process using GFAC to
LONI'sQueenBee and TACC's Ranger
Deployed 26 October 2010
Implementation details available
http://wiki.bcf.uthscsa.edu/cauma/wiki/US2GFACTesting
Bac
59. User Community: Publications
Since the development of our advanced methods, virtually
every publication from our lab has used these methods
We currently count 35 peer reviewed journal publications and
poster abstracts
Many additional presented talks where these methods have
provided important new detail to the investigations of biological
as well as synthetic polymer systems
We are aware of at least another 25 publications that were
facilitated by our methods from other laboratories using our
TeraGrid applications
Bac
60. Conclusion
• We focus initially on one component per
gateway.
– SimpleGrid, CCSM, Data Portal: gadgets
• Other gadget based gateways at UC
– GridChem: Xbaya
– UltraScan: GFac
• Goal is to establish an Apache-style
meritocracy for contributed code.
• Making distributed teams work: hacking
retreats.
61. OGCE Gateway Tool Adaption & Reuse
LEAD
LEAD
Experiment Builder, XRegistry Interface
GFac, XBaya,
XRegistry, FTR
Eventing System
GridChem
Xbaya, GC Middleware
GridChem Ultrascan
Resource OGCE GFac, Eventing
Discovery Service Re- System
engineer, Gen OVP/ BioVLab
TeraGrid eralize, Build, RST/
MIG XBaya, GFac
User Portal Test and
GPIR, File
Release
ODI
Browser
Workflow Suite, Gadget Container
OGCE Team Bio Drug Screen
Gadget Swarm->GFac
Container, GTLab, Java
script Cog, XRegistry EST Pipeline
Interface, Experiment
Builder, Axis2 Swarm->GFac
Gfac, Axis2 Eventing
System, Resource Future Grid
Prediction GFac, Xbaya, … 61
62. Software Strategy
• Focus on gadget container and tools for running
science applications on grids and clouds.
• Provide a tool set that can be used in whole or in
part.
– If you just want GFac, then you can use it without
buying an entire framework.
• Outsource security, information services, data
and metadata, etc to other providers.
– MyProxy, TG IIS, Globus, Condor, XMC Cat, iRods, etc.
63. Advanced Support Scenarios
• GridChem/ParamChem workflow support
• UltraScan Job Submission (GFAC)
• EST Pipeline
– Bioinformatics pipeline for managing mass job
submission.
64. More Information
• This is downloadable, packaged software.
– Apache Maven build system provides everything
you need to to build the gadget container,
gadgets, workflow composer, and backing
services.
– Get code by anonymous SVN checkout.
• Email: mpierce@cs.indiana.edu,
smarru@cs.indiana.edu, ogce-
discuss@googlegroups.com
• OGCE Web Site: www.collab-ogce.org
• Blog/News Feed: http://collab-
ogce.blogspot.com/
65. Acknowledgements and People
• Funding by TeraGrid GIG, RP and by OCI SDCI
• IU: Marlon Pierce, Suresh Marru, Raminder
Singh, Archit Kulshrestha, Zhenhua Guo
• TACC: Maytal Dahan, Rion Dooley
• SDSC: Nancy Wilkins-Diehr, Jeff Sale
• SDSU: Mary Thomas
67. Molecular Force Field Cyberenvironments
Parameter Initialization and optimization Workflow
Parameter Workflow
definitions Manager
Optimization
Model/Reference Data Monitor
Definition
Optimization Optimization Job
Merit Function Incomplete? Completed?
Specification
Expert Paramater testing Model
Interface
Optimization
Methods Choice Successful Testing
Consistency Checker Paramater Sensitivity
Analysis
Update Parameter Database
with new set
Optmization Job
Launcher
Notification of End
of Workflow
68. OGCE Alumni
• We also gratefully acknowledge the
contributions of participants in previous
incarnations of the OGCE:
– TACC: MaytalDahan, Rion Dooley
– SDSU: Mary Thomas
– SDSC: Nancy Wilkins-Diehr, Jeff Sale
– LSF: SrinathPerera, SanjivaWeeravarna