Talk given at APAN26 in beautiful Queenstown, New Zealand. Updates my INGRID talk with new material on caBIG--including some nice slides provided by Ravi Madduri.
2. 2
Thanks!
DOE Office of Science
NSF Office of Cyberinfrastructure
National Institutes of Health
Colleagues at Argonne, U.Chicago, USC/ISI,
OSU, Manchester, and elsewhere
5. 5
Scientific Communication, ~2000
Data ArchivesData Archives
User
Analysis toolsAnalysis tools
Gateway
Figure: S. G. Djorgovski
Discovery toolsDiscovery tools
Service-Oriented Science
6. 6
Application Scenario
Location A
Microarray, Protein,
Image data
Location B
Microarray, Protein,
Image data
Location C
Microarray, Protein,
Image data
Location C
Image Analysis
Location D
Image Analysis
Microarray and protein
databases at other institutions
Different database
systems, data
representations,
security
Different
program
invocation,
remote access,
data transfer
7. 7
caBIG: sharing of infrastructure, applications, and
data.
Data
Integration!
Services
& Cancer Biology
Globus
8. 8
Service-Oriented Science
People create services (data, code, instr.) …
which I discover (& decide whether to use) …
& compose to create a new function ...
& then publish as a new service.
I find “someone else” to host services,
so I don’t have to become an expert in
operating services & computers!
I hope that this “someone else” can
manage security, reliability, scalability, …
!!
“Service-Oriented Science”, Science, 2005
9. 9
Creating Services
People create services (data, code, instr.) …
which I discover (& decide whether to use) …
& compose to create a new function ...
& then publish as a new service.
I find “someone else” to host services,
so I don’t have to become an expert in
operating services & computers!
I hope that this “someone else” can
manage security, reliability, scalability, …
!!
“Service-Oriented Science”, Science, 2005
10. 10
Anatomy of a Service
op1 opN (meta)data
Implementation(s)
Clients
Registry
Management
Clients
…
Service
Service
Attribute
Authority
Attribute
Authority
Persistence
11. 11
Creating Services (~2005)
“This full-day tutorial provides an
introduction to programming Java
services with the latest version of the
Globus Toolkit version 4 (GT4). The
tutorial teaches how to build a Java
Service that makes use of GT4
mechanisms for state management,
security, registry and related topics.”
14. 14Center for Enabling Distributed Petascale Science
Workflow Automation at DOE Facilities
Automation
Reproducibility
Security
Reusability
Storage
Metadata
Analysis
Visualization
Advanced Photon
Source
15. 15
Discovering Services
People create services (data or functions) …
which I discover (& decide whether to use) …
& compose to create a new function ...
& then publish as a new service.
I find “someone else” to host services,
so I don’t have to become an expert in
operating services & computers!
I hope that this “someone else” can
manage security, reliability, scalability, …
!!
“Service-Oriented Science”, Science, 2005
16. 16
The ultimate arbiter?
Types, ontologies
Can I use it?
Billions of services
Discovering Services
Assume success
Syntax,
semantics
Permissions
Reputation
A B
18. 18
Discovery (2):
Standardized Vocabularies
Core Services
Grid
Service
Uses Terminology
Described In
Cancer Data
Standards
Repository
Enterprise
Vocabulary
Services
References
Objects
Defined in
Service
Metadata
Publishes
Subscribes to
and Aggregates
Queries
Service
Metadata
Aggregated In
Registers To
Discovery
Client API
Index
Service
Globus
22. 22
Composing Services
People create services (data or functions) …
which I discover (& decide whether to use) …
& compose to create a new function ...
& then publish as a new service.
I find “someone else” to host services,
so I don’t have to become an expert in
operating services & computers!
I hope that this “someone else” can
manage security, reliability, scalability, …
!!
“Service-Oriented Science”, Science, 2005
23. 23
Composing Services:
E.g., BPEL Workflow System
Data Service
@ uchicago.edu
Analytic service
@ osu.edu
Analytic service
@ duke.edu
<BPEL
Workflow
Doc>
<Workflow
Inputs>
<Workflow
Results>
BPEL
Engine
link
caBiG: https://cabig.nci.nih.gov/; BPEL work: Ravi Madduri et al.
link
link
link
See also Kepler & Taverna
Globus
27. 27
Publishing Services
People create services (data or functions) …
which I discover (& decide whether to use) …
& compose to create a new function ...
& then publish as a new service.
I find “someone else” to host services,
so I don’t have to become an expert in
operating services & computers!
I hope that this “someone else” can
manage security, reliability, scalability, …
!!
“Service-Oriented Science”, Science, 2005
30. 30
Hosting Services
People create services (data or functions) …
which I discover (& decide whether to use) …
& compose to create a new function ...
& then publish as a new service.
I find “someone else” to host services,
so I don’t have to become an expert in
operating services & computers!
I hope that this “someone else” can
manage security, reliability, scalability, …
!!
“Service-Oriented Science”, Science, 2005
31. 31
The Two Dimensions
of Service-Oriented Science
Decompose across network
Clients integrate dynamically
Select & compose services
Select “best of breed” providers
Publish result as new services
Decouple resource & service providers
Function
Resource
Data Archives
Analysis tools
Discovery toolsUsers
Fig: S. G. Djorgovski
33. 33
Putting It Together for the Example Scenario
Location A
Microarray, Protein,
Image data
Location B
Microarray, Protein,
Image data
Location C
Microarray, Protein,
Image data
Location C
Image Analysis
Location D
Image Analysis
caGrid Service
Interfaces
caGrid
Environ-
ment
Registered
Object
Definitions
Advertise-
ment
Log on, Grid
credentials
Query and
Analysis
Workflow
Discovery
Microarray & protein
databases at other
institutions
34. 34
Lessons Learned
A convenient higher-level abstraction
Suitable for a subset of scientific use cases
Infrastructure need to be sustainable
Integrates well with hospital/cancer
center/experimental facility IT
infrastructure
Workflows are attractive to users
Scalability and provenance are important
No vendor lock-in (if you are careful)
User experience remains ambiguous
Early adopters are enthusiastic (50+ services)
35. 35
Services for Science
A new approach to communicating
A (not-so new) approach to structuring systems
They’re real
Excellent infrastructure and tools (Globus,
Introduce, gRAVI, Taverna, Swift, etc., etc.)
Substantial numbers of services out there
They’re challenging
Sociology: incentives, rewards
Infrastructure: hosting
Provenance: justifying “results”
Scaling: services, requests
Hinweis der Redaktion
I would mention data integration here: integration of different data types: tissue, (pre)clinical, genomics, proteomics….
Developer benefits:
Developers spend time developing scientific applications instead of rewriting code to turn into services.
Developer time spared through code reuse—legacy apps are wrappered into services instead of rewritten as services
Learning gRavi easier than learning service development—especially marshaling of security resources!
Traditional applications are not secure and can not provide an audit trail
Scientific benefits:
Scientists spend time developing workflows rather than moving data from application to application
Scientists interact with GUI to focus on scientific process not how to run applications.
Services provide easy access to all scientific tools in one integrated GUI.
Access to grid resources promotes cross disciplinary collaboration as it becomes easier to share data and applications across institutional boundaries
Use funnel animation?
The GT4 plug-in support semantic based service query. Users can input multiple (up to three, in current scenario three is enough. We can add more in our program upon request) service query criteria and input the corresponding value.
We can combine multiple criteria. The initial GUI only shows one query criteria, but more can be added by clicking “Add Service Query” button. For example, we can query the caGrid services whose “Research Center” name is “Ohio State University”, with Service Name “DICOMDataService”, and has operation “PullOp”.