What Are The Drone Anti-jamming Systems Technology?
Opportunities and Challenges for Running Scientific Workflows on the Cloud
1. Opportunities and Challenges for
Running Scientific Workflows
on the Cloud
Yong Zhao, Xubo Fei, Ioan Raicu, Shiyong Lu
Cyber-Enabled Distributed Computing and Knowledge Discovery
(CyberC), 2011 International Conference
Ying Lian
Computer Science, WSU
4. INTRODUCTION
Cloud computing is gaining tremendous momentum in both
academia and industry.
“Cloud Computing”: a large-scale distributed computing
paradigm that is driven by economies of scale, in which a
pool of abstracted, virtualized, dynamically-scalable,
managed computing power, storage, platforms, and
services are delivered on demand to external customers
over the Internet.
Mostly applied to Web applications and business
applications. To support workflow applications
a link is missing
5. INTRODUCTION
Manage and run workflow applications on the cloud
(especially data-intensive scientific workflows)
Several Scientific workflow management systems
(SWFMSs) have been applied.
Cloud Workflow: specification, execution, and
provenance tracking of scientific workflows, as well as
the management of data and computing resources to
enable the running of scientific workflows on the Cloud
Following sections: Meaning, challenges, research
opportunities
6. OPPORTUNITIES
Keywords: Infinite computing resource
1. The scale of scientific problems that can be addressed
by scientific workflows is now greatly increased, which
was previously upbounded by the size of a dedicated
resource pool with limited resource sharing extension in
the form of virtual organizations.
data size (e.g. GenBank double/9-12m )—vast storage
space
complexities of the applications (e.g. protein simulation by
iterative algorithm with huge parameters) – massive
computing resources
7. OPPORTUNITIES
2. The on-demand resource allocation
mechanism in Cloud has a number of
advantages over the traditional cluster/Grid
environments for scientific workflows:
Improve resources utilization. Unequal numbers of
recourses are required for different stages.
Faster turn-around time for end users: dynamic scale
out/in
Enable new generation workflow: collaborative
scientific workflow. In which user interaction and
collaboration patterns are favored
8. OPPORTUNITIES
3. Much bigger room for trade-off between
performance and cost.
Spectrum of resource investment: from delicate
private resources, hybrid local & cloud, full outsourcing
on clouds
Cloud computing bring the opportunities to improve
the performance/cost ratio
But the optimization of this ratio and automatic tradeoff mechanism remain challenging.
9. CHANLLENGES
Architectural challenges
Integration challenges
Computing challenges
Data management challenges
Language challenges
Service management challenges
10. Architectural Challenges
User interface customizability and support
Reproducibility support
Heterogeneous and distributed services and
software tools integration
Heterogeneous and distributed data product
management
High-end computing support
Workflow monitoring and failure handling
Interoperability
12. Deploy the architecture: solutions
Operation
Task
Management
Workflow
management
All_in_the_could
SWFMS running
out of the
Cloud
Not on a
batch-based
schedule
Presentation
Layer
deployed at a
client machine
SWFMS inside
the cloud, and
accessed via
Web browser
No concern of
vendor lock-in
Deploy
immediately
without
sequence
Suitable for ad
hoc domainspecific
requirement
Highly scalable:
Software as a
Service
SWFMS itself
cannot benefit
from the
scalability
Cost of storage
of provenance
& data
products
More
dependent on
Cloud platform
Cost;
Dependency;
Vendor lock-in
13. Integration Challenges
How to integrate scientific workflow systems with Cloud
infrastructure and resources ?
Operation layer : Applications, services, and tools hosted in
the Cloud and the scheduling and management of a
workflow are outside the Cloud. (e.g. Google Map service
use ad hoc scripts and programs to glue the services
together)
Task management layer: resource provisioning. (e.g. Nimbus)
Workflow management layer: Debugging, monitoring, and
provenance tracking
All in cloud: porting issue. Need a workflow engine at cloud
end, and web interface or thin client at user end
14. Language Challenges
MapReduce: a widely used computing model, with two
key function, Map and Reduce. --White-Box
SwiftScript serves as a general purpose coordination
language, where existing applications can be invoked
without modification. --Black-Box
15. Language Challenges
Handle the mapping from input and output data into
logical structures.
Support large-scale parallelism via either implicit
parallelism, or explicit declaratives.
Support data partitioning and task partitioning.
Require a scalable, reliable, and efficient runtime system
that can support Cloud-scale task scheduling and
dispatching, provide error recovery and fault tolerance.
16. Computing Challenges
Workflow system may not be able to talk to Cloud
resources directly middleware services needed.
(Nimbus or Falkon to handle the resource provisioning
and task dispatching)
More complicated if consider: workflow resource
requirement, data dependencies, Cloud virtualization.
A SWFMS will try to automatically recover when non-fatal
errors happen. Smart-return: detailed execution info be
logged, for workflow restart.
17. Data management challenges
When data intensiveness increase, the management of
data resources and dataflow between the storage and
compute resources become the bottleneck.
Data Locality: CPU cheaper, data inflate location is the
most challenge, rather than the computational resources
Combining compute and data management: need to
minimize the amount of data movement. Otherwise,
significant underutilization of raw resources will be yield.
Provenance: derivation history of a data product. Tracking
across service providers, and across different abstraction
layers. Secure access is another missing now.
18. Service management challenges
The engineering of the components of an SWFMS as
services:
thousands of services developed and available for the
myExperiment project
the LEAD system has developed a tool to wrap and convert
ordinary science applications into services
The orchestration and invocation of services from an
SWFMS
managing the large number of service instances
data movements across different service instances
19. RESEARCH DIRECTIONS
Emphasis on workflow reference architecture and direct
research effort to foregoing layers
Great leap on Middleware development: resource
management, monitoring, messaging
Many Task computing (MTC): preliminary applied in Grids
and supercomputer, expected to largely improved for
Cloud
Scripting: mixture of semantics, combination of
application of services…
Cost optimization: very challenging, but rewarding too
20. RESEARCH DIRECTIONS
SWFMS security
Access control: critical because of the natures of
clouds ( Dynamic, large data and service sharing)
Information flow control: assure the scientific flow
related info propagated to an authorized end
Secure electronic transaction protocol: pay-as-you-go
pricing model
21. CONCLUSIONS
As more customers and applications migrate into Cloud,
the requirement to have workflow system to manage
complex tasks will become more urgent
Now mash-up’s and MapReduce style task management
have been acting in place of a workflow system in the
Cloud
The opportunities and challenges in bringing workflow
systems into the Cloud are discussed
They identify key research directions in realizing scientific
workflows in Cloud environments
Cloud Computing has proven to be one of the great disruptive technologies of our time, and the effects of its increasing adoption and maturation will ripple out.Cloud Computing is here to stay, and as developers become more aware of the immense potential.