2. Data Document Access through DataManager
interface locally, DataPeer remotely
Identifier with namespace
LSID reference
DataManager instance
File reference Has unique namespace
Stores
within peer group
…..
Zero or more ref scheme instances
pointing to identical immutable data
Locational Context
List (depth) Reference
Configuration Scheme Plugins
Identifier with namespace (extension point)
LSID
Depth, List of child IDs • No context required?
LSID
URL
URL
• Local network name, subnet mask
Error
File
File
Identifier with namespace • File system name and mount point
…? …?
Depth, Detail
• Whatever you need here
3. Example nested list structure :
•Downstream process filters on the event depth it needs:
Leaf1 •If the minimum depth is too high it iterates, discarding all
List2 but the finest grained events
Leaf2
list1 •If the maximum depth is too low it wraps in a new single
List3 leaf3 element collection, discarding all but the root event
•Identifiers in the boxes are those from the previous slide
Appears on data link as :
Leaf3[1,0] List3[1] Leaf2[0,1] Leaf1[0,0] List2[0] List1[]
Processors (or, more accurately, service proxies) can now emit results piece by piece
Sensor proxy that can emit a temperature reading / cell count / image every ten seconds
Database query that returns rows one row at a time from the data server
Management of collection events is handled by the framework
4. Taverna 2 opens up the per-processor dispatch logic.
Dispatch layers can ignore, pass unmodified, block, modify or act on any message and can
communicate with adjacent layers.
Each processor contains a single stack of arbitrarily many dispatch layers.
Single dispatch layer Dispatch layer composition allows
for complex control flow within a
Job Queue & Single Job & Service List
given processor.
Service List
Job specification messages from
DispatchLayer is an extensibility
layer above
point.
DispatchLayer implementation
Use it to implement dynamic
Data and error messages from
binding, caching, recursive
layer below
Fault Result behaviour…?
Message Message
5. Parallelize This dispatch stack
configuration
• Ensures that at least ‘n’ jobs are pulled from the queue and sent to the layer below
• Reacts to faults and results by pulling more jobs off the queue and sending them replicates the current
down, passing the fault or result message back up to the stack manager
Taverna 1 processor
logic in that retry is
Failover
within failover and
• Responds to job events from above by storing the job, removing all but one service
from the service list and passing the job down. both are within the
• Responds to faults by fetching the corresponding job, rewriting the original service
parallelize layer.
set to include only the next service and resending the job down. If no more services
are available propagate the fault upwards
• Responds to results by discarding any failover state for that job
Layers can occur
multiple times, you
Retry
could easily have
• Responds to jobs by storing the job along with an initial retry count of zero
retry both above
• Responds to faults by checking the retry count, and either incrementing and
resending the job or propagating the fault message if the count is exceeded
and below the
failover layer for
Invoke
example.
• Responds to jobs by invoking the first concrete service in the service list with the
specified input data
• Sends fault and results messages to the layer above
6. ‘Service’ in this case means ‘Taverna 2 proxy to something we can invoke’ – name might
change!
Service invocation is asynchronous by default – all AsynchronousService implementations should
return control immediately and, ideally, use thread pooling amongst instances of that type.
Results, failure messages are pushed to an AsynchronousServiceCallback object which also
provides the necessary context to the invocation :
Provenance
DataManager SecurityManager Message Push
Connector
• Resolve input data • Provides a set of • Allows explicit push • Used to push fault
references security agents of actor state P- and result messages
available to assertions to a back to the
• Register result data
manage connected invocation layer of
to get an identifier
authentication provenance store the dispatch stack
to return
against protected for invocation
resources specific metadata
capture
7. Client
Policy
Policy
engine
In this scenario the
agent is discovered Set of
credentials
based on the service, a
message is passed to
the agent to be signed
Service
and that message
relayed.
Security Agent
Credentials never leave
the agent!
8. Taverna 2 combines data managers, workflow enactors and security agents into transient
collaborative virtual experiments within a peer group. These groups can be shared and
membership managed over time and can persist beyond a single workflow run.
User 1 User 2
External External Data
Services Stores i.e. SRB
Policy Policy
Policy Policy
engine engine
Enactor
Set of Set of
DM DM DM
credentials credentials
Peer group (i.e. JXTA group) – Virtual Experiment Session
9. Define a workflow as nested boundaries of control.
Each boundary pushes its identifier onto an ID stack on data entering it and pops it when exiting.
When a new ID is created the controlling entity registers with a singleton monitor tree, attaching to
the parent identified by the path defined by the previous value of the ID stack on that data.
P1
WF1 Iteration over nested
workflow here…
WF1_1 P2
P3
WF2_1 Q1
P1 WF2
P3
WF2_2 Q1
Q1
Each node defines a set of properties.
If a property is mutable it can be used to steer the
enactment.
P2 Properties could include parallelism setting, service
binding criteria, current job queue length, queue
consumption, number of failures in the last minute…
10. Due December 2007 in ‘visible to end user’ form.
Thisrelease will probably not include everything, esp
steering agents and virtual experiment management.
Early tech preview real soon now [tm]
Complete code rewrite, current status is around
90% complete on enactor and data manager core.
Java code in CVS on sourceforge, project name is
‘taverna’, CVS module is ‘t2core’
Licensed under LGPL at present
Hands on session later if anyone’s interested?
11. Core Research and
Investigators Postgraduates Pioneers Funding and Industrial Development
• Matthew Addis, Andy • Tracy Craddock, Keith • EPSRC • Nedim Alpdemir, Pinar
•Hannah Tipney, May
Brass, Alvaro Fernandes, Flanagan, Antoon Goderis, Alper, Khalid Belhajjame,
• Wellcome Trust
Tassabehji, Medical Genetics
Rob Gaizauskas, Carole Alastair Hampshire, Duncan Tim Carver, Rich Cawley,
team at St Marys Hospital, • OMII-UK
Goble, Chris Greenhalgh, Hull, Martin Szomszor, Justin Ferris, Matthew
Manchester, UK; Simon
Luc Moreau, Norman Paton, Kaixuan Wang, Qiuwei Yu, Gamble, Kevin Glover,
Pearce, Claire Jennings, • Dennis Quan, Sean Martin,
Peter Rice, Alan Robinson, Jun Zhao Mark Greenwood, Ananth
Institute of Human Genetics Michael Niemi (IBM), Mark
Robert Stevens, Paul Krishna, Matt Lee, Peter Li,
Wilkinson (BioMOBY)
School of Clinical Medical
Watson, Anil Wipat Phillip Lord, Darren Marvin,
Sciences, University of Simon Miles, Arijit
Newcastle, UK; Doug Kell, Mukherjee, Tom Oinn,
Stuart Owen, Juri Papay,
Peter Li, Manchester Centre
Savas Parastatidis,
for Integrative Systems
Matthew Pocock, Stefan
Biology, UoM, UK; Andy
Rennick-Egglestone, Ian
Brass, Paul Fisher, Bio-Health
Roberts, Martin Senger,
Informatics Group, UoM, UK, Nick Sharman, Stian
Simon Hubbard, Faculty of Soiland, Victor Tan, Franck
Life Sciences, UoM, UK Tanoh, Daniele Turi, Alan R.
Williams, David Withers,
Katy Wolstencroft and
Chris Wroe
Additional T2 thanks to Matthew Pocock, Thomas Down & David DeRoure amongst others!
Please see http://www.mygrid.org.uk/wiki/Mygrid/Acknowledgements for most up to date list