How to Troubleshoot Apps for the Modern Connected Worker
A View on eScience
1. Collaborative eScience:
Evolving Approaches
Charles Severance
Rutgers CyberInfrastructure Meeting
April 4, 2006
www.dr-chuck.com
2. Outline
• A look back at the past 15 years
• Putting the “collab” in Collaborative eScience
• The current tools of Collaborative eScience
– Collaboration
– Portals
– Repository
• Reflecting on 15 years of Experience
– What is wrong with Middleware?
– Authorization and Authentication - Are we there yet?
• A “future” eScience Case Study
3. The Founding Concepts
• Scientific Domain
• Groups of People
• Common User Interface
• Data Sharing
– In the moment
– Long-term
• Experimental Equipment
• Compute
• Visualization
4. Over 15 Years of Collaborative eScience
SCIGate ?
OGCE Grid Portal
NEESGrid NEESIT
Worktools CHEF Sakai
Globus Tool Kit
UARC/SPARC
1991 - 1999 2000 2001 2002 2003 2004 2005 2006 2007
8. SPARC Software
• Written from scratch
– No Middleware
– No Portal Technology
• Three rewrites over 10 years
– NextStep
– Java Applets with server support
– Browser based - kind of like a portal
• At the end, in 2001 - it was ready for another
rewrite
9. Keys to SPARC Success
• Ten years of solid funding
– Team consistency
– Long enough to learn from “mistakes”
• Long term relationship between IT folks and
scientists - evolved over time - relationship
was “grey”
• Software rewritten several times over life of
project based on evolving user needs and
experience with each version of the program
• Portion of effort was invested in evaluation of
usability - feedback to developers
10. After SPARC: Now What?
• Getting people together is an important part of
collaborative eScience
– WorkTools - Based on Lotus Notes
– CHEF - Collaborative framework - Based on Java and
Jetspeed
– Sakai - Collaboration and Learning Environment - Java
• Critical point: Collaborative software is only one
component of eScience
• UM Focus: Building reusable user interface
technologies for the people part of collaborative
eScience
11. WorkTools
WorkTools - The “organic”
single-server approach - if
you build it (and give away
free acounts), they will
come…
Over 9000 users (2000 active) at the end of 2003
12. CompreHensive CollaborativE
Framework (CHEF)
• Fall 2001: CHEF Development begins
– Generalized extensible framework for building
collaboratories
• Funded internally at UM
• All JAVA - Open Source
– Jakarta Jetspeed Portal
– Jakarta Tomcat Servlet Container
– Jakarta Turbine Service Container
• Build community of developers through workshops
and outreach
13. CHEF Applications
• CourseTools Next Generation
• WorkTools Next Generation
• NEESGrid
• NSF National Middleware Grid Portal
14. NEESGrid - The Equipment
Network for
Earthquake
Engineering
Simulation
NSF Funded.
NCSA, ANL, USC/ISI, UM, USC, Berkeley, MSU
16. Overall Data Modeling Efforts
NEES
Site
Specifications Site A Site B Site C
Database
Equipment Equipment People
People
Project Trials
Experiments
Description Experiments Trials
Domain
Tsnumai Shake Table Geotech Centrifuge
Specific
Specimen Specimen Specimen Specimen
models
Common
Elements Units Sensors Descriptions
Data / Data Data Data
Observations
17. Capturing Video and Data
PTZ/ Still Camera Control
USB Capture Gateway
DT Client
Video DT Main System
BT848
Frames
DT Client
Data
DAQ
Capture
DT Client
Simulation
Coordinator Site B
Site A
18. Data Monitoring Tools
Still Image / Camera Control
Camera Control Still image
Gateway camera
control
^
< >
^
DT Main System ~ < >
Thumb-
nail
Creare
viewers
19. What’s in a name?
Sakai is named after
Hiroyuki Sakai of the Food
Channel Television
program “Iron Chef”.
Hiroyuki is renowned for his
fusion of French and
Japanese cuisine.
20. Sakai General Collaborative Tools
• Announcements
• Assignments
• Chat Room
• Threaded Discussion
• Drop Box
• Email Archive
• Message Of The Day
• News/RSS
• Preferences
• Resources
• Schedule
• Web Content
• Worksite Setup
• WebDAV
21. Requirements Overlap
Grid Computing
Quizzes Physics
Visualization
Grading Tools Research
Syllabus Collaboration
SCORM
Teaching
and Data Repository
Learning
Chat
Discussion Earthquake
Resources Research Large Data
Collaboration Libraries
23. Additional General Collaboration
Tools Under Development
• Wiki based on
Radeox
• Blog
• Shared Display
• Shared
Whiteboard
• Multicast Audio
• Multicast Video
These are works-in-progress by members of the Sakai eResearch community. There are no dates for release.
27. Integration and Applications
Atlas ITER CMS
Administration and Users
Configure: Atlas Portal
Experiment Gateway
Process Portal Gateway Desktop Gateway Technologies
Control
Knowledge Store
…
Sakai
SRB
KnowledgeStore
Opal Management
Clarens Experiment Components
Simulation
Process
Metadata
Control
…
…
Configure: ITER Portal
Experiment
Process Services and
BlueGene
MetaData
Control
Security
Clarens
Components
Identity
Globus
ORNL
Sakai
Knowledge Store
Opal
SRB
Sakai
…
SRB
Opal
Clarens
Metadata SciGate Petascale Petascale Resources
Production Compute Data
29. Scope of Collaborative E-Science
“..composing and orchestrating “..interoperability is key…”
many technologies…”
Portal
Technology
Data
Collaborative Sources
Tools
Identity
ACL Shared Data
Repository Knowledge
Compute
Tools
30. Portals are an excellent
technology for building a
User Interface for federated user interface
Collaborative E- across these disparate
components using
Science standards like JSR-168.
Portal
Technology
Data
Collaborative Sources
Tools
Identity
ACL Shared Data
Repository Knowledge
Compute
Tools
31. Portals may only
be an intermediate
step in the
Desktop
Applications
process..
Portal
Technology
Data
Collaborative Sources
Tools
Identity
ACL Shared Data
Repository Knowledge
Compute
Tools
32. Sakai is focused primarily
Focus of Sakai on integration with portals
Activity in eScience and working closely with
data repositories.
Dis
cus
sF
irst
Portal
Technology
Data
Collaborative Sources
Tools
Identity
ACL Shared Data
Repository Knowledge
Compute
Tools
33. Collaboration .vs. Portal
• Basic organization is about the • Basic organization is about the
shape of the people and groups thing it represents - Teragrid,
• Customization based on the “group NVO
leaders” • Site customization is based on
• New groups form quickly and the resource owners
organically
• Sometimes there is an
• Doing one thing at a time - chat,
upload - perhaps multiple active individual customization aspect
windows on a desktop • Many small rectangles to
• Very interactive provide a great deal of
• Think of navigation as picking a tool information on a single screen
or switching from one class to • Portals think of rectangles
another operating independently - like
• Think “Application” windows
• Think “Dashboard”
34. Sakai Portlet Version 0.2
Announcements (sakai.announcements)
Assignments (sakai.assignment)
• Tree View Chat Room (sakai.chat)
Discussion (sakai.discussion)
• Gallery View Gradebook (sakai.gradebook.tool)
Email Archive (sakai.mailbox)
• Proxy portlets Membership (sakai.membership)
Message Forums (sakai.messageforums)
• Source in SVN Preferences Tool (sakai.preferences)
Presentation (sakai.presentation)
• Configurable via Profile (sakai.profile)
Resources (sakai.resources)
properties file Wiki (sakai.rwiki)
Tests & Quizzes (sakai.samigo)
Roster (sakai.site.roster)
Schedule (sakai.schedule)
Site Info (sakai.siteinfo)
Syllabus (sakai.syllabus)
35. Sakai JSR-168 Portlet
• Web Services are used to login to Sakai
establish a session and retrieve a list of
Sakai Sites, Pages, and Tools
• The portlet is 100% stock JSR-168
– Works in Pluto, uPortal, and GridSphere
36. Three Variations
• Display the Sakai gallery - all of Sakai
except for the login and branding.
• Retrieve the hierarchy of sites, pages
and tools display in a tree view with the
portlet and show selected tools/pages in
an iframe within the portlet
• Proxy tool placement for a particular
Sakai tool such as sakai.preferences
46. Sakai is focused primarily
Focus of Sakai on integration with portals
Activity in eScience and working closely with
data repositories.
Portal
Technology
Data
Collaborative Sources
Tools
N ow
c us s
Di s
Identity
ACL Shared Data
Repository Knowledge
Compute
Tools
47. Collaboration .vs. Repository
• Many different systems may be • Generally one system for the
active at the same time area
• Systems evolve, improve, and • Long term strategic choice for
are often replaced every few institution
years • System focused on accessing,
• Systems focused on the indexing, curation, and storage
dynamic needs of users and • Millions of high quality objects
applications properly indexed
• Thousands of simultaneous • Data and metadata quality
online users • Must enforce standards and
• Performance tuning workflow to insure data quality
• Must be very easy to use; • Most use is very purposeful:
almost unnoticeable search, publish, add value
• Used informally hundreds of • Think “Library”
times per day per user
• Think “E-Mail”
48. Inbound Object Flow
Search
The lens or
View
disseminator
Reuse
understands
Sakai DR the data
model and is
Index Lens capable of
rendering the
objects. The
Store
Prepare for
lens is part of
Create and storage the DR.
use in
Data
native form
Model
Curate, convert,
update and
maintain over
Ingest time
The DR establishes a data model for “site” Preparation for storage may
objects. The CLE hands sites to the DR. The include cleanup, conversion,
DR may have to do “model” or content cleanup copyright clearance, and other
before completing the ingest process. workflow steps.
49. Outbound Object Flow
Sakai can find and
re-use objects in the
Search
repository.
View
View
Sakai
Search Index Lens Lens
Reuse
Reuse Data
Model
Data
DR Model
50. Sakai and Repositories Going
Forward
• Instead of solving the problem by creating a single
DR technology that is a superset - which might take
years
• Focus on data portability between systems - reduce
the impedance mismatch (or needed conversion
between systems)
• RDF enables object portability across systems,
languages, and technologies
51. Sakai Repository Approach
• Move Sakai and other Collaboration systems toward
RDF
– Experiment with using RDF as native storage format
– High Performance RDF - Fedora testing - 180M tuples -
complex queries - 70ms
• Move data repositories toward RDF
– Move from schema-based stovepipe objects to OWL/RDF
based objects with referential integrity
– Explore dimensions of portability of disseminator / lenses -
this is an important research area
• Get started immediately….
54. Where is the Middleware?
“..composing and orchestrating “..interoperability is key…”
many technologies…”
Portal
Technology
Data
Collaborative Sources
Tools
Identity
ACL Shared Data
Repository Knowledge
Compute
Tools
55. Is Middleware The Universal Connector?
Portal
Collaborative
Technology Data
Tools
Sources
Identity
ACL
Middleware
Shared
Data Knowledge
Compute
Repository Tools
56. The Universal Connectors
Portal
Collaborative
Technology Data
Tools
Sources
tcp/ip http/https
Identity
ACL web services
Shared
Data Knowledge
Compute
Repository Tools
57. Is Middleware “inside” each
application?
Portal
Technology
Data
Collaborative Sources
Tools
Identity
ACL Shared Data
Repository Knowledge
Compute
Tools
58. Middleware is simply another component -
used as needed
Portal
Technology
Data
Collaborative Identity Sources
Tools ACL
Middleware
Shared Data
Repository Knowledge
Compute
Tools
59. Identity and Access Control: A very important
function of Middleware
Portal
Technology
Data
Collaborative Identity Sources
ACL
Lets Talk about This
Tools
Middleware
Shared Data
Repository Knowledge
Compute
Tools
60. Chalk Talk:Identity and Access Control
CAS
??
Shibboleth
MyProxy GridShib
Globus
Identity
K.X509 ACL
Kerberos
???
LDAP
Cosign
PubCookie
Competition Collaboration Convergence
61. Identity and ACL: Goal State
• One server - one software distribution
• Virtual Organization Software
• Supports all protocols
– Globus Certificate Authority
– Shibboleth
– LDAP
– MyProxy
– Kerberos
• Who will do this? Who will fund this? Who
can get these competitors to cooperate?
64. The pre-requisites
• My net worth is $5B (I give myself grants)
• I encounter some tech-savvy scientists in a field who
are using technology to do world-class research…
• They have never been visited by any other computer
scientist…
• They are working in groups of 1-30 geographically
distributed around the world
• They all work on a beach with Internet2 connections
and wide-open wireless and favourable exchange
rates
65. D
A
Experiments
Compute
E
B Remote Observation
Data Models
Vol 4
Vol 3
Vol 2
F Vol 1
C eDocuments
Tutorials
66. Step 1: Visit The Scientists
• Understand what they are doing and how
they are doing it?
• Ask them how they would like to improve it.
• Show each application to other scientists.
Ask the other scientists how they would
improve it.
• Help each group improve their work - help
them using whatever technology they are
currently using
67. Step 2: Add some technology
• Install the super-multi-protocol Virtual
Organization software and provide a NOC for
the VO software - identity and simple
attributes
• Install Sakai - point it at the VO software for
identity add icon at the top of Sakai
• Give each scientist an account in the VO
• Give each effort in the field a site within Sakai
68. Heart Study Collaboratory
Login
My Workspace A B C D E Open Forum
Home
Chat
Resources
Tutorials
Site B
Mail List
Live Meetings
69. Step 2: Use the VO
• For those who want to protect their
information, help them add SSO to their
sites, backed by the VO service
• Since it is multi-protocol - likely there
will be no modification of the underlying
science code - only a server
configuration change Identity
ACL
70. D
A
Experiments
Compute
E
B Remote Observation
Data Models Heart Study Collaboratory
Login Vol 4
My Workspace A B C D E Open Forum Vol 3
Home
Vol 2
F
Chat
Resources Vol 1
Tutorials
Site B
Mail List
C
Live Meetings
Identity eDocuments
ACL
Tutorials
71. Step 4: Unique Identifier
Service
• Come up with a way for any member of the VO to “get” a unique
identifier
• Demand some information (build a little data model)
– Person’s name and organization (implicit from request)
– What kind of thing this will represent (experiment, document, image
series)
– Simple description
– Keyword/value extensions
• Build an simple way request and retrieve these through a simple
web service - capture implicit metadata from request (when, IP
address, etc). Make sure it works from perl!
• Encourage community to start marking “stuff” with these
identifiers in their stovepipes
72. Step 5: Data Models
• Begin to work with subsets of the field to try to
find common data models across stovepipes
• Start simple - use very simple RDF - human
readable
• Broaden / deepen model slowly - explore
variations
• Define simple file-system pattern for storing
metadata associated with a file and/or a
directory
73. Step 6: A Backup-Style Repo
• Build a data repository which will function as
a backup
• Basic idea - each time you get identifier - this
enables backup space - any data and/or
metadata can be uploaded under that
particular identifier and left in the repository
• Make the repo multi-protocol, FTP, DAV,
Web-Service with attachments, GridFTP, etc.
• Make it so there can be a network of
cooperating repositories
74. Local
Central Repo
Repo D
A
Experiments
Compute
E
B Remote Observation
Data Models
GUID Vol 4
Vol 3
Service Vol 2
F Vol 1
Identity C Heart Study Collaboratory
My Workspace A B C D E
Login
Open Forum eDocuments
ACL
Home
Chat
Resources
Tutorials Tutorials
Site B
Mail List
Local
Repo
Live Meetings
75. Year 4 and on…
• Once the basic stovepipes have been “brought in
from the cold” and made part of a community with no
harm, the next steps are to begin to work “cross-
stovepipe”
– Evolve data models to be far richer with many variants
– Build value added tools that are aware of the data models
and are usable across stovepipes
• Teach the community to build and share tools - gently
encourage development standards - Java / JSR-168
perhaps
• Most important: Always listen to the users
76. Science at … start at the center
and work outwards…
the center of New Tools
eScience
Data Models
New Technologies
Connect
Communicate
Repositories Science
Priority
Scientists
Enhance
Data Storage
… apply technology
when the users will New Approaches
see it as a “win” …
77. Conclusion
• Many years ago, eScience had science as its
main focus
• Custom approaches resulted in too many
unique solutions
• Computer scientists began a search for the
“magic bullet” - each group found a different
magic bullet
• Each group now competes for mind share
(and funding) to be the “one true” magic bullet
78. Conclusion (cont)
• One way to solve the “many competing technologies”
solution is to form “super groups” which unify the
technologies
• No single technology gets to claim “they are the one”
(Middleware is not “in the middle”)
• Each technology needs to become a drop-in
service/component which is available for use only
when appropriate
• Once we can get past looking at the technologies as
the main focus, we get back to science as the main
focus
79. Lets remember why we started this
whole field in the first place…
• Scientific Domain
• Groups of People
• Common User Interface
To download
• Data Sharing www.dr-chuck.com
– In the moment “Chuck’s Talks”
– Long-term
• Experimental Equipment
• Compute
• Visualization
Hinweis der Redaktion
Upper Atmosphere Research Collaboratory Space Physics Aeronomy Research Collaboratory
SPARC - Space and Aronomy Research
SPARC - Space and Aronomy Research
An advanced users view of SPARC; lots of information in multiple SPARC pages! The typical SPARC collaboration features down the left hand side, page information, resource list, active user list, and chat Streaming video in the lower left. Lots of data and model views throughout the rest of the screen shot.
Funding ended in 2001 - Project continued on inertia for a few years - but then faded away as its technology became dated wnd there was no further investment in “the little things” (I.e. no one to call and ask questions of).
This is a bit of the UM perspective.
Web-based file-system
Very much influenced by CHEF. Note that UMICH thinking of teaching and learning as collaboration. No real learning focused tools - all generic collaboration tools.
Sakai
Sakai is a Collaborative Learning Environment. Sakai can be applied in both a teaching and learning environment as well as research collaboration.
Cross cutting issues include identity, authorization, and access to resources. Desktop.
OGCE architecture and approach is to invest in building reusable components which allow collaborative eScience technologies to be placed in a portal to provide a uniform user interface. Standards such as JSR-168 and WSRP insure that elements can be reused across projects. Over time, the web/portal paradigm will likely be replaced by a desktop paradigm - but the portal provides a least common denominator and allows us to deploy solutions for collaborative eScience in the short term.
Mention WSRP 2.0 Mention PLEX - Flexible, pluggable, Eclipse-based desktop “personal application portal” - very experimental in the UK funded by JISC/CETIS.
Sakai provides the Grid-aware collaboration tools which allow people to work together collaboratively. Sakai can be used standalone, but when Sakai is used in conjunction with a JSR-168 portal it allows eScience providers an easy platform to integrate across the whole scope of their eScience application. The Sakai focus on data repository integration is intended to allow the “capture” of the human collaborative activities for permanent storage in the data repository along with all of the other science data to be stored in the repository. Sakai also participated in the cross-cutting issues of federated identity and access control.
Note display is in Pluto - most portals display Portlet links across the top so in portals like uPortal and GridSphere, you won’t see two columns of buttons.
When the Portlet Starts, it puts up a login screen or auto-logs in depending on configuration. The login is done with a web service. Once the session is established, the portlet emits an iframe, requesting the gallery view of Sakai.
Note display is in Pluto - most portals display Portlet links across the top so in portals like uPortal and GridSphere, you won’t see two columns of buttons.
When the Portlet Starts, it puts up a login screen or auto-logs in depending on configuration. The login is done with a web service. Once the session is established, the site/tool list is retrieved and the portlet is displayed with the hierarchy displayed as a tree. As a page or tool are selected, the tree control places the iFrame with the Charon URL to the site into the iFrame. This iframe URL uses session based launch to bypass Sakai’s login process.
When there are more than one site with the calendar tool with in the Sakai server, the user is given the choice as to which Sakai calendar is to be displayed. This choice is saved in the user’s preferences. If there is only one calendar, this step is skipped and the user goes directly to the tool. This “only one placement scenario” happens for “My Workspace” tools like Preferences. So, the user never sees this step is skipped when placing the Preferences portlet.
This portlet logs in like any other version of the Sakai portlet. It knows that it is a “sakai.calendar”. It retrieves the entire list of sites, pages, and tools, and searches for sakai.calendar placements. If it finds 2 or more placements, it puts up a selection dialog (1) - the user chooses which placement they are interested in, and then that tool placement is displayed (2). The placement is also put in the user’s preferences so the next time the tool comes right up, bypassing the selection.
Sakai provides the Grid-aware collaboration tools which allow people to work together collaboratively. Sakai can be used standalone, but when Sakai is used in conjunction with a JSR-168 portal it allows eScience providers an easy platform to integrate across the whole scope of their eScience application. The Sakai focus on data repository integration is intended to allow the “capture” of the human collaborative activities for permanent storage in the data repository along with all of the other science data to be stored in the repository. Sakai also participated in the cross-cutting issues of federated identity and access control.
Collaborative systems often allow materials to be sloppy (copyright for example). IR’s must be very clear on copyright as there are likely to be different rules w.r.t. materials in IR’s. This may require some cleanup on materials from collab systems when moving to an IR.
One of the goals here is data portability - making data relevant *beyond* the initial application that created it. At some level, IMS Common Cartridge will move us along this path somewhat. A cartridge format could be usable as an export format as well as an import format.
Remember that this is only how Sakai will interoperate with the repository. Other applications will use the repo as well. TwinPeaks is a start in the direction of reuse of content from a repository.
At some level RDF is a low-level enabler which is a pre-requisite. In a way RDF is like “Ethernet” is for the Internet. Additional aspects beyond RDF include OWL (Web Ontology Language) defines the models of data represented in RDF. And beyond RDF and OWL, there is the development of software that understands different onltolgies at increasing levels of semantic understanding.
Middleware is a great thing…
I call this model: “middleware in the middle”. Effectively nothing happens unless the middleware is involved. The middleware’s performance, complexity, reliability, and ease of use become the rate-limiting factor… To meet this need, middleware must be very fast, very simple, very easy to understand, very stable (I.e. not change so much) work with many different languages. Middleware must stand the test of time in production environments…
If we want cross-language, cross environment, there are actually very few technologies which reach the “universal connector” status.
There is no middleware that is sufficient to be a part of every single application ever written across all languages. So where is it?
Middleware is no better, worse, or even that much different than any of the other components. It is used by the other components *if and only if* those components find the services provided by the middleware useful. Middleware gets no special “must be used” status - if it adds value to an eScience application it will be used - if not - developers/integrators will find another way.
Middleware is no better, worse, or even that much different than any of the other components. It is used by the other components *if and only if* those components find the services provided by the middleware useful. Middleware gets no special “must be used” status - if it adds value to an eScience application it will be used - if not - developers/integrators will find another way.
What to do and who will fund it? For portals this took 5 years - once we set out to do it.
The key of this sequence is how little each of the science activities has been affected so far. I understand this is a fantasy. But the underlying principle is the Hippocratic oath - “first, do no harm”.
Infrastructure is what is in the middle. There is a long way between middleware and infrastructure.