A keynote given on experiences in curating workflows and web services.
3rd International Digital Curation Conference: "Curating our Digital Scientific Heritage: a Global Collaborative Challenge"
11-13 December 2007
Renaissance Hotel
Washington DC, USA
1. Curating Services and Workflows The Good, the Bad and the Ugly A Personal Story in the Small Professor Carole Goble The University of Manchester, UK [email_address] Keynote: 3 rd International Digital Curation Conference, Washington DC, 11-13 December 2007
2.
3. ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY). FT CONFLICT 374 374 S -> A (IN REF. 3). SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI [GSK]
4. Programmatic Interfaces to Services (Web Services not Web Sites) Your Script Service Registry Web Service SeqFetch Service BLAT Service BLAST Service SeqFetch Service GO Service Adapted from Lincoln Stein Your Workflow Your Application Interface Description Document WSDL WADL European Bioinformatics Institute API submissions has risen to 3,166,901 for 2007 (Sarah Hunter)
9. Because software needs curating too. http://www.omii.ac.uk Manchester Southampton Edinburgh European Bioinformatics Institute
10.
11. Workflows are reading publications. Workflows are processing the data. Workflows are part of curation pipelines Workflows are another form of outcome to publish and curate alongside data and publications
18. Aerospace Engine Design 90% of design is variant design 70% of information is taken from previous designs Source: Silvia Wong, University of Southampton, UK
19. Digital Library Graduate Students Undergraduate Students e-Experimentation e-Scientists Certified Experimental Results & Analyses Data, Metadata & Ontologies Workflows Adapted from the eBank project Institutional Archive Local Web Publisher Holdings Virtual Learning Environment Technical Reports Reprints Peer-Reviewed Journal & Conference Papers Preprints & Metadata
29. Applications and Scientists need a Curated Registry of Services Note: Registry, not repository Services are hosted elsewhere (Just having a workflow system isn’t enough)
30.
31. Building Annotation Commodities Object Service Endpoint Workflow file etc Annotation Model Functional Operational Provenance Reputation Descriptions Ontologies Controlled vocabulary Tags Folksonomy Free text Layered, Enrichment, Augmentation Annotation model Uses Semantic Web technologies - OWL and RDFS The perspective of the scientist Managed, centralised curation process 700+ class domain ontology Service Ontology 3500+ Services
32.
33.
34. Increasing value Increased automation Better understanding Investment (cost, effort) Folksonomy Tagging Ontology Curation output{score} is_distance_between pair {input{sequence a}, input{sequence b}} ‘ myalignscript.pl ’ ‘ A tool to compare multiple protein structures ’ performs_task : alignment input_type{seq_a} : sequence… output_type{score} : d_value Scripted tool invocation Guided workflow construction Basic ‘discovery’ style service annotations Knowledge driven visualization Workflow validation Semantically enriched data Automated Workflow Construction Guided workflow reuse Dynamic Service Substitution Manual use of tools, web pages Naïve workflow systems Service Configuration
35. Progressive Curation Just enough, Just in time Jam today and Jam tomorrow Gain Pain Very BAD Good, but Unlikely Just right
36. Applications and Scientists needed a Curated Repository of Workflows Find a workflow like this one that I can edit to do something else. That’s really hard.
37.
38.
39.
40. Local Libraries and Warehouses of Workflows trapped in their enterprises or platforms
56. Curation by the Monks Curation by the Masses Automated Curation refine validate refine validate Curation by Developers seed seed refine validate seed A Change in the World The WS4LS BioCatalogue Project Manchester & EBI
57. Challenges - where to start? If we thought about them hard we wouldn’t have done it. So we didn’t. Its, er, my experiment. National Centre for e-Social Science
58.
59.
60.
61.
62.
63. Do we still need curators? “ Hell is other people’s metadata”
64.
65. Pay as you Go, Emergent Curation Gain Pain Very BAD Good, but Unlikely Just right Folksonomy Tagging Hard Core Ontology Curation
66. Must be careful to avoid technology seduction Computer people want to do interesting stuff; curators want stability and reliability; users want simplicity. Smart tools and good interfaces often outwit clever techniques. Bummer. However….
67.
68.
69.
70.
71.
72.
Hinweis der Redaktion
3rd International Digital Curation Conference "Curating our Digital Scientific Heritage: a Global Collaborative Challenge" 11-13 December 2007 Renaissance Washington DC Hotel Washington DC, USA http://www.dcc.ac.uk/events/dcc-2007/