Automating Google Workspace (GWS) & more with Apps Script
BioIT Europe 2010 - BioCatalogue
1. The Reality of Web Services in the Life Sciences Professor Carole Goble [email_address] University of Manchester, UK myGrid Project BioIT World Europe 2010, Hannover http:// www.biocatalogue.org
2.
3.
4.
5.
6.
7.
8. Reusing Public and Third Party Web Services Where… can I find them? advertise mine? What… do they do? can I use them? How… do they work? up to date? reliable? Who… provides them? recommends them? knows about them?
9.
10.
11.
12.
13. An Open, Public, Curated, Boutique Catalogue for Web Services serving the Life Sciences for the Bioinformatics Community http://www.biocatalogue.org Launched June 2009 Nucl Acids Res, June 2010, Web Servers issue doi : 10.1093/nar/gkq394
29. Content Capture & Curation People Powered Content Reward and Attribution Sensitivities Tools Bringing a Community together Automation Core Contribution & Curation Coordination Governance
30.
31. Curating third party services is HARD The Reality of Web Services in the Life Sciences The Reality of (Expert) Crowd Sourcing Contributions for a Web Service Catalogue
32. Eight years ago Lincoln Stein said… “ An interface is a contract between data provider and data consumer” Stein L Creating a bioinformatics nation. Nature 2002;417:119-120.
44. The SOAP/REST technical view over services is not enough Need a functional / task-oriented view
45.
46. Complexity because it’s a database really SABIO–RK Service only Taverna workflow find chemical reactions that are associated with a given metabolite, and the kinetics associated with those reactions.
52. Credits Thomas Laurent Hamish McWilliams Franck Tanoh Jiten Bhagat Carole Goble Steve Pettifer Katy Wolstencroft Robert Stevens David De Roure Mannie Tagarira Jerzy Orlowski Sergejs Aleksejevs Rodrigo Lopez Eric Nzuobontane
53.
54. Thank You http:// www.biocatalogue.org About Us - http:// wiki.biocatalogue.org API Docs - http:// apidocs.biocatalogue.org 11th July 2010 ISMB 10 Bhagat, J., Tanoh, F., Nzuobontane, E., Laurent, T., Orlowski, J., Roos, M., Wolstencroft, K., Aleksejevs, S., Stevens, R., Pettifer, S., Lopez, R., Goble, C.A.: BioCatalogue: a universal catalogue of web services for the life sciences , Nucl. Acids Res., 2010. doi:10.1093/nar/gkq394
Hinweis der Redaktion
(Yes its Boring. But GOOD.) Silver bullets
Long tail middle two…. Long tail of consumers
In his visionary comment, Lincoln Stein called for standardization in bioinformatics, suggesting web services (http://www.w3.org/standards/webofservices) as the unifying platform for programmatic interfaces to tools and data sources (Stein, 2002). Nowadays, the ELIXIR project chooses SOAP web services for programmatic access to all considered bioinformatics databases and tools (http://www.elixir-europe.org/page.php?page=wp7). The Web Service Interoperability Organisation (WS-I, http://ws-i.org), supported by the main IT companies, constrains even more strictly the W3C's SOAP-service standards in order to maximize interoperability among the web services and the web-service programmatic libraries.
Same service can have two different implementations – one in each Reconciling Web Services and REST Services http://www.w3.org/2005/Talks/1115-hh-k-ecows/#(1) WADL: The REST answer to WSDL http://searchsoa.techtarget.com/tip/0,289483,sid26_gci1265367,00.html Short for W eb S ervices D escription L anguage , an XML -formatted language used to describe a Web service's capabilities as collections of communication endpoints capable of exchanging messages. WSDL is an integral part of UDDI , an XML-based worldwide business registry. WSDL is the language that UDDI uses. WSDL was developed jointly by Microsoft and IBM. Short for Re presentational S tate T ransfer is an architectural style for large-scale software design. REST was first articulated by Roy Fielding in his dissertation as: "REST emphasizes scalability of component interactions, generality of interfaces, independent deployment of components, and intermediary components to reduce interaction latency, enforce security, and encapsulate legacy systems. I describe the software engineering principles guiding REST and the interaction constraints chosen to retain those principles, contrasting them to the constraints of other architectural styles. Finally, I describe the lessons learned from applying REST to the design of the Hypertext Transfer Protocol and Uniform Resource Identifier standards, and from their subsequent deployment in Web client and server software." [ Read the Dissertation ] Before we start, Let’s do a basic terminology headsup - SOAP refers to Simple Object Access Protocol HTTP based APIs refer to APIs that are exposed as one or more HTTP URIs and typical responses are in XML / JSON. Response schemas are custom per object REST on the other hand adds an element of using standrdized URIs, and also giving importance to the HTTP verb used (ie GET / POST / PUT etc) Although, in the last few years we saw growth of large no. of Web Services, despite that the hype surrounding the SOAP has barely reduced. Internet architects have come up with a surprisingly good argument for pushing SOAP aside: there’s a better method for building Web services in the form of Representational State Transfer ( REST ). REST is more of an old philosophy than a new technology. But a realization that came later in technology. Whereas SOAP looks to jump-start the next phase of Internet development with a host of new specifications, the REST philosophy espouses that the existing principles and protocols of the Web are enough to create robust Web services. This means that developers who understand HTTP and XML can start building Web services right away, without needing any toolkits beyond what they normally use for Internet application development.
boutiques
Guessimate 3000+ Web Services in Life Science publicly available
Scientists are naughty Reuse is Hard … I used it last time so it will work again the same way…damn! change location, capabilities and signatures (BioMART changed its interface three times in 2006). new ones appear and existing ones disappear (SeqHound) they decay and become outdated or unreliable.
Sustainability Rich enough annotation Customisation Curation Community engagement Accessibility and availability Find Publish Understand Use Monitor Curate Archive Investigator and project specific registries EMBRACE, BioSapien, Stargate Portal Community lists Bioinformatics Links Directory, BioLinks, BioPlanet, Project specialist registries BioMOBY Central, DAS Registry, myGrid Registry, Sswap General catalogues and search engines SeekDa! , Web Services List, XMethods
Provide a single registration point for Web Service providers and a single search site for scientists and developers. Provide a curated catalogue of life science web services Providers, Expert curators and Users will provide oversight, monitor the catalogue and provide high quality annotations for services. BioCatalogue as a place where the community can find, contact and meet the experts and maintainers of these services. A means to pool metadata about services A means to discover and reuse services A means to curate services A platform for service monitoring and analytics A generic service annotation model for community annotation
Jazz Up? Provide a single registration point for Web Service providers and a single search site for scientists and developers. Provide a curated catalogue of life science web services Providers, Expert curators and Users will provide oversight, monitor the catalogue and provide high quality annotations for services. BioCatalogue as a place where the community can find, contact and meet the experts and maintainers of these services. A means to pool metadata about services A means to discover and reuse services A means to curate services A platform for service monitoring and analytics A generic service annotation model for community annotation
A means to pool metadata about services in the wild A means to discover and reuse those services A means to curate services A platform for service monitoring and analytics A supermarket
TO do the following….. Find Publish Understand Use Monitor Curate Retire
11 chemistry web services (EBI, ChemSpider, PubChem) > 60 operations on chemistry and cheminformatics data.
Small but beautiful
Annotate the annotations
Search, Browse, Filter, Follow
Annotation platform Would like it to be an execution platform.
We need to provide comprehensive APIs to the registry Export & import standards WSDL, SAWSDL, SA-REST, WSMO …. RDF and SPARQL Web 2.0 Open REST interface Plugin & Mash up Open to Google URLs for Bookmarking Development model Perpetual beta User driven Biocatalogue Friends Open platform with open REST interfaces Web 2.0 site and development. Open source code base.
Ownership and responsibility
Sensitivities
Provide plenty of advance warning “ An interface is a contract between data provider and data consumer” Document interface; warn if it is unstable Do not make changes lightly - even little fiddly changes break things (like changing internal ids) When possible, maintain legacy interfaces until clients can port their scripts Support as many interfaces as you can HTML, Text only (better), HTTP, REST, SOAP Easy Interfaces + Power User Interfaces
Web Service Life Cycle. Our current versioning capabilities consist of monitoring changes to the WSDL, updating the existing entry in place and then adding entries to the change log. we asked users what would be useful and they mentioned that something like a change log / revision history would be very useful. S. J. Schultheiss et al. (2010) PLoS Comp Biol (in review) ‣ 64% of services used by researchers without computational background ‣ 58% of services developed by students only, difficult to maintain after graduation ‣ 24% of services will not be maintained External install-ability, they only care that it works on their own machines Services are in constant and often silent change. Dynamic and Unstable. Metadata decay (esp. on services instances). Workflow Decay. Monitoring and Repair. BioNanny. Implications for preservation not fossilisation. Implications for sustainability. The ENSEMBL database The Ensembl API is updated at every Ensembl release and needs to be used with only the same database version as the API - e.g. you can't use the 59 API on a 55 database or vice versa. It definitely is kept up-to-date and the latest version is 59. >>> >>> First of all, check that you are using the API code checked out from the version 59 branch of the Ensembl CVS. >>> >>> Then, if you are, it sounds like the Registry call you are making might simply be returning the wrong result. It might be worth you checking this with the Ensembl helpdesk (helpdesk@ensembl.org <mailto:helpdesk@ensembl.org>). >>> >>> You could try loading each species explicitly to see if that fixes it. ( http://www.ensembl.org/info/docs/api/registry.html ) >>> >>> cheers, >>> Richard >>>
So this is an ongoing and dynamic live system Know your audience and environment Provide documentation and assistance Assist users and involve the community Use an existing framework In a standard best practice way Make it portable Be explicit about changes Leave a forwarding address Find someone else to do it Plan the end of the service life cycle S. J. Schultheiss et al. (2010) PLoS Comp Biol (in review) ‣ 64% of services used by researchers without computational background ‣ 58% of services developed by students only, difficult to maintain after graduation ‣ 24% of services will not be maintained External install-ability, they only care that it works on their own machines Services are in constant and often silent change. Dynamic and Unstable. Metadata decay (esp. on services instances). Workflow Decay. Monitoring and Repair. BioNanny. Implications for preservation not fossilisation. Implications for sustainability. The ENSEMBL database The Ensembl API is updated at every Ensembl release and needs to be used with only the same database version as the API - e.g. you can't use the 59 API on a 55 database or vice versa. It definitely is kept up-to-date and the latest version is 59. >>> >>> First of all, check that you are using the API code checked out from the version 59 branch of the Ensembl CVS. >>> >>> Then, if you are, it sounds like the Registry call you are making might simply be returning the wrong result. It might be worth you checking this with the Ensembl helpdesk (helpdesk@ensembl.org <mailto:helpdesk@ensembl.org>). >>> >>> You could try loading each species explicitly to see if that fixes it. ( http://www.ensembl.org/info/docs/api/registry.html ) >>> >>> cheers, >>> Richard >>>
http://xml.nig.ac.jp/rest/Invoke?service={x}&method={y}&... http://xml.nig.ac.jp/{service}/{method}?... http://www.ebi.ac.uk/cgi-bin/dbfetch?db={db}&id={id}&format={f} http://www.ebi.ac.uk/dbfetch/{db_name}/{id}?format={f} http://www.myexperiment.org/workflow.xml?id={id} http://www.myexperiment.org/workflows/{id} Consistent implementation Structure of the URIs should be intuitively understandable and predictable; URLs and parameters should be self-descriptive Unambiguous use of URL parameter names If XML is used as data exchange standard, there must be an XSD schema Need for standards – for example, conformance to WADL Proper use of HTTP status codes and HTTP verbs Avoiding polymorphic services, where values of certain parameters determine the combination of other parameters that the service expects to find in URL Rest in Practice Savas Parastatidis http://restinpractice.com/default.aspx Reconciling Web Services and REST Services http://www.w3.org/2005/Talks/1115-hh-k-ecows/#(1) WADL: The REST answer to WSDL http://searchsoa.techtarget.com/tip/0,289483,sid26_gci1265367,00.html Short for W eb S ervices D escription L anguage , an XML-formatted language used to describe a Web service's capabilities as collections of communication endpoints capable of exchanging messages. WSDL is an integral part of UDDI, an XML-based worldwide business registry. WSDL is the language that UDDI uses. WSDL was developed jointly by Microsoft and IBM. Short for Re presentational S tate T ransfer is an architectural style for large-scale software design. REST was first articulated by Roy Fielding in his dissertation as: &quot;REST emphasizes scalability of component interactions, generality of interfaces, independent deployment of components, and intermediary components to reduce interaction latency, enforce security, and encapsulate legacy systems. I describe the software engineering principles guiding REST and the interaction constraints chosen to retain those principles, contrasting them to the constraints of other architectural styles. Finally, I describe the lessons learned from applying REST to the design of the Hypertext Transfer Protocol and Uniform Resource Identifier standards, and from their subsequent deployment in Web client and server software.&quot; [Read the Dissertation] Before we start, Let’s do a basic terminology headsup - SOAP refers to Simple Object Access Protocol HTTP based APIs refer to APIs that are exposed as one or more HTTP URIs and typical responses are in XML / JSON. Response schemas are custom per object REST on the other hand adds an element of using standrdized URIs, and also giving importance to the HTTP verb used (ie GET / POST / PUT etc) Although, in the last few years we saw growth of large no. of Web Services, despite that the hype surrounding the SOAP has barely reduced. Internet architects have come up with a surprisingly good argument for pushing SOAP aside: there’s a better method for building Web services in the form of Representational State Transfer (REST). REST is more of an old philosophy than a new technology. But a realization that came later in technology. Whereas SOAP looks to jump-start the next phase of Internet development with a host of new specifications, the REST philosophy espouses that the existing principles and protocols of the Web are enough to create robust Web services. This means that developers who understand HTTP and XML can start building Web services right away, without needing any toolkits beyond what they normally use for Internet application development.
http://xml.nig.ac.jp/rest/Invoke?service={x}&method={y}&... http://xml.nig.ac.jp/{service}/{method}?... http://www.ebi.ac.uk/cgi-bin/dbfetch?db={db}&id={id}&format={f} http://www.ebi.ac.uk/dbfetch/{db_name}/{id}?format={f} http://www.myexperiment.org/workflow.xml?id={id} http://www.myexperiment.org/workflows/{id} Consistent implementation Structure of the URIs should be intuitively understandable and predictable; URLs and parameters should be self-descriptive Unambiguous use of URL parameter names If XML is used as data exchange standard, there must be an XSD schema Need for standards – for example, conformance to WADL Proper use of HTTP status codes and HTTP verbs Avoiding polymorphic services, where values of certain parameters determine the combination of other parameters that the service expects to find in URL Rest in Practice Savas Parastatidis http://restinpractice.com/default.aspx Reconciling Web Services and REST Services http://www.w3.org/2005/Talks/1115-hh-k-ecows/#(1) WADL: The REST answer to WSDL http://searchsoa.techtarget.com/tip/0,289483,sid26_gci1265367,00.html Short for W eb S ervices D escription L anguage , an XML-formatted language used to describe a Web service's capabilities as collections of communication endpoints capable of exchanging messages. WSDL is an integral part of UDDI, an XML-based worldwide business registry. WSDL is the language that UDDI uses. WSDL was developed jointly by Microsoft and IBM. Short for Re presentational S tate T ransfer is an architectural style for large-scale software design. REST was first articulated by Roy Fielding in his dissertation as: &quot;REST emphasizes scalability of component interactions, generality of interfaces, independent deployment of components, and intermediary components to reduce interaction latency, enforce security, and encapsulate legacy systems. I describe the software engineering principles guiding REST and the interaction constraints chosen to retain those principles, contrasting them to the constraints of other architectural styles. Finally, I describe the lessons learned from applying REST to the design of the Hypertext Transfer Protocol and Uniform Resource Identifier standards, and from their subsequent deployment in Web client and server software.&quot; [Read the Dissertation] Before we start, Let’s do a basic terminology headsup - SOAP refers to Simple Object Access Protocol HTTP based APIs refer to APIs that are exposed as one or more HTTP URIs and typical responses are in XML / JSON. Response schemas are custom per object REST on the other hand adds an element of using standrdized URIs, and also giving importance to the HTTP verb used (ie GET / POST / PUT etc) Although, in the last few years we saw growth of large no. of Web Services, despite that the hype surrounding the SOAP has barely reduced. Internet architects have come up with a surprisingly good argument for pushing SOAP aside: there’s a better method for building Web services in the form of Representational State Transfer (REST). REST is more of an old philosophy than a new technology. But a realization that came later in technology. Whereas SOAP looks to jump-start the next phase of Internet development with a host of new specifications, the REST philosophy espouses that the existing principles and protocols of the Web are enough to create robust Web services. This means that developers who understand HTTP and XML can start building Web services right away, without needing any toolkits beyond what they normally use for Internet application development.
If you type 'togows' you'll find a number of services with unreasonable number of operations (form 72 to 369). They make annotation and usage very difficult. Each soap operation has a URL which is an endpoint Soap uses web as a transport protocol Each service ha a base url/endpoint, this is monitored along with the WSDL URL Services vs Operations As mentioned, one of the biggest issues is: the people who build the services have a lot of implicit and assumed knowledge that they then don't share with the people consuming the service (or don't share it in an accessible way). The big providers tend to have very long documentation pages / user guides that are sometimes hard to use as a &quot;quick start&quot;.
So we need to describe not just the interface but the behaviour for user function abstractions – goals. Operations orchestration or pattern based Asynchronous service Server like services (e.g soaplab) Service in the wild worse than we think…we’ve come across these different type of service. Multiple operation->1 task: by annotating these services on individual operation, a gap remains between the users perspective of service operations as tasks with a well-defined function and service providers’ technological view. We argue that this gap can be filled by choosing to annotate at a higher level of abstraction => that’s what we name the FU KEGG: Kyoto Encyclopedia of Genes and Genomes
Asynchronous pattern
Because the services have been ripped from their Orchestration Fabric
User perspective vs execution/developers perspective To clearly annotate web service we need another layer of abstraction independent to the technology used. In this presentation a number of example to define the FU The work presented here stems from the observation that current annotation models force users to think in term of service interface rather than high level functionality FU: the elementary units of information used to describe a service. Using widely used web service in Life Science we define the FU as configurations and compositions of underlying service operations. FU is limited to the set of operations that are part of the same service.
indicative of a poor service interface design or to perform a complex query using data from a single database. While this would not be surprising when trying to connect operations from heterogeneous services, single-service workflows that require adapters seem indicative of a poor service interface design or to perform a complex query using data from a single database. The following example illustrates one of these complex composite FUs involving SABIO-RK. Fig. 4(a) shows a composite FU as an ideal sequence of processors. The purpose of this biochemical FU is to find chemical reactions that are associated with a given metabolite, and the kinetics associated with those reactions. This is an ideal workflow in the sense that it “skips over” the adapters that are required to make the data pipeline work in practice for identifying chemical reactions for a given set of metabolites using data within SABIO-RK. The relevant fragment of the actual workflow is shown in Fig. 4(b). The additional processors are scripts that perform local data manipulation (in this case, set intersection, parsing of lines in a text file). When these composite functional units are properly annotated, the significant effort required for their design translates into high added value for third party users who discover them through BioCatalogue.
Just sticking the Java API out there is harsh
http://en.wikipedia.org/wiki/WS-Security
Team curation – make it fun! environment in which to bring providers, consumers and experts together, maybe through the use of *discussions* . I'm not sure this is happening. I.e: the social element. Are providers too afraid to throw themselves out there a bit more? Are consumers quickly getting turned off using certain services because they are hard to use and there is no one to contact about this? Things like BioStar (http://biostar.stackexchange.com/) seem to be very active with questions/answers. But this is disjoint from the actual information on the web services (maybe this is just how the web tends to organically work anyways?) More to come as I think about it. Jits If you build it they will not come. we have done little on getting people to come to the site to make comments - the workflow is find it, download it, bye The front page - I don't know where to do to discuss anything. There isn't a discuss button. where would you start? if biocatalogue was integrated into bioeclipse and you could comment from there that might help. Or from eclipse in general. why would the providers care? BioCatalogue is not a social web site
Figuring out a stranger’s web service is very hard Attribution Curating is hard Third party is hard Responsibility and ownership Reward and credit and downloads/access for providers