The document discusses the motivation for developing Semantic Automated Discovery and Integration (SADI) services as a way to represent important information that cannot be represented directly on the Semantic Web, such as data from analytical algorithms and statistical analyses, and presents SADI as a design pattern for making web services interoperable with the Semantic Web by explicitly labeling the relationships between entities.
The document discusses linked data and how it can be used to share information on the web in a structured format. It provides an overview of linked data and the Resource Description Framework (RDF), describes how URIs can be used to name things and link data on the web, and gives examples of publishing and querying linked data using RDF and SPARQL. Recent developments in using linked data by Facebook, Google, and other companies are also mentioned.
Bio2RDF Release 2 is an updated version that provides improved coverage of life science linked data through additional datasets and properties. It features open source conversion scripts, a common URI pattern for integration, and dataset provenance information. The release includes 19 datasets with over 1 billion RDF triples that can be queried using SPARQL endpoints or through federated queries across datasets using the Semanticscience Integrated Ontology. Future work aims to incorporate additional large datasets and consolidate provenance.
The Role of Metadata in Reproducible Computational ResearchJeremy Leipzig
Reproducible computational research (RCR) provides the keystone to the scientific method, packaging the transformation of raw data to published results in a manner than can be communicated to others. Developing RCR standards has been a growing concern of statisticians, data scientists, and informatics professionals. Metadata provides context and provenance to raw data, and is essential to both discovery and validation RCR. This presentation will give an overview for emerging metadata standards in data, analysis, pipelines tools, and publications.
This document provides an overview of querying Bio2RDF data using SPARQL. It begins with an introduction to SPARQL and the basic anatomy of a SPARQL query. It then provides examples of different types of SPARQL queries (SELECT, CONSTRUCT, ASK, DESCRIBE) with sample data and queries from Bio2RDF. The document concludes with information on using Bio2RDF summary metrics to develop queries and examples of SPARQL queries over Bio2RDF data.
The document discusses linked data and how it can be used to share information on the web in a structured format. It provides an overview of linked data and the Resource Description Framework (RDF), describes how URIs can be used to name things and link data on the web, and gives examples of publishing and querying linked data using RDF and SPARQL. Recent developments in using linked data by Facebook, Google, and other companies are also mentioned.
Bio2RDF Release 2 is an updated version that provides improved coverage of life science linked data through additional datasets and properties. It features open source conversion scripts, a common URI pattern for integration, and dataset provenance information. The release includes 19 datasets with over 1 billion RDF triples that can be queried using SPARQL endpoints or through federated queries across datasets using the Semanticscience Integrated Ontology. Future work aims to incorporate additional large datasets and consolidate provenance.
The Role of Metadata in Reproducible Computational ResearchJeremy Leipzig
Reproducible computational research (RCR) provides the keystone to the scientific method, packaging the transformation of raw data to published results in a manner than can be communicated to others. Developing RCR standards has been a growing concern of statisticians, data scientists, and informatics professionals. Metadata provides context and provenance to raw data, and is essential to both discovery and validation RCR. This presentation will give an overview for emerging metadata standards in data, analysis, pipelines tools, and publications.
This document provides an overview of querying Bio2RDF data using SPARQL. It begins with an introduction to SPARQL and the basic anatomy of a SPARQL query. It then provides examples of different types of SPARQL queries (SELECT, CONSTRUCT, ASK, DESCRIBE) with sample data and queries from Bio2RDF. The document concludes with information on using Bio2RDF summary metrics to develop queries and examples of SPARQL queries over Bio2RDF data.
Este documento describe el caso clínico de una paciente de 72 años con carcinoma ductal in situ (CDIS) y posterior carcinoma de mama. Se sometió a quimioterapia parcial, mastectomía radical y linfadenectomía axilar izquierda que resultó en complicaciones de herida. Exámenes posteriores revelaron una lesión tumoral ulcerada en el duodeno, por lo que se sometió a una laparatomía exploratoria que encontró adenopatías peripancreáticas.
Cristóvão Buarque, ministro de educación de Brasil, respondió a la sugerencia de internacionalizar la Amazonía diciendo que, aunque los gobiernos no la cuidan debidamente, la Amazonía pertenece a Brasil y los países amazónicos. Sin embargo, desde una perspectiva humanista, otros recursos naturales y culturales deberían internacionalizarse antes, como las reservas de petróleo, el capital financiero de los países ricos, y los principales museos del mundo. Buarque argumentó que si los estadounidenses quieren internacionalizar
The document discusses SADI (Semantic Automated Discovery and Integration), which provides best practices for creating semantic web services. SADI web services explicitly create RDF triples linking input and output data to describe their semantics. This allows services to be discovered and workflows to be automatically generated. The SADI Taverna plugin and SHARE system are presented, which allow searching for desired properties to automatically add and connect SADI services into workflows. SHARE also uses SADI to automatically construct workflows to answer SPARQL queries by discovering necessary analytical services on the web.
Este documento proporciona orientaciones para que las instituciones de formación para el trabajo ajusten sus programas bajo el enfoque de competencias. Explica las características de una oferta basada en competencias y analiza los momentos clave de la implementación de programas por competencias, como el enfoque del proyecto educativo institucional, la pertinencia de la oferta, la denominación, los perfiles y mapas de competencias, el diseño curricular, los procesos de formación y la evaluación de aprendizajes. Adicionalmente, brinda recom
This document discusses the concept of Lean Clinical Workplace Design, which integrates Evidence Based Design, Design Thinking, and Lean Process Efficiency. It traces the development of this concept through the author's professional experiences at various healthcare organizations. These experiences demonstrated how observing clinical processes, workflows, and spatial layouts can improve efficiency. The document also discusses how Evidence Based Design, Lean Thinking, and human factors research have independently aimed to optimize healthcare design and delivery. It argues that combining these three approaches into Lean Clinical Workplace Design could provide more comprehensive, balanced solutions that improve both financial and clinical outcomes for healthcare professionals and patients.
1. Waves can transfer energy without transferring matter. The document discusses different types of waves including transverse, longitudinal, plane, and sound waves. It also covers key wave concepts such as amplitude, wavelength, frequency, speed, and direction of propagation.
2. The document discusses various wave phenomena including reflection, refraction, diffraction, and interference. Activities are suggested to observe and analyze these phenomena using tools like ripple tanks and computer simulations. Formulas related to speed, wavelength and frequency are also introduced.
3. The document covers additional topics related to waves and oscillations including standing waves, resonance, and applications to sound waves and the electromagnetic spectrum. Learning outcomes focus on describing, analyzing and solving problems involving different types
Este documento presenta la agenda para la Semana de la Biblioteca en una escuela. Incluye shows de títeres para diferentes grados, narraciones de cuentos populares, y presentaciones del autor José Antonio Núñez. También describe la Feria del Libro de la escuela con secciones para Early Years, Lower School, Upper School, maestros y padres de familia para promover la lectura.
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...Mark Wilkinson
Some of the recent work we've been doing with SADI and SHARE, using SHARE as a mechanism for dynamically converting OWL Classes into workflows in a data-dependent manner; OWL, in this case, is acting as an abstract workflow model. The slides in the middle are the usual SADI/SHARE explanation; the slides at the end show how we're using these dynamically generated workflows to "personalize" medical information on the Web for a particular patient's profile.
Abercrombie & Fitch has faced several class-action lawsuits for discriminatory hiring practices and offensive marketing. Some examples included a shirt depicting slanted-eyed men in conical hats with the slogan "Two Wongs Can Make it White", and a hiring ratio of 6 white employees to every 1 minority. While Abercrombie claims their intent is humor, many find shirts with phrases like "Buddha Bash" and images that stereotype Asian culture to be inappropriate and discriminatory. A review of historical discrimination cases and the evolving role of the EEOC provides context for how Abercrombie's practices relate to issues of race, gender, and equal employment opportunity.
La Unión Europea ha acordado un paquete de sanciones contra Rusia por su invasión de Ucrania. Las sanciones incluyen restricciones a las transacciones con bancos rusos clave y la prohibición de la venta de aviones y equipos a Rusia. Los líderes de la UE esperan que las sanciones aumenten la presión económica sobre Rusia y la disuadan de continuar su agresión contra Ucrania.
El documento habla sobre los beneficios de la graviola para tratar el cáncer. Estudios científicos han demostrado que la graviola contiene sustancias como la acetogenina que son efectivas para tratar tumores cancerosos, especialmente en pulmón, mama y próstata. La acetogenina actúa de forma selectiva matando células cancerosas sin dañar las sanas y es 10,000 veces más efectiva que la adriamicina usada en quimioterapia. La graviola se presenta como una buena alternativa complementaria
Science in the Web, from hypothesis to result. Publishing in silico experiments IN the Web allows us to immediately and precisely disseminate new knowledge that can affect other Web Science experiments. This is the "singularity" where a new discovery is immediately put into practice
This is a brief version of earlier talks, but I think it might explain more emphatically what I think Web Science is, and why I believe it is realistic, and how SADI/SHARE technologies (or technologies like them) are important to achieve the vision
This document discusses reducing heap memory stress in Java applications by using "heap-off memory" techniques. It provides an overview of Java memory fundamentals and the limitations of on-heap caching. It then introduces Apache DirectMemory as an open source project that implements an off-heap caching solution using ByteBuffers to improve performance by reducing garbage collection overhead. Examples of using DirectMemory for multi-layer caching and as a cache server are also presented.
This document provides a tutorial for searching the IntOGen database to find information on altered genes and biological processes in different cancer types. It demonstrates how to search for breast cancer, sort the results to find the most significantly downregulated or mutated genes, and view module and experiment details for specific genes like CDKN2A. The tutorial explains how to navigate between genes, tumor types, modules, and experiments to discover knowledge on cancer alterations.
Exploring the power and benefits of using WordPress plugins, how to build a WordPress plugin in a few simple steps, plus a good solid list of plugin resources.
Setting up the Red5 environment, building sample applications and integrating with flash. We will look at how Red5 works within the flash IDE and build a sample chat application, video streaming, and multi-user environment.
This document discusses Bio2RDF, a project that converts life science databases into RDF and makes them accessible via SPARQL endpoints. It provides background on the need for data integration, describes how Bio2RDF was implemented including the conversion process and architecture, and outlines future goals like adding more datasets and developing new services.
Este documento describe el caso clínico de una paciente de 72 años con carcinoma ductal in situ (CDIS) y posterior carcinoma de mama. Se sometió a quimioterapia parcial, mastectomía radical y linfadenectomía axilar izquierda que resultó en complicaciones de herida. Exámenes posteriores revelaron una lesión tumoral ulcerada en el duodeno, por lo que se sometió a una laparatomía exploratoria que encontró adenopatías peripancreáticas.
Cristóvão Buarque, ministro de educación de Brasil, respondió a la sugerencia de internacionalizar la Amazonía diciendo que, aunque los gobiernos no la cuidan debidamente, la Amazonía pertenece a Brasil y los países amazónicos. Sin embargo, desde una perspectiva humanista, otros recursos naturales y culturales deberían internacionalizarse antes, como las reservas de petróleo, el capital financiero de los países ricos, y los principales museos del mundo. Buarque argumentó que si los estadounidenses quieren internacionalizar
The document discusses SADI (Semantic Automated Discovery and Integration), which provides best practices for creating semantic web services. SADI web services explicitly create RDF triples linking input and output data to describe their semantics. This allows services to be discovered and workflows to be automatically generated. The SADI Taverna plugin and SHARE system are presented, which allow searching for desired properties to automatically add and connect SADI services into workflows. SHARE also uses SADI to automatically construct workflows to answer SPARQL queries by discovering necessary analytical services on the web.
Este documento proporciona orientaciones para que las instituciones de formación para el trabajo ajusten sus programas bajo el enfoque de competencias. Explica las características de una oferta basada en competencias y analiza los momentos clave de la implementación de programas por competencias, como el enfoque del proyecto educativo institucional, la pertinencia de la oferta, la denominación, los perfiles y mapas de competencias, el diseño curricular, los procesos de formación y la evaluación de aprendizajes. Adicionalmente, brinda recom
This document discusses the concept of Lean Clinical Workplace Design, which integrates Evidence Based Design, Design Thinking, and Lean Process Efficiency. It traces the development of this concept through the author's professional experiences at various healthcare organizations. These experiences demonstrated how observing clinical processes, workflows, and spatial layouts can improve efficiency. The document also discusses how Evidence Based Design, Lean Thinking, and human factors research have independently aimed to optimize healthcare design and delivery. It argues that combining these three approaches into Lean Clinical Workplace Design could provide more comprehensive, balanced solutions that improve both financial and clinical outcomes for healthcare professionals and patients.
1. Waves can transfer energy without transferring matter. The document discusses different types of waves including transverse, longitudinal, plane, and sound waves. It also covers key wave concepts such as amplitude, wavelength, frequency, speed, and direction of propagation.
2. The document discusses various wave phenomena including reflection, refraction, diffraction, and interference. Activities are suggested to observe and analyze these phenomena using tools like ripple tanks and computer simulations. Formulas related to speed, wavelength and frequency are also introduced.
3. The document covers additional topics related to waves and oscillations including standing waves, resonance, and applications to sound waves and the electromagnetic spectrum. Learning outcomes focus on describing, analyzing and solving problems involving different types
Este documento presenta la agenda para la Semana de la Biblioteca en una escuela. Incluye shows de títeres para diferentes grados, narraciones de cuentos populares, y presentaciones del autor José Antonio Núñez. También describe la Feria del Libro de la escuela con secciones para Early Years, Lower School, Upper School, maestros y padres de familia para promover la lectura.
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...Mark Wilkinson
Some of the recent work we've been doing with SADI and SHARE, using SHARE as a mechanism for dynamically converting OWL Classes into workflows in a data-dependent manner; OWL, in this case, is acting as an abstract workflow model. The slides in the middle are the usual SADI/SHARE explanation; the slides at the end show how we're using these dynamically generated workflows to "personalize" medical information on the Web for a particular patient's profile.
Abercrombie & Fitch has faced several class-action lawsuits for discriminatory hiring practices and offensive marketing. Some examples included a shirt depicting slanted-eyed men in conical hats with the slogan "Two Wongs Can Make it White", and a hiring ratio of 6 white employees to every 1 minority. While Abercrombie claims their intent is humor, many find shirts with phrases like "Buddha Bash" and images that stereotype Asian culture to be inappropriate and discriminatory. A review of historical discrimination cases and the evolving role of the EEOC provides context for how Abercrombie's practices relate to issues of race, gender, and equal employment opportunity.
La Unión Europea ha acordado un paquete de sanciones contra Rusia por su invasión de Ucrania. Las sanciones incluyen restricciones a las transacciones con bancos rusos clave y la prohibición de la venta de aviones y equipos a Rusia. Los líderes de la UE esperan que las sanciones aumenten la presión económica sobre Rusia y la disuadan de continuar su agresión contra Ucrania.
El documento habla sobre los beneficios de la graviola para tratar el cáncer. Estudios científicos han demostrado que la graviola contiene sustancias como la acetogenina que son efectivas para tratar tumores cancerosos, especialmente en pulmón, mama y próstata. La acetogenina actúa de forma selectiva matando células cancerosas sin dañar las sanas y es 10,000 veces más efectiva que la adriamicina usada en quimioterapia. La graviola se presenta como una buena alternativa complementaria
Science in the Web, from hypothesis to result. Publishing in silico experiments IN the Web allows us to immediately and precisely disseminate new knowledge that can affect other Web Science experiments. This is the "singularity" where a new discovery is immediately put into practice
This is a brief version of earlier talks, but I think it might explain more emphatically what I think Web Science is, and why I believe it is realistic, and how SADI/SHARE technologies (or technologies like them) are important to achieve the vision
This document discusses reducing heap memory stress in Java applications by using "heap-off memory" techniques. It provides an overview of Java memory fundamentals and the limitations of on-heap caching. It then introduces Apache DirectMemory as an open source project that implements an off-heap caching solution using ByteBuffers to improve performance by reducing garbage collection overhead. Examples of using DirectMemory for multi-layer caching and as a cache server are also presented.
This document provides a tutorial for searching the IntOGen database to find information on altered genes and biological processes in different cancer types. It demonstrates how to search for breast cancer, sort the results to find the most significantly downregulated or mutated genes, and view module and experiment details for specific genes like CDKN2A. The tutorial explains how to navigate between genes, tumor types, modules, and experiments to discover knowledge on cancer alterations.
Exploring the power and benefits of using WordPress plugins, how to build a WordPress plugin in a few simple steps, plus a good solid list of plugin resources.
Setting up the Red5 environment, building sample applications and integrating with flash. We will look at how Red5 works within the flash IDE and build a sample chat application, video streaming, and multi-user environment.
This document discusses Bio2RDF, a project that converts life science databases into RDF and makes them accessible via SPARQL endpoints. It provides background on the need for data integration, describes how Bio2RDF was implemented including the conversion process and architecture, and outlines future goals like adding more datasets and developing new services.
Producing, publishing and consuming linked data - CSHALS 2013François Belleau
This document discusses lessons learned from the Bio2RDF project for producing, publishing, and consuming linked data. It outlines three key lessons: 1) How to efficiently produce RDF using existing ETL tools like Talend to transform data formats into RDF triples; 2) How to publish linked data by designing URI patterns, offering SPARQL endpoints and associated tools, and registering data in public registries; 3) How to consume SPARQL endpoints by building semantic mashups using workflows to integrate data from multiple endpoints and then querying the mashup to answer questions.
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEditTerry Reese
MarcEdit is a metadata editing tool that has been in development since 1999. It started as a way for the creator to better understand MARC and circumvent other editing software. Over time it has grown to support a wide range of metadata standards and be used in over 190 countries. Recently, the developer has been focusing on integrating linked data and semantic capabilities through tools like the Link Identifiers tool, which embeds URIs from controlled vocabularies into MARC records, and the Validate Headings tool, which uses identifiers from id.loc.gov to validate and correct headings. The goal is for MarcEdit to help catalogers start experimenting with linked data approaches and integrating semantic concepts into legacy MARC data.
Open (linked) bibliographic data edmund chamberlain (university of cambridge)RDTF-Discovery
The document discusses the Cambridge University Library's decision to expose its bibliographic data as open linked data through the COMET (Cambridge Open METadata) project in order to share data with other institutions, gain insights from analyses of the data, and explore the potential of linked open data for libraries. Some challenges mentioned include choosing an open license, mapping data to RDF vocabularies, and using triplestores to publish and link the data. Future plans include encouraging other libraries to adopt similar approaches and expanding the types of library data exposed through linked open data.
The document discusses the Cambridge University Library's decision to expose its bibliographic data as open linked data through the COMET (Cambridge Open METadata) project. Some of the challenges addressed are licensing, mapping data to RDF vocabularies, and using triplestores. The benefits expected include understanding linked data capabilities, limitations of MARC, and opportunities for future development using linked open data.
This document discusses how semantic web technologies like RDF and SPARQL can help navigate complex bioinformatics databases. It describes a three step method for building a semantic mashup: 1) transform data from sources into RDF, 2) load the RDF into a triplestore, and 3) explore and query the dataset. As an example, it details how Bio2RDF transformed various database cross-reference resources into RDF and loaded them into Virtuoso to answer questions about namespace usage.
This document discusses URLs and URL design. Some key points covered include:
- URLs should be meaningful and describe the content or functionality behind them. File structure and naming conventions in URLs can help with this.
- URL rewriting techniques like Pretty URLs can make URLs cleaner and more readable for users and search engines.
- Namespaces, routing conventions, and RESTful design principles can help organize URLs and map URLs to application functionality.
- Vanity URLs, long URLs, and duplicate or dangling URLs should generally be avoided for usability and maintenance reasons.
http://lod2.eu/BlogPost/webinar-series
This webinar in the course of the LOD2 webinar series will present the release 3.0 of the LOD2 stack, which contains updates to
*) Virtuoso 7 [Openlink]: the original row store of the Virtuoso 6 universal server has now been replaced by a column store, increasing the performance of SPARQL queries significantly, the store is now up to three times as fast as the previous major version.
Linked Open Data Manager Suite [SWC]: the 'lodms' application allows the user to quickly set up pipelines for transforming linked data through the use of its many extensions. It also allows operations for extracting rdf from other types of data.
*) dbpedia-spotlight-ui [ULEI]: a graphical user interface component that allows the user to use a remote DBpedia spotlight instance to annotate a text with DBpedia concepts.
*) sparqlify [ULEI]: a scalable SPARQL-SQL rewriter, allowing you to query an SQL database as if it were a triple store.
*) SIREn [DERI]: a Lucene plugin that allows you to efficiently index and query RDF, as well as any textual document with an arbitrary amount of metadata fields.
*) CubeViz [ULEI]: CubeViz allows visualization of the Data Cube linked data representation of statistical data. It has support for the more advanced DataCube features, such as slices. It also allows the selection of a remote SPARQL endpoint and export of a modified cube.
*) R2R [UMA]: the R2R mapping API is now included directly into the lod2 demonstrator application, allowing users to experience the full effect of the R2R semantic mapping language through a graphical user interface.
*) ontowiki-csvimport [ULEI]: an OntoWiki extension that transforms CSV files to RDF. The extension can create Data Cubes that can be visualized by CubeViz.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
A practical guide on how to query and visualize Linked Open Data with eea.daviz Plone add-on.
In this presentation you will get an introduction to Linked Open Data and where it is applied. We will see how to query this large open data cloud over the web with the language SPARQL. We will then go through real examples and create interactive and live data visualizations with full data tracebility using eea.sparql and eea.daviz.
Presented at the PLOG2013 conference http://www.coactivate.org/projects/plog2013
Richard Wallis is a technology evangelist who works on semantic technologies and linked data. He gave a presentation in June 2012 about cultural linked data and how libraries, archives, and museums can help provide the backbone of information on the Web of Data, similar to how they have historically served as the backbone of information for centuries in other formats.
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...Chunlei Wu
Overview of BioThings project (https://biothings.io) with the highlight of BioThings Studio tool, a web development environment for building Biomedical APIs
GDG Meets U event - Big data & Wikidata - no lies codelabCAMELIA BOBAN
This document discusses using SPARQL to query RDF data from DBPedia. It provides an overview of key concepts like RDF triples, SPARQL, and Apache Jena framework. It also includes a sample SPARQL query to retrieve cities in Abruzzo, Italy with a population over 50,000. Resources and prefixes for working with DBPedia, Wikidata, and other linked data sets are listed.
The document discusses representing data in the Resource Description Framework (RDF). It describes how relational data can be represented as RDF triples with rows becoming subjects, columns becoming properties, and values becoming objects. It also discusses using URIs instead of internal IDs and names to allow data integration. The document then covers serializing RDF data in different formats like RDF/XML, N-Triples, N3, and Turtle and describes syntax for representing literals, language tags, and abbreviating subject and predicate pairs.
Presentation at the EMBL-EBI Industry RDF meetingJohannes Keizer
The document discusses how AGROVOC, AGRIS, and the CIARD RING leverage RDF vocabularies and technologies to improve data interoperability. It provides examples of how AGRIS retrieves information on its centers through SPARQL queries of the RING, and how data in AGRIS is associated with RING URIs for centers to allow retrieving records by center. The RING is an openly accessible RDF store of datasets described using DCAT, accessible via its SPARQL endpoint.
This document discusses how AGROVOC, AGRIS, and the CIARD RING leverage RDF vocabularies and technologies to enable data interoperability. It provides examples of how SPARQL queries can be used to retrieve and link related data across these systems, such as querying AGRIS for center descriptions using their RING URIs, or retrieving bibliographic records for a specific AGRIS center from the AGRIS endpoint. The RING is presented as a public SPARQL endpoint containing linked dataset metadata that uses standards like DCAT and SKOS to describe resources and concepts to facilitate machine-to-machine interactions between systems.
This document discusses benchmarking Virtuoso, an open source triplestore, using the Berlin SPARQL Benchmark (BSBM). It summarizes the results of loading and querying datasets of various sizes (10M, 100M, 200M triples) on different systems. Virtuoso showed short loading times and high query throughput. The document also provides information on connecting to and working with Virtuoso using RESTful services, the Jena API, and the Sesame framework.
This document discusses how following design principles can help a Ruby application be more resilient to change. It argues that an application's design based on principles like the single responsibility principle, open/closed principle and dependency inversion will make the application more adaptable when requirements change. The document provides examples of how processing requirements for a file parsing application evolve over time and how following SOLID principles helps the code handle these changes through decoupled and focused classes.
This is a short presentation about the FAIR Metrics Evaluator - software that automates the application of FAIR Metrics against a given resource, in order to determine its degree of "FAIRness"
An overview of the current functionality of the FAIR Evaluator - a framework for automating the evaluation of FAIRness of digital resources. The screenshots here are of the early strawman prototype, which is only available for use by the FAIR Metrics Authoring group at this time. Nevertheless, feedback on the functionality of the Evaluator would be welcome! We anticipate having a fully public version before August 2018.
This work is supported, in part, by the Ministerio de Economía y Competitividad grant number TIN2014-55993-RM
Quickly re-publish CSV/TSV files from existing repositories as FAIR Data with just a few mouse clicks!
You select the columns to "project" as Linked Data, and the associated ontology terms. The FAIR Projector Builder will create a FAIR Projector for you: a Triple Pattern Fragment server to provide the Linked Data; a published DCAT Distribution containing metadata about those triples and their source; and an RML model (syntactic and semantic of the triples, to aid in third-party discovery of this novel projection.
(current status - first prototype, not ready for public consumption)
-------
Thanks to the NBDC/DBCLS for sponsoring the hackathon series.
MDW also funded by Ministerio de Economía y Competitividad grant number TIN2014-55993-RM
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Mark Wilkinson
My presentation to OAI10 - CERN - UNIGE Workshop on Innovations in Scholarly Communication, 21-23 June 2017
University of Geneva.
https://indico.cern.ch/event/405949/contributions/2487823/
A description of the FAIR Accessor and FAIR Projector technologies: REST-compliant approaches to publishing FAIR Metadata and FAIR Data (respectively)
Spanish Ministerio de Economía y Competitividad TIN2014-55993-R
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th PlenaryMark Wilkinson
smartAPIs are an approach to the incremental, machine-aided, semantic annotation of Web APIs. Starting from existing, popular standards, we will provide enhanced tools for authoring ever-richer metadata, guided by global community knowledge encapsulated in ontologies, and aided by "smart suggestions" based on mining the metadata from previous API specifications.
The project is led by Michel Dumontier (Maastricht University). This presentation was given on his behalf by Mark Wilkinson (UPM, Madrid; Spanish Ministerio de Economía y Competitividad grant number TIN2014-55993-R)
IBC FAIR Data Prototype Implementation slideshowMark Wilkinson
Discussion about ways of achieving FAIRness of both metadata and data. Brute force approaches, and more elegant "projection" approaches are shown.
Relevant papers are at:
doi: 10.7717/peerj-cs.110 (https://peerj.com/articles/cs-110/)
doi: 10.3389/fpls.2016.00641 (https://doi.org/10.3389/fpls.2016.00641)
Spanish Ministerio de Economía y Competitividad grant number TIN2014-55993-R
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...Mark Wilkinson
This slide deck accompanies the manuscript "Interoperability and FAIRness through a novel combination of Web technologies", submitted to PeerJ Computer Science: https://doi.org/10.7287/peerj.preprints.2522v1
It describes the output of the "Skunkworks" FAIR implementation group, who were tasked with building a prototype infrastructure that would fulfill the FAIR Principles for scholarly data publishing. We show how a novel combination of the Linked Data Platform, RDF Mapping Language (RML) and Triple Pattern Fragments (TPF) can be combined to create a scholarly publishing infrastructure that is markedly interoperable, at both the metadata and the data level.
This slide deck (or something close) will be presented at the Dutch Techcenter for Life Sciences Partners Workshop, November 4, 2016.
Spanish Ministerio de Economía y Competitividad grant number TIN2014-55993-R
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015Mark Wilkinson
The primary slide deck for the SADI tutorial. We explain the motivation, simple SADI services, more complex SADI services, and then do a detailed walk-through of building a service, including the Perl service code and examples of service invocation at the command line, and using the SHARE client. You will want to look at the sample data/queries in this slide deck: http://www.slideshare.net/markmoby/sample-data-and-other-ur-ls-55737183 and the example service code in this slide deck: http://www.slideshare.net/markmoby/example-code-for-the-sadi-bmi-calculator-web-service?related=1
The document describes sample patient data provided in N3 and XML formats. It also describes an ontology defining classes like PatientData and properties like height and mass. Additionally, it provides a SPARQL query and discusses how a SADI service can calculate BMI from untyped input data by using description logic reasoning to discover the data types.
Example code for the SADI BMI Calculator Web ServiceMark Wilkinson
Two versions of the code for the SADI Web Service demonstrated at the Using the Semantic Web for faster (Bio-)Research workshop hosted by the Swiss Institute for Bioinformatics, Geneva, December, 2015. The first version of the code is a bare-bones service that consumes individuals with height and weight and returns individuals with a BMI. The second piece of code is functionally identical to the first, but highlights the small changes required to make the service a NanoPublisher (NanoPublishing services respond to Accept n-quads HTTP headers by returning NanoPublications, rather than just a stream of triples)
Perl code for a SADI service that calculates BMI. The first panel is the code for a traditional SADI service, the second panel highlights the minor changes required to convert the service into a service that outputs NanoPublications.
Luke McCarthy's tutorial - originally created for the CBRASS Project, funded by CANARIE.
The slideshow takes you though the design of a SADI Service, the considerations when creating service input and output classes (where DL reasoning is used for matchmaking), and how SADI fits with other initiatives such as SAWSDL
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Mark Wilkinson
A discussion and demonstration of a functional Data FAIRport, using W3C's Linked Data Platform, Ruben Verborgh's Linked Data Fragments, and Hydra's hypermedia controlled vocabularies. This is the output of the "Skunkworks" working group of the larger Data FAIRport project (http://datafairport.org).
The document discusses the objectives and outcomes of the FAIRport Skunkworks team so far. The team is exploring existing technologies to build prototype FAIRport code components using existing standards. They aim to enable findable, accessible, interoperable, and reusable data across repositories. However, repositories use different metadata schemas and standards like DCAT in incomplete ways. The team proposes "FAIR Profiles" - a generic way to describe metadata fields and constraints for any repository using a standardized vocabulary and structure. This would enable rich queries across repositories. They define a FAIR Profile Schema to serve as a lightweight meta-meta-descriptor for describing diverse repository metadata schemas in a consistent way.
Presentation to the J. Craig Venter Institute, Dec. 2014Mark Wilkinson
This is largely a compilation of various other talks that I have posted here - a summary of the past 3+ years of work on SADI/SHARE. It includes the (now well-worn!!) slides about SHARE, as well as some of the more contemporary stuff about how we extended GALEN clinical classes with richer semantic descriptions, and then used them to do automated clinical phenotype analysis. Also includes the slide-deck related to automated Measurement Unit conversion (related to our work on semantically representing Framingham clinical risk assessment rules)
So... for anyone who regularly follows my uploads, there isn't much "new" in here, but at least it's all in one place now! :-)
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Mark Wilkinson
We were interested in whether we could model well-established clinical risk guidelines in OWL, and use these to automatically classify patient data v.v. "risk" (e.g. using the Framingham risk categories). What we ended-up doing, however, was wandering down a very interesting path of attempting to model clinical intuition! This reports the first phase of the experiment. A subsequent SlideShare will give part II of this investigation.
This is the work of Soroush Samadian, Ph.D. Candidate at the University of British Columbia Bioinformatics Graduate Programme.
the same story as usual, but with a bit more context (why it is absolutely necessary to move science in this direction). Presented to University of Potsdam, Germany, and the University of New Brunswick, Canada in December, 2012.
SWAT4LS 2011: SADI Knowledge Explorer Plug-inMark Wilkinson
my presentation of the SADI plug-in to the IO Informatics' Knowledge Explorer. Presented at SWAT4LS (Semantic Web Applications and Tools for Life Sciences), London, UK, December, 2011. It describes how we resolve identifiers to semantic metadata in a variety of ways in order to boot-strap the semantics required to do service discovery and matching. It also describes how we convert OWL classes into approximately matching SPARQL queries, and store these queries in the SADI registry such that, after service discovery, it is simple to extract the data a service requires as its input.
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)Mark Wilkinson
IMPORTANT CORRECTION TO THIS SLIDESHOW WAS MADE August 24, 2011. How to use the Protege SADI plugin to generate SADI-compliant semantic web services. Created for the 2011 DBCLS BioHackathon. Credits to Mark Wilkinson, Benjamin Vandervalk, Luke McCarthy, Edward Kawas.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
GraphRAG for Life Science to increase LLM accuracy
SADI CSHALS 2013
1. Semantic Automated Discovery and Integration
SADI Services Tutorial
Mark Wilkinson
Isaac Peral Senior Researcher in Biological Informatics
Centro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain
Adjunct Professor of Medical Genetics, University of British Columbia
Vancouver, BC, Canada.
3. A lot of important information cannot be represented
on the Semantic Web
For example, all of the data that results from
analytical algorithms and statistical analyses
(I’m purposely excluding databases from the list of examples
for reasons I will discuss in a moment)
8. Traditional definitions of The Deep Web
include databases that have Web FORM interfaces.
HOWEVER
The Life Science Semantic Web community
is encouraging the establishment of SPARQL endpoints
as the way to serve that same data to the world
(i.e. NOT through Web Services)
11. “We need to commit specific hardware for
that [mySQL] service. We don’t use the
same servers for mySQL as for the
Website...”
“...we resolve the situation by asking the
user to stop hammering the server. This
might involve temporary ban on the IP...”
- ENSEMBL Helpdesk
12. So... There appears to be good reasons
why most data providers do not expose
their databases for public query!
16. Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: Mark <markw@illuminae.com>
Date: Tue, 19 Feb 2013 13:11:22 +0100
A message posted to the Bio2RDF
Subject: SPARQL or not?
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
mailing list last week from Jerven
From: "Mark Wilkinson" <markw@illuminae.com>
Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>
User-Agent: Opera Mail/12.14 (Linux)
Bolleman, one of the team-members
X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]
X-AVG-ID: ID798D8A94-2992BC71
behind UniProt’s push for RDF...
Hi Bio2RDF maintainers,
I keep on noticing this rather expensive query.
CONSTRUCT
{ <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> .
<http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s .
<http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s .
?s <http://www.w3.org/2000/01/rdf-schema#label> ?label .
?s <http://purl.org/dc/elements/1.1/title> ?title .
?s <http://purl.org/dc/terms/title> ?dctermstitle .
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .
?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel .
?s ?p ?o .}
WHERE
{ ?s ?p ?o
FILTER contains(str(?o), ""Paget"")
OPTIONAL
{ ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label }
OPTIONAL
{ ?s <http://purl.org/dc/elements/1.1/title> ?title }
OPTIONAL
{ ?s <http://purl.org/dc/terms/title> ?dctermstitle }
OPTIONAL
{ ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type }
OPTIONAL
{ ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel }
}
OFFSET 0
LIMIT 500
It comes from the example queries on the bio2rdf landing page.
Its extremely resource consuming and totally useless as it will never ever run in time.
Can you please change this query to something useful and workable. And at least cache the results if you ever get them.
Regards,
Jerven
17. Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: Mark <markw@illuminae.com>
Date: Tue, 19 Feb 2013 13:11:22 +0100
Subject: SPARQL or not?
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: "Mark Wilkinson" <markw@illuminae.com>
Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>
User-Agent: Opera Mail/12.14 (Linux)
X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]
X-AVG-ID: ID798D8A94-2992BC71
I keep noticing this
Hi Bio2RDF maintainers,
I keep on noticing this rather expensive query. rather expensive query
CONSTRUCT
{ <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> .
<http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s .
<http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s .
?s <http://www.w3.org/2000/01/rdf-schema#label> ?label .
?s <http://purl.org/dc/elements/1.1/title> ?title .
?s <http://purl.org/dc/terms/title> ?dctermstitle .
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .
?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel .
?s ?p ?o .}
WHERE
{ ?s ?p ?o
FILTER contains(str(?o), ""Paget"")
OPTIONAL
{ ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label }
OPTIONAL
{ ?s <http://purl.org/dc/elements/1.1/title> ?title }
OPTIONAL
{ ?s <http://purl.org/dc/terms/title> ?dctermstitle }
OPTIONAL
{ ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type }
OPTIONAL
{ ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel }
}
OFFSET 0
LIMIT 500
It comes from the example queries on the bio2rdf landing page.
Its extremely resource consuming and totally useless as it will never ever run in time.
Can you please change this query to something useful and workable. And at least cache the results if you ever get them.
Regards,
Jerven
18. Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: Mark <markw@illuminae.com>
Date: Tue, 19 Feb 2013 13:11:22 +0100
Subject: SPARQL or not?
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: "Mark Wilkinson" <markw@illuminae.com>
Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>
User-Agent: Opera Mail/12.14 (Linux)
X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]
X-AVG-ID: ID798D8A94-2992BC71
Hi Bio2RDF maintainers,
I keep on noticing this rather expensive query.
CONSTRUCT
{ <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> .
<http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s .
<http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s .
?s <http://www.w3.org/2000/01/rdf-schema#label> ?label .
?s <http://purl.org/dc/elements/1.1/title> ?title .
?s <http://purl.org/dc/terms/title> ?dctermstitle .
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .
?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel .
?s ?p ?o .}
WHERE
{ ?s ?p ?o
FILTER contains(str(?o), ""Paget"")
OPTIONAL
It comes from THE EXAMPLE QUERIES
{ ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label }
OPTIONAL on the Bio2RDF landing page
{ ?s <http://purl.org/dc/elements/1.1/title> ?title }
OPTIONAL
{ ?s <http://purl.org/dc/terms/title> ?dctermstitle }
OPTIONAL
{ ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type }
(my emphasis added)
OPTIONAL
{ ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel }
}
OFFSET 0
LIMIT 500
It comes from the example queries on the bio2rdf landing page.
Its extremely resource consuming and totally useless as it will never ever run in time.
Can you please change this query to something useful and workable. And at least cache the results if you ever get them.
Regards,
Jerven
19. Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: Mark <markw@illuminae.com>
Date: Tue, 19 Feb 2013 13:11:22 +0100
Subject: SPARQL or not?
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: "Mark Wilkinson" <markw@illuminae.com>
Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>
User-Agent: Opera Mail/12.14 (Linux)
X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]
X-AVG-ID: ID798D8A94-2992BC71
Hi Bio2RDF maintainers,
I keep on noticing this rather expensive query.
CONSTRUCT
{ <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> .
<http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s .
<http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s .
?s <http://www.w3.org/2000/01/rdf-schema#label> ?label .
?s <http://purl.org/dc/elements/1.1/title> ?title .
?s <http://purl.org/dc/terms/title> ?dctermstitle .
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .
?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel .
?s ?p ?o .}
WHERE
{ ?s ?p ?o
FILTER contains(str(?o), ""Paget"")
OPTIONAL
{ ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label }
OPTIONAL
{ ?s <http://purl.org/dc/elements/1.1/title> ?title }
OPTIONAL
It’s extremely resource-
{ ?s <http://purl.org/dc/terms/title> ?dctermstitle }
OPTIONAL
{ ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type }
consuming and totally useless as
}
OPTIONAL
{ ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel } it will never run in time
OFFSET 0
LIMIT 500
It comes from the example queries on the bio2rdf landing page.
Its extremely resource consuming and totally useless as it will never ever run in time.
Can you please change this query to something useful and workable. And at least cache the results if you ever get them.
Regards,
Jerven
20. So even people who are world-leaders in RDF and SPARQL
write “expensive” and “useless” queries
that (already!) are making life difficult for
SPARQL endpoint providers
I believe that situation will only get worse
as more people begin to use the Semantic Web
and as SPARQL itself becomes richer and more SQL-like
21. In My Opinion
History tells us, and this story IMO supports,
that SPARQL endpoints might not be widely adopted
by source bioinformatics data providers
Historically, the majority of bioinformatics data hosts
have opted for API/Service-based
access to their resources
22. In My Opinion
Moreover, I am still obsessed with interoperability!
Having a unified way to discover, and access,
bioinformatics resources
whether they be databases or algorithms
just seems like a Good Thing™
23. In My Opinion
So we need to find a way to make Web Services
play nicely with the Semantic Web
28. causally related with
http://semanticscience.org/resource/SIO_000243
SIO_000243:
<owl:ObjectProperty rdf:about="&resource;SIO_000243">
<rdfs:label xml: lang="en"> is causally related with</rdfs:label>
<rdf:type rdf:resource="&owl;SymmetricProperty"/>
<rdf:type rdf:resource="&owl;TransitiveProperty"/>
<dc:description xml:lang="en"> A transitive, symmetric, temporal relation
in which one entity is causally related with another non-identical entity.
</dc:description>
<rdfs:subPropertyOf rdf:resource="&resource;SIO_000322"/>
</owl:ObjectProperty>
29. causally related with
http://semanticscience.org/resource/SIO_000243
SIO_000243:
<owl:ObjectProperty rdf:about="&resource;SIO_000243">
<rdfs:label xml: lang="en"> is causally related with</rdfs:label>
<rdf:type rdf:resource="&owl;SymmetricProperty"/>
<rdf:type rdf:resource="&owl;TransitiveProperty"/>
<dc:description xml:lang="en"> A transitive, symmetric, temporal relation
in which one entity is causally related with another non-identical entity.
</dc:description>
<rdfs:subPropertyOf rdf:resource="&resource;SIO_000322"/>
</owl:ObjectProperty>
30. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
OWL-S
SAWSDL
WSDL-S
Others...
31. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data
Describe output data
Describe how the system manipulates the data
Describe how the world changes as a result
32. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data Usually through
“semantic annotation”
Describe output data of XML Schema
Describe how the system manipulates the data
Describe how the world changes as a result
33. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data In the least-semantic
case, the input and
output data is “vanilla”
Describe output data XML
Describe how the system manipulates the data
Describe how the world changes as a result
34. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data In the “most semantic”
case (WSDL) RDF is
converted into XML,
Describe output data then back to RDF again
Describe how the system manipulates the data
Describe how the world changes as a result
35. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data The rigidity of XML
Schema is the
antithesis of the
Describe output data Semantic Web!
Describe how the system manipulates the data
Describe how the world changes as a result
36. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data So... Perhaps we
shouldn’t be using XML
Describe output data Schema at all...??
Describe how the system manipulates the data
Describe how the world changes as a result
37. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data
Describe output data
Describe how the system manipulates the data
HARD!
Describe how the world changes as a result
38. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data
Describe output data
Describe how the system manipulates the data
Un-necessary?
Describe how the world changes as a result
40. Scientific Web Services
are DIFFERENT!
Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
41. “The service interfaces within bioinformatics are relatively
simple. An extensible or constrained interoperability framework
is likely to suffice for current demands: a fully generic
framework is currently not necessary.”
Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
42. Scientific Web Services are DIFFERENT
They’re simpler!
Rather than waiting for a solution to the more general problem
(which may be years away... or more!)
can we solve the Semantic Web Service problem
within the scientific domain
while still being fully standards-compliant?
44. v.v. being Semantic Webby,
what is missing from this list?
Describe input data
Describe output data
Describe how the system manipulates the data
Describe how the world changes as a result
46. causally related with
http://semanticscience.org/resource/SIO_000243
The Semantic Web works because of relationships!
47. causally related with
http://semanticscience.org/resource/SIO_000243
The Semantic Web works because of relationships!
In 2008 I proposed that, in the Semantic Web world,
algorithms should be viewed as “exposing” relationships
between the input and output data
49. SADI AACTCTTCGTAGTG...
has_seq_string
sequence
AACTCTTCGTAGTG...
has_seq_string has
homology
sequence
to
BLAST
Terminal Flower
type species
SADI requires you to explicitly declare
as part of your analytical output, gene A. thal.
the biological relationship that your
algorithm “exposed”.
50. Another “philosophical” decision was
to abandon XML Schema
In a world that is moving towards
RDF representations of all data
it makes no sense to convert semantically rich RDF
into semantic-free Schema-based XML
then back into RDF again
51. The final philosophical decision was
to abandon SOAP
The bioinformatics community seems to be
very receptive to pure-HTTP interfaces
(e.g. the popularity of REST-like APIs)
So SADI uses simple HTTP POST
of just the RDF input data
(no message scaffold whatsoever)
54. ID Name Height Weight Age
24601 Jean Valjean 1.8m 84kg 45
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
55. ID Name Height Weight Age
24601 Jean Valjean 1.8m 84kg 45
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
56. OWL-DL Classes
ID Name Height Weight Age
24601 Jean Valjean 1.8m 84kg 45
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
57. Property restrictions
in OWL Class definition
ID Name Height Weight Age
24601 Jean Valjean 1.8m 84kg 45
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
58. ID Name Height Weight Age
24601 Jean Valjean 1.8m 84kg 45
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
59. A reasoner determines that Patient #24601
is an OWL Individual of the Input service Class
ID Name Height Weight Age
24601 Jean Valjean 1.8m 84kg 45
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
60. NOTE THE URI OF THE INPUT INDIVIDUAL
Patient:24601
ID Name Height Weight Age
24601 Jean Valjean 1.8m 84kg 45
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
61. ID Name Height Weight Age BMI
24601 Jean Valjean 1.8m 84kg 45 25.9
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
62. NOTE THE URI OF THE OUTPUT INDIVIDUAL
Patient:24601
ID Name Height Weight Age BMI
24601 Jean Valjean 1.8m 84kg 45 25.9
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
63.
64. The URI of the input is linked by a
meaningful predicate to the output
(either literal output or another URI)
65. Therefore, by connecting SADI services
together in a workflow you end-up with an
unbroken chain of Linked Data
68. The SHARE registry
indexes all of the input/output/relationship
triples that can be generated by all known services
This is how SHARE discovers services
69. We wanted to duplicate
a real, peer-reviewed, bioinformatics analysis
simply by building a model in the Web
describing what the answer
(if one existed)
would look like
72. Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies
data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
73. Original Study Simplified
Using what is known about interactions in fly & yeast
predict new interactions with your
protein of interest
74. “Pseudo-code” Abstracted Workflow
Given a protein P in Species X
Find proteins similar to P in Species Y
Retrieve interactors in Species Y
Sequence-compare Y-interactors with Species X genome
(1) Keep only those with homologue in X
Find proteins similar to P in Species Z
Retrieve interactors in Species Z
Sequence-compare Z-interactors with (1)
Putative interactors in Species X
76. Modeling the science...
ProbableInteractor
is homologous to (
Potential Interactor from ModelOrganism1…)
and
Potential Interactor from ModelOrganism2…)
Probable Interactor is defined in OWL as a subClass - something that appears
as a potential interactor in both comparator model organisms.
77. Running the Web Science Experiment
In a local data-file
provide the protein we are interested in
and the two species we wish to use in our comparison
taxon:9606 a i:OrganismOfInterest . # human
uniprot:Q9UK53 a i:ProteinOfInterest . # ING1
taxon:4932 a i:ModelOrganism1 . # yeast
taxon:7227 a i:ModelOrganism2 . # fly
78. The tricky bit is...
In the abstract, the search
for homology is “generic” –
ANY Protein, ANY model
system
But when the machine does
the experiment, it will need
to use (at least) two
organism-specific resources
because the answer requires
information from two
taxon:4932 a i:ModelOrganism1 . # yeast
declared species taxon:7227 a i:ModelOrganism2 . # fly
79. This is the question we ask:
(the query language here is SPARQL)
PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#>
SELECT ?protein
FROM <file:/local/workflow.input.n3>
WHERE {
?protein a i:ProbableInteractor .
}
The URL of our OWL model (ontology) defining Probable Interactors
80. Each relationship (property-restriction)
in the OWL Class is then matched
with a SADI Service
The matched SADI Service can
generate data that fulfils that
property restriction
(i.e. produces triples with that S/P/O pattern)
81. SHARE chains these SADI services
into an analytical workflow...
...the outputs from that workflow are
Instances (OWL Individuals) of
Probable Interactors
82. SHARE derived (and executed) the following workflow automatically
These are different
SADI Web Services...
...selected at run-time
based on the same model
83.
84. Keys to Success:
1: Use standards
2: Focus on predicates, not classes
3: Use these predicates to define, rather than assert, classes
4: Make sure all URIs resolve, and resolve to something useful
5: Never leave the RDF world... (abandon vanilla XML,
even for Web Services!)
6: Use reasoners... Everywhere... Always!
90. Taverna
• Contextual service discovery
• Automatic RDF serialization and
deserialization beetween SADI and non-SADI
services
• Note that Taverna is not as rich a client as
SHARE. The reason is that SHARE will
aggregate and re-reason after every service
invocation. There is no (automatic) data
aggregation in Taverna.
91. Using SADI services – building a workflow
The next step in the workflow is to find a SADI service that takes the
genes from getKEGGGenesByPathway and returns the proteins
that those genes code for.
92. Using SADI services – building a workflow
Right-click on the service output port and click Find services that
consume KEGG_Record…
93. Using SADI services – building a workflow
Select getUniprotByKeggGene from the list of SADI services and
click Connect.
94. Using SADI services – building a workflow
The getUniprotByKeggGene service is added to the workflow and
automatically connected to the output from
getKEGGGenesByPathway.
95. Using SADI services – building a workflow
Add a new workflow output called protein and connect the output
from the getUniprotByKeggGene service to it.
96. Using SADI services – building a workflow
The next step in the workflow is to find a SADI service that takes the
proteins and returns sequences of those proteins. Right-click on the
encodes output port and click Find services that consume
UniProt_Record…
97. Using SADI services – building a workflow
The UniProt info service attaches the property hasSequence so
select this service and click Connect.
98. Using SADI services – building a workflow
The UniProt info service is added to the workflow and automatically
connected to the output from getUniprotByKeggGene .
99. Using SADI services – building a workflow
Add a new workflow output called sequence and connect the output
from the hasSequence output from the UniProt info service to it.
100. Using SADI services – building a workflow
The KEGG pathway were interested in is "hsa00232”, so we’ll add it as
a constant value. Right-click on the KEGG_PATHWAY_Record
input port and click Constant value.
101. Using SADI services – building a workflow
Enter the value hsa00232 and click OK.
102. Using SADI services – building a workflow
The workflow is now complete and ready to run.
103. IO Informatics Knowledge Explorer plug-in
• “Bootstrapping” of semantics using known
URI schema (identifiers.org, LSRN, Bio2RDF,
etc.)
• Contextual service discovery
• Automatic packaging of appropriate data
from your data-store and automated service
invocation using that data.
•This uses some not-widely-known services and
metadata that is in the SHARE registry!!
104. The SADI plug-in to the
IO Informatics’
Knowledge Explorer
...a quick explanation of how
we “boot-strap” semantics...
106. Sentient Knowledge Explorer is a retrieval, integration,
visualization, query, and exploration environment for semantically
rich data
107. Most imported data-sets will already have
properties (e.g. “encodes”)
…and the data will already be typed
(e.g. “Gene” or “Protein”)
…so finding SADI Services to consume that
data is ~trivial
112. In the case of LSRN URIs, they resolve to:
<lsrn:DragonDB_Locus_Record rdf:about="http://lsrn.org/DragonDB_Locus:CHO"
<dc:identifier>CHO</dc:identifier>
<sio:SIO_000671> <!-- has identifier -->
<lsrn:DragonDB_Locus_Identifier>
<sio:SIO_000300>CHO</sio:SIO_000300> <!-- has value -->
</lsrn:DragonDB_Locus_Identifier>
</sio:SIO_000671>
</lsrn:DragonDB_Locus_Record>
</rdf:RDF>
113. In the case of LSRN URIs, they resolve to:
<lsrn:DragonDB_Locus_Record rdf:about="http://lsrn.org/DragonDB_Locus:CHO
<dc:identifier>CHO</dc:identifier>
<sio:SIO_000671> <!-- has identifier -->
<lsrn:DragonDB_Locus_Identifier>
<sio:SIO_000300>CHO</sio:SIO_000300> <!-- has value -->
</lsrn:DragonDB_Locus_Identifier>
</sio:SIO_000671>
</lsrn:DragonDB_Locus_Record>
</rdf:RDF> The Semantic Science Integrated Ontology
(Dumontier) has a model for how to describe
database records, including explicitly making
the record identifier an attribute of that
record; in our LSRN metadata, we also
explicitly rdf:type both records and identifiers.
114. Now we have enough information to start exploring global data...
123. HTTP POST the URI to the SHARE
Resolver Service
It will (try to) return you SIO-compliant
RDF metadata about that URI
(this is a typical SADI service)
The resolver currently recognizes a few
different sharted-URI schemes
(e.g. Bio2RDF, Identifiers.org)
and can be updated with new patterns
124. Next problem:
Knowledge Explorer
and therefore the plug-in
are written in C#
All of our interfaces are
described in OWL
C# reasoners are
extremely limited at this
time
125. This problem manifests itself in two ways:
1. An individual on the KE canvas has all the
properties required by a Service in the registry, but
is not rdf:typed as that Service’s input type how
do you discover that Service so that you can add it
to the menu?
2. For a selected Service from the menu, how does the
plug-in know which data-elements it needs to
extract from KE to send to that service in order to
fulfil it’s input property-restrictions?
126. If I select a canvas node, and ask SADI to
find services, it will...
128. Nevertheless:
(a) The service can be discovered based on JUST this node selection
(b) The service can be invoked based on JUST this node selection
129. Voila!
How did the plug-in discover the service,
and determine which data was required to
access that service based on an OWL Class
definition, without a reasoner?
130. SELECT ?x, ?y
FROM knowledge_explorer_database
WHERE {
?x foaf:name ?y
}
Convert Input OWL Class def’n
into an ~equivalent SPARQL query
Service Description
INPUT OWL Class Store together
NamedIndividual: things with with index
a “name” property INDEX
from “foaf” ontology
The service Registry
provides a
OUTPUT OWL Class “greeting”
GreetedIndividual: things with property based
a “greeting” property
from “hello” ontology
on a “name”
property
131. Just to ensure that I don’t over-trivialize this point,
the REAL SPARQL query that extracts the input for this service is...
133. Summary
While the Knowledge Explorer plug-in has similar
functionality to other tools we have built for SADI, it
takes advantage of some features of the SADI Registry,
and SADI in general, that are not widely-known.
We hope that the availability of these features
encourages development of SADI tooling in other
languages that have limited access to reasoning.