Distributed Query Processing for Federated RDF Data ManagementOlafGoerlitz
PhD defense talk about SPLENDID, a state-of-the-art implementation for efficient distributed SPARQL query processing on Linked Data using SPARQL endpoints and voiD descriptions.
aRangodb, un package per l'utilizzo di ArangoDB con RGraphRM
Lingua talk: Italiano.
Descrizione:
In questo talk parleremo di come integrare e utilizzare ArangoDB, un database multi-modello con supporto nativo ai grafi, con R. Presenteremo quindi aRangodb, il package che abbiamo sviluppato per interfacciarsi in modo più semplice e intuitivo al database. Nel corso del talk mostreremo come il package possa essere utilizzato in ambito data science usando alcuni case studies concreti.
Speaker:
Gabriele Galatolo - Data Scientist - Kode srl
Il seminario presenta il tema emergente del Web of Data, nell'ambito del Semantic Web. Vengono esaminate le criticità incontrate nell'accedere all'enorme quantità di informazione presente attualmente nel Web e i vantaggi di un approccio basato sulla creazione interattiva di interrogazioni.
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Ontotext
These are slides from a live webinar taken place January 2018.
GraphDB™ Fundamentals builds the basis for working with graph databases that utilize the W3C standards, and particularly GraphDB™. In this webinar, we demonstrated how to install and set-up GraphDB™ 8.4 and how you can generate your first RDF dataset. We also showed how to quickly integrate complex and highly interconnected data using RDF and SPARQL and much more.
With the help of GraphDB™, you can start smartly managing your data assets, visually represent your data model and get insights from them.
Property graph vs. RDF Triplestore comparison in 2020Ontotext
This presentation goes all the way from intro "what graph databases are" to table comparing the RDF vs. PG plus two different diagrams presenting the market circa 2020
[Conference] Cognitive Graph Analytics on Company Data and NewsOntotext
Ontotext introduced their cognitive analytics platform that performs cognitive graph analytics on company data and news. The platform builds large knowledge graphs by integrating data from multiple sources and uses text mining to link news articles to entities in the knowledge graph. It provides functionality for node ranking, similarity analysis and data cleaning to consolidate and reconcile company records across datasets. The platform was demonstrated through a knowledge graph containing over 2 billion facts built by integrating datasets like DBpedia, Geonames, and news article metadata.
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
This presentation will provide a brief introduction to logical reasoning and overview of the most popular semantic schema and ontology languages: RDFS and the profiles of OWL 2.
While automatic reasoning has always inspired the imagination, numerous projects have failed to deliver to the promises. The typical pitfalls related to ontologies and symbolic reasoning fall into two categories:
- Over-engineered ontologies. The selected ontology language and modeling patterns can be too expressive. This can make the results of inference hard to understand and verify, which in its turn makes KG hard to evolve and maintain. It can also impose performance penalties far greater than the benefits.
- Inappropriate reasoning support. There are many inference algorithms and implementation approaches, which work well with taxonomies and conceptual models of few thousands of concepts, but cannot cope with KG of millions of entities.
- Inappropriate data layer architecture. One such example is reasoning with virtual KG, which is often infeasible.
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingOntotext
A presentation of Ontotext’s CEO Atanas Kiryakov, given during Semantics 2018 - an annual conference that brings together researchers and professionals from all over the world to share knowledge and expertise on semantic computing.
Distributed Query Processing for Federated RDF Data ManagementOlafGoerlitz
PhD defense talk about SPLENDID, a state-of-the-art implementation for efficient distributed SPARQL query processing on Linked Data using SPARQL endpoints and voiD descriptions.
aRangodb, un package per l'utilizzo di ArangoDB con RGraphRM
Lingua talk: Italiano.
Descrizione:
In questo talk parleremo di come integrare e utilizzare ArangoDB, un database multi-modello con supporto nativo ai grafi, con R. Presenteremo quindi aRangodb, il package che abbiamo sviluppato per interfacciarsi in modo più semplice e intuitivo al database. Nel corso del talk mostreremo come il package possa essere utilizzato in ambito data science usando alcuni case studies concreti.
Speaker:
Gabriele Galatolo - Data Scientist - Kode srl
Il seminario presenta il tema emergente del Web of Data, nell'ambito del Semantic Web. Vengono esaminate le criticità incontrate nell'accedere all'enorme quantità di informazione presente attualmente nel Web e i vantaggi di un approccio basato sulla creazione interattiva di interrogazioni.
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Ontotext
These are slides from a live webinar taken place January 2018.
GraphDB™ Fundamentals builds the basis for working with graph databases that utilize the W3C standards, and particularly GraphDB™. In this webinar, we demonstrated how to install and set-up GraphDB™ 8.4 and how you can generate your first RDF dataset. We also showed how to quickly integrate complex and highly interconnected data using RDF and SPARQL and much more.
With the help of GraphDB™, you can start smartly managing your data assets, visually represent your data model and get insights from them.
Property graph vs. RDF Triplestore comparison in 2020Ontotext
This presentation goes all the way from intro "what graph databases are" to table comparing the RDF vs. PG plus two different diagrams presenting the market circa 2020
[Conference] Cognitive Graph Analytics on Company Data and NewsOntotext
Ontotext introduced their cognitive analytics platform that performs cognitive graph analytics on company data and news. The platform builds large knowledge graphs by integrating data from multiple sources and uses text mining to link news articles to entities in the knowledge graph. It provides functionality for node ranking, similarity analysis and data cleaning to consolidate and reconcile company records across datasets. The platform was demonstrated through a knowledge graph containing over 2 billion facts built by integrating datasets like DBpedia, Geonames, and news article metadata.
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
This presentation will provide a brief introduction to logical reasoning and overview of the most popular semantic schema and ontology languages: RDFS and the profiles of OWL 2.
While automatic reasoning has always inspired the imagination, numerous projects have failed to deliver to the promises. The typical pitfalls related to ontologies and symbolic reasoning fall into two categories:
- Over-engineered ontologies. The selected ontology language and modeling patterns can be too expressive. This can make the results of inference hard to understand and verify, which in its turn makes KG hard to evolve and maintain. It can also impose performance penalties far greater than the benefits.
- Inappropriate reasoning support. There are many inference algorithms and implementation approaches, which work well with taxonomies and conceptual models of few thousands of concepts, but cannot cope with KG of millions of entities.
- Inappropriate data layer architecture. One such example is reasoning with virtual KG, which is often infeasible.
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingOntotext
A presentation of Ontotext’s CEO Atanas Kiryakov, given during Semantics 2018 - an annual conference that brings together researchers and professionals from all over the world to share knowledge and expertise on semantic computing.
1) The document compares different methods for representing statement-level metadata in RDF, including RDF reification, singleton properties, and RDF*.
2) It benchmarks the storage size and query execution time of representing biomedical data using each method in the Stardog triplestore.
3) The results show that RDF* requires fewer triples but the database size is larger, and it outperforms the other methods for complex queries.
This document summarizes an introductory webinar on building an enterprise knowledge graph from RDF data using TigerGraph. It introduces RDF and knowledge graphs, demonstrates loading DBpedia data into a TigerGraph graph database using a universal schema, and provides examples of queries to extract information from the graph such as related people, publishers by location, and related topics for a given predicate. The webinar encourages attendees to learn more about graph databases and TigerGraph through additional resources and future webinar episodes.
MongoDB and Spring - Two leaves of a same treeMongoDB
Enterprise systems evolve at a tremendous pace these days. All sorts of new frameworks, databases, operating systems and multiple deployment strategies and infrastructures to adjust to ever growing business demands.
The integration between Spring Framework and MongoDB tends to be somewhat unknown. This presentation shows the different projects that compose Spring ecosystem, Springdata, Springboot, SpringIO etc and how to merge between the pure JAVA projects to massive enterprise systems that require the interaction of these systems together.
This document summarizes an internship at Lama Capital Management focused on programming financial strategies. The internship involved:
1) Using JavaScript to plot return vs frequency curves from calculations in Python to link frontend and backend websites.
2) Scraping historical data from websites using Python scripts to develop trading strategies and test them on daily data over 15 days.
3) Implementing strategies like ACD and RSI in Python, including defining triggers and optimizing parameters like ATR, entry/exit points, and profit booking levels to maximize win rates.
4) Programming in Python to retrieve real-time market data, generate buy/sell signals, and place live trades through APIs.
How google is using linked data today and vision for tomorrowVasu Jain
In this presentation, I will discuss how modern search engines, such as Google, make use of Linked Data spread inWeb pages for displaying Rich Snippets. Also i will present an example of the technology and analyze its current uptake.
Then i sketched some ideas on how Rich Snippets could be extended in the future, in particular for multimedia documents.
Original Paper :
http://scholar.google.com/citations?view_op=view_citation&hl=en&user=K3TsGbgAAAAJ&authuser=1&citation_for_view=K3TsGbgAAAAJ:u-x6o8ySG0sC
Another Presentation by Author: https://docs.google.com/present/view?id=dgdcn6h3_185g8w2bdgv&pli=1
Multiplaform Solution for Graph DatasourcesStratio
One of the top banks in Europe, needed a system to provide better performance, scaling almost linearly with the increase in information to be analyzed, and allowing to move the processes that were currently being executed in the Host to a Big Data infrastructure. During a year we've worked on a system which is able to provide greater agility, flexibility and simplicity for the user to view information when profiling and is now able to analyze the structure of profile data. It's a powerful way to make online queries to a graph database, which is integrated with Apache Spark and different graph libraries. Basically, we get all the necessary information through Cypher queries which are sent to a Neo4j database.
Using the last Big Data technologies like Spark Dataframe, HDFS, Stratio Intelligence or Stratio Crossdata, we have developed a solution which is able to obtain critical information for multiple datasources like text files o graph databases.
While the adoption of machine learning and deep learning techniques continue to grow, many organizations find it difficult to actually deploy these sophisticated models into production. It is common to see data scientists build powerful models, yet these models are not deployed because of the complexity of the technology used or lack of understanding related to the process of pushing these models into production.
As part of this talk, I will review several deployment design patterns for both real-time and batch use cases. I’ll show how these models can be deployed as scalable, distributed deployments within the cloud, scaled across hadoop clusters, as APIs, and deployed within streaming analytics pipelines. I will also touch on topics related to security, end-to-end governance, pitfalls, challenges, and useful tools across a variety of platforms. This presentation will involve demos and sample code for the the deployment design patterns.
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsChristophe Debruyne
Data processing is increasingly the subject of various internal and external regulations, such as GDPR which has recently come into effect. Instead of assuming that such processes avail of data sources (such as files and relational databases), we approach the problem in a more abstract manner and view these processes as taking datasets as input. These datasets are then created by pulling data from various data sources. Taking a W3C Recommendation for prescribing the structure of and for describing datasets, we investigate an extension of that vocabulary for the generation of executable R2RML mappings. This results in a top-down approach where one prescribes the dataset to be used by a data process and where to find the data, and where that prescription is subsequently used to retrieve the data for the creation of the dataset “just in time”. We argue that this approach to the generation of an R2RML mapping from a dataset description is the first step towards policy-aware mappings, where the generation takes into account regulations to generate mappings that are compliant. In this paper, we describe how one can obtain an R2RML mapping from a data structure definition in a declarative manner using SPARQL CONSTRUCT queries, and demonstrate it using a running example. Some of the more technical aspects are also described.
Reference: Christophe Debruyne, Dave Lewis, Declan O'Sullivan: Generating Executable Mappings from RDF Data Cube Data Structure Definitions. OTM Conferences (2) 2018: 333-350
Guest lecture at the Syracuse University School of Information Studies eScience Librarianship Lecture Series (08 Dec 2011).
Description: It’s your government, is it your data? New approaches to building interlinked catalogs of government-produced data. Dr. John S. Erickson, Director of Web Science Operations for the Tetherless World Constellation at Rensselaer Polytechnic Institute will present technical methods being developed to manage the delivery of large-scale open government data projects based on semantic web and linked data best practices.
Data Day Seattle 2017: Scaling Data Science at Stitch FixStefan Krawczyk
At Stitch Fix we have a lot of Data Scientists. Around eighty at last count. One reason why I think we have so many, is that we do things differently. To get their work done, Data Scientists have access to whatever resources they need (within reason), because they’re end to end responsible for their work; they collaborate with their business partners on objectives and then prototype, iterate, productionize, monitor and debug everything and anything required to get the output desired. They’re full data-stack data scientists!
The teams in the organization do a variety of different tasks:
- Clothing recommendations for clients.
- Clothes reordering recommendations.
- Time series analysis & forecasting of inventory, client segments, etc.
- Warehouse worker path routing.
- NLP.
… and more!
They’re also quite prolific at what they do -- we are approaching 4500 job definitions at last count. So one might be wondering now, how have we enabled them to get their jobs done without getting in the way of each other?
This is where the Data Platform teams comes into play. With the goal of lowering the cognitive overhead and engineering effort required on part of the Data Scientist, the Data Platform team tries to provide abstractions and infrastructure to help the Data Scientists. The relationship is a collaborative partnership, where the Data Scientist is free to make their own decisions and thus choose they way they do their work, and the onus then falls on the Data Platform team to convince Data Scientists to use their tools; the easiest way to do that is by designing the tools well.
In regard to scaling Data Science, the Data Platform team has helped establish some patterns and infrastructure that help alleviate contention. Contention on:
Access to Data
Access to Compute Resources:
Ad-hoc compute (think prototype, iterate, workspace)
Production compute (think where things are executed once they’re needed regularly)
For the talk (and this post) I only focused on how we reduced contention on Access to Data, & Access to Ad-hoc Compute to enable Data Science to scale at Stitch Fix. With that I invite you to take a look through the slides.
Boost your data analytics with open data and public news contentOntotext
Get guidance through the gigantic sea of freely available Open Data and learn how it can empower you analysis of any kind of sources.
This webinar is a live demo of news and data analytics, based on rich links within big knowledge graphs. It will show you how to:
Build ranking reports (e.g for people and organisations)
View topics linked implicitly (e.g. daughter companies, key personnel, products …)
Draw trend lines
Extend your analytics with additional data sources
This document outlines an intro to JavaScript fundamentals course, including:
- An overview of the instructor and TAs
- A description of the agenda which includes learning key JavaScript concepts, assignments, and an answer key
- Explanations of how the web works, client/server relationships, and an example using Facebook
- The history and modern use of JavaScript
- Demonstrations of JavaScript fundamentals like variables, functions, if/else statements, comparing values, and using parameters
- Encouragement to use online resources like Google and Repl.it for hands-on practice
- Information on continuing learning opportunities from Thinkful
This is our contributions to the Data Science projects, as developed in our startup. These are part of partner trainings and in-house design and development and testing of the course material and concepts in Data Science and Engineering. It covers Data ingestion, data wrangling, feature engineering, data analysis, data storage, data extraction, querying data, formatting and visualizing data for various dashboards.Data is prepared for accurate ML model predictions and Generative AI apps
This is our project work at our startup for Data Science. This is part of our internal training and focused on data management for AI, ML and Generative AI apps
This document outlines an intro to JavaScript fundamentals course, including:
- An overview of the instructor and TAs
- Learning key JavaScript concepts like variables, functions, if/else statements
- Examples of how the web works with clients and servers
- A brief history of JavaScript and how it has evolved
- Using Repl.it to do hands-on coding challenges
Matt Archer - How To Regression Test A Billion Rows Of Financial Data Every S...TEST Huddle
EuroSTAR Software Testing Conference 2012 presentation on How To Regression Test A Billion Rows Of Financial Data Every Sprint by Matt Archer.
See more at: http://conference.eurostarsoftwaretesting.com/past-presentations/
This document discusses creating a knowledge graph for Irish history as part of the Beyond 2022 project. It will include digitized records from core partners documenting seven centuries of Irish history. Entities like people, places, and organizations will be extracted from source documents and related in a knowledge graph using semantic web technologies. An ontology was created to provide historical context and meaning to the relationships between entities in Irish history. Tools will be developed to explore and search the knowledge graph to advance historical research.
This document presents an interest-based approach for propagating RDF updates between a source dataset and local replicas. The traditional approach of fully synchronizing all changes is not scalable. The proposed approach uses SPARQL queries to define interests, and only propagates changes that match the interests to the replicas. This cuts down the size of updates significantly. Experimental results show the interesting changes were 0.38-4.38% of removed triples and 0.34-1.81% of added triples, reducing overhead of synchronization.
More Related Content
Similar to Modelling context and statement-level metadata in knowledge graphs
1) The document compares different methods for representing statement-level metadata in RDF, including RDF reification, singleton properties, and RDF*.
2) It benchmarks the storage size and query execution time of representing biomedical data using each method in the Stardog triplestore.
3) The results show that RDF* requires fewer triples but the database size is larger, and it outperforms the other methods for complex queries.
This document summarizes an introductory webinar on building an enterprise knowledge graph from RDF data using TigerGraph. It introduces RDF and knowledge graphs, demonstrates loading DBpedia data into a TigerGraph graph database using a universal schema, and provides examples of queries to extract information from the graph such as related people, publishers by location, and related topics for a given predicate. The webinar encourages attendees to learn more about graph databases and TigerGraph through additional resources and future webinar episodes.
MongoDB and Spring - Two leaves of a same treeMongoDB
Enterprise systems evolve at a tremendous pace these days. All sorts of new frameworks, databases, operating systems and multiple deployment strategies and infrastructures to adjust to ever growing business demands.
The integration between Spring Framework and MongoDB tends to be somewhat unknown. This presentation shows the different projects that compose Spring ecosystem, Springdata, Springboot, SpringIO etc and how to merge between the pure JAVA projects to massive enterprise systems that require the interaction of these systems together.
This document summarizes an internship at Lama Capital Management focused on programming financial strategies. The internship involved:
1) Using JavaScript to plot return vs frequency curves from calculations in Python to link frontend and backend websites.
2) Scraping historical data from websites using Python scripts to develop trading strategies and test them on daily data over 15 days.
3) Implementing strategies like ACD and RSI in Python, including defining triggers and optimizing parameters like ATR, entry/exit points, and profit booking levels to maximize win rates.
4) Programming in Python to retrieve real-time market data, generate buy/sell signals, and place live trades through APIs.
How google is using linked data today and vision for tomorrowVasu Jain
In this presentation, I will discuss how modern search engines, such as Google, make use of Linked Data spread inWeb pages for displaying Rich Snippets. Also i will present an example of the technology and analyze its current uptake.
Then i sketched some ideas on how Rich Snippets could be extended in the future, in particular for multimedia documents.
Original Paper :
http://scholar.google.com/citations?view_op=view_citation&hl=en&user=K3TsGbgAAAAJ&authuser=1&citation_for_view=K3TsGbgAAAAJ:u-x6o8ySG0sC
Another Presentation by Author: https://docs.google.com/present/view?id=dgdcn6h3_185g8w2bdgv&pli=1
Multiplaform Solution for Graph DatasourcesStratio
One of the top banks in Europe, needed a system to provide better performance, scaling almost linearly with the increase in information to be analyzed, and allowing to move the processes that were currently being executed in the Host to a Big Data infrastructure. During a year we've worked on a system which is able to provide greater agility, flexibility and simplicity for the user to view information when profiling and is now able to analyze the structure of profile data. It's a powerful way to make online queries to a graph database, which is integrated with Apache Spark and different graph libraries. Basically, we get all the necessary information through Cypher queries which are sent to a Neo4j database.
Using the last Big Data technologies like Spark Dataframe, HDFS, Stratio Intelligence or Stratio Crossdata, we have developed a solution which is able to obtain critical information for multiple datasources like text files o graph databases.
While the adoption of machine learning and deep learning techniques continue to grow, many organizations find it difficult to actually deploy these sophisticated models into production. It is common to see data scientists build powerful models, yet these models are not deployed because of the complexity of the technology used or lack of understanding related to the process of pushing these models into production.
As part of this talk, I will review several deployment design patterns for both real-time and batch use cases. I’ll show how these models can be deployed as scalable, distributed deployments within the cloud, scaled across hadoop clusters, as APIs, and deployed within streaming analytics pipelines. I will also touch on topics related to security, end-to-end governance, pitfalls, challenges, and useful tools across a variety of platforms. This presentation will involve demos and sample code for the the deployment design patterns.
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsChristophe Debruyne
Data processing is increasingly the subject of various internal and external regulations, such as GDPR which has recently come into effect. Instead of assuming that such processes avail of data sources (such as files and relational databases), we approach the problem in a more abstract manner and view these processes as taking datasets as input. These datasets are then created by pulling data from various data sources. Taking a W3C Recommendation for prescribing the structure of and for describing datasets, we investigate an extension of that vocabulary for the generation of executable R2RML mappings. This results in a top-down approach where one prescribes the dataset to be used by a data process and where to find the data, and where that prescription is subsequently used to retrieve the data for the creation of the dataset “just in time”. We argue that this approach to the generation of an R2RML mapping from a dataset description is the first step towards policy-aware mappings, where the generation takes into account regulations to generate mappings that are compliant. In this paper, we describe how one can obtain an R2RML mapping from a data structure definition in a declarative manner using SPARQL CONSTRUCT queries, and demonstrate it using a running example. Some of the more technical aspects are also described.
Reference: Christophe Debruyne, Dave Lewis, Declan O'Sullivan: Generating Executable Mappings from RDF Data Cube Data Structure Definitions. OTM Conferences (2) 2018: 333-350
Guest lecture at the Syracuse University School of Information Studies eScience Librarianship Lecture Series (08 Dec 2011).
Description: It’s your government, is it your data? New approaches to building interlinked catalogs of government-produced data. Dr. John S. Erickson, Director of Web Science Operations for the Tetherless World Constellation at Rensselaer Polytechnic Institute will present technical methods being developed to manage the delivery of large-scale open government data projects based on semantic web and linked data best practices.
Data Day Seattle 2017: Scaling Data Science at Stitch FixStefan Krawczyk
At Stitch Fix we have a lot of Data Scientists. Around eighty at last count. One reason why I think we have so many, is that we do things differently. To get their work done, Data Scientists have access to whatever resources they need (within reason), because they’re end to end responsible for their work; they collaborate with their business partners on objectives and then prototype, iterate, productionize, monitor and debug everything and anything required to get the output desired. They’re full data-stack data scientists!
The teams in the organization do a variety of different tasks:
- Clothing recommendations for clients.
- Clothes reordering recommendations.
- Time series analysis & forecasting of inventory, client segments, etc.
- Warehouse worker path routing.
- NLP.
… and more!
They’re also quite prolific at what they do -- we are approaching 4500 job definitions at last count. So one might be wondering now, how have we enabled them to get their jobs done without getting in the way of each other?
This is where the Data Platform teams comes into play. With the goal of lowering the cognitive overhead and engineering effort required on part of the Data Scientist, the Data Platform team tries to provide abstractions and infrastructure to help the Data Scientists. The relationship is a collaborative partnership, where the Data Scientist is free to make their own decisions and thus choose they way they do their work, and the onus then falls on the Data Platform team to convince Data Scientists to use their tools; the easiest way to do that is by designing the tools well.
In regard to scaling Data Science, the Data Platform team has helped establish some patterns and infrastructure that help alleviate contention. Contention on:
Access to Data
Access to Compute Resources:
Ad-hoc compute (think prototype, iterate, workspace)
Production compute (think where things are executed once they’re needed regularly)
For the talk (and this post) I only focused on how we reduced contention on Access to Data, & Access to Ad-hoc Compute to enable Data Science to scale at Stitch Fix. With that I invite you to take a look through the slides.
Boost your data analytics with open data and public news contentOntotext
Get guidance through the gigantic sea of freely available Open Data and learn how it can empower you analysis of any kind of sources.
This webinar is a live demo of news and data analytics, based on rich links within big knowledge graphs. It will show you how to:
Build ranking reports (e.g for people and organisations)
View topics linked implicitly (e.g. daughter companies, key personnel, products …)
Draw trend lines
Extend your analytics with additional data sources
This document outlines an intro to JavaScript fundamentals course, including:
- An overview of the instructor and TAs
- A description of the agenda which includes learning key JavaScript concepts, assignments, and an answer key
- Explanations of how the web works, client/server relationships, and an example using Facebook
- The history and modern use of JavaScript
- Demonstrations of JavaScript fundamentals like variables, functions, if/else statements, comparing values, and using parameters
- Encouragement to use online resources like Google and Repl.it for hands-on practice
- Information on continuing learning opportunities from Thinkful
This is our contributions to the Data Science projects, as developed in our startup. These are part of partner trainings and in-house design and development and testing of the course material and concepts in Data Science and Engineering. It covers Data ingestion, data wrangling, feature engineering, data analysis, data storage, data extraction, querying data, formatting and visualizing data for various dashboards.Data is prepared for accurate ML model predictions and Generative AI apps
This is our project work at our startup for Data Science. This is part of our internal training and focused on data management for AI, ML and Generative AI apps
This document outlines an intro to JavaScript fundamentals course, including:
- An overview of the instructor and TAs
- Learning key JavaScript concepts like variables, functions, if/else statements
- Examples of how the web works with clients and servers
- A brief history of JavaScript and how it has evolved
- Using Repl.it to do hands-on coding challenges
Matt Archer - How To Regression Test A Billion Rows Of Financial Data Every S...TEST Huddle
EuroSTAR Software Testing Conference 2012 presentation on How To Regression Test A Billion Rows Of Financial Data Every Sprint by Matt Archer.
See more at: http://conference.eurostarsoftwaretesting.com/past-presentations/
This document discusses creating a knowledge graph for Irish history as part of the Beyond 2022 project. It will include digitized records from core partners documenting seven centuries of Irish history. Entities like people, places, and organizations will be extracted from source documents and related in a knowledge graph using semantic web technologies. An ontology was created to provide historical context and meaning to the relationships between entities in Irish history. Tools will be developed to explore and search the knowledge graph to advance historical research.
This document presents an interest-based approach for propagating RDF updates between a source dataset and local replicas. The traditional approach of fully synchronizing all changes is not scalable. The proposed approach uses SPARQL queries to define interests, and only propagates changes that match the interests to the replicas. This cuts down the size of updates significantly. Experimental results show the interesting changes were 0.38-4.38% of removed triples and 0.34-1.81% of added triples, reducing overhead of synchronization.
Profiling User Interests on the Social Semantic WebFabrizio Orlandi
Fabrizio Orlandi's PhD Viva @Insight NUI Galway (ex-DERI) - 31/03/2014.
Supervisors: Alexandre Passant and John G. Breslin.
Examiners: Fabien Gandon and Stefan Decker
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebFabrizio Orlandi
This document discusses improving user interest profiling techniques by leveraging linked data, the provenance of data, and the social semantic web. It aims to address challenges like information isolation across social media sites and the lack of provenance on the web of data. Key research questions focus on how to extract and aggregate user information from social media following linked data principles, the role of provenance for user profiling, and how to use the web of data and semantic technologies to enrich profiles. The work aims to represent user profiles interoperably and adapt profiling algorithms to different social media and data origins.
Semantic user profiling and Personalised filtering of the Twitter streamFabrizio Orlandi
Presentation at Kno.e.sis - Feb 2012.
The presentation describe my current PhD research at DERI and the work done in 5 weeks during a collaboration in Kno.e.sis with Pavan Kapanipathi, Prof. Amit Sheth, Prof. T. K. Prasad and the rest of the group.
- video: http://youtu.be/MmF5HxIVUwA
Semantic Representation of Provenance in WikipediaFabrizio Orlandi
This document discusses representing provenance information from Wikipedia articles using semantic web technologies. The authors present a semantic model based on SIOC and the W7 model to represent provenance using RDF triples. They describe extracting provenance data from Wikipedia revisions and applying their model to over 166 articles in the "Semantic Web" category. An application was created to access and expose the provenance data, allowing statistics about article edits to be viewed on Wikipedia pages and as linked open data. Future work could include refining the provenance model and improving the performance of the application.
Semantic search on heterogeneous wiki systems - Wikimania 2010Fabrizio Orlandi
1) The document proposes using Linked Data principles and extending the SIOC ontology to semantically interconnect heterogeneous wiki systems and enable semantic search across them.
2) Key wiki features like categorization, tagging, discussions, and versioning are modeled in the extended SIOC ontology.
3) Plugins are developed for MediaWiki and DokuWiki to export semantic data using the extended SIOC model, allowing semantic queries across wiki platforms.
Semantic Search on Heterogeneous Wiki Systems - wikisym2010Fabrizio Orlandi
This document discusses enabling semantic search across heterogeneous wiki systems by extending the Semantically Interlinked Online Communities (SIOC) ontology to model relevant wiki features. It proposes modeling multi-authoring, categories, tagging, discussions, backlinks, and page versioning in SIOC. It also describes a MediaWiki exporter that generates RDF using the extended SIOC model to expose wiki data and link wiki pages following Linked Data practices.
Semantic Search on Heterogeneous Wiki Systems - posterFabrizio Orlandi
This document describes a system for enabling semantic search across heterogeneous wiki systems using Semantic Web technologies. The key contributions are:
1) Developing a common RDF model for representing wiki structure and contributions to encompass previous models.
2) Extracting semantic data from different wiki engines and loading it into a Sesame RDF store, totaling around 45,500 triples.
3) Building an application with a simple interface that allows semantic searching and browsing across linked wikis in less than 3 seconds.
Semantic Search on Heterogeneous Wiki Systems - ShortFabrizio Orlandi
1) The document discusses a system to enable semantic search across heterogeneous wiki systems using Semantic Web technologies.
2) Key aspects of the system include a common semantic model based on the SIOC ontology to represent wiki structure and contributions, data extractors to translate wiki data to RDF, and an application with a user interface to enable semantic search and browsing across different interlinked wikis.
3) The system was able to semantically search and link information across 5 different wiki sites containing over 3000 articles and 700 users.
Enabling cross-wikis integration by extending the SIOC ontologyFabrizio Orlandi
This document discusses enabling cross-wiki integration by extending the SIOC (Semantically-Interlinked Online Communities) ontology. It presents an approach to represent wiki structures and social interactions in a unified way using SIOC. An exporter was developed to translate MediaWiki pages into SIOC data following Linked Data principles. Querying this integrated data across wikis and other social platforms was demonstrated. Further work is needed to develop exporters for other wiki platforms and improve modeling of wiki page content and versioning systems.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
3. www.adaptcentre.ieKnowledge Graphs - Example
3Image source: https://aws.amazon.com/neptune/
When did this occur?
What is the time span?
(Valid time)
4. www.adaptcentre.ie
4
When did this occur?
What is the time span?
(Valid time)
What’s the confidence
of this fact?
(Certainty)
Knowledge Graphs - Example
5. www.adaptcentre.ie
When did this occur?
What is the time span?
(Valid time)
When were these facts
created? What’s their
time validity?
(Transaction time)
What’s the confidence
of this fact?
(Certainty)
5
Knowledge Graphs - Example
6. www.adaptcentre.ie
When did this occur?
What is the time span?
(Valid time)
When were these facts
created? What’s their
time validity?
(Transaction time)
What’s the confidence
of this fact?
(Certainty)
6
Knowledge Graphs - Example
Where does this data
come from?
(Provenance)
7. www.adaptcentre.ie
● Temporal aspects of facts are usually not reflected in KGs
(When are specific statements - triples - valid?)
● Facts extracted from heterogeneous data sources hold different degrees of
certainty, depending on the source or the extraction/generation process
● Missing efficient solutions for managing the dynamics (the evolution) of KGs
(When were specific statements added/updated?)
● Need for data provenance: what’s the origin of the data?
Popular Use Cases for Contextual Metadata
7
8. www.adaptcentre.ieData Provenance with PROV-O
Provenance (W3C definition¹):
“Provenance of a resource is a record that describes entities and processes involved in producing and delivering or
otherwise influencing that resource.
Provenance provides a critical foundation for assessing authenticity, enabling trust, and allowing reproducibility.
Provenance assertions are a form of contextual metadata and can themselves become important records with their own
provenance.”
PROV-O:
W3C ontology (OWL) based on
the core PROV data model
http://www.w3.org/TR/prov-o/
8¹ https://www.w3.org/2005/Incubator/prov/wiki/What_Is_Provenance
11. www.adaptcentre.ieExample of Statement-Level Metadata
11
Subject Predicate Object Starts Ends
Cristiano Ronaldo team Real Madrid 1 July 2009 10 July 2018
Cristiano Ronaldo team Juventus 11 July 2018
Cristiano Ronaldo Real Madrid
team
How to represent this
in a graph?
?
the problem of n-ary (not binary) relations...
12. www.adaptcentre.ieRDF graphs vs. Property graphs
12
RDF Graphs
● Formally defined data model
● Various well-defined serialization
formats
● Well-defined query language with a
formal semantics
● Natural support for globally unique
identifiers
● Semantics of data can be made
explicit in the data itself
● W3C recommendations (standards!)
● High usage complexity
Labeled-Property Graphs (e.g. neo4j )
● Easy to manage statement-level
metadata
● Efficient graph traversals
● Fast and scalable implementations
● No open standards defined
● Different proprietary implementations
and query languages
● Good adoption in enterprise
13. www.adaptcentre.ieRDF graphs vs. Property graphs
13
RDF Graphs
Vertices
Every statement produces two vertices in the graph.
Some are uniquely identified by URIs: Resources
Some are property values: e.g. Literals
Edges
Every statement produces an edge.
Uniquely identified by URIs
Vertices or Edges have NO internal structure
Labeled-Property Graphs (e.g. neo4j )
Vertices
Unique Id + set of key-value pairs
Edges
Unique Id + set of key-value pairs
Vertices and Edges have internal structure
14. www.adaptcentre.ieRDF graphs vs. Property graphs
14
SPARQL
SELECT ?who
WHERE
{
?who :likes ?a .
?a rdf:type :Person .
?a :name ?aName .
FILTER regex(?aName,’Ann’)
}
Cypher (neo4j)
MATCH
(who)-[:LIKES]->(a:Person)
WHERE
a.name CONTAINS ‘Ann’
RETURN who
Query: Who likes a person named “Ann”?
15. www.adaptcentre.ieStatement-Level Metadata with Property Graphs
15
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
Cristiano Ronaldo Real Madrid
team {
starts : 2009-07-01
ends : 2018-07-10 }
16. www.adaptcentre.ieModelling (1) - RDF Reification
16
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
Cristiano_Ronaldo
team
Subject Predicate Object
Cristiano_Ronaldo team Real_Madrid
Stmt1 type Statement
Stmt1 subject Cristiano_Ronaldo
Stmt1 predicate team
Stmt1 object Real_Madrid
Stmt1 starts 2009-07-01
Stmt1 ends 2018-07-10
Real_Madrid
Stmt1 Statement
2009-07-01
2018-07-10
subject object
predicate
type
starts
ends
17. www.adaptcentre.ieModelling (1) - RDF Reification
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
Pros:
1. Easy to understand
Cons:
1. Not Scalable => Takes 4N to represent
a statement
2. No formal semantics defined
3. Discouraged in LOD!
4N
Subject Predicate Object
Cristiano_Ronaldo team Real_Madrid
Stmt1 type Statement
Stmt1 subject Cristiano_Ronaldo
Stmt1 predicate team
Stmt1 object Real_Madrid
Stmt1 starts 2009-07-01
Stmt1 ends 2018-07-10
18. www.adaptcentre.ie
Vinh Nguyen, Olivier Bodenreider, and Amit Sheth. "Don't like RDF reification?: making statements about statements using singleton property."
In Proceedings of the 23rd international conference on World wide web, ACM, 2014.
Modelling (2) - Singleton Property
18
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
Cristiano_Ronaldo
team#1
Real_Madrid
team
2009-07-01
2018-07-10
singletonPropertyOf
starts
ends
Subject Predicate Object
Cristiano_Ronaldo team#1 Real_Madrid
team#1 singletonPropertyOf team
team#1 starts 2009-07-01
team#1 ends 2018-07-10
19. www.adaptcentre.ie
Vinh Nguyen, Olivier Bodenreider, and Amit Sheth. "Don't like RDF reification?: making statements about statements using singleton property."
In Proceedings of the 23rd international conference on World wide web, ACM, 2014.
Modelling (2) - Singleton Property
19
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
Subject Predicate Object
Cristiano_Ronaldo team#1 Real_Madrid
team#1 singletonPropertyOf team
team#1 starts 2009-07-01
team#1 ends 2018-07-10
Pros:
1. More scalable => only 1 extra triple
Cons:
1. Less intuitive
2. Large number of unique predicates
3. Requires verbose constructs in queries
20. www.adaptcentre.ieModelling (3) - RDF* and SPARQL*
20
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
RDF extension for nested triples:
<< :Cristiano_Ronaldo :team :Real_Madrid >>
:starts “2009-07-01” ;
:ends “2018-07-10”.
SPARQL extension with nested triple patterns:
SELECT ?player WHERE {
<< ?player :team :Real_Madrid >> :starts ?date .
FILTER (?date >= “2009-07-01”) }
21. www.adaptcentre.ie
21
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
1. Purely syntactic “sugar” on top of standard RDF and SPARQL
a. Can be parsed directly into standard RDF and SPARQL
b. Can be implemented easily by a small wrapper on top of any
existing RDF store (DBMS)
2. A logical model in its own right, with the possibility of a
dedicated physical schema
a. Extension of the RDF data model and of SPARQL to capture the notion of
nested triples
b. Supported by some of the most popular triplestores (e.g. Jena, Blazegraph)
Modelling (3) - RDF* and SPARQL*
O Hartig: “Foundations of RDF* and SPARQL* - An Alternative Approach to Statement-Level Metadata in RDF.” In Proc. of the 11th Alberto Mendelzon
International Workshop on Foundations of Data Management (AMW), 2017.
22. www.adaptcentre.ie
22
Recent effort and solution, receiving wider attention and support.
Since 2020, part of the W3C “RDF dev community group”: https://w3c.github.io/rdf-star/
Modelling (3) - RDF* and SPARQL*
Now you can also test it live on Yago (https://yago-knowledge.org)
Try --> https://bit.ly/2V4ARXL
23. www.adaptcentre.ie
Carroll, Jeremy J., et al. "Named graphs, provenance and trust." Proceedings of the 14th international conference on World Wide Web. ACM, 2005.
Modelling (4) - Named Graphs (Quads)
23
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
Subject Predicate Object NG
Cristiano_Ronaldo team Real_Madrid graph_1
graph_1 starts 2009-07-01 graph_X
graph_1 ends 2018-07-10 graph_X
Cristiano_Ronaldo
team
Real_Madrid
graph_1
2009-07-01
2018-07-10
starts
ends
graph_X
24. www.adaptcentre.ie
Pros:
1. Intuitive - creates N named graphs for N
sources
2. Attach metadata for a set of triples
3. RDF and SPARQL standards
https://www.w3.org/TR/sparql11-query/#specifyingDataset
Cons:
1. Restricts usage of named graphs to
provenance only
2. Requires verbose constructs in queries
Modelling (4) - Named Graphs (Quads)
24
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
Subject Predicate Object NG
Cristiano_Ronaldo team Real_Madrid graph_1
graph_1 starts 2009-07-01 graph_X
graph_1 ends 2018-07-10 graph_X
Carroll, Jeremy J., et al. "Named graphs, provenance and trust." Proceedings of the 14th international conference on World Wide Web. ACM, 2005.
A possible specification is N-Quads that extends N-Triples
with an optional context value at the fourth position
http://www.w3.org/TR/n-quads/ (W3C Recommendation)
25. www.adaptcentre.ieData Provenance with PROV-O - Example
25
prov:wasAttributedTo
:Fabrizio
Expressing statements about statements using Named Graphs and PROV-O
:graphName
28. www.adaptcentre.ieModelling (5) - Qualifiers in Wikidata
28
wd:Cristiano_Ronaldo
wdt:member_of_sports
_team wd:Real_Madrid
wds:Statement
2009-07-01
2018-07-10
p:member_of_sports_team ps:member_of_sports_team
pq:start_time
pq:end_time
The prefix p: points not to the object, but to a statement node. This node then is the subject of other triples.
The prefix ps: within the statement node retrieves the object.
The prefix pq: within the statement node retrieves the qualifier information.
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
(see: https://en.wikibooks.org/wiki/SPARQL/WIKIDATA_Qualifiers,_References_and_Ranks)
31. www.adaptcentre.ieSummary - Statement Level Metadata in RDF
1) Standard Reification
2) Singleton Property
3) RDF* / SPARQL*
4) Named Graphs (Quads)
5) Wikidata Qualifiers
31
32. www.adaptcentre.ie
Research in our group…
How can we effectively represent and manage temporal dynamics
and uncertainty of facts in knowledge graphs?
Current activities:
● Model and characterise facts in KGs according to temporal and uncertainty aspects
● Develop solutions for real-time processing, update and propagation of changes in
KGs
● Evaluate the developed solutions, applying them to different use cases
32
33. www.adaptcentre.ie
Research in our group…
- RDF* Observatory: Benchmarking RDF*/SPARQL* engines
https://github.com/dgraux/RDFStarObservatory
- A real-time dashboard for Wikidata edits
- Summarising and verbalising the evolution of KGs with Formal
Concept Analysis
- A scalable and efficient storage layer for temporal KGs
33
34. www.adaptcentre.ie
Some Industrial Use-Cases
1) Finance (temporal aspects)
Data about companies, their shares & market is complex, available and very time-dependent.
→ See “Thomson Reuters” and “Bloomberg” KGs
2) Law / Court Cases (uncertainty)
Legal search and Q&A systems on large corpora of court cases need the uncertainty dimension for
their different information extraction systems
→ See “Wolters Kluwer’s KG” and Google’s “Knowledge Vault”
3) News & Social Media (dynamics)
Very time-dependent & uncertain data which needs an efficient management solution for its dynamics
→ See “GDELT” Global Knowledge Graph project
34