SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Stardog
Linked Data Catalog
      Héctor Pérez-Urbina
     Edgar Rodríguez-Díaz

       Clark & Parsia, LLC
 {hector, edgar}@clarkparsia.com
Who are we?
● Clark & Parsia is a semantic software startup
● HQ in Washington, DC & office in Boston
● Provides software development and integration
  services
● Specializing in Semantic Web, web services, and
  advanced AI technologies for federal and
  enterprise customers

        http://clarkparsia.com/
        Twitter: @candp
What's SLDC?
● Stardog Linked Data Catalog
● A catalog of data sources
    ○ Semi structured
    ○ Relational
    ○ Object-oriented
    ○ ...
● Provides a coherent view over existing data
  repositories so that users and/or
  applications can easily find them and query
  them
Use Cases
● Sources
   ○ Management, import, subscription,
     categorization, sharing
● Query
   ○ Management, sharing, results export
   ○ Querying
      ■ Metadata, external sources, integration
● Locating sources
   ○ Search, browse
● NLP/AI
   ○ Entity extraction, graph algorithms, clustering
     analysis
Application layer




  Middleware layer




NLP/AI analytics layer




     Data layer
Demo
Semantic Technologies
● W3C standards
   ○ RDF(S), OWL, SPARQL
● Lower operational costs and raise productivity
   ○ Cooperation without coordination
   ○ Appropriate abstractions
   ○ Declarative is better than imperative
   ○ Correctness when it matters; sloppiness
     when it doesn’t
Data Model
● Similar to DCAT from W3C
   ○ Catalog entries
● Enhanced with
   ○ SSD
   ○ VoID datasets
   ○ SKOS background models
   ○ Axioms & rules
Modeling the Domain
● Use of axioms to model
  relationships between
  classes
   ○ :Query subClassOf :
     Resource
   ○ :Entry subClassOf :
     Resource
● Retrieve the resources
  user :u can see
   ○ SELECT ?resource
     WHERE { ?resource
     type :Resource . }
Security
● Authentication
   ○ Shiro-Based implementation
   ○ Extensible to LDAP and/or AD
● Authorization
   ○ Eat-your-own-food approach
   ○ Reasoning-Based
   ○ Use of axioms & rules
Deriving Permissions
● Users have permission
  roles
● Permission roles have
  permission relations with
  resources
Deriving Permissions
● If a user has a permission role containing a
  read permission associated to a resource,
  then the user has the same permission over
  the resource
     :permissionRole(?user,?role),
     :readPermission(?role,?resource) ->
     :readUserPermission(?user,?resource)
● Everybody has read access to public
  resources
     :User(?user),
     :PublicResource(?resource) ->
     :readUserPermission(?user,?resource)
Deriving Permissions
● User :user1 has delete permissions over any
  source
   ○ :deleteUserPermission(?user,:anySource),
     :DataSource(?source) ->
     :deleteUserPermission(?user,?source)
   ○ :user1 :deleteUserPermission :anySource
● Everybody has all permissions to the resources
  they created
   ○ :resourceCreator(?user,?resource) ->
     :allUserPermissions(?user,?resource)
   ○ :allUserPermissions(?user,?resource) ->
     :readUserPermission(?user,?resource)
   ○ ...
Impact of Reasoning
Can user :user1 delete resource :source1?
     ASK WHERE {
         { :user1 :deleteUserPermission :source1 . }
         UNION
         { :user1 :permissionRole ?role .
           ?role :deletePermission :source1 . }
         UNION
         { :user1 :resourceCreator :source1 . }
         UNION
         { :user1 :deleteUserPermission :anyResource . }
         UNION
         { :user1 :allUserPermissions :source1 . }
         UNION
         { ... }
         UNION
         ...
Impact of Reasoning
● Are you sure you're not missing anything?
● New awesome way of getting delete permissions
  you came up with yesterday
● Model knowledge where it belongs and let the
  reasoner do the work for you:
    ASK WHERE {
        { :user1 :deleteUserPermission :source1 . }
    }
Too much Inference?
When I say
   :deleteUserPermission domain :User
   :deleteUserPermission range :Resource
I mean that for every triple
  :user1 :deleteUserPermission :resource1
the individual :user1 must be an instance of :
User and :resource1 of :Resource.

But the reasoner doesn't find the error!!
Typing Constraint
Only users can have delete user permissions
 ● :deleteUserPermission domain :User
 ● :user1 :deleteUserPermission :resource1
Typing Constraint
Only users can have delete user permissions
  ● :deleteUserPermission domain :User
  ● :user1 :deleteUserPermission :resource1


                     OWA                  CWA
Consistent            true                 false

             Infer that          Assume that
Reason       :user1 type :User   :user1 type not :User
CWA or OWA?
● Which one?
   ○ Of course use both!
● Some axioms should be interpreted under
  CWA
        :deleteUserPermission domain :User
● And others under OWA
        :SuperUser subClassOf   :User
● So the right thing happens
        :user1 :deleteUserPermission :resource1
        :user1 type :SuperUser
SLDC for Data Integration
● SLDC provides descriptions of data sources,
  relationships between them, and information
  to query them
● We can treat data sources as an integrated
  single data source
    ○ Distributed querying
    ○ AI analytics
● Virtual, materialized, hybrid
Mappings
● Simple
   ○ pops:Employee subClassOf foaf:Person
   ○ pops:Project equivalentTo foaf:Project
   ○ pops:hasEmployee subPropertyOf foaf:member
● SWRL-Based
   ○ pops:firstName(?person, ?first),
     pops:lastName(?person, ?last),
     swrlb:concat(?name, ?first, " ", ?last) ->
     foaf:name(?person, ?name)
   ○ pops:worksOnProject(?person,?project),
     pops:ActiveProject(?project) ->
     foaf:currentProject(?person,?project)
Summing Up
● SLDC is a linked data catalog
    ○ Manage a variety of sources
    ○ Find sources
    ○ Query sources
● Implemented using Semantic Technologies
    ○ Reasoning
       ■ Axioms & Rules
    ○ Data validation
    ○ Data integration
Questions?
Why?
● Large organizations
   ○ Disparate departments
   ○ Independent, isolated sources
● Where is what?
   ○ Do we have a data source about clients?
   ○ Where is it?
● Who created what?
   ○ Who owns it?
● Who has access to what?
   ○ Do I have access to it?
   ○ Who do I talk to to get it?
Source Management
● Management
    ○ Create, delete, update, clone
● Import
    ○ RDF, HTML, XML
● Subscription
    ○ Endpoint location
● Categorization
    ○ Categories
    ○ External vocabularies
● Sharing
    ○ To specific users
    ○ Public
Querying Sources
● Querying metadata
    ○ Queries about the catalog itself
● External query
    ○ Querying a particular source
● Integrated query
    ○ Querying a set of integrated sources
● Query management
● Query sharing
● Results export
Finding Sources
● Browse
   ○ Facets
   ○ Pelorus
● Search
   ○ Text-based search
   ○ Rich query language
Last but not least
● NLP processing
   ○ Entity/Event extraction from natural language
     source descriptions
   ○ Better source classification & search
● Graph algorithms
   ○ What's the shortest path between these
     resources?
● Clustering
   ○ Can we discover similar sources based on a
     given criteria?
Axioms
● It's not always about simple taxonomies...
● What about domain/range axioms?
   ○ :someProperty domain :SomeClass
   ○ :a :someProperty :b
   ○ :SomeClass(x)?
● What about complex subclass chains?
   ○ :SomeClass subClassOf :someProperty
     some :OtherClass
   ○ :someProperty some :OtherClass subClassOf
     :AnotherClass
   ○ :a type :SomeClass
   ○ :AnotherClass(x)?
● What about cardinality constraints, universal
  quantification, datatype reasoning, ...?
Data Validation
● Fundamental data management problem
   ○ Verify data integrity and correctness
   ○ Data corruption can lead to failures in applications, errors
     in decision making, security vulnerabilities, etc.
● Relevant in many scenarios
   ○ Storing data for stand-alone applications
   ○ Exchanging data in distributed settings
● For some use cases, data validation is critical but
  we still want to do it intelligently
Participation Constraint
Each resource must have been created by a user
 ● :Resource subClassOf inv(resourceCreator) some
   :User
 ● :resource1 type :Resource


                     OWA                         CWA
Consistent             true                       false

             Infer that
                                        Assume that
                 ● _:b :                _:b :resourceCreator :
Reason             resourceCreator :
                                        resource1
                   resource1
                                        is false
                 ● _:b type :Resource
Uniqueness Constraint
Each data source must belong to at most one
catalog entry
 ● :dataSource inverseFunctional
 ● :entry1 :dataSource :dataSource1
 ● :entry2 :dataSource :dataSource1

                     OWA                      CWA
Consistent            true                    false

                                    Assume that
             Infer that
Reason       :entry1 sameAs :entry2
                                    :entry1 sameAs :entry2
                                    is false

Weitere ähnliche Inhalte

Was ist angesagt?

Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsApproaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsLucidworks
 
Access Control for HTTP Operations on Linked Data
Access Control for HTTP Operations on Linked DataAccess Control for HTTP Operations on Linked Data
Access Control for HTTP Operations on Linked DataLuca Costabello
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksLucidworks
 
Gerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionLucidworks
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013Roy Russo
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Olaf Hartig
 
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...Charlie Hull
 
Cool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchCool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchclintongormley
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionRrubaa Panchendrarajan
 
A Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageA Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageGreg Brown
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 
Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionDavid Pilato
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword Haitham El-Ghareeb
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Olaf Hartig
 

Was ist angesagt? (20)

Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsApproaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
 
Access Control for HTTP Operations on Linked Data
Access Control for HTTP Operations on Linked DataAccess Control for HTTP Operations on Linked Data
Access Control for HTTP Operations on Linked Data
 
Solr vs ElasticSearch
Solr vs ElasticSearchSolr vs ElasticSearch
Solr vs ElasticSearch
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
 
Gerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol Graph Databases
Gerry McNicol Graph Databases
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with Fusion
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
 
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
 
Cool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchCool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearch
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
 
A Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageA Survey of Elasticsearch Usage
A Survey of Elasticsearch Usage
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English version
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
 

Ähnlich wie Stardog Linked Data Catalog

SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
AI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analyticsAI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analyticsDataWorks Summit
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"Pinar Alper
 
Instant search - A hands-on tutorial
Instant search  - A hands-on tutorialInstant search  - A hands-on tutorial
Instant search - A hands-on tutorialGanesh Venkataraman
 
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop Clusters
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop ClustersCloudera Federal Forum 2014: Tracking Provenance in Hadoop Clusters
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop ClustersCloudera, Inc.
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysDemi Ben-Ari
 
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentAre Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentHarsh Thakkar
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 
Beyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at ScaleBeyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at ScaleTuri, Inc.
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security datamarkgrover
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 
API Design & Security in django
API Design & Security in djangoAPI Design & Security in django
API Design & Security in djangoTareque Hossain
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017MLconf
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
Democratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryDemocratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryMark Grover
 
A Framework for Dynamic Data Source Identification and Orchestration on the Web
A Framework for Dynamic Data Source Identification and Orchestration on the WebA Framework for Dynamic Data Source Identification and Orchestration on the Web
A Framework for Dynamic Data Source Identification and Orchestration on the Webmashups
 

Ähnlich wie Stardog Linked Data Catalog (20)

SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
AI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analyticsAI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analytics
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
Instant search - A hands-on tutorial
Instant search  - A hands-on tutorialInstant search  - A hands-on tutorial
Instant search - A hands-on tutorial
 
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop Clusters
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop ClustersCloudera Federal Forum 2014: Tracking Provenance in Hadoop Clusters
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop Clusters
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentAre Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Beyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at ScaleBeyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at Scale
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security data
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
API Design & Security in django
API Design & Security in djangoAPI Design & Security in django
API Design & Security in django
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Democratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryDemocratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data Discovery
 
A Framework for Dynamic Data Source Identification and Orchestration on the Web
A Framework for Dynamic Data Source Identification and Orchestration on the WebA Framework for Dynamic Data Source Identification and Orchestration on the Web
A Framework for Dynamic Data Source Identification and Orchestration on the Web
 

Mehr von Clark & Parsia LLC

Stardog 1.1: Easier, Smarter, Faster RDF Database
Stardog 1.1: Easier, Smarter, Faster RDF DatabaseStardog 1.1: Easier, Smarter, Faster RDF Database
Stardog 1.1: Easier, Smarter, Faster RDF DatabaseClark & Parsia LLC
 
Validating Linked Data with OWL
Validating Linked Data with OWLValidating Linked Data with OWL
Validating Linked Data with OWLClark & Parsia LLC
 
PelletServer: REST and Semantic Technologies
PelletServer: REST and Semantic TechnologiesPelletServer: REST and Semantic Technologies
PelletServer: REST and Semantic TechnologiesClark & Parsia LLC
 
PelletDb: Scalable Reasoning for Enterprise Semantics
PelletDb: Scalable Reasoning for Enterprise SemanticsPelletDb: Scalable Reasoning for Enterprise Semantics
PelletDb: Scalable Reasoning for Enterprise SemanticsClark & Parsia LLC
 
Automated Planning as a Semantic Technology
Automated Planning as a Semantic TechnologyAutomated Planning as a Semantic Technology
Automated Planning as a Semantic TechnologyClark & Parsia LLC
 
SemTech 2010: Pelorus Platform
SemTech 2010: Pelorus PlatformSemTech 2010: Pelorus Platform
SemTech 2010: Pelorus PlatformClark & Parsia LLC
 

Mehr von Clark & Parsia LLC (9)

Stardog 1.1: Easier, Smarter, Faster RDF Database
Stardog 1.1: Easier, Smarter, Faster RDF DatabaseStardog 1.1: Easier, Smarter, Faster RDF Database
Stardog 1.1: Easier, Smarter, Faster RDF Database
 
Stardog talk-dc-march-17
Stardog talk-dc-march-17Stardog talk-dc-march-17
Stardog talk-dc-march-17
 
Validating Linked Data with OWL
Validating Linked Data with OWLValidating Linked Data with OWL
Validating Linked Data with OWL
 
Terp: An OWL-friendly SPARQL
Terp: An OWL-friendly SPARQLTerp: An OWL-friendly SPARQL
Terp: An OWL-friendly SPARQL
 
PelletServer: REST and Semantic Technologies
PelletServer: REST and Semantic TechnologiesPelletServer: REST and Semantic Technologies
PelletServer: REST and Semantic Technologies
 
PelletDb: Scalable Reasoning for Enterprise Semantics
PelletDb: Scalable Reasoning for Enterprise SemanticsPelletDb: Scalable Reasoning for Enterprise Semantics
PelletDb: Scalable Reasoning for Enterprise Semantics
 
Automated Planning as a Semantic Technology
Automated Planning as a Semantic TechnologyAutomated Planning as a Semantic Technology
Automated Planning as a Semantic Technology
 
Empire: JPA for RDF & SPARQL
Empire: JPA for RDF & SPARQLEmpire: JPA for RDF & SPARQL
Empire: JPA for RDF & SPARQL
 
SemTech 2010: Pelorus Platform
SemTech 2010: Pelorus PlatformSemTech 2010: Pelorus Platform
SemTech 2010: Pelorus Platform
 

Kürzlich hochgeladen

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Stardog Linked Data Catalog

  • 1. Stardog Linked Data Catalog Héctor Pérez-Urbina Edgar Rodríguez-Díaz Clark & Parsia, LLC {hector, edgar}@clarkparsia.com
  • 2. Who are we? ● Clark & Parsia is a semantic software startup ● HQ in Washington, DC & office in Boston ● Provides software development and integration services ● Specializing in Semantic Web, web services, and advanced AI technologies for federal and enterprise customers http://clarkparsia.com/ Twitter: @candp
  • 3. What's SLDC? ● Stardog Linked Data Catalog ● A catalog of data sources ○ Semi structured ○ Relational ○ Object-oriented ○ ... ● Provides a coherent view over existing data repositories so that users and/or applications can easily find them and query them
  • 4. Use Cases ● Sources ○ Management, import, subscription, categorization, sharing ● Query ○ Management, sharing, results export ○ Querying ■ Metadata, external sources, integration ● Locating sources ○ Search, browse ● NLP/AI ○ Entity extraction, graph algorithms, clustering analysis
  • 5. Application layer Middleware layer NLP/AI analytics layer Data layer
  • 7. Semantic Technologies ● W3C standards ○ RDF(S), OWL, SPARQL ● Lower operational costs and raise productivity ○ Cooperation without coordination ○ Appropriate abstractions ○ Declarative is better than imperative ○ Correctness when it matters; sloppiness when it doesn’t
  • 8. Data Model ● Similar to DCAT from W3C ○ Catalog entries ● Enhanced with ○ SSD ○ VoID datasets ○ SKOS background models ○ Axioms & rules
  • 9. Modeling the Domain ● Use of axioms to model relationships between classes ○ :Query subClassOf : Resource ○ :Entry subClassOf : Resource ● Retrieve the resources user :u can see ○ SELECT ?resource WHERE { ?resource type :Resource . }
  • 10. Security ● Authentication ○ Shiro-Based implementation ○ Extensible to LDAP and/or AD ● Authorization ○ Eat-your-own-food approach ○ Reasoning-Based ○ Use of axioms & rules
  • 11. Deriving Permissions ● Users have permission roles ● Permission roles have permission relations with resources
  • 12. Deriving Permissions ● If a user has a permission role containing a read permission associated to a resource, then the user has the same permission over the resource :permissionRole(?user,?role), :readPermission(?role,?resource) -> :readUserPermission(?user,?resource) ● Everybody has read access to public resources :User(?user), :PublicResource(?resource) -> :readUserPermission(?user,?resource)
  • 13. Deriving Permissions ● User :user1 has delete permissions over any source ○ :deleteUserPermission(?user,:anySource), :DataSource(?source) -> :deleteUserPermission(?user,?source) ○ :user1 :deleteUserPermission :anySource ● Everybody has all permissions to the resources they created ○ :resourceCreator(?user,?resource) -> :allUserPermissions(?user,?resource) ○ :allUserPermissions(?user,?resource) -> :readUserPermission(?user,?resource) ○ ...
  • 14. Impact of Reasoning Can user :user1 delete resource :source1? ASK WHERE { { :user1 :deleteUserPermission :source1 . } UNION { :user1 :permissionRole ?role . ?role :deletePermission :source1 . } UNION { :user1 :resourceCreator :source1 . } UNION { :user1 :deleteUserPermission :anyResource . } UNION { :user1 :allUserPermissions :source1 . } UNION { ... } UNION ...
  • 15. Impact of Reasoning ● Are you sure you're not missing anything? ● New awesome way of getting delete permissions you came up with yesterday ● Model knowledge where it belongs and let the reasoner do the work for you: ASK WHERE { { :user1 :deleteUserPermission :source1 . } }
  • 16. Too much Inference? When I say :deleteUserPermission domain :User :deleteUserPermission range :Resource I mean that for every triple :user1 :deleteUserPermission :resource1 the individual :user1 must be an instance of : User and :resource1 of :Resource. But the reasoner doesn't find the error!!
  • 17. Typing Constraint Only users can have delete user permissions ● :deleteUserPermission domain :User ● :user1 :deleteUserPermission :resource1
  • 18. Typing Constraint Only users can have delete user permissions ● :deleteUserPermission domain :User ● :user1 :deleteUserPermission :resource1 OWA CWA Consistent true false Infer that Assume that Reason :user1 type :User :user1 type not :User
  • 19. CWA or OWA? ● Which one? ○ Of course use both! ● Some axioms should be interpreted under CWA :deleteUserPermission domain :User ● And others under OWA :SuperUser subClassOf :User ● So the right thing happens :user1 :deleteUserPermission :resource1 :user1 type :SuperUser
  • 20. SLDC for Data Integration ● SLDC provides descriptions of data sources, relationships between them, and information to query them ● We can treat data sources as an integrated single data source ○ Distributed querying ○ AI analytics ● Virtual, materialized, hybrid
  • 21.
  • 22.
  • 23. Mappings ● Simple ○ pops:Employee subClassOf foaf:Person ○ pops:Project equivalentTo foaf:Project ○ pops:hasEmployee subPropertyOf foaf:member ● SWRL-Based ○ pops:firstName(?person, ?first), pops:lastName(?person, ?last), swrlb:concat(?name, ?first, " ", ?last) -> foaf:name(?person, ?name) ○ pops:worksOnProject(?person,?project), pops:ActiveProject(?project) -> foaf:currentProject(?person,?project)
  • 24. Summing Up ● SLDC is a linked data catalog ○ Manage a variety of sources ○ Find sources ○ Query sources ● Implemented using Semantic Technologies ○ Reasoning ■ Axioms & Rules ○ Data validation ○ Data integration
  • 26. Why? ● Large organizations ○ Disparate departments ○ Independent, isolated sources ● Where is what? ○ Do we have a data source about clients? ○ Where is it? ● Who created what? ○ Who owns it? ● Who has access to what? ○ Do I have access to it? ○ Who do I talk to to get it?
  • 27. Source Management ● Management ○ Create, delete, update, clone ● Import ○ RDF, HTML, XML ● Subscription ○ Endpoint location ● Categorization ○ Categories ○ External vocabularies ● Sharing ○ To specific users ○ Public
  • 28. Querying Sources ● Querying metadata ○ Queries about the catalog itself ● External query ○ Querying a particular source ● Integrated query ○ Querying a set of integrated sources ● Query management ● Query sharing ● Results export
  • 29. Finding Sources ● Browse ○ Facets ○ Pelorus ● Search ○ Text-based search ○ Rich query language
  • 30. Last but not least ● NLP processing ○ Entity/Event extraction from natural language source descriptions ○ Better source classification & search ● Graph algorithms ○ What's the shortest path between these resources? ● Clustering ○ Can we discover similar sources based on a given criteria?
  • 31. Axioms ● It's not always about simple taxonomies... ● What about domain/range axioms? ○ :someProperty domain :SomeClass ○ :a :someProperty :b ○ :SomeClass(x)? ● What about complex subclass chains? ○ :SomeClass subClassOf :someProperty some :OtherClass ○ :someProperty some :OtherClass subClassOf :AnotherClass ○ :a type :SomeClass ○ :AnotherClass(x)? ● What about cardinality constraints, universal quantification, datatype reasoning, ...?
  • 32. Data Validation ● Fundamental data management problem ○ Verify data integrity and correctness ○ Data corruption can lead to failures in applications, errors in decision making, security vulnerabilities, etc. ● Relevant in many scenarios ○ Storing data for stand-alone applications ○ Exchanging data in distributed settings ● For some use cases, data validation is critical but we still want to do it intelligently
  • 33. Participation Constraint Each resource must have been created by a user ● :Resource subClassOf inv(resourceCreator) some :User ● :resource1 type :Resource OWA CWA Consistent true false Infer that Assume that ● _:b : _:b :resourceCreator : Reason resourceCreator : resource1 resource1 is false ● _:b type :Resource
  • 34. Uniqueness Constraint Each data source must belong to at most one catalog entry ● :dataSource inverseFunctional ● :entry1 :dataSource :dataSource1 ● :entry2 :dataSource :dataSource1 OWA CWA Consistent true false Assume that Infer that Reason :entry1 sameAs :entry2 :entry1 sameAs :entry2 is false