SlideShare ist ein Scribd-Unternehmen logo
1 von 117
Downloaden Sie, um offline zu lesen
INDEXING AND SEARCHING
      RDF DATASETS
Improving Performance of Semantic Web Applications with
                 Lucene, SIREn and RDF


                      Mike Hugo
                     Entagen, LLC
slides and sample code can be found at
https://github.com/mjhugo/rdf-lucene-siren-presentation
ENTAGEN
ACCELERATING INSIGHT
17
AGENDA
SPARQL
SPARQL

    LUCENE
SPARQL

    LUCENE

         SIREN
SPARQL

    LUCENE

          SIREN
    TripleMap.com
LINKING OPEN DATA
LIFE SCIENCE LINKED DATA
WHAT’S A TRIPLE?

Subject


          Predicate



                      Object
WHAT’S A TRIPLE?

<Mike>


         <name>



                  “Mike Hugo”
WHAT’S A TRIPLE?

                           “Minneapolis”
         <lives_in_city>
<Mike>

            <name>

                           “Mike Hugo”
WHAT’S A TRIPLE?

                               “Minneapolis”
<Mike>       <lives_in_city>



<daughter>       <name>


                               “Mike Hugo”
   <Lydia>
WHAT’S A TRIPLE?

                               “Minneapolis”
<Mike>       <lives_in_city>


                   <name>
<daughter>
                                 “Mike Hugo”



   <Lydia>
                 <name>        “Lydia Hugo”
Ready, GO!
select id, label
from targets
where label = ‘${queryValue}’
select id, label
from targets
where label
 ilike ‘%${queryValue}%’
SELECT ?uri ?type ?label WHERE {
  ?uri rdfs:label ?label .
  ?uri rdf:type ?type .
  FILTER (?label = '${params.query}')
} LIMIT 10
SELECT ?uri ?type ?label WHERE {
  ?uri rdfs:label ?label .
  ?uri rdf:type ?type .
  FILTER regex(?label,
    'Q${params.query}E', 'i')
} LIMIT 10
SELECT ?uri ?type ?label WHERE {
  ?uri rdfs:label ?label .
  ?uri rdf:type ?type .
  FILTER regex(?label,
    'Q${params.query}E', 'i')
} LIMIT 10



                              case insensitive
     query as literal value
DEMO
Baseline SPARQL Query Performance
FASTER!
Java API
Indexing and Searching Text
`




http://wiki.apache.org/lucene-java/PoweredBy
indexing   storage
Document
Document

 field        value
   ID           2
  name     “Mike Hugo”
company     “Entagen”
           “lorem ipsum
  bio
          dolor sum etc...”
Index

  field
   field             value
                     value
    field
     field             value
                       value
      field
       field             value
  name
   name field    “mike value
                         value
                         hugo”
                 “mike hugo”
    name
     name         “mike hugo”
                   “mike hugo”
      name
       nameid       “mike hugo”
                     “mike 2hugo”
company
 company          “Entagen”
                   “Entagen”
  company
   company
        name        “Entagen”
                     “Entagen”
                      “Mike Hugo”
    company
     company          “Entagen”
                       “Entagen”
                        “Entagen” Indexed
      company “lorem ipsum
                “lorem ipsum
    bio
     bio         “lorem ipsum
                  “lorem etc...”
      bio
       bio         “loremipsum
               dolorsum ipsum
              dolor“loremetc...”      not
        bio
         bio            sum ipsum
                dolorsum etc...””
                 dolor sum etc... ”
          bio     dolor sum ipsum”
                     “lorem etc...
                   dolor sum etc... Stored
                    dolor sum etc...”
Query:   name: mike
Query:     name: mike

 Matching
Documents:   field            value
              idfield           2
                               value
                 idfield          2
                                 value
                    idfield         2
                                   value
                       id              2
field   value
 id      2
field   value
 id      2
field   value
 id      2



                field          value
                  ID            2
                 name     “Mike Hugo”
               company      “Entagen”
                          “lorem ipsum
                 bio
                         dolor sum etc...”
Simplest Solution
Lucene index of rdfs:label
Build the Index
String queryLabels = """
    SELECT ?uri ?label
    WHERE {
         ?uri rdfs:label ?label .
    }                 Build a SPARQL
"""                 query to find all the
                   rdfs:label properties
sparqlQueryService.executeForEach(repo
      def doc = new Document()
      String uri = it.uri.stringValue()
      String label = it.label.stringValu

    doc.add(new Field(SUBJECT_URI_FIEL
sparqlQueryService.executeForEach
  (repository, queryLabels) {
    String uri = it.uri.stringValue()
    String label = it.label.stringValu
     Execute the
      def doc = new Document()
    SPARQL query

      doc.add(new Field(SUBJECT_URI_FIEL
              Field.Store.YES, Field.Ind
      doc.add(new Field(LABEL_FIELD, lab
              Field.Store.NO, Field.Inde

      writer.addDocument(doc)
}
arqlQueryService.executeForEach(reposito
  String uri = it.uri.stringValue()
  String label = it.label.stringValue()

  Document doc = new Document()
  doc.add(new Field(SUBJECT_URI_FIELD,
          uri,           Instantiate a
                          new Lucene
          Field.Store.YES,
                           Document
          Field.Index.ANALYZED))
  doc.add(new Field(LABEL_FIELD,
          label,
          Field.Store.NO,
          Field.Index.ANALYZED))

  writer.addDocument(doc)
key
  Document doc = new Document()
  doc.add(new Field(SUBJECT_URI_FIELD,
    value uri,
          Field.Store.YES,
          Field.Index.ANALYZED))
  doc.add(new Field(LABEL_FIELD,
          label,        Add the Subject
          Field.Store.NO, URI to the
                           Document
          Field.Index.ANALYZED))

  writer.addDocument(doc)

lly {
Field.Store.YES,
           Field.Index.ANALYZED))
  doc.add(new Field(LABEL_FIELD, key
     value label,
           Field.Store.NO,
           Field.Index.ANALYZED))
                      Add the Label field
  writer.addDocument(doc) document
                       to the
                      (but don’t store it)
lly {
iter.close() // Close index
doc.add(new Field(LABEL_FIELD, labe
             Field.Store.NO,
             Field.Index.ANALYZED))

     writer.addDocument(doc)
 }
inally {
 writer.close()   // Closethe document
                      Add index
                        to the Index
Query the Index
f query = {
  Query query = new QueryParser(
     Version.LUCENE_CURRENT,
     LABEL_FIELD, query this field
     new StandardAnalyzer())
          .parse(params.query);
                           for this value
  def s Create a Lucene
        = new Date().time
        Query from user
  List results = executeQuery(query)
             input
  def e = new Date().time

  render(view: 'index', model: [results:
IndexSearcher searcher = luceneSearche
ScoreDoc[] scoreDocs =
    searcher.search(query, 10).scoreDo
List results = [] Search the index
                    (limit 10) for
def connection = repository.connection
scoreDocs.each {       matching
                     documents
    Document doc = searcher.doc(it.doc
    String uri = doc[SUBJECT_URI_FIELD
    Map labelAndType = sparqlQueryServ
    results << [uri: uri, type: labelA
}
connection.close()
return results
List results = []
def connection = repository.connection
scoreDocs.each {
  Document doc = searcher.doc(it.doc)
  String uri = doc[SUBJECT_URI_FIELD]
  Map labelAndType =
       For each matching
   sparqlQueryService.
       document, get the
     getLabelAndType(uri, connection)
      doc and extract the
  results.add([
          Subject URI
         uri: uri,
         type: labelAndType.type,
         label: labelAndType.label])
}
connection.close()
return results
List results = []
def connection = repository.connection
scoreDocs.each {
  Document doc = searcher.doc(it.doc)
  String uri = doc[SUBJECT_URI_FIELD]
  Map labelAndType =
   sparqlQueryService.
     getLabelAndType(uri, connection)
  results.add([
         uri: uri, Using the Subject
                  URI, load properties
         type: labelAndType.type,
                  from the triplestore
         label: labelAndType.label])
}
connection.close()
return results
List results = []
def connection = repository.connection
scoreDocs.each {
  Document doc = searcher.doc(it.doc)
                  return results
                containing Subject
  String uri = doc[SUBJECT_URI_FIELD]
  Map labelAndType Type, and Label
               URI, =
   sparqlQueryService.
     getLabelAndType(uri, connection)
  results.add([
         uri: uri,
         type: labelAndType.type,
         label: labelAndType.label])
}
connection.close()
return results
DEMO
Lucene Index of Searchable Labels
WHAT ABOUT ENTITY
 RELATIONSHIPS?
WHAT ABOUT OTHER
   PROPERTIES?
Lucene Extension

Indexing and Searching
 Semi-Structured Data
Document
Document

field                      value

 URI   <DB00619>
        <DB00619> rdfs:label "Imatinib" .
        <DB00619> rdf:type <drugbank:drugs> .
triples
        <DB00619> drugbank:brandName "Gleevec" .
        <DB00619> drugbank:target <targets/1588> .
Build the Index
Connection connection = repository.conn
y {
  String subjectUris = """
      SELECT distinct ?uri
      WHERE {
           ?uri ?p ?o .
      }
  """
  sparqlQueryService.executeForEach(rep
        def doc = new Select all Subject
                      Document()
                        URIs from the
                         triplestore
        String subjectUri = it.uri.string
        doc.add(new Field(SUBJECT_URI_FIE
                subjectUri,
"""
sparqlQueryService.executeForEach(
  repository, subjectUris) {
    def doc = new Document()

    String subjectUri = it.uri.stringV
    doc.add(new Field(SUBJECT_URI_FIEL
            subjectUri,
            Field.Store.YES,
             Execute the Sparql Query
            Field.Index.ANALYZED))
               For each URI, create a
                  new Document
    StringWriter triplesStringWriter =
    NTriplesWriter nTriplesWriter =
        new NTriplesWriter(triplesStri
epository, subjectUris) {
 def doc = new Document()

 String subjectUri = it.uri.stringValue
 doc.add(new Field(SUBJECT_URI_FIELD,
         subjectUri,
         Field.Store.YES,
         Field.Index.ANALYZED))

 StringWriter triplesStringWriter = new
 NTriplesWriter nTriplesWriter =URI
                  Add the Subject
                    to the Document
     new NTriplesWriter(triplesStringWr
 connection.exportStatements(
         new URIImpl(subjectUri),
         null, null, false,
Field.Index.ANALYZED))

StringWriter triplesStringWriter = new
NTriplesWriter nTriplesWriter =
    new NTriplesWriter(triplesStringWr
connection.exportStatements(
        new URIImpl(subjectUri),
        null, null, false,
        nTriplesWriter)

                     Get an NTriples
doc.add(new Field(TRIPLES_FIELD,
                     string from the
        triplesStringWriter.toString()
        Field.Store.NO, triplestore
        Field.Index.ANALYZED))
new URIImpl(subjectUri),
        null, null, false,
        nTriplesWriter)

doc.add(new Field(TRIPLES_FIELD,
        triplesStringWriter.toString()
        Field.Store.NO,
        Field.Index.ANALYZED))

                   Add the NTriples
writer.addDocument(doc)
                     string to the
                      document
doc.add(new Field(TRIPLES_FIELD,
        triplesStringWriter.toString()
        Field.Store.NO,
        Field.Index.ANALYZED))

writer.addDocument(doc)


                 Add the document
                   to the index
Query the Index
SirenCellQuery predicate =
  new SirenCellQuery(
   new SirenTermQuery(
       new Term(TRIPLES_FIELD,
       RDFS.LABEL.stringValue())));
predicate.constraint = PREDICATE_CELL

SirenCellQuery object =
                   query the Triples
  new SirenCellQuery(
   new SirenTermQuery(   field
           new Term(TRIPLES_FIELD,
           params.query.toLowerCase()))
object.constraint = OBJECT_CELL
SirenCellQuery predicate =
  new SirenCellQuery(
   new SirenTermQuery(
       new Term(TRIPLES_FIELD,
       RDFS.LABEL.stringValue())));
predicate.constraint = PREDICATE_CELL

SirenCellQuery object =
  new SirenCellQuery( a predicate
                    for
   new SirenTermQuery(
           new Term(TRIPLES_FIELD,
           params.query.toLowerCase()))
object.constraint = OBJECT_CELL
SirenCellQuery predicate =
  new SirenCellQuery(
   new SirenTermQuery(
       new Term(TRIPLES_FIELD,
       RDFS.LABEL.stringValue())));
predicate.constraint = PREDICATE_CELL
                         of rdfs:label *
SirenCellQuery object =
  new SirenCellQuery(
   new SirenTermQuery(
             new Term(TRIPLES_FIELD,
             params.query.toLowerCase()))
    * note: could be any predicate!
object.constraint = OBJECT_CELL
SirenCellQuery object =
  new SirenCellQuery(
   new SirenTermQuery(
           new Term(TRIPLES_FIELD,
           params.query.toLowerCase())
object.constraint = OBJECT_CELL

Query query = new SirenTupleQuery()
                   query the Triples
query.add(predicate,
                         field
        SirenTupleClause.Occur.MUST)
query.add(object,
        SirenTupleClause.Occur.MUST)
SirenCellQuery object =
  new SirenCellQuery(
   new SirenTermQuery(
           new Term(TRIPLES_FIELD,
           params.query.toLowerCase())
object.constraint = OBJECT_CELL

Query query = new SirenTupleQuery()
query.add(predicate,
                     for an object
        SirenTupleClause.Occur.MUST)
query.add(object,
        SirenTupleClause.Occur.MUST)
SirenCellQuery object =
  new SirenCellQuery(
   new SirenTermQuery(
           new Term(TRIPLES_FIELD,
           params.query.toLowerCase())
object.constraint = OBJECT_CELL

Query query = new SirenTupleQuery()
query.add(predicate, matching the
                      user input
        SirenTupleClause.Occur.MUST)
query.add(object,
        SirenTupleClause.Occur.MUST)
field                      value

  URI   <DB00619>
         <DB00619> rdfs:label "Imatinib" .
         <DB00619> rdf:type <drugbank:drugs> .
 triples
         <DB00619> drugbank:brandName "Gleevec" .
         <DB00619> drugbank:target <targets/1588> .


Query: “imatinib”
field                      value

  URI    <DB00619>
         <DB00619> rdfs:label "Imatinib" .
         <DB00619> rdf:type <drugbank:drugs> .
 triples
         <DB00619> drugbank:brandName "Gleevec" .
         <DB00619> drugbank:target <targets/1588> .


Query:

   triples field
field                      value

    URI   <DB00619>
           <DB00619> rdfs:label "Imatinib" .
           <DB00619> rdf:type <drugbank:drugs> .
   triples
           <DB00619> drugbank:brandName "Gleevec" .
           <DB00619> drugbank:target <targets/1588> .


 Query:

predicate = rdfs:label
field                      value

    URI   <DB00619>
           <DB00619> rdfs:label "Imatinib" .
           <DB00619> rdf:type <drugbank:drugs> .
   triples
           <DB00619> drugbank:brandName "Gleevec" .
           <DB00619> drugbank:target <targets/1588> .


 Query:

predicate = rdfs:label
                       object = “imatinib”
List executeQuery(Query query) {
 IndexSearcher searcher = sirenSearcherM
 ScoreDoc[] scoreDocs =
   searcher.search(query, 10).scoreDocs
 List results = []
 def connection = repository.connection
                        Search the index
 scoreDocs.each {         (limit 10) for
                             matching
     Document doc = searcher.doc(it.doc)
                           documents
     String uri = doc[SUBJECT_URI_FIELD]
     Map labelAndType = sparqlQueryServi
          getLabelAndType(uri, connectio
     results.add([
             uri: uri,
             type: labelAndType.type,
List results = []
def connection = repository.connection
scoreDocs.each {
    Document doc = searcher.doc(it.doc)
    String uri = doc[SUBJECT_URI_FIELD]
    Map labelAndType = sparqlQueryServic
             For each matching
         getLabelAndType(uri, connection
             document, get the
    results.add([
            doc and extract the
            uri: uri,
                Subject URI
            type: labelAndType.type,
            label: labelAndType.label])
}
connection.close()
return results
connection = repository.connection
reDocs.each {
 Document doc = searcher.doc(it.doc)
 String uri = doc[SUBJECT_URI_FIELD]
 Map labelAndType = sparqlQueryService.
      getLabelAndType(uri, connection)
 results.add([
         uri: uri, Using the Subject
         type: labelAndType.type,
                   URI, load properties
         label: labelAndType.label])
                   from the triplestore

nection.close()
urn results
String uri = doc[SUBJECT_URI_FIELD]
 Map labelAndType = sparqlQueryService.
      getLabelAndType(uri, connection)
 results.add([
         uri: uri,
         type: labelAndType.type,
         label: labelAndType.label])

nection.close()      return results
urn results        containing Subject
                  URI, Type, and Label
DEMO
SIREn Index of RDF Entities
FLEXIBILITY
field                      value

    URI   <DB00619>
           <DB00619> rdfs:label "Imatinib" .
           <DB00619> rdf:type <drugbank:drugs> .
   triples
           <DB00619> drugbank:brandName "Gleevec" .
           <DB00619> drugbank:target <targets/1588> .


 Query:

predicate = rdfs:label
                       object = “imatinib”
field                      value

  URI    <DB00619>
         <DB00619> rdfs:label "Imatinib" .
         <DB00619> rdf:type <drugbank:drugs> .
 triples
         <DB00619> drugbank:brandName "Gleevec" .
         <DB00619> drugbank:target <targets/1588> .


Query:

         object = “imatinib”
field                      value

  URI    <DB00619>
         <DB00619> rdfs:label "Imatinib" .
         <DB00619> rdf:type <drugbank:drugs> .
 triples
         <DB00619> drugbank:brandName "Gleevec" .
         <DB00619> drugbank:target <targets/1588> .


Query:
object = “imatinib”
              OR
               object = “gleevec”
MORE THAN LITERALS
field                      value

  URI    <DB00619>
         <DB00619> rdfs:label "Imatinib" .
         <DB00619> rdf:type <drugbank:drugs> .
 triples
         <DB00619> drugbank:brandName "Gleevec" .
         <DB00619> drugbank:target <targets/1588> .


Query:


         predicate = brandName
field                      value

  URI    <DB00619>
         <DB00619> rdfs:label "Imatinib" .
         <DB00619> rdf:type <drugbank:drugs> .
 triples
         <DB00619> drugbank:brandName "Gleevec" .
         <DB00619> drugbank:target <targets/1588> .


Query:


             predicate = target
RELATIONSHIPS
field                      value

  URI    <DB00619>
         <DB00619> rdfs:label "Imatinib" .
         <DB00619> rdf:type <drugbank:drugs> .
 triples
         <DB00619> drugbank:brandName "Gleevec" .
         <DB00619> drugbank:target <targets/1588> .


Query:


         object = <targets/1588>
DEMO
Searching SIREn Index for Relationships
Distributed
Indexing and Searching
 Semi-Structured Data
Replication
400 Million Documents
> 12 Billion Triples
Query Parser
Query Parser




subject

          predicate   object
DEMO
SIREn in action on TripleMap.com
DEMO
SIREn in action on TripleMap.com
SPARQL

    LUCENE

          SIREN
    TripleMap.com
QUESTIONS?

    mike@entagen.com / twitter: @piragua



                                TripleMap

http://www.entagen.com   http://www.triplemap.com

Weitere ähnliche Inhalte

Was ist angesagt?

Erlang for data ops
Erlang for data opsErlang for data ops
Erlang for data opsmnacos
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling rogerbodamer
 
MongoDB and PHP ZendCon 2011
MongoDB and PHP ZendCon 2011MongoDB and PHP ZendCon 2011
MongoDB and PHP ZendCon 2011Steven Francia
 
Ext GWT 3.0 Data Widgets
Ext GWT 3.0 Data WidgetsExt GWT 3.0 Data Widgets
Ext GWT 3.0 Data WidgetsSencha
 
Code Samples &amp; Screenshots
Code Samples &amp; ScreenshotsCode Samples &amp; Screenshots
Code Samples &amp; ScreenshotsNii Amah Hesse
 
An introduction into Spring Data
An introduction into Spring DataAn introduction into Spring Data
An introduction into Spring DataOliver Gierke
 
Mongo db문서의생성,갱신,삭제
Mongo db문서의생성,갱신,삭제Mongo db문서의생성,갱신,삭제
Mongo db문서의생성,갱신,삭제홍준 김
 
Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!Oliver Gierke
 
SetFocus Portfolio
SetFocus PortfolioSetFocus Portfolio
SetFocus Portfoliodonjoshu
 
4시간만에 따라해보는 Windows 10 앱 개발 샘플코드
4시간만에 따라해보는 Windows 10 앱 개발 샘플코드4시간만에 따라해보는 Windows 10 앱 개발 샘플코드
4시간만에 따라해보는 Windows 10 앱 개발 샘플코드영욱 김
 
Building DSLs with Groovy
Building DSLs with GroovyBuilding DSLs with Groovy
Building DSLs with GroovySten Anderson
 
Embedding a language into string interpolator
Embedding a language into string interpolatorEmbedding a language into string interpolator
Embedding a language into string interpolatorMichael Limansky
 

Was ist angesagt? (20)

Database2
Database2Database2
Database2
 
DOM and Events
DOM and EventsDOM and Events
DOM and Events
 
Erlang for data ops
Erlang for data opsErlang for data ops
Erlang for data ops
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling
 
MongoDB and PHP ZendCon 2011
MongoDB and PHP ZendCon 2011MongoDB and PHP ZendCon 2011
MongoDB and PHP ZendCon 2011
 
Ext GWT 3.0 Data Widgets
Ext GWT 3.0 Data WidgetsExt GWT 3.0 Data Widgets
Ext GWT 3.0 Data Widgets
 
Code Samples &amp; Screenshots
Code Samples &amp; ScreenshotsCode Samples &amp; Screenshots
Code Samples &amp; Screenshots
 
An introduction into Spring Data
An introduction into Spring DataAn introduction into Spring Data
An introduction into Spring Data
 
Mongo db문서의생성,갱신,삭제
Mongo db문서의생성,갱신,삭제Mongo db문서의생성,갱신,삭제
Mongo db문서의생성,갱신,삭제
 
03DOM.ppt
03DOM.ppt03DOM.ppt
03DOM.ppt
 
Jquery
JqueryJquery
Jquery
 
Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!
 
SetFocus Portfolio
SetFocus PortfolioSetFocus Portfolio
SetFocus Portfolio
 
4시간만에 따라해보는 Windows 10 앱 개발 샘플코드
4시간만에 따라해보는 Windows 10 앱 개발 샘플코드4시간만에 따라해보는 Windows 10 앱 개발 샘플코드
4시간만에 따라해보는 Windows 10 앱 개발 샘플코드
 
Building DSLs with Groovy
Building DSLs with GroovyBuilding DSLs with Groovy
Building DSLs with Groovy
 
Embedding a language into string interpolator
Embedding a language into string interpolatorEmbedding a language into string interpolator
Embedding a language into string interpolator
 
J query1
J query1J query1
J query1
 
JSON
JSONJSON
JSON
 
J query
J queryJ query
J query
 
jQuery
jQueryjQuery
jQuery
 

Andere mochten auch

The Role of the Intangibles Information Gap in the Financialization of the A...
The Role of the Intangibles Information Gap in the Financialization of the A...The Role of the Intangibles Information Gap in the Financialization of the A...
The Role of the Intangibles Information Gap in the Financialization of the A...Smarter-Companies
 
Managementmodellen Bij Audit Cc
Managementmodellen Bij Audit CcManagementmodellen Bij Audit Cc
Managementmodellen Bij Audit CcBruno Verbergt
 
Digitalcommunicationstrategysn 090602093708-phpapp02
Digitalcommunicationstrategysn 090602093708-phpapp02Digitalcommunicationstrategysn 090602093708-phpapp02
Digitalcommunicationstrategysn 090602093708-phpapp02indraf
 
All we know he´s called Kenneth
All we know he´s called KennethAll we know he´s called Kenneth
All we know he´s called KennethPeter Falkheden
 
Midwest Trust and Wealth Management Conference Presentation
Midwest Trust and Wealth Management Conference PresentationMidwest Trust and Wealth Management Conference Presentation
Midwest Trust and Wealth Management Conference PresentationP. Haans Mulder, JD, MST, CFP®
 
IC: Ready to Cross The Chasm?
IC: Ready to Cross The Chasm?IC: Ready to Cross The Chasm?
IC: Ready to Cross The Chasm?Smarter-Companies
 
DreMode~Capabilities Kit
DreMode~Capabilities KitDreMode~Capabilities Kit
DreMode~Capabilities KitDreMode
 
Creating Value with SAP BusinessObjects Planning and Consolidation, version f...
Creating Value with SAP BusinessObjects Planning and Consolidation, version f...Creating Value with SAP BusinessObjects Planning and Consolidation, version f...
Creating Value with SAP BusinessObjects Planning and Consolidation, version f...dcd2z
 
Finito, lavoro italiano
Finito, lavoro italianoFinito, lavoro italiano
Finito, lavoro italianoStefano31
 

Andere mochten auch (11)

The Role of the Intangibles Information Gap in the Financialization of the A...
The Role of the Intangibles Information Gap in the Financialization of the A...The Role of the Intangibles Information Gap in the Financialization of the A...
The Role of the Intangibles Information Gap in the Financialization of the A...
 
Managementmodellen Bij Audit Cc
Managementmodellen Bij Audit CcManagementmodellen Bij Audit Cc
Managementmodellen Bij Audit Cc
 
Digitalcommunicationstrategysn 090602093708-phpapp02
Digitalcommunicationstrategysn 090602093708-phpapp02Digitalcommunicationstrategysn 090602093708-phpapp02
Digitalcommunicationstrategysn 090602093708-phpapp02
 
Navigating Complicated Issues for Seniors
Navigating Complicated Issues for Seniors Navigating Complicated Issues for Seniors
Navigating Complicated Issues for Seniors
 
All we know he´s called Kenneth
All we know he´s called KennethAll we know he´s called Kenneth
All we know he´s called Kenneth
 
Midwest Trust and Wealth Management Conference Presentation
Midwest Trust and Wealth Management Conference PresentationMidwest Trust and Wealth Management Conference Presentation
Midwest Trust and Wealth Management Conference Presentation
 
IC: Ready to Cross The Chasm?
IC: Ready to Cross The Chasm?IC: Ready to Cross The Chasm?
IC: Ready to Cross The Chasm?
 
DreMode~Capabilities Kit
DreMode~Capabilities KitDreMode~Capabilities Kit
DreMode~Capabilities Kit
 
Corporate Presentation
Corporate PresentationCorporate Presentation
Corporate Presentation
 
Creating Value with SAP BusinessObjects Planning and Consolidation, version f...
Creating Value with SAP BusinessObjects Planning and Consolidation, version f...Creating Value with SAP BusinessObjects Planning and Consolidation, version f...
Creating Value with SAP BusinessObjects Planning and Consolidation, version f...
 
Finito, lavoro italiano
Finito, lavoro italianoFinito, lavoro italiano
Finito, lavoro italiano
 

Ähnlich wie Improving RDF Search Performance with Lucene and SIREN

Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDBrogerbodamer
 
What do you mean, Backwards Compatibility?
What do you mean, Backwards Compatibility?What do you mean, Backwards Compatibility?
What do you mean, Backwards Compatibility?Trisha Gee
 
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data ModelingDATAVERSITY
 
2017-06-22 Documentation as code
2017-06-22 Documentation as code2017-06-22 Documentation as code
2017-06-22 Documentation as codeJérémie Bresson
 
Pyconie 2012
Pyconie 2012Pyconie 2012
Pyconie 2012Yaqi Zhao
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring DataEric Bottard
 
Building DSLs with the Spoofax Language Workbench
Building DSLs with the Spoofax Language WorkbenchBuilding DSLs with the Spoofax Language Workbench
Building DSLs with the Spoofax Language WorkbenchEelco Visser
 
Building Your First MongoDB App
Building Your First MongoDB AppBuilding Your First MongoDB App
Building Your First MongoDB AppHenrik Ingo
 
ActionScript3 collection query API proposal
ActionScript3 collection query API proposalActionScript3 collection query API proposal
ActionScript3 collection query API proposalSlavisa Pokimica
 
DevNation'15 - Using Lambda Expressions to Query a Datastore
DevNation'15 - Using Lambda Expressions to Query a DatastoreDevNation'15 - Using Lambda Expressions to Query a Datastore
DevNation'15 - Using Lambda Expressions to Query a DatastoreXavier Coulon
 
Avro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSONAvro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSONAlexandre Victoor
 
Working With JQuery Part1
Working With JQuery Part1Working With JQuery Part1
Working With JQuery Part1saydin_soft
 
When Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at SquidooWhen Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at SquidooGil Hildebrand
 

Ähnlich wie Improving RDF Search Performance with Lucene and SIREN (20)

Python dictionaries
Python dictionariesPython dictionaries
Python dictionaries
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
What do you mean, Backwards Compatibility?
What do you mean, Backwards Compatibility?What do you mean, Backwards Compatibility?
What do you mean, Backwards Compatibility?
 
Decorators demystified
Decorators demystifiedDecorators demystified
Decorators demystified
 
Text to data
Text to dataText to data
Text to data
 
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
 
2017-06-22 Documentation as code
2017-06-22 Documentation as code2017-06-22 Documentation as code
2017-06-22 Documentation as code
 
Pyconie 2012
Pyconie 2012Pyconie 2012
Pyconie 2012
 
MongoDB (Advanced)
MongoDB (Advanced)MongoDB (Advanced)
MongoDB (Advanced)
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 
Building DSLs with the Spoofax Language Workbench
Building DSLs with the Spoofax Language WorkbenchBuilding DSLs with the Spoofax Language Workbench
Building DSLs with the Spoofax Language Workbench
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 
Building Your First MongoDB App
Building Your First MongoDB AppBuilding Your First MongoDB App
Building Your First MongoDB App
 
ActionScript3 collection query API proposal
ActionScript3 collection query API proposalActionScript3 collection query API proposal
ActionScript3 collection query API proposal
 
DevNation'15 - Using Lambda Expressions to Query a Datastore
DevNation'15 - Using Lambda Expressions to Query a DatastoreDevNation'15 - Using Lambda Expressions to Query a Datastore
DevNation'15 - Using Lambda Expressions to Query a Datastore
 
Avro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSONAvro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSON
 
JNDI
JNDIJNDI
JNDI
 
Working With JQuery Part1
Working With JQuery Part1Working With JQuery Part1
Working With JQuery Part1
 
When Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at SquidooWhen Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at Squidoo
 

Kürzlich hochgeladen

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Improving RDF Search Performance with Lucene and SIREN

  • 1. INDEXING AND SEARCHING RDF DATASETS Improving Performance of Semantic Web Applications with Lucene, SIREn and RDF Mike Hugo Entagen, LLC
  • 2. slides and sample code can be found at https://github.com/mjhugo/rdf-lucene-siren-presentation
  • 3.
  • 5. 17
  • 6.
  • 7.
  • 9.
  • 11. SPARQL LUCENE
  • 12. SPARQL LUCENE SIREN
  • 13. SPARQL LUCENE SIREN TripleMap.com
  • 14.
  • 17. WHAT’S A TRIPLE? Subject Predicate Object
  • 18. WHAT’S A TRIPLE? <Mike> <name> “Mike Hugo”
  • 19. WHAT’S A TRIPLE? “Minneapolis” <lives_in_city> <Mike> <name> “Mike Hugo”
  • 20. WHAT’S A TRIPLE? “Minneapolis” <Mike> <lives_in_city> <daughter> <name> “Mike Hugo” <Lydia>
  • 21. WHAT’S A TRIPLE? “Minneapolis” <Mike> <lives_in_city> <name> <daughter> “Mike Hugo” <Lydia> <name> “Lydia Hugo”
  • 22.
  • 23.
  • 24.
  • 25.
  • 27. select id, label from targets where label = ‘${queryValue}’
  • 28. select id, label from targets where label ilike ‘%${queryValue}%’
  • 29. SELECT ?uri ?type ?label WHERE { ?uri rdfs:label ?label . ?uri rdf:type ?type . FILTER (?label = '${params.query}') } LIMIT 10
  • 30. SELECT ?uri ?type ?label WHERE { ?uri rdfs:label ?label . ?uri rdf:type ?type . FILTER regex(?label, 'Q${params.query}E', 'i') } LIMIT 10
  • 31. SELECT ?uri ?type ?label WHERE { ?uri rdfs:label ?label . ?uri rdf:type ?type . FILTER regex(?label, 'Q${params.query}E', 'i') } LIMIT 10 case insensitive query as literal value
  • 34.
  • 35. Java API Indexing and Searching Text
  • 37. indexing storage
  • 39. Document field value ID 2 name “Mike Hugo” company “Entagen” “lorem ipsum bio dolor sum etc...”
  • 40. Index field field value value field field value value field field value name name field “mike value value hugo” “mike hugo” name name “mike hugo” “mike hugo” name nameid “mike hugo” “mike 2hugo” company company “Entagen” “Entagen” company company name “Entagen” “Entagen” “Mike Hugo” company company “Entagen” “Entagen” “Entagen” Indexed company “lorem ipsum “lorem ipsum bio bio “lorem ipsum “lorem etc...” bio bio “loremipsum dolorsum ipsum dolor“loremetc...” not bio bio sum ipsum dolorsum etc...”” dolor sum etc... ” bio dolor sum ipsum” “lorem etc... dolor sum etc... Stored dolor sum etc...”
  • 41. Query: name: mike
  • 42. Query: name: mike Matching Documents: field value idfield 2 value idfield 2 value idfield 2 value id 2
  • 43. field value id 2
  • 44. field value id 2
  • 45. field value id 2 field value ID 2 name “Mike Hugo” company “Entagen” “lorem ipsum bio dolor sum etc...”
  • 47. Lucene index of rdfs:label
  • 49. String queryLabels = """ SELECT ?uri ?label WHERE { ?uri rdfs:label ?label . } Build a SPARQL """ query to find all the rdfs:label properties sparqlQueryService.executeForEach(repo def doc = new Document() String uri = it.uri.stringValue() String label = it.label.stringValu doc.add(new Field(SUBJECT_URI_FIEL
  • 50. sparqlQueryService.executeForEach (repository, queryLabels) { String uri = it.uri.stringValue() String label = it.label.stringValu Execute the def doc = new Document() SPARQL query doc.add(new Field(SUBJECT_URI_FIEL Field.Store.YES, Field.Ind doc.add(new Field(LABEL_FIELD, lab Field.Store.NO, Field.Inde writer.addDocument(doc) }
  • 51. arqlQueryService.executeForEach(reposito String uri = it.uri.stringValue() String label = it.label.stringValue() Document doc = new Document() doc.add(new Field(SUBJECT_URI_FIELD, uri, Instantiate a new Lucene Field.Store.YES, Document Field.Index.ANALYZED)) doc.add(new Field(LABEL_FIELD, label, Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc)
  • 52. key Document doc = new Document() doc.add(new Field(SUBJECT_URI_FIELD, value uri, Field.Store.YES, Field.Index.ANALYZED)) doc.add(new Field(LABEL_FIELD, label, Add the Subject Field.Store.NO, URI to the Document Field.Index.ANALYZED)) writer.addDocument(doc) lly {
  • 53. Field.Store.YES, Field.Index.ANALYZED)) doc.add(new Field(LABEL_FIELD, key value label, Field.Store.NO, Field.Index.ANALYZED)) Add the Label field writer.addDocument(doc) document to the (but don’t store it) lly { iter.close() // Close index
  • 54. doc.add(new Field(LABEL_FIELD, labe Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc) } inally { writer.close() // Closethe document Add index to the Index
  • 56. f query = { Query query = new QueryParser( Version.LUCENE_CURRENT, LABEL_FIELD, query this field new StandardAnalyzer()) .parse(params.query); for this value def s Create a Lucene = new Date().time Query from user List results = executeQuery(query) input def e = new Date().time render(view: 'index', model: [results:
  • 57. IndexSearcher searcher = luceneSearche ScoreDoc[] scoreDocs = searcher.search(query, 10).scoreDo List results = [] Search the index (limit 10) for def connection = repository.connection scoreDocs.each { matching documents Document doc = searcher.doc(it.doc String uri = doc[SUBJECT_URI_FIELD Map labelAndType = sparqlQueryServ results << [uri: uri, type: labelA } connection.close() return results
  • 58. List results = [] def connection = repository.connection scoreDocs.each { Document doc = searcher.doc(it.doc) String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = For each matching sparqlQueryService. document, get the getLabelAndType(uri, connection) doc and extract the results.add([ Subject URI uri: uri, type: labelAndType.type, label: labelAndType.label]) } connection.close() return results
  • 59. List results = [] def connection = repository.connection scoreDocs.each { Document doc = searcher.doc(it.doc) String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = sparqlQueryService. getLabelAndType(uri, connection) results.add([ uri: uri, Using the Subject URI, load properties type: labelAndType.type, from the triplestore label: labelAndType.label]) } connection.close() return results
  • 60. List results = [] def connection = repository.connection scoreDocs.each { Document doc = searcher.doc(it.doc) return results containing Subject String uri = doc[SUBJECT_URI_FIELD] Map labelAndType Type, and Label URI, = sparqlQueryService. getLabelAndType(uri, connection) results.add([ uri: uri, type: labelAndType.type, label: labelAndType.label]) } connection.close() return results
  • 61. DEMO Lucene Index of Searchable Labels
  • 62. WHAT ABOUT ENTITY RELATIONSHIPS?
  • 63. WHAT ABOUT OTHER PROPERTIES?
  • 64. Lucene Extension Indexing and Searching Semi-Structured Data
  • 66. Document field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> .
  • 68. Connection connection = repository.conn y { String subjectUris = """ SELECT distinct ?uri WHERE { ?uri ?p ?o . } """ sparqlQueryService.executeForEach(rep def doc = new Select all Subject Document() URIs from the triplestore String subjectUri = it.uri.string doc.add(new Field(SUBJECT_URI_FIE subjectUri,
  • 69. """ sparqlQueryService.executeForEach( repository, subjectUris) { def doc = new Document() String subjectUri = it.uri.stringV doc.add(new Field(SUBJECT_URI_FIEL subjectUri, Field.Store.YES, Execute the Sparql Query Field.Index.ANALYZED)) For each URI, create a new Document StringWriter triplesStringWriter = NTriplesWriter nTriplesWriter = new NTriplesWriter(triplesStri
  • 70. epository, subjectUris) { def doc = new Document() String subjectUri = it.uri.stringValue doc.add(new Field(SUBJECT_URI_FIELD, subjectUri, Field.Store.YES, Field.Index.ANALYZED)) StringWriter triplesStringWriter = new NTriplesWriter nTriplesWriter =URI Add the Subject to the Document new NTriplesWriter(triplesStringWr connection.exportStatements( new URIImpl(subjectUri), null, null, false,
  • 71. Field.Index.ANALYZED)) StringWriter triplesStringWriter = new NTriplesWriter nTriplesWriter = new NTriplesWriter(triplesStringWr connection.exportStatements( new URIImpl(subjectUri), null, null, false, nTriplesWriter) Get an NTriples doc.add(new Field(TRIPLES_FIELD, string from the triplesStringWriter.toString() Field.Store.NO, triplestore Field.Index.ANALYZED))
  • 72. new URIImpl(subjectUri), null, null, false, nTriplesWriter) doc.add(new Field(TRIPLES_FIELD, triplesStringWriter.toString() Field.Store.NO, Field.Index.ANALYZED)) Add the NTriples writer.addDocument(doc) string to the document
  • 73. doc.add(new Field(TRIPLES_FIELD, triplesStringWriter.toString() Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc) Add the document to the index
  • 75. SirenCellQuery predicate = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, RDFS.LABEL.stringValue()))); predicate.constraint = PREDICATE_CELL SirenCellQuery object = query the Triples new SirenCellQuery( new SirenTermQuery( field new Term(TRIPLES_FIELD, params.query.toLowerCase())) object.constraint = OBJECT_CELL
  • 76. SirenCellQuery predicate = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, RDFS.LABEL.stringValue()))); predicate.constraint = PREDICATE_CELL SirenCellQuery object = new SirenCellQuery( a predicate for new SirenTermQuery( new Term(TRIPLES_FIELD, params.query.toLowerCase())) object.constraint = OBJECT_CELL
  • 77. SirenCellQuery predicate = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, RDFS.LABEL.stringValue()))); predicate.constraint = PREDICATE_CELL of rdfs:label * SirenCellQuery object = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, params.query.toLowerCase())) * note: could be any predicate! object.constraint = OBJECT_CELL
  • 78. SirenCellQuery object = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, params.query.toLowerCase()) object.constraint = OBJECT_CELL Query query = new SirenTupleQuery() query the Triples query.add(predicate, field SirenTupleClause.Occur.MUST) query.add(object, SirenTupleClause.Occur.MUST)
  • 79. SirenCellQuery object = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, params.query.toLowerCase()) object.constraint = OBJECT_CELL Query query = new SirenTupleQuery() query.add(predicate, for an object SirenTupleClause.Occur.MUST) query.add(object, SirenTupleClause.Occur.MUST)
  • 80. SirenCellQuery object = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, params.query.toLowerCase()) object.constraint = OBJECT_CELL Query query = new SirenTupleQuery() query.add(predicate, matching the user input SirenTupleClause.Occur.MUST) query.add(object, SirenTupleClause.Occur.MUST)
  • 81. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: “imatinib”
  • 82. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: triples field
  • 83. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: predicate = rdfs:label
  • 84. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: predicate = rdfs:label object = “imatinib”
  • 85. List executeQuery(Query query) { IndexSearcher searcher = sirenSearcherM ScoreDoc[] scoreDocs = searcher.search(query, 10).scoreDocs List results = [] def connection = repository.connection Search the index scoreDocs.each { (limit 10) for matching Document doc = searcher.doc(it.doc) documents String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = sparqlQueryServi getLabelAndType(uri, connectio results.add([ uri: uri, type: labelAndType.type,
  • 86. List results = [] def connection = repository.connection scoreDocs.each { Document doc = searcher.doc(it.doc) String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = sparqlQueryServic For each matching getLabelAndType(uri, connection document, get the results.add([ doc and extract the uri: uri, Subject URI type: labelAndType.type, label: labelAndType.label]) } connection.close() return results
  • 87. connection = repository.connection reDocs.each { Document doc = searcher.doc(it.doc) String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = sparqlQueryService. getLabelAndType(uri, connection) results.add([ uri: uri, Using the Subject type: labelAndType.type, URI, load properties label: labelAndType.label]) from the triplestore nection.close() urn results
  • 88. String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = sparqlQueryService. getLabelAndType(uri, connection) results.add([ uri: uri, type: labelAndType.type, label: labelAndType.label]) nection.close() return results urn results containing Subject URI, Type, and Label
  • 89. DEMO SIREn Index of RDF Entities
  • 91. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: predicate = rdfs:label object = “imatinib”
  • 92. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: object = “imatinib”
  • 93. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: object = “imatinib” OR object = “gleevec”
  • 95. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: predicate = brandName
  • 96. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: predicate = target
  • 98.
  • 99.
  • 100.
  • 101.
  • 102. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: object = <targets/1588>
  • 103. DEMO Searching SIREn Index for Relationships
  • 104. Distributed Indexing and Searching Semi-Structured Data
  • 105.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111. 400 Million Documents > 12 Billion Triples
  • 113. Query Parser subject predicate object
  • 114. DEMO SIREn in action on TripleMap.com
  • 115. DEMO SIREn in action on TripleMap.com
  • 116. SPARQL LUCENE SIREN TripleMap.com
  • 117. QUESTIONS? mike@entagen.com / twitter: @piragua TripleMap http://www.entagen.com http://www.triplemap.com