SlideShare a Scribd company logo
1 of 45
Download to read offline
Insiders
                                                            January
                                                               2010


                  Using the Web of Data
                            for
                  Information Extraction


    scoobie
          sparql rdfa
D2R server rdf
 squin    epiphany
  Linked Data
                OBIE




                        Benjamin Adrian
                        http://www.dfki.uni-kl.de/~adrian
Insiders
Are you still surfing ...                  January
                                              2010




       Benjamin Adrian
       http://www.dfki.uni-kl.de/~adrian
Insiders
… or overloaded?                       January
                                          2010




   Benjamin Adrian
   http://www.dfki.uni-kl.de/~adrian
Insiders
                 A simple question ...                                January
                                                                         2010


What are the cities of the universities in Rhineland Palatinate and
what is the unemployment rate of these cities?




                             Benjamin Adrian
                             http://www.dfki.uni-kl.de/~adrian
Insiders
                     A simple question ...                                       January
                                                                                    2010


What are the cities of the universities in Rhineland Palatinate and
what is the unemployment rate of these cities?

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX eurostat: <http://www4.wiwiss.fu-berlin.de/eurostat/resource/eurostat/>
PREFIX dbpedia: <http://dbpedia.org/ontology/>
PREFIX dbpedia_cat: <http://dbpedia.org/resource/Category>

SELECT ?dbpcity ?cityName ?ur WHERE {
?uni      skos:subject dbpedia_cat:Universities_and_colleges_in_Rhineland-Palatinate;
          dbpedia:city                       ?dbpcity .
?dbpcity  owl:sameAs                         ?statcity.
?statcity rdfs:label                         ?cityName ;
          eurostat:unemployment_rate_total ?ur
}
                 http://www.w3.org/TR/rdf-sparql-query/
                                  Benjamin Adrian
                                  http://www.dfki.uni-kl.de/~adrian
Insiders
                      … and its answer.                                           January
                                                                                     2010



         dbpcity                                      cityName          ur

         http://dbpedia.org/resource/Koblenz          Koblenz           8.8
         http://dbpedia.org/resource/Trier            Trier             7.3




Data Sources:

 http://epp.eurostat.ec.europa.eu                       http://wiki.dbpedia.org
 http://www4.wiwiss.fu-berlin.de/eurostat/


Query Engine:    SQUIN - Query the Web of Linked Data
                 http://squin.sourceforge.net/




                                 Benjamin Adrian
                                 http://www.dfki.uni-kl.de/~adrian
So much data out there,                      Insiders
                                             January
too much?                                       2010




         Benjamin Adrian
         http://www.dfki.uni-kl.de/~adrian
Insiders
What data do you have?                    January
                                             2010




      Benjamin Adrian
      http://www.dfki.uni-kl.de/~adrian
Insiders
Are you still surfing ...                  January
                                              2010




       Benjamin Adrian
       http://www.dfki.uni-kl.de/~adrian
Insiders
                   Agenda                             January
                                                         2010


In order to use Web of Data for information
extraction, you have to understand its basics.
●   RDF on one slide
●   Publish data in RDF with D2R Server
●   Publish RDF as Linked Data
●   Query Linked Data with SPARQL and Squin
●   Use RDF for information extraction
●   Bring Linked Data to text via RDFa


                  Benjamin Adrian
                  http://www.dfki.uni-kl.de/~adrian
Insiders
       Wouldn't this be nice.                    January
                                                    2010



Data




             Benjamin Adrian
             http://www.dfki.uni-kl.de/~adrian              11
Insiders
       Wouldn't this be nice.                                             January
                                                                             2010



Data        Text


                                              User-defined Filter




           Ex
             tra
                ct
                   io
                        n
                            Pi
                                 pe
                                   l in
                                          e


                                                             Extraction
                                                              Results
                                          enrich

                Benjamin Adrian
                http://www.dfki.uni-kl.de/~adrian                                    12
Insiders
       Wouldn't this be nice.                                                   January
                                                                                   2010

                                                            annotated
Data        Text                                                 text


                                              User-defined Filter




           Ex                                                             annotate
             tra
                ct
                   io
                        n
                            Pi
                                 pe
                                   l in
                                          e


                                                             Extraction
                                                              Results
                                          enrich

                Benjamin Adrian
                http://www.dfki.uni-kl.de/~adrian                                          13
Insiders
       Wouldn't this be nice.                                                    January
                                                                                    2010

                                                             annotated
Data          Text                                                text


                                               User-defined Filter




            Ex                                                             annotate
              tra
                 ct
                    io
                         n
                             Pi
                                  pe
          populate                  l in
                                           e


                                                              Extraction
                                                               Results
                                           enrich

                 Benjamin Adrian
                 http://www.dfki.uni-kl.de/~adrian                                          14
Insiders
                        RDF on one slide                                                  January
                                                                                             2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type     foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications//icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .
* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                   Benjamin AdrianFound at:
                                   http://www.dfki.uni-kl.de/~adrian
Insiders
                               RDF on one slide                                                January
                                                                                                  2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
                                                                                Vocabularies
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type       foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .

* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                            Benjamin AdrianFound at:
                                            http://www.dfki.uni-kl.de/~adrian
Insiders
                               RDF on one slide                                               January
                                                                                                 2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
                                                                                URLs / URIs
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type       foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .

* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                            Benjamin AdrianFound at:
                                            http://www.dfki.uni-kl.de/~adrian
Insiders
                               RDF on one slide                                            January
                                                                                              2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
                                                                                Subjects
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type       foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .

* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                            Benjamin AdrianFound at:
                                            http://www.dfki.uni-kl.de/~adrian
Insiders
                               RDF on one slide                                              January
                                                                                                2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
                                                                                Predicates
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type       foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .

* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                            Benjamin AdrianFound at:
                                            http://www.dfki.uni-kl.de/~adrian
Insiders
                               RDF on one slide                                           January
                                                                                             2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
                                                                                Objects
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type       foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .

* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                            Benjamin AdrianFound at:
                                            http://www.dfki.uni-kl.de/~adrian
Insiders
RDF data is graph data.                    January
                                              2010




       Benjamin Adrian
       http://www.dfki.uni-kl.de/~adrian
Publishing relational                     Insiders
                                          January
    data in RDF                              2010




      Benjamin Adrian
      http://www.dfki.uni-kl.de/~adrian
Publishing relational                                                 Insiders
                                                                                         January
                       data in RDF                                                          2010


D2R Server -    Publishing Relational Databases on
                the Semantic Web

   http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/




                                         Two small command line calls:

                                         ./d2r-server
                                              -p 80
                                              -b http://projects.dfki.uni-kl.de/mydatabase/
                                              mydatabase.n3
                                        ./generate-mapping
                                             -o mydatabase.n3
                                             -b http://projects.dfki.uni-kl.de/mydatabase/
                                             jdbc:mysql://localhost:3306/mydatabase


                                  Benjamin Adrian
                                  http://www.dfki.uni-kl.de/~adrian
Linked Data: Linking RDF                             Insiders
                                                     January
data from different sources                             2010


   Customer DB                        Employees DB




                  How to interlink
                  these datasets?




   Project DB                          DBpedia

                Benjamin Adrian
                http://www.dfki.uni-kl.de/~adrian
Linked Data: Linking RDF                                                Insiders
                                                                                    January
            data from different sources                                                2010


Linked Data Principles (TimBL, 2006)

1. Use URIs as names for things
                            (e.g., http://dbpedia.org/resource/Berlin)
2. Use HTTP-URIs so that people can look up those names
3. Provide useful information in RDF when someone looks up an URI
4. Include links to other URIs to enable discovery of more information
Example:

<http://dbpedia.org/resource/Berlin>
    owl:sameAs opencyc:en/CityOfBerlinGermany ;
    owl:sameAs opencyc:en/Berlin_StateGermany
    owl:sameAs <http://sws.geonames.org/2950159/>
    owl:sameAs <http://www4.wiwiss.fu-berlin.de/eurostat/resource/regions/Berlin>
    owl:sameAs freebase:http://dbpedia.org/resource/Berlin


                                  Benjamin Adrian
                                  http://www.dfki.uni-kl.de/~adrian
SPARQL: Querying RDF                                            Insiders
                                                                              January
                      data                                                       2010



SPARQL - the RDF query language.
In contrast to SQL, it's data model is not set oriented but graph oriented.

Some Examples:

     Resulting in tuples:
     SELECT ?interest ?friend WHERE {
         <http://www.w3.org/People/Berners­Lee/card#i> foaf:knows ?friend .
         ?friend foaf:interest ?interest .    }

     Resulting as graph :
     CONSTRUCT {?friend foaf:interest ?interest } WHERE {
         <http://www.w3.org/People/Berners­Lee/card#i> foaf:knows ?friend .
         ?friend foaf:interest ?interest .    }




                                  Benjamin Adrian
                                  http://www.dfki.uni-kl.de/~adrian
SPARQL: Query Linked                                Insiders
                                                     January
Data from different sources                             2010


   Customer DB                        Employees DB




                  How to access
                 these datasets
                  with a single
                 SPARQL query?




   Project DB                          DBpedia

                Benjamin Adrian
                http://www.dfki.uni-kl.de/~adrian
SPARQL: Query Linked                                                             Insiders
                                                                                         January
       Data from different sources                                                          2010


Customer DB           Employees DB              Squin: Query the Web of
                                                Linked Data

                                                http://squin.sourceforge.net/

                                                Squin follows a Link Traversal
 D2R Server           D2R Server                approach over HTTP URIs.

              SQUIN                             Remember:

                                                 SELECT DISTINCT ?c ?cityName ?ur
                                                WHERE {
D2R Server            D2R Server                ?u skos:subject
                                                dbpedia_cat:Universities_and_colleges_i
                                                n_Rhineland-Palatinate;
                                                   dbpedia:city ?c .
                                                 ?c owl:sameAs [ rdfs:label ?cityName ;

                                                eurostat:unemployment_rate_total ?ur ]
                                                }
Project DB            DBpedia

                      Benjamin Adrian
                      http://www.dfki.uni-kl.de/~adrian
Using RDF and Linked Data                                     Insiders
                                                              January
 for Information Extraction                                      2010


   User          Linked Data                          Query


          asks                      question



                       t
                  a bou




           to                      answers




   Text            Extraction                Result Graph
                   Pipeline



                  Benjamin Adrian
                  http://www.dfki.uni-kl.de/~adrian
Using RDF and Linked Data                                                       Insiders
                                                                                            January
             for Information Extraction                                                        2010


What data do we have?
Example RDF data
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    rdf:type     foaf:Document ;
    dc:creator   dblp_author:Markus_Ebbecke ; 
    dc:title     „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .




  Classes            Instances      Datatype Properties          Object Properties     Literals
 foaf:Document .../SchulzEGAAD09      dc:title                       dc:creator      „Markus“
 foaf:Person   .../Markus_Ebbecke     foaf:name                      foaf:knows      „Ebbecke“
                                      foaf:firstName                                 „Seizing the
                                      foaf:surName                                   Treasure:
                                                                                     Transferring
                                                                                     Knowledge
                                                                                     in Invoice
                                                                                     Analysis“

                                 Benjamin Adrian
                                 http://www.dfki.uni-kl.de/~adrian
SCOOBIE                                    Insiders
                                                                         January
                         Domain Adaption                                    2010



    Structured                            Text Corpus
    Data                                         Data

                                                          Patterns and
                                                           Gazetteers
                                                                  Data



                 Vocabulary Data

Instance Data



                    Data Preprocessing                Information
                    & Learning (offline)           Extraction (online)


                     Benjamin Adrian
                     http://www.dfki.uni-kl.de/~adrian                              31
SCOOBIE                       Insiders
                                                                             January
                                              Eco System                        2010


               Index      Domain Knowledge                   Models
                                 Text                             Training
                                Corpus                             Corpus
Session Data



                         Instances

                         Ontology                             Models

                                         Patterns +
                                         Gazetteers
                 Pre-
               process               Train                   Extract
Tasks
API




                I   O            I



                         Benjamin Adrian
                         http://www.dfki.uni-kl.de/~adrian                              32
SCOOBIE                       Insiders
                                                               January
                              OBIE Pipeline                       2010


Normalization                        Text Extraction
                                     Language Detection
Segmentation                         Tokenization
                                     Sentence Extraction
                                     POS-Tagging
Symbolization                        Named Entity Recognition
                                     Structured Entity Recognition
                                     Noun Phrase Chunking
                                     Symbol Recognition
Instantiation                        Instance Recognition
                                     Instance Disambiguation
                                     Chunk Classification
Contextualization                    Fact Extraction
                                     Fact Selection
Population                           Query Answering
                Benjamin Adrian
                http://www.dfki.uni-kl.de/~adrian                         33
Used Machine                           Insiders
                                                                 January

                        Learning Models                             2010


             Semi-Supervised Learning

            CRF-based Noun Phrase Chunker
I
             Supervised Learning

            Gazetteer matching statistics (Named Entity Recognition)
        I   Regex matching statistics (Structured Entity Recognition)

            Unsupervised or Instance-based Learning

            TF/IDF-based instance re-ranking (Instance Disambiguation)
    I       K-Nearest-Neighbor chunk classifier (Chunk Classification)
            Spreading Activation-based fact ranking (Fact Selection)


                       Benjamin Adrian
                       http://www.dfki.uni-kl.de/~adrian                    34
Used Machine Learning:                                                             Insiders
                                                                                          January
       Conditional Random Field                                                              2010



CRFs are sequence taggers:

Train it with:   Bill      CAPITALIZED                noun
                 slept     LOWERCASE                  non-noun
                 here      LOWERCASE                  non-noun

Test it with:    He            CAPITALIZED
                 visited       LOWERCASE
                 London        CAPITALIZED

CRF results:     noun                                           MALLET - MAchine Learning
                 non-noun                                       for LanguagE Toolkit
                 non-noun
                                                                http://mallet.cs.umass.edu/


                            Benjamin Adrian
                            http://www.dfki.uni-kl.de/~adrian                                        35
Bringing Linked Data to                            Insiders
                                                                January
                       Text                                        2010


Annotate plain text or HTML with RDF data.
   I'm working at DFKI.

RDFa offers an HTML extension:

   I'm working at
   <span about="dbpedia:DFKI" property="rdfs:label">
   DFKI</span>



Now lets generate RDFa automatically ...




                            Benjamin Adrian
                            http://www.dfki.uni-kl.de/~adrian              36
Insiders
       Do you remember?                                                        January
                                                                                  2010

                                                           annotated
Data        Text                                                text


                                             User-defined Filter




          Ex                                                             annotate
            tra
               ct
                  io
                       n
                           Pi
                                pe
        populate                  l in
                                         e


                                                            Extraction
                                                             Results
                                         enrich

               Benjamin Adrian
               http://www.dfki.uni-kl.de/~adrian                                          37
Insiders
RDF Epiphany                                               January
                                                              2010



                                      Epiphany takes the
                                      original webpage
                                       …




  Benjamin Adrian
  http://www.dfki.uni-kl.de/~adrian                                   38
Insiders
RDF Epiphany                                               January
                                                              2010



                                      Epiphany takes the
                                      original webpage
                                       …
                                      and SCOOBIE initialized
                                      with an RDF data set
                                      …




  Benjamin Adrian
  http://www.dfki.uni-kl.de/~adrian                                   39
Insiders
RDF Epiphany                                                 January
                                                                2010



                                      Epiphany takes the
                                      original webpage
                                       …
                                      and SCOOBIE initialized
                                      with an RDF data set
                                      …
                                      It extracts RDF information
                                      from text and annotates it as
                                      RDFa
                                      …




  Benjamin Adrian
  http://www.dfki.uni-kl.de/~adrian                                     40
Insiders
RDF Epiphany                                                January
                                                               2010



                                      Epiphany takes the
                                      original webpage
                                       …
                                      and SCOOBIE initialized
                                      with an RDF Linked Data set
                                      …
                                      It extracts RDF information
                                      from text and annotates it as
                                      RDFa
                                      …
                                      clicking on RDFa annotations
                                      opens further information from
                                      the Linked Data set




  Benjamin Adrian
  http://www.dfki.uni-kl.de/~adrian                                    41
Insiders
                              RDF Epiphany                                          January
                                                                                       2010




At a glance
●   Epiphany is a free web service.

●   Epiphany uses SCOOBIE.
                                                                          SCOOBIE
●   Epiphany can be initialized with any RDF
       Linked Data set.

●   Epiphany generates an RDF document about
       a web page.

●   Epiphany annotates RDF as RDFa in the web
       page.


http://projects.dfki.uni-kl.de/epiphany/


                                      Benjamin Adrian
                                      http://www.dfki.uni-kl.de/~adrian                        42
Insiders
                                           Summary                                                           January
                                                                                                                2010

Customer DB          Employees DB                                                      annotated
                                                                     Text                   text

 D2R                 D2R
 Server
             SQUIN
                     Server                                              User-defined Filter

D2R                  D2R
Server               Server



Project DB           DBpedia          Ex                                                               annotate
                                        tra
                                           ct
                                              io
                                                   n
                                                       Pi
                                                            pe
                                    populate                  l in
                                                                     e


                                                                                          Extraction
                                                                                           Results
                                                                     enrich

                                           Benjamin Adrian
                                           http://www.dfki.uni-kl.de/~adrian                                            43
Insiders
                                              Outlook                                                        January
                                                                                                                2010

Customer DB          Employees DB
                                                                     E-Mail
                                                                                          annotated
                                                                                             E-Mail
 D2R                 D2R
 Server
             SQUIN
                     Server                                              User-defined Filter

D2R                  D2R
Server               Server



Project DB           DBpedia          Ex                                                               annotate
                                        tra
                                           ct
                                              io
                                                   n
                                                       Pi
                                                            pe
                                    populate                  l in
                                                                     e


                                                                                          Extraction
                                                                                           Results
                                                                     enrich

                                           Benjamin Adrian
                                           http://www.dfki.uni-kl.de/~adrian                                            44
Insiders
                                                Thank you!   January
                                                                2010




    scoobie
          sparql rdfa
D2R server rdf
 squin    epiphany
  Linked Data
                OBIE




                        Benjamin Adrian
                        http://www.dfki.uni-kl.de/~adrian               45

More Related Content

Viewers also liked

Mining Product Synonyms - Slides
Mining Product Synonyms - SlidesMining Product Synonyms - Slides
Mining Product Synonyms - SlidesAnkush Jain
 
IRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersIRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersSriTeja Allaparthi
 
System for-health-diagnosis
System for-health-diagnosisSystem for-health-diagnosis
System for-health-diagnosisask2372
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Textbutest
 
Information_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITInformation_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITAnkit Sharma
 
A survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalA survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalChen Xi
 
Open Information Extraction 2nd
Open Information Extraction 2ndOpen Information Extraction 2nd
Open Information Extraction 2ndhit_alex
 
Information Retrieval and Extraction
Information Retrieval and ExtractionInformation Retrieval and Extraction
Information Retrieval and ExtractionChristopher Frenz
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionDeeksha thakur
 
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...Masumi Shirakawa
 
ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...Jim Jenkins
 
N-gram統計量からの係り受け情報の復元 (YANS2011)
N-gram統計量からの係り受け情報の復元 (YANS2011)N-gram統計量からの係り受け情報の復元 (YANS2011)
N-gram統計量からの係り受け情報の復元 (YANS2011)Yuya Unno
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesTommaso Teofili
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsBenjamin Habegger
 
Enterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesEnterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesYunyao Li
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment AnalysisAyush Khandelwal
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked DataIsabelle Augenstein
 
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsCrowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsMatthew Lease
 
SAS University Edition - Getting Started
SAS University Edition - Getting StartedSAS University Edition - Getting Started
SAS University Edition - Getting StartedCraig Trim
 

Viewers also liked (20)

Mining Product Synonyms - Slides
Mining Product Synonyms - SlidesMining Product Synonyms - Slides
Mining Product Synonyms - Slides
 
IRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersIRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research Papers
 
System for-health-diagnosis
System for-health-diagnosisSystem for-health-diagnosis
System for-health-diagnosis
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Text
 
Information_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITInformation_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIIT
 
A survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalA survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrieval
 
Open Information Extraction 2nd
Open Information Extraction 2ndOpen Information Extraction 2nd
Open Information Extraction 2nd
 
Information Retrieval and Extraction
Information Retrieval and ExtractionInformation Retrieval and Extraction
Information Retrieval and Extraction
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & Extraction
 
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
 
ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...
 
N-gram統計量からの係り受け情報の復元 (YANS2011)
N-gram統計量からの係り受け情報の復元 (YANS2011)N-gram統計量からの係り受け情報の復元 (YANS2011)
N-gram統計量からの係り受け情報の復元 (YANS2011)
 
2 13
2 132 13
2 13
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - Usecases
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and Tools
 
Enterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesEnterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challenges
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked Data
 
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsCrowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
 
SAS University Edition - Getting Started
SAS University Edition - Getting StartedSAS University Edition - Getting Started
SAS University Edition - Getting Started
 

Recently uploaded

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 

Recently uploaded (20)

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 

Using the Web of Data for Information Extraction

  • 1. Insiders January 2010 Using the Web of Data for Information Extraction scoobie sparql rdfa D2R server rdf squin epiphany Linked Data OBIE Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 2. Insiders Are you still surfing ... January 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 3. Insiders … or overloaded? January 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 4. Insiders A simple question ... January 2010 What are the cities of the universities in Rhineland Palatinate and what is the unemployment rate of these cities? Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 5. Insiders A simple question ... January 2010 What are the cities of the universities in Rhineland Palatinate and what is the unemployment rate of these cities? PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX eurostat: <http://www4.wiwiss.fu-berlin.de/eurostat/resource/eurostat/> PREFIX dbpedia: <http://dbpedia.org/ontology/> PREFIX dbpedia_cat: <http://dbpedia.org/resource/Category> SELECT ?dbpcity ?cityName ?ur WHERE { ?uni skos:subject dbpedia_cat:Universities_and_colleges_in_Rhineland-Palatinate; dbpedia:city ?dbpcity . ?dbpcity owl:sameAs ?statcity. ?statcity rdfs:label ?cityName ; eurostat:unemployment_rate_total ?ur } http://www.w3.org/TR/rdf-sparql-query/ Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 6. Insiders … and its answer. January 2010 dbpcity cityName ur http://dbpedia.org/resource/Koblenz Koblenz 8.8 http://dbpedia.org/resource/Trier Trier 7.3 Data Sources: http://epp.eurostat.ec.europa.eu http://wiki.dbpedia.org http://www4.wiwiss.fu-berlin.de/eurostat/ Query Engine: SQUIN - Query the Web of Linked Data http://squin.sourceforge.net/ Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 7. So much data out there, Insiders January too much? 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 8. Insiders What data do you have? January 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 9. Insiders Are you still surfing ... January 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 10. Insiders Agenda January 2010 In order to use Web of Data for information extraction, you have to understand its basics. ● RDF on one slide ● Publish data in RDF with D2R Server ● Publish RDF as Linked Data ● Query Linked Data with SPARQL and Squin ● Use RDF for information extraction ● Bring Linked Data to text via RDFa Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 11. Insiders Wouldn't this be nice. January 2010 Data Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 11
  • 12. Insiders Wouldn't this be nice. January 2010 Data Text User-defined Filter Ex tra ct io n Pi pe l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 12
  • 13. Insiders Wouldn't this be nice. January 2010 annotated Data Text text User-defined Filter Ex annotate tra ct io n Pi pe l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 13
  • 14. Insiders Wouldn't this be nice. January 2010 annotated Data Text text User-defined Filter Ex annotate tra ct io n Pi pe populate l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 14
  • 15. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications//icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 16. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . Vocabularies @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 17. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . URLs / URIs @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 18. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . Subjects @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 19. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . Predicates @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 20. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . Objects @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 21. Insiders RDF data is graph data. January 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 22. Publishing relational Insiders January data in RDF 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 23. Publishing relational Insiders January data in RDF 2010 D2R Server - Publishing Relational Databases on the Semantic Web http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/ Two small command line calls: ./d2r-server -p 80 -b http://projects.dfki.uni-kl.de/mydatabase/ mydatabase.n3 ./generate-mapping -o mydatabase.n3 -b http://projects.dfki.uni-kl.de/mydatabase/ jdbc:mysql://localhost:3306/mydatabase Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 24. Linked Data: Linking RDF Insiders January data from different sources 2010 Customer DB Employees DB How to interlink these datasets? Project DB DBpedia Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 25. Linked Data: Linking RDF Insiders January data from different sources 2010 Linked Data Principles (TimBL, 2006) 1. Use URIs as names for things (e.g., http://dbpedia.org/resource/Berlin) 2. Use HTTP-URIs so that people can look up those names 3. Provide useful information in RDF when someone looks up an URI 4. Include links to other URIs to enable discovery of more information Example: <http://dbpedia.org/resource/Berlin> owl:sameAs opencyc:en/CityOfBerlinGermany ; owl:sameAs opencyc:en/Berlin_StateGermany owl:sameAs <http://sws.geonames.org/2950159/> owl:sameAs <http://www4.wiwiss.fu-berlin.de/eurostat/resource/regions/Berlin> owl:sameAs freebase:http://dbpedia.org/resource/Berlin Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 26. SPARQL: Querying RDF Insiders January data 2010 SPARQL - the RDF query language. In contrast to SQL, it's data model is not set oriented but graph oriented. Some Examples: Resulting in tuples: SELECT ?interest ?friend WHERE {    <http://www.w3.org/People/Berners­Lee/card#i> foaf:knows ?friend .    ?friend foaf:interest ?interest .  } Resulting as graph : CONSTRUCT {?friend foaf:interest ?interest } WHERE {    <http://www.w3.org/People/Berners­Lee/card#i> foaf:knows ?friend .    ?friend foaf:interest ?interest .  } Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 27. SPARQL: Query Linked Insiders January Data from different sources 2010 Customer DB Employees DB How to access these datasets with a single SPARQL query? Project DB DBpedia Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 28. SPARQL: Query Linked Insiders January Data from different sources 2010 Customer DB Employees DB Squin: Query the Web of Linked Data http://squin.sourceforge.net/ Squin follows a Link Traversal D2R Server D2R Server approach over HTTP URIs. SQUIN Remember: SELECT DISTINCT ?c ?cityName ?ur WHERE { D2R Server D2R Server ?u skos:subject dbpedia_cat:Universities_and_colleges_i n_Rhineland-Palatinate; dbpedia:city ?c . ?c owl:sameAs [ rdfs:label ?cityName ; eurostat:unemployment_rate_total ?ur ] } Project DB DBpedia Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 29. Using RDF and Linked Data Insiders January for Information Extraction 2010 User Linked Data Query asks question t a bou to answers Text Extraction Result Graph Pipeline Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 30. Using RDF and Linked Data Insiders January for Information Extraction 2010 What data do we have? Example RDF data <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> rdf:type foaf:Document ; dc:creator dblp_author:Markus_Ebbecke ;  dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . Classes Instances Datatype Properties Object Properties Literals foaf:Document .../SchulzEGAAD09 dc:title dc:creator „Markus“ foaf:Person .../Markus_Ebbecke foaf:name foaf:knows „Ebbecke“ foaf:firstName „Seizing the foaf:surName Treasure: Transferring Knowledge in Invoice Analysis“ Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 31. SCOOBIE Insiders January Domain Adaption 2010 Structured Text Corpus Data Data Patterns and Gazetteers Data Vocabulary Data Instance Data Data Preprocessing Information & Learning (offline) Extraction (online) Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 31
  • 32. SCOOBIE Insiders January Eco System 2010 Index Domain Knowledge Models Text Training Corpus Corpus Session Data Instances Ontology Models Patterns + Gazetteers Pre- process Train Extract Tasks API I O I Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 32
  • 33. SCOOBIE Insiders January OBIE Pipeline 2010 Normalization Text Extraction Language Detection Segmentation Tokenization Sentence Extraction POS-Tagging Symbolization Named Entity Recognition Structured Entity Recognition Noun Phrase Chunking Symbol Recognition Instantiation Instance Recognition Instance Disambiguation Chunk Classification Contextualization Fact Extraction Fact Selection Population Query Answering Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 33
  • 34. Used Machine Insiders January Learning Models 2010 Semi-Supervised Learning CRF-based Noun Phrase Chunker I Supervised Learning Gazetteer matching statistics (Named Entity Recognition) I Regex matching statistics (Structured Entity Recognition) Unsupervised or Instance-based Learning TF/IDF-based instance re-ranking (Instance Disambiguation) I K-Nearest-Neighbor chunk classifier (Chunk Classification) Spreading Activation-based fact ranking (Fact Selection) Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 34
  • 35. Used Machine Learning: Insiders January Conditional Random Field 2010 CRFs are sequence taggers: Train it with: Bill CAPITALIZED noun slept LOWERCASE non-noun here LOWERCASE non-noun Test it with: He CAPITALIZED visited LOWERCASE London CAPITALIZED CRF results: noun MALLET - MAchine Learning non-noun for LanguagE Toolkit non-noun http://mallet.cs.umass.edu/ Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 35
  • 36. Bringing Linked Data to Insiders January Text 2010 Annotate plain text or HTML with RDF data. I'm working at DFKI. RDFa offers an HTML extension: I'm working at <span about="dbpedia:DFKI" property="rdfs:label"> DFKI</span> Now lets generate RDFa automatically ... Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 36
  • 37. Insiders Do you remember? January 2010 annotated Data Text text User-defined Filter Ex annotate tra ct io n Pi pe populate l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 37
  • 38. Insiders RDF Epiphany January 2010 Epiphany takes the original webpage … Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 38
  • 39. Insiders RDF Epiphany January 2010 Epiphany takes the original webpage … and SCOOBIE initialized with an RDF data set … Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 39
  • 40. Insiders RDF Epiphany January 2010 Epiphany takes the original webpage … and SCOOBIE initialized with an RDF data set … It extracts RDF information from text and annotates it as RDFa … Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 40
  • 41. Insiders RDF Epiphany January 2010 Epiphany takes the original webpage … and SCOOBIE initialized with an RDF Linked Data set … It extracts RDF information from text and annotates it as RDFa … clicking on RDFa annotations opens further information from the Linked Data set Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 41
  • 42. Insiders RDF Epiphany January 2010 At a glance ● Epiphany is a free web service. ● Epiphany uses SCOOBIE. SCOOBIE ● Epiphany can be initialized with any RDF Linked Data set. ● Epiphany generates an RDF document about a web page. ● Epiphany annotates RDF as RDFa in the web page. http://projects.dfki.uni-kl.de/epiphany/ Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 42
  • 43. Insiders Summary January 2010 Customer DB Employees DB annotated Text text D2R D2R Server SQUIN Server User-defined Filter D2R D2R Server Server Project DB DBpedia Ex annotate tra ct io n Pi pe populate l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 43
  • 44. Insiders Outlook January 2010 Customer DB Employees DB E-Mail annotated E-Mail D2R D2R Server SQUIN Server User-defined Filter D2R D2R Server Server Project DB DBpedia Ex annotate tra ct io n Pi pe populate l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 44
  • 45. Insiders Thank you! January 2010 scoobie sparql rdfa D2R server rdf squin epiphany Linked Data OBIE Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 45