SlideShare a Scribd company logo
1 of 23
Download to read offline
FactForge 
         FactForge
Data Service and the Value of 
    Inferred Knowledge
            Mariana Damova, PhD


         European Open Data Forum
                 June 2012                             
Ontotext
   – Top‐5 provider of core Semantic Technology
   – Established in year 2000; offices in Bulgaria, UK, USA
   – Active both in research and commercial projects
                                               p j

• 360° semantic technology – unique portfolio:
   – Semantic Databases: high‐performance RDF DBMS, scalable reasoning
   – Semantic Search: text‐mining (IE), metadata generation, Information Retrieval (IR)
   – Web Mining: focused crawling, screen scraping, data fusion 
   – Linked Data Management and Data Integration

   Good recognition in the SemTech community
   – Ontotext pages are ranked #1 for “semantic annotation” and “semantic repository” at 
     Ontotext pages are ranked #1 for  semantic annotation and semantic repository at
     GYM, #3 for “linked data management” at Google

   Several joint ventures and subsidiaries
   – Innovantage: leading online recruitment intelligence provider in UK
Ontotext Clients (selected)

          British Broadcasting Corporation (BBC)
                – Run its World Cup 2010 sites on top of OWLIM
                – Since Mar’12 BBC Sports and 2012 Olympics sections are driven
                  Since Mar 12 BBC Sports and 2012 Olympics sections are driven 
                  by OWLIM and a Concept Extraction service developed by Ontotext

          Press Association (UK)
                – Analysis of Sports news
                  Analysis of Sports news
                – Concept extraction
                – Linked data generation

          Top‐3 USA media (not allowed to name)
          Top‐3 USA media (not allowed to name)
          The National Archives (UK) contracted Ontotext to implement 
          semantic KB and semantic search for the Government Web Archive

          British Museum (UK) Ontotext leads the development of Phase 3 of 
          ResearchSpace project on collaborative research in cultural heritage; 
          British Museum’s public SPARQL end‐point is powered by OWLIM

          de Bibliothek (Holland) 
          d Bibli h k (H ll d) aggregation of data from 150 library databases
Linked Open Data is maturing
Linked Open Data is maturing
  LOD cloud grows by billions of triples yearly
Technologies and guidelines about 
T h l i        d id li       b t
  how to produce linked data fast 
  how to assure their quality 
  h              h       l
  how to provide vertical oriented data services
                                               LOD2, LATC, baseKB 



                         European Data Forum           June2012      #4
This talk is about 
This talk is about
        reasoning 
                and 
                  d
                  coping with diversity of the data on the web of data 




                              European Data Forum          June 2012   #5
Outline

• FactForge (beta)
• Reference Layer
• Access Modes
• Querying 
   – Airports around London
   – US city – a subject of a Novel
           y        j
   – US city – contactInformation

• Challenges
• Conclusion



                               European Data Forum
FactForge (beta)




the largest body of heterogeneous general knowledge on which inference has been performed
       g       y          g       g              g                              p

– powered by OWLIM 5.0                                           – supporting SPARQL 1.1
                                   European Data Forum
Datasets

                         REASON‐ABLE VIEW
                           of LOD datasets
                  Number of explicit statements: 1,796,673,630
                      b    f    l
                             Implicit statements: 1,3
                    Retrievable statements:  14,928,925,039


                                                      CIA FactBook
   DBpedia 3.7
                     Freebase
                                     NY Times
                                                                     Lexvo



    Wordnet 3.0          Geonames                                Lingvoj
                                           MusicBrainz 




materialization is performed with respect to the semantics of OWL-Horst optimized

                                 European Data Forum
Reference Layer




                                                                   PROTON – light weight upper level ontology
                                                                             ~500 classes, ~150 properties
                                                                                          ,      p p
                                                                   http://www.ontotext.com/proton-ontology

Linking at schema level:
(1) using rdfs:subClassOf and rdfs:subPropertyOf statements;
(2) using OWL expressions where there is a difference in the conceptualization
(3) using inference rules if additional individuals are necessary in the repository to support the mapping

                                               European Data Forum                          June 2012        #9
Access modes

RDF Search - retrieve ranked list of URIs related to literals, which contain specific keywords




                                      European Data Forum                    June 2012      #10
Access modes (condt)

 Exploration ‐ traversing the data, one resource at a time 
Access modes (condt)

 Exploration ‐ traversing the data, one resource at a time,  
                inspecting inferred knowledge 



- locatedIn – Denmark, Northern Europe
- Geonames types/FearureCodes P.PPL
              yp /
- parentFeature – Denmeark, Europe
…




                                         European Data Forum    June 2012   #12
Access modes (condt)
   Exploration - traversing the data, one resource at a time,
                 inspecting inferred knowledge




- locatedIn - Europe
- subRegionOf - Europe
        g             p
- hasContactInfo –
       website via Freebase
- containsLocation
  …




                                     European Data Forum        June 2012   #13
Access modes (condt)

SPARQL endpoint




                   European Data Forum   June 2012   #14
Access modes (condt)

RelFinder




                       European Data Forum   June 2012   #15
Querying
Using LOD concepts




 SELECT * WHERE {
  ?Person dbp‐ont:birthPlace ?BirthPlace ;
       rdf:type dbp‐ont:Politician ;
 ?BirthPlace geo‐ont:parentFeature dbpedia:Germany .
 ?BirthPlace geo ont:parentFeature dbpedia:Germany
 } 




Using the intermediary layer
    g                y y




 SELECT * WHERE {
   ?Person prot:birthPlace ?BirthPlace ;
        rdf:type prot:Politicianr ;
   ?BirthPlace prot:subRegionOf dbpedia:Germany .
 }




                                                       European Data Forum
                                                                             June 2012
Find Airports near London

                                         Standard LOD vs. PROTON query
                                         13 vs. 20 results
                                         DBpedia vs DBpedia and Geonames
                                                  vs.




                   European Data Forum                June 2012    #17
Find airports near London ‐ Results comparison




 Using Geospatial index of OWLIM
     g     p



                                   European Data Forum   June 2012   #18
City – a subject of a science fiction author




                      European Data Forum      June 2012   #19
OWLIM 5.0 and SPARQL 1.1

Exemplary queries :
GROUP BY, min
GROUP BY, min
   — Minimal and maximal population counts of European countries
Federated Query between FactForge and LinkedLifeData
    —D Drugs that cure the disease from which died Alexandre Graham Bell
             th t      th di       f     hi h di d Al    d G h      B ll
Literal index over dates
     – World governors in office between 1980 and 2005
Literal index over digits
     ― European countries with population above 20 MLN
Geospatial index
Geospatial index
    — Show the distance from London of airports located at most 50 miles away from it




                                European Data Forum                 June 2012     #20
Challenges and usage

• Clean data 
   – Clean up input data

• At model level 
   – Contradiction detection
   – Consistency checking
                y       g

• Curation and upgrading methodology



         FactForge has been used as data layer infrastructure in FP7 projects, like RENDER
         FactForge has been used in tasks of
                   linked data generation from unstructured data,
                   metadata enrichment of structured data
                   providing linkages to the entire LOD cloud
                                                   for example The National Archive

                                 European Data Forum                   June 2012      #21
Acknowledgements

  Partial funding




Colleagues
Atanas Kiryakov, CEO of Ontotext
Zdravko T h
Zd k Tashev, Ontotext
                  O
Ivan Peikov, Ontotext
Rouslan Velkov, Ontotext
Kiril Simov, Ontotext
Barry Bi h
B       Bishop, Ontotext
                O
Barry Norton, Ontotext
Marin Dimitrov, Ontotext
Alex Simov, Ontotext
Jordan Di h
J d Dichev, Ontotext
                 O t t t                                k
                                                     Links
Konstantin Penchev, Ontotext                         http://ff-dev.ontotext.com
                                                     http://www.ontotext.com/owlim
                                                     http://www.ontotext.com/factforge
                                                     Email:
                                                     E il
                                                     info@factforge.net
                                   European Data Forum                 June 2012     #22
Thank you for your attention!




mariana.damova@ontotext.com

More Related Content

What's hot

Matching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sourcesMatching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sources
IJwest
 

What's hot (15)

Linked Open Data: A simple how-to
Linked Open Data: A simple how-toLinked Open Data: A simple how-to
Linked Open Data: A simple how-to
 
Semantic Tags Generation and Retrieval for Online Advertising - CIKM 2010
Semantic Tags Generation and Retrieval for Online Advertising - CIKM 2010Semantic Tags Generation and Retrieval for Online Advertising - CIKM 2010
Semantic Tags Generation and Retrieval for Online Advertising - CIKM 2010
 
Redundancy analysis on linked data #cold2014 #ISWC2014
Redundancy analysis on linked data #cold2014 #ISWC2014Redundancy analysis on linked data #cold2014 #ISWC2014
Redundancy analysis on linked data #cold2014 #ISWC2014
 
Linked Data and Sevices
Linked Data and SevicesLinked Data and Sevices
Linked Data and Sevices
 
Another RDF Encoding Form
Another RDF Encoding FormAnother RDF Encoding Form
Another RDF Encoding Form
 
Matching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sourcesMatching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sources
 
A Unified Approach for Representing Metametadata
A Unified Approach for Representing MetametadataA Unified Approach for Representing Metametadata
A Unified Approach for Representing Metametadata
 
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data StreamsEfficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data Streams
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
RDF and Open Linked Data, a first approach
RDF and Open Linked Data, a first approachRDF and Open Linked Data, a first approach
RDF and Open Linked Data, a first approach
 
Web Data Management with RDF
Web Data Management with RDFWeb Data Management with RDF
Web Data Management with RDF
 
OpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportOpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project Report
 
Verifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetVerifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNet
 
Topical_Facets
Topical_FacetsTopical_Facets
Topical_Facets
 
DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World." DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World."
 

Viewers also liked

EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
European Data Forum
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
EDF2012   Peter Boncz - LOD benchmarking SRbenchEDF2012   Peter Boncz - LOD benchmarking SRbench
EDF2012 Peter Boncz - LOD benchmarking SRbench
European Data Forum
 
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012   Kostas Tzouma - Linking and analyzing bigdata - StratosphereEDF2012   Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
European Data Forum
 
EDF2012 Andrew Farrow - (Copy)right information in the digital age
EDF2012   Andrew Farrow - (Copy)right information in the digital ageEDF2012   Andrew Farrow - (Copy)right information in the digital age
EDF2012 Andrew Farrow - (Copy)right information in the digital age
European Data Forum
 
EDF2012 Simon Redfern - Open Bank Project
EDF2012   Simon Redfern - Open Bank ProjectEDF2012   Simon Redfern - Open Bank Project
EDF2012 Simon Redfern - Open Bank Project
European Data Forum
 
EDF2012 Chris Taggart - How the biggest Open Database of Companies was built
EDF2012   Chris Taggart - How the biggest Open Database of Companies was builtEDF2012   Chris Taggart - How the biggest Open Database of Companies was built
EDF2012 Chris Taggart - How the biggest Open Database of Companies was built
European Data Forum
 
EDF2012 Simon Riggs - Open Data, Open Database: PostgreSQL
EDF2012  Simon Riggs - Open Data, Open Database: PostgreSQLEDF2012  Simon Riggs - Open Data, Open Database: PostgreSQL
EDF2012 Simon Riggs - Open Data, Open Database: PostgreSQL
European Data Forum
 
EDF2012 Florian Bauer - Using LOD to share clean energy data and knowledge
EDF2012   Florian Bauer - Using LOD to share clean energy data and knowledgeEDF2012   Florian Bauer - Using LOD to share clean energy data and knowledge
EDF2012 Florian Bauer - Using LOD to share clean energy data and knowledge
European Data Forum
 
EDF2012 Phil Archer - Enabling Open Data Interoperability
EDF2012   Phil Archer - Enabling Open Data InteroperabilityEDF2012   Phil Archer - Enabling Open Data Interoperability
EDF2012 Phil Archer - Enabling Open Data Interoperability
European Data Forum
 

Viewers also liked (9)

EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
EDF2012   Peter Boncz - LOD benchmarking SRbenchEDF2012   Peter Boncz - LOD benchmarking SRbench
EDF2012 Peter Boncz - LOD benchmarking SRbench
 
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012   Kostas Tzouma - Linking and analyzing bigdata - StratosphereEDF2012   Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
 
EDF2012 Andrew Farrow - (Copy)right information in the digital age
EDF2012   Andrew Farrow - (Copy)right information in the digital ageEDF2012   Andrew Farrow - (Copy)right information in the digital age
EDF2012 Andrew Farrow - (Copy)right information in the digital age
 
EDF2012 Simon Redfern - Open Bank Project
EDF2012   Simon Redfern - Open Bank ProjectEDF2012   Simon Redfern - Open Bank Project
EDF2012 Simon Redfern - Open Bank Project
 
EDF2012 Chris Taggart - How the biggest Open Database of Companies was built
EDF2012   Chris Taggart - How the biggest Open Database of Companies was builtEDF2012   Chris Taggart - How the biggest Open Database of Companies was built
EDF2012 Chris Taggart - How the biggest Open Database of Companies was built
 
EDF2012 Simon Riggs - Open Data, Open Database: PostgreSQL
EDF2012  Simon Riggs - Open Data, Open Database: PostgreSQLEDF2012  Simon Riggs - Open Data, Open Database: PostgreSQL
EDF2012 Simon Riggs - Open Data, Open Database: PostgreSQL
 
EDF2012 Florian Bauer - Using LOD to share clean energy data and knowledge
EDF2012   Florian Bauer - Using LOD to share clean energy data and knowledgeEDF2012   Florian Bauer - Using LOD to share clean energy data and knowledge
EDF2012 Florian Bauer - Using LOD to share clean energy data and knowledge
 
EDF2012 Phil Archer - Enabling Open Data Interoperability
EDF2012   Phil Archer - Enabling Open Data InteroperabilityEDF2012   Phil Archer - Enabling Open Data Interoperability
EDF2012 Phil Archer - Enabling Open Data Interoperability
 

Similar to EDF2012 Mariana Damova - Factforge

Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
emmanuel_jamin
 
121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1
manujam
 
Approximation and Self-Organisation on the Web of Data
Approximation and Self-Organisation on the Web of DataApproximation and Self-Organisation on the Web of Data
Approximation and Self-Organisation on the Web of Data
Kathrin Dentler
 

Similar to EDF2012 Mariana Damova - Factforge (20)

Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
20140521 sem-tech-biz-guest-lecture
20140521 sem-tech-biz-guest-lecture20140521 sem-tech-biz-guest-lecture
20140521 sem-tech-biz-guest-lecture
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 
121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1
 
Linked Open Data (LOD) part 3
Linked Open Data (LOD)  part 3Linked Open Data (LOD)  part 3
Linked Open Data (LOD) part 3
 
OpenAIRE schirrwagen
OpenAIRE schirrwagenOpenAIRE schirrwagen
OpenAIRE schirrwagen
 
Linking Open Data
Linking Open DataLinking Open Data
Linking Open Data
 
Linked Data Management
Linked Data ManagementLinked Data Management
Linked Data Management
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityuniv
 
IC-SDV 2018: Martin Kracker (EPO) Linked Open EP data – a new Product from th...
IC-SDV 2018: Martin Kracker (EPO) Linked Open EP data – a new Product from th...IC-SDV 2018: Martin Kracker (EPO) Linked Open EP data – a new Product from th...
IC-SDV 2018: Martin Kracker (EPO) Linked Open EP data – a new Product from th...
 
Web Data Management in RDF Age
Web Data Management in RDF AgeWeb Data Management in RDF Age
Web Data Management in RDF Age
 
LOD2: Guest presentation: French datalift project
LOD2: Guest presentation: French datalift projectLOD2: Guest presentation: French datalift project
LOD2: Guest presentation: French datalift project
 
Datalift lod2-paris-24032011
Datalift lod2-paris-24032011Datalift lod2-paris-24032011
Datalift lod2-paris-24032011
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
 
Approximation and Self-Organisation on the Web of Data
Approximation and Self-Organisation on the Web of DataApproximation and Self-Organisation on the Web of Data
Approximation and Self-Organisation on the Web of Data
 
Exposing Bibliographic Information as Linked Open Data using Standards-based ...
Exposing Bibliographic Information as Linked Open Data using Standards-based ...Exposing Bibliographic Information as Linked Open Data using Standards-based ...
Exposing Bibliographic Information as Linked Open Data using Standards-based ...
 
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORELOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
 
Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)
Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)
Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)
 
LODAC Museum -- Connecting Museums with LOD --
LODAC Museum -- Connecting Museums with LOD --LODAC Museum -- Connecting Museums with LOD --
LODAC Museum -- Connecting Museums with LOD --
 
Linked Open Data and Ontotext Projects
Linked Open Data and Ontotext ProjectsLinked Open Data and Ontotext Projects
Linked Open Data and Ontotext Projects
 

More from European Data Forum

More from European Data Forum (20)

EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...
EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...
EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...
 
Barbato leit ict 15-16-17
Barbato leit ict 15-16-17Barbato leit ict 15-16-17
Barbato leit ict 15-16-17
 
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
 
EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...
EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...
EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...
 
EDF2014: BIG - NESSI Networking Session: Intro Presentation
EDF2014: BIG - NESSI Networking Session: Intro PresentationEDF2014: BIG - NESSI Networking Session: Intro Presentation
EDF2014: BIG - NESSI Networking Session: Intro Presentation
 
EDF2014: Kush Wadhwa, Senior Partner, Trilateral Research & Consulting: Addre...
EDF2014: Kush Wadhwa, Senior Partner, Trilateral Research & Consulting: Addre...EDF2014: Kush Wadhwa, Senior Partner, Trilateral Research & Consulting: Addre...
EDF2014: Kush Wadhwa, Senior Partner, Trilateral Research & Consulting: Addre...
 
EDF2014: Adrian Cristal, Barcelona Supercomputing Center, RETHINK big Project...
EDF2014: Adrian Cristal, Barcelona Supercomputing Center, RETHINK big Project...EDF2014: Adrian Cristal, Barcelona Supercomputing Center, RETHINK big Project...
EDF2014: Adrian Cristal, Barcelona Supercomputing Center, RETHINK big Project...
 
EDF2014: Dimitris Vassiliadis, Head of Unit, EXUS Innovation Attractor: From ...
EDF2014: Dimitris Vassiliadis, Head of Unit, EXUS Innovation Attractor: From ...EDF2014: Dimitris Vassiliadis, Head of Unit, EXUS Innovation Attractor: From ...
EDF2014: Dimitris Vassiliadis, Head of Unit, EXUS Innovation Attractor: From ...
 
EDF2014: Rüdiger Eichin, Research Manager at SAP AG, Germany: Deriving Value ...
EDF2014: Rüdiger Eichin, Research Manager at SAP AG, Germany: Deriving Value ...EDF2014: Rüdiger Eichin, Research Manager at SAP AG, Germany: Deriving Value ...
EDF2014: Rüdiger Eichin, Research Manager at SAP AG, Germany: Deriving Value ...
 
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
 
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
 
EDF2014: Marta Nagy-Rothengass, Head of Unit Data Value Chain, Directorate Ge...
EDF2014: Marta Nagy-Rothengass, Head of Unit Data Value Chain, Directorate Ge...EDF2014: Marta Nagy-Rothengass, Head of Unit Data Value Chain, Directorate Ge...
EDF2014: Marta Nagy-Rothengass, Head of Unit Data Value Chain, Directorate Ge...
 
EDF2014: Stefan Wrobel, Institute Director, Fraunhofer IAIS / Member of the b...
EDF2014: Stefan Wrobel, Institute Director, Fraunhofer IAIS / Member of the b...EDF2014: Stefan Wrobel, Institute Director, Fraunhofer IAIS / Member of the b...
EDF2014: Stefan Wrobel, Institute Director, Fraunhofer IAIS / Member of the b...
 
EDF2014: Michele Vescovi, Researcher, Semantic & Knowledge Innovation Lab, It...
EDF2014: Michele Vescovi, Researcher, Semantic & Knowledge Innovation Lab, It...EDF2014: Michele Vescovi, Researcher, Semantic & Knowledge Innovation Lab, It...
EDF2014: Michele Vescovi, Researcher, Semantic & Knowledge Innovation Lab, It...
 
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...
 
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...
 
EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...
EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...
EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...
 
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
 
EDF2014: Piek Vossen, Professor Computational Lexicology, VU University Amste...
EDF2014: Piek Vossen, Professor Computational Lexicology, VU University Amste...EDF2014: Piek Vossen, Professor Computational Lexicology, VU University Amste...
EDF2014: Piek Vossen, Professor Computational Lexicology, VU University Amste...
 
EDF2014: Taru Rastas, Senior Advisor, Ministry of Communications of Finland: ...
EDF2014: Taru Rastas, Senior Advisor, Ministry of Communications of Finland: ...EDF2014: Taru Rastas, Senior Advisor, Ministry of Communications of Finland: ...
EDF2014: Taru Rastas, Senior Advisor, Ministry of Communications of Finland: ...
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 

EDF2012 Mariana Damova - Factforge

  • 1. FactForge  FactForge Data Service and the Value of  Inferred Knowledge Mariana Damova, PhD European Open Data Forum June 2012                             
  • 2. Ontotext – Top‐5 provider of core Semantic Technology – Established in year 2000; offices in Bulgaria, UK, USA – Active both in research and commercial projects p j • 360° semantic technology – unique portfolio: – Semantic Databases: high‐performance RDF DBMS, scalable reasoning – Semantic Search: text‐mining (IE), metadata generation, Information Retrieval (IR) – Web Mining: focused crawling, screen scraping, data fusion  – Linked Data Management and Data Integration Good recognition in the SemTech community – Ontotext pages are ranked #1 for “semantic annotation” and “semantic repository” at  Ontotext pages are ranked #1 for  semantic annotation and semantic repository at GYM, #3 for “linked data management” at Google Several joint ventures and subsidiaries – Innovantage: leading online recruitment intelligence provider in UK
  • 3. Ontotext Clients (selected) British Broadcasting Corporation (BBC) – Run its World Cup 2010 sites on top of OWLIM – Since Mar’12 BBC Sports and 2012 Olympics sections are driven Since Mar 12 BBC Sports and 2012 Olympics sections are driven  by OWLIM and a Concept Extraction service developed by Ontotext Press Association (UK) – Analysis of Sports news Analysis of Sports news – Concept extraction – Linked data generation Top‐3 USA media (not allowed to name) Top‐3 USA media (not allowed to name) The National Archives (UK) contracted Ontotext to implement  semantic KB and semantic search for the Government Web Archive British Museum (UK) Ontotext leads the development of Phase 3 of  ResearchSpace project on collaborative research in cultural heritage;  British Museum’s public SPARQL end‐point is powered by OWLIM de Bibliothek (Holland)  d Bibli h k (H ll d) aggregation of data from 150 library databases
  • 4. Linked Open Data is maturing Linked Open Data is maturing LOD cloud grows by billions of triples yearly Technologies and guidelines about  T h l i d id li b t how to produce linked data fast  how to assure their quality  h h l how to provide vertical oriented data services LOD2, LATC, baseKB  European Data Forum June2012 #4
  • 5. This talk is about  This talk is about reasoning  and  d coping with diversity of the data on the web of data  European Data Forum June 2012 #5
  • 6. Outline • FactForge (beta) • Reference Layer • Access Modes • Querying  – Airports around London – US city – a subject of a Novel y j – US city – contactInformation • Challenges • Conclusion European Data Forum
  • 7. FactForge (beta) the largest body of heterogeneous general knowledge on which inference has been performed g y g g g p – powered by OWLIM 5.0 – supporting SPARQL 1.1 European Data Forum
  • 8. Datasets REASON‐ABLE VIEW of LOD datasets Number of explicit statements: 1,796,673,630 b f l Implicit statements: 1,3 Retrievable statements:  14,928,925,039 CIA FactBook DBpedia 3.7 Freebase NY Times Lexvo Wordnet 3.0 Geonames Lingvoj MusicBrainz  materialization is performed with respect to the semantics of OWL-Horst optimized European Data Forum
  • 9. Reference Layer PROTON – light weight upper level ontology ~500 classes, ~150 properties , p p http://www.ontotext.com/proton-ontology Linking at schema level: (1) using rdfs:subClassOf and rdfs:subPropertyOf statements; (2) using OWL expressions where there is a difference in the conceptualization (3) using inference rules if additional individuals are necessary in the repository to support the mapping European Data Forum June 2012 #9
  • 10. Access modes RDF Search - retrieve ranked list of URIs related to literals, which contain specific keywords European Data Forum June 2012 #10
  • 11. Access modes (condt) Exploration ‐ traversing the data, one resource at a time 
  • 12. Access modes (condt) Exploration ‐ traversing the data, one resource at a time,   inspecting inferred knowledge  - locatedIn – Denmark, Northern Europe - Geonames types/FearureCodes P.PPL yp / - parentFeature – Denmeark, Europe … European Data Forum June 2012 #12
  • 13. Access modes (condt) Exploration - traversing the data, one resource at a time, inspecting inferred knowledge - locatedIn - Europe - subRegionOf - Europe g p - hasContactInfo – website via Freebase - containsLocation … European Data Forum June 2012 #13
  • 14. Access modes (condt) SPARQL endpoint European Data Forum June 2012 #14
  • 15. Access modes (condt) RelFinder European Data Forum June 2012 #15
  • 16. Querying Using LOD concepts SELECT * WHERE { ?Person dbp‐ont:birthPlace ?BirthPlace ; rdf:type dbp‐ont:Politician ; ?BirthPlace geo‐ont:parentFeature dbpedia:Germany . ?BirthPlace geo ont:parentFeature dbpedia:Germany }  Using the intermediary layer g y y SELECT * WHERE { ?Person prot:birthPlace ?BirthPlace ; rdf:type prot:Politicianr ; ?BirthPlace prot:subRegionOf dbpedia:Germany . } European Data Forum June 2012
  • 17. Find Airports near London Standard LOD vs. PROTON query 13 vs. 20 results DBpedia vs DBpedia and Geonames vs. European Data Forum June 2012 #17
  • 18. Find airports near London ‐ Results comparison Using Geospatial index of OWLIM g p European Data Forum June 2012 #18
  • 19. City – a subject of a science fiction author European Data Forum June 2012 #19
  • 20. OWLIM 5.0 and SPARQL 1.1 Exemplary queries : GROUP BY, min GROUP BY, min — Minimal and maximal population counts of European countries Federated Query between FactForge and LinkedLifeData —D Drugs that cure the disease from which died Alexandre Graham Bell th t th di f hi h di d Al d G h B ll Literal index over dates – World governors in office between 1980 and 2005 Literal index over digits ― European countries with population above 20 MLN Geospatial index Geospatial index — Show the distance from London of airports located at most 50 miles away from it European Data Forum June 2012 #20
  • 21. Challenges and usage • Clean data  – Clean up input data • At model level  – Contradiction detection – Consistency checking y g • Curation and upgrading methodology FactForge has been used as data layer infrastructure in FP7 projects, like RENDER FactForge has been used in tasks of linked data generation from unstructured data, metadata enrichment of structured data providing linkages to the entire LOD cloud for example The National Archive European Data Forum June 2012 #21
  • 22. Acknowledgements Partial funding Colleagues Atanas Kiryakov, CEO of Ontotext Zdravko T h Zd k Tashev, Ontotext O Ivan Peikov, Ontotext Rouslan Velkov, Ontotext Kiril Simov, Ontotext Barry Bi h B Bishop, Ontotext O Barry Norton, Ontotext Marin Dimitrov, Ontotext Alex Simov, Ontotext Jordan Di h J d Dichev, Ontotext O t t t k Links Konstantin Penchev, Ontotext http://ff-dev.ontotext.com http://www.ontotext.com/owlim http://www.ontotext.com/factforge Email: E il info@factforge.net European Data Forum June 2012 #22
  • 23. Thank you for your attention! mariana.damova@ontotext.com