Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

The Semantic Data Web, Sören Auer, University of Leipzig

Nächste SlideShare
Linked data life cycles
Linked data life cycles
Wird geladen in …3

Hier ansehen

1 von 59 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)


Ähnlich wie The Semantic Data Web, Sören Auer, University of Leipzig (20)

Weitere von LOD2 Creating Knowledge out of Interlinked Data (20)


Aktuellste (20)

The Semantic Data Web, Sören Auer, University of Leipzig

  1. 1. The Semantic Data WebSören Auer<br />
  2. 2. Problem: Try to search for these things on the current Web:<br /><ul><li>Apartments near German-Russian bilingual childcare in Berlin.
  3. 3. ERP service providers with offices in Vienna and London.
  4. 4. Researchers working on multimedia topics in Eastern Europe.</li></ul>Informationis available on the Web, but opaque to current search.<br />Why Semantic Data Web?<br />Solution: complement text on Web pages with structured linked open data & intelligently combine/integrate such structured information from different sources:<br />Search engine<br />HTML<br />HTML<br />RDF<br />RDF<br />Web server<br />Web server<br />Web server<br />Web server<br />berlin.de<br />Has everything about childcare in Berlin.<br />Immobilienscout.de<br />Knows all about real estate offers in Germany<br />DB<br />DB<br />
  5. 5. Vom Web derDokumentezumSemantic Data Web<br />Semantic Web(Vision 1998, starting ???)<br /><ul><li>Reasoning
  6. 6. Logic, Rules
  7. 7. Trust</li></ul>Data Web (since 2006)<br /><ul><li>URI de-referencability
  8. 8. Web Data integration
  9. 9. RDF serializations</li></ul>Social Web (since 2003)<br /><ul><li>Folksonomies/Tagging
  10. 10. Reputation, sharing
  11. 11. Groups, relationships</li></ul>Web (since 1992)<br /><ul><li>HTTP
  12. 12. HTML/CSS/JavaScript</li></li></ul><li>Web 1.0 Web 2.0 Web 3.0<br />Many Web sitescontaining unstructured,textual content<br />Few large Web sites are specialized onspecific content types<br />Many Web sites containing & semantically syndicating arbitrarily structured content<br />Video<br />Pictures<br />Encyclopedicarticles<br />+<br />+<br />
  13. 13. The Long Tail of Information Domains<br />Pictures<br />The Long Tail by Chris Anderson (Wired, Oct. ´04) adopted to information domains<br />Recipes<br />News<br />Video<br />Calendar<br />Popularity<br />SemWeb supported structured content<br />Requirements-Engineering<br />Talentmanagement<br />Special interest<br />communities<br />Itinerary ofKing George<br />Gene<br />sequences<br />…<br />…<br />…<br />…<br />Currently supportedstructuredcontent types<br />Not or insufficiently supported content types<br />
  14. 14. 1. Uses RDF Data Model<br />Linked Data Web Technology<br />14.7.2011<br />takesPlaceAt<br />organizes<br />Uni Mainz<br />Semedia2011<br />Mainz<br />takesPlaceIn<br />2. Is serialised in triples:<br />UniMainzorganizes SeMedia2011<br />SeMedia2011 takesPlaceAt “20110714”^^xsd:date<br />SeMedia2011 takesPlaceAtMainz<br />3. Uses Content-negotiation<br />
  15. 15. The emerging Web of Data<br />interlink<br />2009<br />2007<br />SILK<br />DXX Engine<br />fuse<br />create <br />2008<br />poolparty<br />SemMF<br />OntoWiki<br />2008<br />Sigma<br />WiQA<br />2008<br />2008<br />ORE<br />repair<br />classify<br />Virtouso<br />2009<br />DL-Learner<br />MonetDB<br />Sindice<br />enrich<br />
  16. 16. What works now? What has to be done?<br /><ul><li>Web - a global, distributed platform for data, information and knowledge integration
  17. 17. exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF</li></ul>Achievements<br />Extension of the Web with a data commons (25B facts<br />vibrant, global RTD community<br />Industrial uptake begins (e.g. BBC, Thomson Reuters, Eli Lilly) <br />Emerging governmental adoption in sight<br />Establishing Linked Data as a deployment path for the Semantic Web.<br /><ul><li>Challenges</li></ul>Coherence: Relatively few, expensively maintained links<br />Quality: partly low quality data and inconsistencies<br />Performance: Still substantial penalties comparedto relational <br />Data consumption: large-scale processing, schema mapping and data fusion still in its infancy<br />Usability: Establishing direct end-user tools and network effect<br />April 2008<br />July 2007<br />September 2008<br />July 2009<br />
  18. 18. Linked DataLifecycle<br />
  19. 19. Extraction<br />
  20. 20. From unstructured sources<br /><ul><li>NLP, text mining, annotation</li></ul>From semi-structured sources<br /><ul><li>DBpedia, LinkedGeoData, SCOVO/DataCube</li></ul>From structured sources<br /><ul><li>RDB2RDF</li></ul>Extraction<br />
  21. 21. extract structured information from Wikipedia & make this information available on the Web as LOD:<br /><ul><li>ask sophisticated queries against Wikipedia (e.g. universities in brandenburg, mayors of elevated towns, soccer players),
  22. 22. link other data sets on the Web to Wikipedia data
  23. 23. Represents a community consensus</li></ul>Recently launched DBpedia Live transforms Wikipedia into a structred knowledge base<br /> Transforming Wikipedia into an Knowledge Base<br />
  24. 24. Structure in Wikipedia<br />Title<br />Abstract<br />Infoboxes<br />Geo-coordinates<br />Categories<br />Images<br />Links<br />other language versions<br />other Wikipedia pages<br />To the Web<br />Redirects<br />Disambiguations<br />14.07.2011<br />Sören Auer - The emerging Web of Linked Data<br />13<br />
  25. 25. Infobox templates<br />Wikitext-Syntax<br />{{Infobox Korean settlement<br />| title = Busan Metropolitan City<br />| img = Busan.jpg<br />| imgcaption = A view of the [[Geumjeong]] district in Busan<br />| hangul = 부산 광역시<br />...<br />| area_km2 = 763.46<br />| pop = 3635389<br />| popyear = 2006<br />| mayor = Hur Nam-sik<br />| divs = 15 wards (Gu), 1 county (Gun)<br />| region = [[Yeongnam]]<br />| dialect = [[Gyeongsang]]<br />}}<br />http://dbpedia.org/resource/Busan<br />dbp:Busan dbpp:title ″Busan Metropolitan City″<br />dbp:Busan dbpp:hangul ″부산 광역시″@Hang<br />dbp:Busan dbpp:area_km2 ″763.46“^xsd:float<br />dbp:Busan dbpp:pop ″3635389“^xsd:int<br />dbp:Busan dbpp:region dbp:Yeongnam<br />dbp:Busan dbpp:dialect dbp:Gyeongsang<br />...<br />RDF representation<br />14.07.2011<br />Sören Auer - The emerging Web of Linked Data<br />14<br />
  26. 26. A vast multi-lingual, multi-domain knowledge base<br />DBpedia extraction results in:<br />descriptions of ca. 3.4 million things (1.5 million classified in a consistent ontology, including 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films, 15,000 video games, 140,000 organizations, 146,000 species, 4,600 diseases<br />labels and abstracts for these 3.2 million things in up to 92 different languages; 1,460,000 links to images and 5,543,000 links to external web pages; 4,887,000 external links into other RDF datasets, 565,000 Wikipedia categories, and 75,000 YAGO categories<br />altogether over 1 billion pieces of information (i.e. RDF triples): 257M from English edition, 766M from other language editions<br />DBpedia made a substantial impact in science, technology and society<br />DBpedia became the central interlinking hub on the Data Web<br />Scientific publications attracted more than 500 citations<br />More than 15.000 monthly visits on DBpedia.org,numerous press articles, blog posts …<br />Ecosystem of commercial and community applications:ThomsonReuters, BBC, Neofonie, Openlink, Faviki …<br />Motivates further research need wrt. scalability, quality<br />14.07.2011<br />Sören Auer - The emerging Web of Linked Data<br />15<br />
  27. 27. Many different approaches (D2R, Virtuoso RDF Views, Triplify, …)<br />No agreement on a formalsemantics of RDF2RDFmapping<br /><ul><li>LOD readiness, SPARQL-SQL translation</li></ul>W3C RDB2RDF WG<br />Extraction Relational Data<br />
  28. 28. From unstructured sources<br /><ul><li>Deploy existing NLP approaches (OpenCalais, Ontos API)
  29. 29. Develop standardized, LOD enabled interfaces between NLP tools (NLP2RDF)</li></ul>From semi-structured sources<br /><ul><li>Efficient bi-directional synchronization</li></ul>From structured sources<br /><ul><li>Declarative syntax and semantics of data model transformations (W3C WG RDB2RDF)</li></ul>Orthogonal challenges<br /><ul><li>Using LOD as background knowledge
  30. 30. Provenance</li></ul>Extraction Challenges<br />
  31. 31. Storage and Querying<br />
  32. 32. Still by a factor 5-50 slower than relational data management (BSBM)<br />Performance increases steadily<br />Comprehensive, well-supported open-soure and commercial implementations are available:<br /><ul><li>OpenLink’s Virtuoso (os+commercial)
  33. 33. Big OWLIM (commercial), Swift OWLIM (os)
  34. 34. 4store (os)
  35. 35. Talis (hosted)
  36. 36. Bigdata (distributed)
  37. 37. Allegrograph (commercial)
  38. 38. Mulgara (os)</li></ul>RDF Data Management<br />
  39. 39. <ul><li>Reduce the performance gap between relational and RDF data management
  40. 40. SPARQL Query extensions
  41. 41. Spatial/semantic/temporal data management
  42. 42. More advanced query result caching
  43. 43. View maintenance / adaptive reorganization based on common access patterns
  44. 44. More realistic benchmarks</li></ul>Storage and Querying Challenges<br />
  45. 45. Authoring<br />
  46. 46. 1. Semantic (Text) Wikis<br /><ul><li>Authoring of semanticallyannotated texts</li></ul>2. Semantic Data Wikis<br /><ul><li>Direct authoring of structured information (i.e. RDF, RDF-Schema, OWL)</li></ul>Two Kinds of Semantic Wikis<br />
  47. 47. Versatile domain-independent tool<br />Serves as Linked Data / SPARQL endpoint on the Data Web<br />Open-source project hosted at Google code<br />Not just a Wiki UI, but a whole framework for the development of Semantic Web applications<br />Developed in PHP based on the Zend framework<br />Very active developer and user community<br />More than 500 downloads monthly<br />Large number of use cases<br />Management of multimedia content for a record label in the UK<br />OntoWiki – a semantic data wiki<br />
  48. 48. Dynamische Sichten auf die Wissensbasis<br />OntoWiki<br />14.07.2011<br />Sören Auer - The emerging Web of Linked Data<br />24<br />
  49. 49. OntoWiki<br />RDF-Triple auf Resourcen-Detailseite<br />14.07.2011<br />Sören Auer - The emerging Web of Linked Data<br />25<br />
  50. 50. OntoWiki<br />Dynamische Vorschläge aus dem Daten Web<br />14.07.2011<br />Sören Auer - The emerging Web of Linked Data<br />26<br />
  51. 51. RDFauthor in OntoWiki<br />
  52. 52. 14.07.2011<br />Sören Auer - The emerging Web of Linked Data<br />28<br />OntoWiki: Caucasian Spiders<br />
  53. 53. OntoWiki: Supporting Requirements engineering<br />
  54. 54. Semantic Portal with OntoWiki: Vakantieland<br />
  55. 55. Linking<br />© CC-BY-NC-ND by ~Dezz~ (residae on flickr)<br />
  56. 56. Automatic<br />Semi-automatic<br /><ul><li>SILK
  57. 57. LIMES</li></ul>Manual<br /><ul><li>Sindice integration into UIs
  58. 58. Semantic Pingback</li></ul>LOD Linking<br />
  59. 59. update and notification services for LOD <br />Downward compatible with Pingback (blogosphere)<br />http://aksw.org/Projects/SemanticPingBack<br />Creating a network effect aroundLinking Data: Semantic Pingback<br />
  60. 60. Visualizing Pingbacks in OntoWiki<br />
  61. 61. Only 5% of the information on the Data Web is actually linked<br /><ul><li>Make sense of work in the de-duplication/record linkage literature
  62. 62. Consider the open world nature of Linked Data
  63. 63. Use LOD background knowledge
  64. 64. Zero-configuration linking
  65. 65. Explore active learning approaches, which integrate users in a feedback loop
  66. 66. Maintain a 24/7 linking service: Linked Open Data Around-The-Clock project (LATC-project.eu)</li></ul>Interlinking Challenges<br />
  67. 67. Enrichment<br />
  68. 68. Linked Data is mainly instance data and !!!<br />ORE (Ontology Repair and Enrichment) tool allows to improve an OWL ontology by fixing inconsistencies & making suggestions for adding further axioms.<br /><ul><li>Ontology Debugging: OWL reasoning to detect inconsistencies and satisfiable classes + detect the most likely sources for the problems. user can create a repair plan, while maintaining full control.
  69. 69. Ontology Enrichment: uses the DL-Learner framework to suggest definitions & super classes for existing classes in the KB. works if instance data is available for harmonising schema and data.</li></ul>http://aksw.org/Projects/ORE<br />Enrichment & Repair<br />
  70. 70. Quality<br />Analysis<br />
  71. 71. Quality on the Data Web is varying a lot<br /><ul><li>Hand crafted or expensively curated knowledge base (e.g. DBLP, UMLS) vs. extracted from text or Web 2.0 sources (DBpedia)</li></ul>Research Challenge<br /><ul><li>Establish measures for assessing the authority, provenance, reliability of Data Web resources</li></ul>Linked Data Quality Analysis<br />
  72. 72. Evolution<br />© CC-BY-SA by alasis on flickr)<br />
  73. 73. <ul><li>unified method, for both data evolution and ontology refactoring.
  74. 74. modularized, declarative definition of evolution patterns is relatively simple compared to an imperative description of evolution
  75. 75. allows domain experts and knowledge engineers to amend the ontology structure and modify data with just a few clicks
  76. 76. Combined with RDF representation of evolution patterns and their exposure on the Linked Data Web, EvoPat facilitates the development of an evolution pattern ecosystem
  77. 77. patterns can be shared and reused on the Data Web.
  78. 78. declarative definition of bad smells and corresponding evolution patterns promotes the (semi-)automatic improvement of information quality.</li></ul>EvoPat – Pattern based KB Evolution<br />
  79. 79. Evolution Patterns<br />
  80. 80. 14.07.2011<br />Sören Auer - The emerging Web of Linked Data<br />43<br />
  81. 81. Exploration<br />
  82. 82. CatalogusProfessorumLipsiensis<br />
  83. 83. Visual Query Builder<br />
  84. 84. Relationship Finder in CPL<br />
  85. 85. PublicData.eu<br />
  86. 86. http://fintrans.publicdata.eu<br />
  87. 87. http://energy.publicdata.eu <br />
  88. 88. http://scoreboard.lod2.eu<br />
  89. 89. Anwendung BBC Music<br />39 meist besuchte Webseite<br />Integriert verschiedenste Linked Data Quellen (DBpedia, Musicbrainz, Geonames)<br />
  90. 90. LODLifecycle<br />supported byDebian based<br />LOD2 Stack<br />(to be released in September)<br />
  91. 91. <ul><li>Linked Data isrelativelysimple touse
  92. 92. The LOD ecosystemcomprisesextraction, authoring, enrichment, exploration & search
  93. 93. More workhastobedoneto bring variousbitsandpiecestogether
  94. 94. Debian based LOD2 Stackfaciliatesmanagingthe LOD lifecycle
  95. 95. CKAN.net will evolveinto a registrytordatasets, visualizations, apps, wrappers, mashups, ..
  96. 96. Linked Data is a perfecttechnologyforMedia applications</li></ul>Take homemessages<br />
  97. 97. Thanksforyourattention!<br />Sören Auer<br />http://www.informatik.uni-leipzig.de/~auer/ | http://aksw.org | http://lod2.org<br />auer@uni-leipzig.de<br />
  98. 98. AKSW Data Web Building Blocks<br />Vakantieland<br />Building Data Web applications<br />OntoWiki<br />Collaborative creation of explicit knowledge via Semantic Wikis<br />RDB2RDF<br />Mapping relational data to RDF<br />SoftWiki<br />Distributed, stakeholder driven Requirements Engineering<br />DL-Learner<br />Machine Learning for Ontologies<br />DBpedia<br />“Semantification” ofWikipedia<br />OpenResearch.org<br />A semantic Wiki for the sciences<br />ORE<br />Ontology Enrichment & Repair<br />LinkedGeoData<br />“Semantification” of OpenStreetMaps<br />xOperator<br />Combining Instant Messaging with the Data Web<br />LIMES<br />Link Discovery Framework<br />formetricspaces)<br />Triplify<br />“Semantification” of (small) Web Applications<br />CatalogusProfessorum<br />Prosopographical knowledge base<br />RDF Query Subsumption& View Maintenance<br />Scaling triple stores<br />LESS<br />Semantification Syndication<br />…<br />Foundations<br />Marrying databases with RDFand ontologies<br />Applications<br />Bringing the Data Web to end users<br />Tools<br />14.07.2011<br />Sören Auer - The emerging Web of Linked Data<br />56<br />
  99. 99. LOD2 in a Nutshell<br />57<br />Research focus<br /><ul><li>Very large RDF datamanagement
  100. 100. Enrichment &Interlinking
  101. 101. Fusion & InformationQuality
  102. 102. Adaptive UI interfaces</li></ul>Use Cases<br /><ul><li>Media & Publishing
  103. 103. Enterprise Data Webs
  104. 104. Open Gov Data</li></ul>Partners<br />Uni Leipzig, DERI Galway, FU Berlin, Semantic Web Company, OpenLink, Tenforce, Exalead, WoltersKluwer, OKFN<br />
  105. 105. Open Governmental Data –and ideal testbed for Linked Data?<br /><ul><li>European registry & collaboration platform for open governmental data
  106. 106. Outreach & involve original data providers - local, regional, national and European</li></ul>Close cooperation with W3C eGov IG, OKFN’s OpenEUdata, PSI & grassroots efforts<br />CKAN.org | OKFN’s EuOpenData group |ICT2010 Networking Session<br />UIs and Personalization<br />individual mashups of data with other sources<br />Notification/subscription service based on personal prefs<br />Transparency wishlists, upload revisions, derivates<br />create and publish queries, reports and visualizations<br />58<br />
  107. 107. Linked Enterprise Intra Data Webs can fill the gap between Intra-/Extranets and EIS/ERP<br />Facilitates data integration along value-chains within and across enterprises<br />The pragmatic, incremental, vocabulary based Linked Data approach reduces data integration costs significantly<br />The wealth of knowledge available as Linked Open Data can be leveraged as background knowledge for Enterprise applications<br />Linked Enterprise Data<br />

Hinweis der Redaktion

  • Initially, the Web consisted of many Websites containing only unstructured/textual content.The Web 2.0 extended this traditional Web with few extremely large Websites specializing on certain specific content types. Examples are:Wikipedia for encyclopedic articlesDel.icio.us for Web linksFlickr for picturesYouTube for VideosDigg for news…These websites provide specialized searching, querying, sharing, authoring specifically adopted to the content type of their concern.
  • Popular content types such as pictures, movies, calendars, encyclopedic articles, news recipes etc. are already sufficiently well supported on the Web.However, there is a long tail of special-interest content (profiles of expertise, historic data and events, bio-medical knowledge, intra-corporational knowledge etc.) which has very low or no current support (for filtering, aggregation, searching, querying, collaborative editing) on the Web.
  • http://www.flickr.com/photos/residae/2560241604/#/
  • http://www.flickr.com/photos/alasis/3541341601/sizes/l/in/photostream/