Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Data Integration And Visualization

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
Sparql
Sparql
Wird geladen in …3
×

Hier ansehen

1 von 79 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Andere mochten auch (20)

Anzeige

Ähnlich wie Data Integration And Visualization (20)

Aktuellste (20)

Anzeige

Data Integration And Visualization

  1. 1. Data integration and visualization Ivan Ermilov University of Leipzig USING RDF
  2. 2. Agenda • Data discovery • Data conversion • Data integration
  3. 3. Linked Data Lifecycle http://stack.lod2.eu/blog/
  4. 4. DATA DISCOVERY
  5. 5. Data Discovery • Ontologies • Vocabularies • Documents
  6. 6. Data Discovery: Ontologies Specification of a conceptualization
  7. 7. Data Discovery: Ontologies
  8. 8. Data Discovery: Ontologies http://swoogle.umbc.edu/ http://watson.kmi.open.ac.uk/WatsonWUI/
  9. 9. Data Discovery: Vocabularies FOAF – Friend of a Friend: • A Semantic Web Vocabulary used to describe people, their activities and their relationships between one another. • It is becoming very popular for people who discover this to setup and have their own FOAF profile. • This vocabulary is the base from which other vocabularies are extended.
  10. 10. Data Discovery: Vocabularies http://xmlns.com/foaf/spec/
  11. 11. Data Discovery: Vocabularies
  12. 12. Data Discovery: Vocabularies http://lov.okfn.org/dataset/lov/
  13. 13. Data Discovery: Documents <http://www.linkedin.com/in/timbl> <http://purl.org/dc/terms/title> "Tim Berners-Lee - LinkedIn"@en . _:node0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2006/vcard/ns#Address> . _:node0 <http://www.w3.org/2006/vcard/ns#locality> "Greater Boston Area" . <http://www.linkedin.com/in/timbl> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/12/cal/icaltzd#vcalendar> . _:node1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/12/cal/icaltzd#Vevent> . _:node1 <http://www.w3.org/2002/12/cal/icaltzd#summary> "MIT" . _:node1 <http://www.w3.org/2002/12/cal/icaltzd#description> "Director, World Wide Web ConsortiumnnAlso, part time Prof in ECS at Southampton University, UK" .
  14. 14. Data Discovery: Documents http://sindice.com/
  15. 15. Data Catalogs • Community maintained registry exists • Contains 362 data catalogs (growing) • Based on CKAN data catalog platform http://datacatalogs.org/
  16. 16. Data Catalogs http://datacatalogs.org/
  17. 17. What is CKAN? • Metadata repository with crowd-sourcing enabled • Everybody can register and publish data about their datasets • Developer-friendly web application • Provides a well-documented API • Easy to install, easy to use as your own metadata repository
  18. 18. CKAN Architecture Packages Resources contain And you can search for them
  19. 19. The Data Hub
  20. 20. The Data Hub
  21. 21. Hub of Data
  22. 22. Hub of Data
  23. 23. CKAN API • Well-documented • http://docs.ckan.org/en/latest/api.html • Covers everything you can do with the web interface • You can write your own web interface • OKFN maintained library for accessing API • ckanclient (python)
  24. 24. CKAN API: Methods • Retrieving data • Creating new data • Update existing data • Delete existing data • Data is: packages, resources, groups, tags, users etc.
  25. 25. CKAN API: Examples ckan = CkanClient(base_location=ckan_api_url, api_key=ckan_api_key) package_list = ckan.package_list() formats = [] for package in package_list: resource_list = package[‘resources’] for resource in resource_list: if(not resource['format'] in formats): formats.append(resource['format']) return sorted(formats) https://github.com/okfn/ckanclient
  26. 26. Use Case: CSV2RDF Conversion • Framework for CSV2RDF conversion • Crowd-sourcing enabled • RDF Visualizations https://github.com/earthquakesan/CSV2RDF-WIKI
  27. 27. CSV2RDF Conversion: Why CSV?
  28. 28. CSV2RDF Conversion: Data Quality
  29. 29. Data conversion
  30. 30. Data Conversion • Structured: Relational Databases • Semi structured: XML, HTML, XLS, CSV, APIs • Unstructured: Raw text PublicData.eu Statistics
  31. 31. XML RDB Spreadsheet ? How does government spending in certain sectors relates to my company’s earnings? How does the historic spending relates to the current figures? Give me report about all of my customers across the whole organization Data Conversion
  32. 32. Custom scripts XML RDB Spreadsheet ? Data Conversion XPath SQL Result aggregation
  33. 33. Merging data with RDF XML RDB Spreadsheet Once in RDF:  Easily integrate your data  Concepts can be mapped to one another  Query everything with one W3C standard language (SPARQL)
  34. 34. Merging Data with RDF: Example • Blue App has model
  35. 35. • Red App has model • Need to integrate Red & Blue models Merging Data with RDF: Example
  36. 36. • Step 1: Merge RDF • Same nodes (URIs) join automatically Merging Data with RDF: Example
  37. 37. • Step 2: Add relationships and rules • (Relationships are also RDF) Merging Data with RDF: Example
  38. 38. • Step 3: Define Green model • (Making use of Red • & Blue models) Merging Data with RDF: Example
  39. 39. • What the Blue app sees: • No difference! Merging Data with RDF: Example
  40. 40. • What the Red app sees • No difference! Merging Data with RDF: Example
  41. 41. RDF helps bridge other formats/models • Producers and consumers may use different formats/models • Rules can specify transformations • Inference engine finds path to desired result model RDF Model Transform A1 A2 A3 B1 B2 C1 C2 X Y Z Ontologies & Rules Ontologies & Rules Ontologies & Rules
  42. 42. RDB2RDF
  43. 43. Extract, Transform, Load (ETL)
  44. 44. Automatic Mapping
  45. 45. Semi-Automatic Mapping
  46. 46. R2RML
  47. 47. Sparqlify: Examples
  48. 48. Sparqlify: Examples
  49. 49. Sparqlify: Examples
  50. 50. Sparqlify: Examples
  51. 51. Sparqlify: Examples
  52. 52. Sparqlify: CSV2RDF Prefix pdd: <http://data.publicdata.eu/> Prefix pdo: <http://wiki.publicdata.eu/ontology/> Create View Template DefaultMapping As Construct { ?s ?p1 ?o1 ; ?p2 ?o2 ... } With ?s = uri(concat(pdd:,’csv-path/’,?rowId)) ?p1 = uri(concat(pdo:, ?headingName1)) ?o1 = plainLiteral(?1) ?p2 = ... http://sparqlify.org/
  53. 53. Raw Text Processing: ConTEXT ● No installation and configuration required. ● Access content from a variety of sources ● Instantly show the results of text analysis to users in a variety of visualizations. ● Allow refinement of automatic annotations and take feedback into account ● Provide a generic architecture where different modules for content acquisition, natural language processing and visualization can be plugged together. http://rdface.aksw.org/nlp/hub.php
  54. 54. Processing Raw Text: ConTEXT
  55. 55. Data Integration
  56. 56. Definition • In general, integration of multiple information systems aims at combining selected systems so that they form a unified new whole and give users the illusion of interacting with one single information system
  57. 57. Semantic Data Integration
  58. 58. Federated SPARQL Queries • Query processing involving multiple distributed data sources, e.g. Linked Open Data cloud DBpedia New York Times Query both data collections in an integrated way
  59. 59. Federated Query Processing Federation mediator at the server Virtual integration of (remote) data sources Communication via SPARQL protocol SPARQL Data Source SPARQL Data Source Federation Mediator SPARQL Data Source Query
  60. 60. Federated Query Engines Engine Name Implementation language License FedX Java GNU A.G.P.L SPLENDID Java L.G.P.L LHD Java MIT DARQ Java GPL ANAPSID Python GNU G.P.L ADERIS Java Apache
  61. 61. Data Visualization
  62. 62. LD Visualization Techniques
  63. 63. LD Visualization Techniques
  64. 64. LD Visualization Techniques
  65. 65. LD Visualization Techniques
  66. 66. Classification of Visualization Techniques
  67. 67. Comparison of Values/Attributeshttp://goo.gl/IvsGbU http://goo.gl/JeFhlM
  68. 68. Analysis of Relationships and Hierarchies
  69. 69. Analysis of Relationships and Hierarchies http://rhizomik.net/dbpedia/treemap.jsp http://lov.okfn.org/dataset/lov/
  70. 70. Analysis of Temporal and Geographical Events http://lov.okfn.org/dataset/lov/details/vocabulary_dcterms.html
  71. 71. Analysis of Multidimensional Data http://mbostock.github.io/protovis/ex/cars.html
  72. 72. Other Visualization Techniques
  73. 73. Applications of LD Visualization Techniques
  74. 74. Tool Types
  75. 75. Tool Types
  76. 76. CubeViz
  77. 77. Facete
  78. 78. Thank you Ivan Ermilov iermilov@informatik.uni-leipzig.de University of Leipzig FOR YOUR ATTENTION

Hinweis der Redaktion

  • Example scenario DBpedia and New York Times collections DBpedia as structured knowledge base New York Times as a news provider

×