Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

2014.12 - Let's Disco (EDDI 2014)

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
Annotating with RDFa
Annotating with RDFa
Wird geladen in …3
×

Hier ansehen

1 von 145 Anzeige

Weitere Verwandte Inhalte

Ähnlich wie 2014.12 - Let's Disco (EDDI 2014) (20)

Weitere von Dr.-Ing. Thomas Hartmann (20)

Anzeige

Aktuellste (20)

2014.12 - Let's Disco (EDDI 2014)

  1. 1. Slideshare http://de.slideshare.net/boschthomas
  2. 2. Questions? Please don‘t hesitate!
  3. 3. asbigas DDI-L?
  4. 4. Disco Spechttps://github.com/linked-statistics/ disco-spec/blob/master/discovery.html
  5. 5. Triple Storehttp://multiweb.gesis.org/openrdf- workbench/repositories/discotest/summary
  6. 6. Why Disco?
  7. 7. Why DDI as Linked Data?
  8. 8. Use Case
  9. 9. Whereto searchfor data?
  10. 10. Which microdatadoes existaccording to specific metadata?
  11. 11. Which datasetsare associatedwith the microdata?
  12. 12. Which aggregated data according to specific metadatadoes exist?
  13. 13. Which datasetsare associatedwith aggregated data?
  14. 14. Fromwhich microdatadatasets is the aggregateddataset derived?
  15. 15. Whichsummary statisticsdoes a variablehave?
  16. 16. Which category statistics does a variablerepresentation have?
  17. 17. Which microdata datasetsare created bythe research institute 'GESIS'?
  18. 18. Overview
  19. 19. SeriesStudies
  20. 20. Series Series title: CIS
  21. 21. Study Study title: EU-LFS 1991
  22. 22. Agents
  23. 23. IdentificationVersioning
  24. 24. ddi:study a disco:Study; dcterms:title "National Population and Housing Census, 1980"@en; adms:identifier[ a adms:Identifier; skos:notation "us:ddi:us.mpc:ARG_1980_PHC_v01_A_IPUMS:1"; adms:schemaAgency "DDI Alliance"@en. ].
  25. 25. ddi:study a disco:Study ; dcterms:creator [ rdfs:label "Minnesota Population Center"@en ; skos:notation "MPC“ ; adms:identifier[ a adms:Identifier ; skos:notation "us.mpc“ ; adms:schemaAgency "DDI Alliance"@en] ] .
  26. 26. Coverage
  27. 27. Spatial Coverage
  28. 28. <urn:ddi:de.gesis:study_EU-SILC-2005:0.1> a disco:Study ; dcterms:title "EU-SILC 2005"@en ; skos:prefLabel "2005"@en ; dcterms:spatial <http://sws.geonames.org/2782113> , ... , :AllCountriesOfStudy ;
  29. 29. :AllCountriesOfStudy a dcterms:Location, missy:Country; rdfs:label "all countries of study"; missy:code "" .
  30. 30. Countries Study: EU-LFS 2004
  31. 31. Temporal Coverage
  32. 32. <urn:ddi:de.gesis:study_EU-SILC-2005:0.1> a disco:Study ; dcterms:title "EU-SILC 2005"@en ; skos:prefLabel "2005"@en ; dcterms:temporal <urn:ddi:de.gesis:0ba9b4f3-ec22- 4471-8ffa-a38e8ada187a:0.1> ;
  33. 33. <urn:ddi:de.gesis:0ba9b4f3-ec22-4471- 8ffa-a38e8ada187a:0.1> a disco:PeriodOfTime; dcterms:date "Jan 1, 2005 12:00:00 AM"^^xsd:date .
  34. 34. Year Study title: 'Structure of Earnings Survey –2006'
  35. 35. Topical Coverage
  36. 36. missy:PB100 a disco:Variable dcterms:subject[ a skos:Concept ; skos:notation "Quarter of the personal interview"@en ] .
  37. 37. Topical Coverage Variable ID: <urn:ddi:de.gesis:variable_EU-SILC-2010- panel-p-data-2010_rev2-PB020:0.1> Variable name: 'PB020'
  38. 38. Thematic Classification
  39. 39. :thematicClassification a skos:ConceptScheme; skos:hasTopConcept :concept1 , :concept2 , :concept3 . Series-Level
  40. 40. :superConcept a skos:Concept ; skos:notation "Demographic background"@en ; skos:narrower :subConcept1 , :subConcept2 . Narrower Concepts
  41. 41. Direct Broader Concepts Concept: 'Country'@en
  42. 42. All (Direct + Indirect) Broader Concepts Concept: 'Country'@en
  43. 43. Direct Narrower Concepts Concept: "Type of cooperation"@en
  44. 44. All (Direct + Indirect) Narrower Concepts Concept: "Type of cooperation"@en
  45. 45. Top Concepts Series: EU-SILC Thematic Classification: <urn:ddi:de.gesis:7bb54a91-4b26-4f6e- a1b7-be48cb58be24:0.1>
  46. 46. 2-Level Concepts Series: EU-SILC Thematic Classification: <urn:ddi:de.gesis:7bb54a91-4b26-4f6e- a1b7-be48cb58be24:0.1>
  47. 47. Lowest-Level Concepts (Leaf Concepts) Series: EU-SILC Thematic Classification: <urn:ddi:de.gesis:7bb54a91-4b26-4f6e- a1b7-be48cb58be24:0.1>
  48. 48. Data SetsData Files
  49. 49. Data Sets
  50. 50. All Data Sets (IDs) Study: EU-SILC 2010
  51. 51. Data Files
  52. 52. :dataFile a disco:Datafile; dcterms:identifier "ARG1900-P-H.dat“ ; dcterms:description "Person records"@en ; disco:caseQuantity 2667714 ; dcterms:format "ascii“ ; dcterms:provenance "Minnesota Population Center"@en ; owl:versionInfo "Version 1.0, IPUMS sample"@en.
  53. 53. :dataFile a disco:Datafile ; dcterms:spatial[ a dcterms:Location ; rdfs:label "Argentina, national coverage"@en]; dcterms:temporal:periodOfTime .
  54. 54. Controlled Vocabularies
  55. 55. Variables
  56. 56. ddi:AR80A401 a disco:Variable; skos:notation "AR80A401“ ; skos:prefLabel "Sex"@en ; disco:basedOn ddi:SexVD ; disco:question ddi:QuestionGender .
  57. 57. ddi:SexVD a disco:RepresentedVariable; disco:universe ddi:UniversePerson ; disco:representation ddi:SexRepr ; disco:concept ddi:IpumsC1 ; skos:prefLabel "Sex"@en ; dcterms:description "Sex data element"@en.
  58. 58. missy:PB100 a disco:Variable; skos:notation "PB100" ; skos:prefLabel "Quarter of the personal interview"@en ; skos:concept :concept ; disco:question :question .
  59. 59. Variables (Names + Labels) Data Set: EI-SILC 2010 cross-sec p-data Data Set (ID): <urn:ddi:de.gesis:logicalDataSet_EU-SILC- 2010-cross-sec-p-data-2010_rev3:0.1>
  60. 60. Variable Concept Data Set: EI-SILC 2010 cross-sec p-data Data Set (ID): <urn:ddi:de.gesis:logicalDataSet_EU-SILC- 2010-cross-sec-p-data-2010_rev3:0.1> Variable Name: PB010
  61. 61. Study Data Set Variable label Variable name: 'AGE'
  62. 62. Topical Coverage Variable name: B21 Study: "Structure of Earnings Survey -2006"@en
  63. 63. Variables having no concepts Study title: EU-SILC 2006
  64. 64. VariableRepresentation
  65. 65. Valid Codes and Categories missy:1 a skos:Concept; skos:notation "1" ; skos:prefLabel "January,February,March" ; disco:isValid true .
  66. 66. Invalid Codes and Categories missy:Missing a skos:Concept ; skos:notation "M" ; skos:prefLabel "Missing" ; disco:isValidfalse .
  67. 67. Variable -Variable Representation missy:PB100 a disco:Variable , missy:Variable ; skos:notation 'PB100' ; disco:representation :representationPB100 .
  68. 68. Variable Representation :representationPB100 a disco:Representation, skos:OrderedCollection; skos:memberList ( missy:1 missy:2 missy:3 missy:4 missy:Missing ) .
  69. 69. Variable Representation Codes and Categories Variable: missy:PB100
  70. 70. Descriptive Statistics
  71. 71. Summary Statistics
  72. 72. missy:Minimum a disco:SummaryStatistics , missy:SummaryStatistics; disco:statisticsVariable missy:PB100 ; missy:country :AllCountriesOfStudy ; disco:summaryStatisticType ddicv-sumstats:Minimum ; rdf:value "1".
  73. 73. Spatial Coverage of Study :AllCountriesOfStudy a dcterms:Location, missy:Country; rdfs:label "all countries of study"; missy:code "" .
  74. 74. missy:Minimum a disco:SummaryStatistics , missy:SummaryStatistics; disco:statisticsVariable missy:PB100 ; missy:country <http://sws.geonames.org/2921044> ; disco:summaryStatisticType ddicv-sumstats:Minimum ; rdf:value "1".
  75. 75. missy:Maximum a disco:SummaryStatistics , missy:SummaryStatistics; disco:statisticsVariable missy:PB100 ; missy:country :AllCountriesOfStudy ; disco:summaryStatisticType ddicv-sumstats:Maximum ; rdf:value "4".
  76. 76. missy:Mean a disco:SummaryStatistics , missy:SummaryStatistics; disco:statisticsVariable missy:PB100 ; missy:country :AllCountriesOfStudy ; disco:summaryStatisticType ddicv-sumstats:ArithmeticMean ; rdf:value "2.17".
  77. 77. missy:StandardDeviation a disco:SummaryStatistics , missy:SummaryStatistics; disco:statisticsVariable missy:PB100 ; missy:country :AllCountriesOfStudy ; disco:summaryStatisticType ddicv-sumstats:StandardDeviation ; rdf:value "0.9061".
  78. 78. missy:ValidCases a disco:SummaryStatistics , missy:SummaryStatistics; disco:statisticsVariable missy:PB100 ; missy:country :AllCountriesOfStudy ; disco:summaryStatisticType ddicv-sumstats:ValidCases ; rdf:value "470950".
  79. 79. missy:PercentOfValidCases a disco:SummaryStatistics , missy:SummaryStatistics; disco:statisticsVariable missy:PB100 ; missy:country :AllCountriesOfStudy ; disco:summaryStatisticType ddicv-sumstats:PercentOfValidCases ; rdf:value "99.1".
  80. 80. missy:InvalidCases a disco:SummaryStatistics , missy:SummaryStatistics; disco:statisticsVariable missy:PB100 ; missy:country :AllCountriesOfStudy ; disco:summaryStatisticType ddicv-sumstats:InvalidCases ; rdf:value "4195".
  81. 81. missy:PercentOfInvalidCases a disco:SummaryStatistics , missy:SummaryStatistics; disco:statisticsVariable missy:PB100 ; missy:country :AllCountriesOfStudy ; disco:summaryStatisticType ddicv-sumstats:ValidCases ; rdf:value "0.9".
  82. 82. Variable Pointing to Summary Statistics missy:PB100 a disco:variable , missy:Variable ; missy:summaryStatistics( missy:Minimum missy:Minimum_DE missy:Maximum missy:Maximum_DE … )
  83. 83. Summary Statistics: Minimum Variable: missy:PB100 Spatial Coverage: all countries of study
  84. 84. Summary Statistics: Valid Cases Variable: missy:PB100 Spatial Coverage: DE
  85. 85. Category Statistics
  86. 86. Valid Codes and Categories missy:2 a skos:Concept , missy:Concept ; skos:notation "2" ; skos:prefLabel "April, May, June" ; disco:isValid true ; missy:categoryStatistics( missy:CS_2_AllCountries missy:CS_2_DE ) .
  87. 87. Invalid Codes and Categories missy:Missing a skos:Concept , missy:Concept ; skos:notation "M" ; skos:prefLabel "Missing" ; disco:isValid false ; missy:categoryStatistics ( missy:CS_M_AllCountries ) .
  88. 88. Valid Cases missy:CS_2_AllCountries a disco:CategoryStatistics, missy:CategoryStatistics ; disco:statisticsCategory missy:2 ; missy:country :AllCountriesOfStudy ; disco:frequency 243708 ; disco:percentage 51.3 ; disco:cumulativePercentage 51.7 ; disco:computationBase "valid" .
  89. 89. Invalid Cases missy:CS_M_AllCountries a disco:CategoryStatistics, missy:CategoryStatistics ; disco:statisticsCategory missy:Missing ; missy:country :AllCountriesOfStudy ; disco:frequency 4195 ; disco:percentage 0.9 ; disco:computationBase "invalid" .
  90. 90. Category Statistics: Frequency ( Invalid Cases) Variable: missy:PB100 Spatial Coverage: All Countries of Study Code: missy:Missing
  91. 91. Category Statistics: Cumulative Percentage ( Valid Cases) Variable: missy:PB100 Spatial Coverage: DE <http://sws.geonames.org/2921044> Code: missy:2 Category Label: 'April, May, June'
  92. 92. Data Collection
  93. 93. :variableYearOfBirth a disco:Variable skos:notation "RB080" ; skos:prefLabel "Year of birth"@en ; dcterms:subject :concept ; disco:question:questionYearOfBirth.
  94. 94. :questionYearOfBirth a disco:Question; disco:questionText "What is your date of birth?"@en .
  95. 95. Variable Question Data Set: EI-SILC 2010 cross-sec p-data Data Set (ID): <urn:ddi:de.gesis:logicalDataSet_EU-SILC- 2010-cross-sec-p-data-2010_rev3:0.1> Variable Name: PB010
  96. 96. Variables Series: EU-SILC 2005 Question text: 'What is your date of birth?'@en
  97. 97. Relationships to other Vocabularies
  98. 98. PHDD
  99. 99. Mapping DDI-XML to Disco
  100. 100. DDI 4
  101. 101. DDI 4 •Model-driven further development of DDI •Model generate multiple representations(OWL, XSD, Java, RDB, …) •Functional views are published in a step by step manner
  102. 102. Disco + DDI 4
  103. 103. Do Not Wait for DDI 4! •Own functional view for disco •Mapping: disco DDI 4 (OWL representation) •Easy migration
  104. 104. Let‘s Disco Now!
  105. 105. Acknowledgements 26expertsfromthestatisticalcommunityandtheLinkedDatacommunitycomingfrom12differentcountriescontributedtothiswork.Theywereparticipatingintheeventsmentionedbelow. •1stworkshopon'SemanticStatisticsforSocial,Behavioural,andEconomicSciences:LeveragingtheDDIModelfortheLinkedDataWeb'atSchlossDagstuhl-LeibnizCenterforInformatics,GermanyinSeptember2011 •Workingmeetinginthecourseofthe3rdAnnualEuropeanDDIUsersGroupMeeting(EDDI11)inGothenburg,SwedeninDecember2011 •2ndworkshopon'SemanticStatisticsforSocial,Behavioural,andEconomicSciences:LeveragingtheDDIModelfortheLinkedDataWeb'atSchlossDagstuhl-LeibnizCenterforInformatics,GermanyinOctober2012 •WorkingmeetingatGESIS-LeibnizInstitutefortheSocialSciencesinMannheim,GermanyinFebruary2013

×