2. OKFN Korea2
What is linked
data, Open
data?
Refine
Modelling
Access
Triple
Storage
other topics
image: Leo Oosterloo @ flickr.com
3. 서울시 데이터 Enrichment
목표
서울시 데이터 상세화를 위한 온톨로지 설계 또는 매핑
구조화, 의미화, 그리고 연결: 서울시 데이터 (비정형 데이터)를 온톨로지를 이용해
모델링하고, 외부 데이터와 연결
영문화: 비 한국어권 사용자가 사용할 수 있는 서울시 데이터 제공
범위
서울시 데이터셋 약 40종
문화재: 문화재청에서 수집한 국내 문화재 (국보, 보물, 지정문화재, 무형문화재 등)
방법론: 기존 RDF 어휘의 재사용을 통해 데이터 모델링
1) 데이터 선정: 서울시 열린데이터 광장에서 모델링 대상 데이터셋 선정
2) 데이터 셋 항목 검토: 데이터 셋의 개별 항목과 Dbpedia 온톨로지 (클래스, 속성)
의 매핑 관계 검토
• Dbpedia 온톨로지: 사물에 대한 개념 및 위키피디아 infobox 항목을 포함하고 있음
OKFN Korea3
4. 서울시 데이터 Enrichment
예를 들어, '박물관'을 모델링 할 경우,
• 박물관에 대한 infobox 템플릿을 위키피디아에서 선택
• Dbpedia에서 박물관 infobox와 매핑한 어휘 선택
• 어휘와 데이터셋 항목 매핑
• 매핑되지 않는 항목의 모델링 여부 결정 (클래스, 속성 포함): 모델링 도구 결정 필요
• URI 체계 (별도 설계 필요) 적용
• 온톨로지 스키마 설계 완료
3) 데이터 정제
• Google Refine을 통해 데이터 정제
• Refine에서 추가하기 전에 할 작업
• 위치 데이터: 원본 데이터 (서울시)에 위치값을 변환 또는 추가
• 영문명: 한글명의 변환, 매핑 (수작업 필요)
• Refine에서 할 작업
– 한글, 영문 위키피디아 URL 추가
– Dbpedia, Freebase URL 추가: Refine reconciliation을 이용해서 추가
– RDF 변환 매핑 Skelton 작업
– RDF, Excel 추출
4) 데이터 업로드 (RDF 또는 Excel)
데이터 스토어 선택
Jena, 4Store, …
OKFN Korea4
14. Modelling – vocabularies
Logical modelling
modelling the domain, not a particular
data structure
what exists
what is asserted? what can you deduce from
that?
not about constraints as such
monotonic, open world
controlled
vocabulary
taxonomy
thesaurus
ontology
Ontology
15. Modelling – vocabularies
unfamiliar terminology but related to
information architecture and conceptual
modelling
domain-driven design
... and yes knowledge representation
16. Elements of:
Vocabulary (defining terms)
• I define a relationship called “prescribed dose.”
Schema (defining types)
• “prescribed dose” relates “treatments” to “dosagee
s”
Taxonomy (defining hierarchies)
• Any “doctor” is a “medical professional”
16
RDF Schema is…
17. Modelling – RDFS
RDF vocabulary description language
classes, types and type hierarchy
ont:School rdfs:Class
rdf:type
―School‖
rdfs:label
18. Modelling – RDFS
RDF vocabulary description language
classes, types and type hierarchy
ont:WelshEstablishment
ont:School rdfs:Class
rdf:type
rdf:typerdfs:subClassOf
―School‖
rdfs:label
19. Modelling – RDFS
RDF vocabulary description language
classes, types and type hierarchy
school:401874
ont:WelshEstablishment
ont:WelshEstablishment
ont:School rdfs:Class rdf:typerdf:type
rdf:typerdfs:subClassOf
―School‖
rdfs:label
20. Modelling – RDFS
RDF vocabulary description language
classes, types and type hierarchy
school:401874
ont:WelshEstablishment
ont:WelshEstablishment
ont:School rdfs:Class rdf:typerdf:type
rdf:typerdfs:subClassOf
school:401874
ont:WelshEstablishment
ont:School
rdf:type
―School‖
rdfs:label
―School‖
rdfs:label
22. Modelling – RDFS
RDF vocabulary description language
class/property relations
domain
range
Already have power to do some vocab
ulary mapping
declare classes or properties from different vo
cabularies to be equivalent:
A rdfs:subClassOf B
B rdfs:subClassOf A
24. Elements of ontology
Same/different identity
• “author” and “auteur” are the same relation
• two resources with the same “ISBN” are the same
“book”
More expressive type definitions
• A “cycle” is a “vehicle” with at least one “wheel”
• A “bicycle” is a “cycle” with exactly two “wheels”
More expressive relation definitions
• “sibling” is a symmetric predicate
• the value of the “favorite dwarf” relation must be one of
“happy”, “sleepy”, “sneezy”, “grumpy”, “dopey”,
“bashful”, “doc”
OWL is…
24
25. Answer questions of
Consistency
• Are there any contradictions in this model?
Classification
• What are all the inferred types of this resource?
Satisfiability
• Are there any classes in this ontology that cannot p
ossibly have any members?
What can we do with OWL?
25
26. Building Useful Ontologies
Developing and maintaining quality ontolgies is very
challenging
Users need tools and services, e.g., to help check
if ontology is:
Meaningful — all named classes can have instances
http://www.aber.ac.uk/compsci/public/media/presentations/OUCL-seminar.ppt
27. Building Useful Ontologies
Developing and maintaining quality ontolgies is very
challenging
Users need tools and services, e.g., to help check
if ontology is:
Meaningful — all named classes can have instances
Correct — captures intuitions of domain experts
28. Building Useful Ontologies
Developing and maintaining quality ontolgies is very
challenging
Users need tools and services, e.g., to help check if ont
ology is:
Meaningful — all named classes can have instances
Correct — captures intuitions of domain experts
Minimally redundant — no unintended synonyms
Banana split Banana sundae
29. Modelling - OWL
richer modelling and semantics
axioms on properties
transitive, symmetric, inverseOf, ...
functional, inverse functional
equivalent property
axioms on classes
intersection, union, disjoint, equivalent
restrictions on classes
some value from, all values from, cardinality, has value,
one of, keys
axioms on individuals
same as, different from, all different
imports
30. Modelling – OWL
supports much richer modelling
consistency checking of model
consistency checking of data
some surprises if used to schema languages
open world, no unique name assumption
can extend to closed world checking
inference
classification
inferred relationships
31. Modelling
Spectrum of goals and styles
Lightweight vocabularies Rich ontological models
simple modelling
just enough agreement
to get useful work done
removing boundaries to
enable information to be
found and connected
global consistency not
possible
a little semantics goes
a long way
rich domain models
need expressivity
consistency is critical
make complex infere
nces you can rely on,
across data you trust
knowledge is power
32. Modelling
Ontology reuse
invest in complete ontology for a domain
rich but general model, may be modular inside
strong ―ontological commitment‖
e.g. medical ontologies
reuse small, common, vocabularies
FOAF, SIOC, Dublin Core, Org ...
pick and choose classes and properties you need
fill in a few missing links for your domain
generic reusable vocabularies
Data cube vocabulary
34. schema.org is one of a number of
microdata vocabularies
it is a shared collection of microdata
schemas for use by webmasters
includes a type hierarchy, like an
RDFS schema
starts with top-level Thing and DataType
types
properties are inherited by descendant types
Schema.org
34
35. annotate an item with text-valued
properties using the “itemprop”
attribute
microdata properties
35
<div itemscope>
<p>My name is <span itemprop="name">Daniel</span>.</p>
</div>
<div itemscope>
<p>Flavors in my favorite ice cream:</p>
<ul>
<li itemprop="flavor">Lemon sorbet</li>
<li itemprop="flavor">Apricot sorbet</li>
</ul>
</div>
38. maintains schema.org ↔RDF
mappings
there are mappings for BIBO, DBpedia,
Dublin Core, FOAF, GoodRelations, SIOC,
and WordNet
also provides examples, tutorials, and
data dumps
Schema.rdfs.org
38
40. Triple Store & RDB
OKFN Korea
http://blog.gniewoslaw.pl/2012/11/relational-databases-vs-triple-stores/
40
41. Storage Solutions
for RDF Data
Triple Table (Basic Idea)
Store all RDF triples in a single table
Create indexes on combinations of S, P, and O
OKFN Korea41