3. Knowledge is power
We have developed our society by/with
knowledge.
Then
How will we develop the society in the digital
era by/with knowledge?
4. Knowledge is power
Scientia est potentia.
- Sir Francis Bacon
"Pourbus Francis Bacon" by Frans Pourbus the younger - www.lazienki-krolewskie.pl. Licensed
under Public domain via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:Pourbus_Francis_Bacon.jpg#mediaviewer/File:Pourbu
s_Francis_Bacon.jpg
5. Knowledge is power in AI
• Edward Feigenbaum
– "father of expert systems“
– Knowledge is power, and the computer is an
amplifier of that power. We are now at the dawn
of a new computer revolution…
Knowledge itself is to become the
new wealth of nations.
"27. Dr. Edward A. Feigenbaum 1994-1997" by United States Air Force - United States Air Force.
Licensed under Public domain via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:27._Dr._Edward_A._Feigenbaum_1994-
1997.jpg#mediaviewer/File:27._Dr._Edward_A._Feigenbaum_1994-1997.jpg
http://www.computerhistory.org/fellowawards/hall/bios/Edward
,Feigenbaum/
6. Knowledge Acquisition Bottleneck
• How can we tell knowledge to computers?
– Knowledge Engineers & Domain Experts work together to
extract and transform knowledge good for computers. But
it is time-consuming, and always insufficient and
incomplete.
• How can we understand knowledge for computers?
– Transformed knowledge is often hard to understand.
• How can we maintain knowledge for computers?
– The real world is changing.
How to adapt it?
Who and how?
7. Knowledge Acquisition Bottleneck
• Solutions – how we can obtain knowledge
– Ontology
• Sharable, sustainable, and formal knowledge about the
world
– Learning
• Learning from the initial knowledge (supervised
learning)
• Learning from the real world (un-supervised learning)
They are still inside of the computational world. But what we’ve learnt
from the expert systems issue is the difficulty lies on the interface
between the computational world and the human society
8. Web comes
• World Wide Web creates the inforsphere that
everyone can contribute her/his information
http://www.flickr.com/photos/rorycellan/8314288381/
http://www.w3.org/2004/Talks/w3c10-HowItAllStarted
9. Semantic Web
Information Management: A Proposal
Tim Berners-Lee, CERN
March 1989, May 1990
Tim Berners-Lee, James Hendler and Ora
Lassila, "The Semantic Web", Scientific
American, May 2001, p. 29-37.
10. Semantic Web
• "The Semantic Web is an extension of
the current web in which information is
given well-defined meaning, better
enabling computers and people to work
in cooperation."
The Semantic Web, Scientific American, May 2001, Tim Berners-Lee, James Hendler
and Ora Lassila
11. Layers of Semantic Web
Tim Berners-Lee http://www.w3.org/2002/Talks/09-lcs-sweb-tbl/
12. Layers of Semantic Web
Tim Berners-Lee http://www.w3.org/2002/Talks/09-lcs-sweb-tbl/
Descriptions on classes
Descriptions on instances
Ontology
Linked Data
• Ontology
– Descriptions on classes
– RDFS, OWL
– Tasks
• Ontology building
– Consistency, comprehensiveness,
logicality
• Alignment of ontologies
13. Layers of Semantic Web
Tim Berners-Lee http://www.w3.org/2002/Talks/09-lcs-sweb-tbl/
Descriptions on classes
Descriptions on instances
Ontology
Linked Data
• Linked Data
– Descriptions on instances (individuals)
– RDF + (RDFS, OWL)
– Pros for Linked Data
• Easy to write (mainly fact description)
• Easy to link (fact to fact link)
– Cons for Linked Data
• Difficult to describe complex structures
• Still need for class description (-> ontology)
14. Linked Data Principle
• Use URIs as names for things
• Use HTTP URIs so that people can look up
those names.
• When someone looks up a URI, provide useful
information, using the standards (RDF*,
SPARQL)
• Include links to other URIs. so that they can
discover more things.
20. 570 datasets,
Last updated: 2014-08-30
Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/
20
22. LODAC (LOD for Academia) Project 2011-2016
• Collect and publish academic data as LOD
LODAC SPECIES: Linking species-related data by name
Specimen
DB
Species
Info. DB
Taxon
Name DBGBIF BioSci.
DB
Category
DB
Names: 113118
Triples:14,532,449
Data from Source BIntegrated data
dc:references dc:references
dc:references dc:references
dc:references dc:references
dc:creator
dc:creator
crm:P55_has_current_location
crm:P55_has_current_location
crm:P55_has_current_location
dc:creator
Data from Source A
Work
Museum
Creator
Minimum Data to identify entitiesRaw Data for entities Raw Data for entities
Query expansion App.
CKAN (Japanese):
Dataset registry
DBPedia Japanese
LODAC Museum: Collecting and Linking museum data
23. LODAC Museum
• Purpose
– Enable creation, publishing, sharing and reuse of collection information
distributed to each museum by introducing LOD.
– Enable to uniquely identify resources such as works, creators, and
institutions, and relations between those on the web
• Activities
– Integrate and share collection data aggregated from data sources as RDF.
– Provide applications using generated LOD.
• Data sources
– Collection data obtained from websites of 114 museums.
– The Database of Japan Arts Thesaurus
– The database of government-designated cultural property
– Cultural Heritage Online Work Creator
Institution
Resources
Over 40
millions
triples
25. Yokohama Art Spot
• provides information on art around Yokohama.
– is a good example of how such efforts by local
people can be rewarded by flexible use of the
provided data.
LODAC Museum × Yokohama Art LOD × PinQA
Museum Collection Local Event Information Q&A
ical:location
RDF
store
SPARQL
endpoint
LODAC Museum
OWLIM SE
artwork
institution
creator
User Yokohama Art Spot
HTML
JavaScript
Python
SPARQLWrapper
RDF
store
SPARQL
endpoint
Yokohama Art LOD
ARC2
RDF
store
SPARQL
endpoint
PinQA
event
question
institution
creator
answer
user
F. Matsumura, I. Kobayashi, F. Kato, T. Kamura, I. Ohmukai and H.Takeda:Producing and
Consuming Linked Open Data on Art with a Local Community, J. F. Sequeda, A. Harth and O.
Hartig eds., Proceedings of the Third International Workshop on Consuming Linked Data
(COLD 2012) (2012), CEUR Workshop Proceedings Vol-905.
[COLD12]
27. LODAC Species: Interlinking species data
• Taxon names: 443,248
• Scientific name: 226,141
• Common name: 219,865
• hasScientificName property
node: 87,160
• hasCommonName property
node: 84,610
Y. Minami, H. Takeda1, F. Kato, I. Ohmukai, N. Arai, U. Jinbo, M. Ito, S.
Kobayashi and S. Kawamoto: Towards a Data Hub for Biodiversity with
LOD, H. Takeda, Y. Qu, R. Mizoguchi and Y. Kitamura eds., Semantic
Technology - Second Joint International Conference, JIST 2012, Nara,
Japan, December 2-4, 2012. Proceedings, Vol 7774 ofLNCS, pp 356–
361, Springer (2013).
• Integrating species databases as linked data
[JIST12]Specimen
rdf:type
species
institutionName
collectedDate
collectionLocality
crm:has_current_location
Bryophytes
TaxonName
ScientificName
CommonName TaxonRank
species
rdfs:subClassOf
rdfs:subClassOf
rdf:type
rdf:type
hasCommonName
hasScientificName hasSuperTaxon
rdf:type
hasTaxonRank
rdf:type
hasTaxonRank
rdf:type
Butterfly
BDLS
dcterms:source
dcterms:publisher
: Named Graph
: owl:Class
Named Graph for
the data sources
28.
29. An Application:
Query expansion for paper search
Input species
name
Papers include
species name
Papers include same genus
species
Papers include
common name
40. Our Society (real world)
Computational World
We’ve just dealt with knowledge fitted to the computational world
41. Three challenges to fill the gap
• Representation of Scientific Names
– Knowledge revision
• Agriculture Ontology
– Integration of domain specific terms
• Core Vocabulary
– Integration of terms across domains
43. Dynamics of Scientific Name
• Scientific name looks unique, but more precisely
unique as long as the current knowledge
– Scientific name changes in time according to new
scientific discovery
– Information on species is described with names in
some time (not always now)
• How to represent information with knowledge
revision?
44. Northern Oriole
These birds are found in the Nearctic in
summer, primarily the eastern United
States.
44
Challenge
52. Event-Centric Model for Taxon Revision
- case: merge of two families -
• At time t1, Buidae is merged into Audiae.
ltk:Taxon
Merger
ltk:Change
HigherTaxon
ex:merge1 ex:reclass1
ex:event1
rdf:type rdf:type
cka:interval
“t1”
“t2”
tl:beginsAt
DateTime
tl:endsAt
DateTime
cka:effect
ex:Auidae_1
ex:Buidae_1
ex:Auidae_2
ex:Xus_1
(OPR) (OPR)
(opr)(opr)
(con)
(con)
(con)
(con)
(event)
Event-Centric Model
Different URIs
URI
URI
URI
URI
URI : URI for taxon concept
Taxon concept = Taxon + Synonym
54. Generating simpler descriptions
- From Event-centric model to Snapshot model -
• Just show the current names
ltk:Change
HigherTaxon
ex:reclass1
rdf:type
cka:Relationship
Evolution
rdfs:subClassOf
ltk:higherTaxon
cka:relation
ltk:higher
Taxon
ex:event1
cka:interval
“t1”
“t2”
tl:beginsAt
DateTime
tl:endsAt
DateTime
ex:Auidae_2
ex:Xus_1
ex:Xus_1
ex:Buidae_1
ex:Auidae_2
cka:assures
(OPR)
(opr)
(event)
(con)
(con)
(con)
(con)
(con)
rule
Event-Centric Model Snapshot Model
ex:inv1
ex:inv1
“t1” “t2”
tl:endsAt
DateTime
tl:beginsAt
DateTime
(the name of the graph)
(named graph)
URI
URI
URI
URI
URI
59. Standardization of Agricultural Activities
Background
Issues
Purpose
Agricultural IT systems are widely adopted to manage and record activities
in the fields efficiently. Interoperability among these systems is needed to
integrate and analyze such records to improve productivity of agriculture.
To provide the standard vocabulary by defining the ontology for agricultural
activity
Data in agricultural IT systems is
not easy to federate and integrate
due to the variety of the languages
It prevents federation and
integration of these systems and
their data.
http://www.toukei.maff.go.jp/dijest/kome/kome05/kome05.html
しろかき
“Puddling”
砕土
“Pulverization”
代かき
“Puddling”
代掻き
“Puddling”
代掻き作業
“Puddling Activity”
荒代(かじり)
“Coarse pudding”
荒代かき
“Coarse pudding”
整地
“Land grading”
均平化
“land leveling”
60. AGROVOC
Thesaurus
AGROVOC organizes words by synonym, narrower/broader, and related
relationship.
harvesting topping(beets)
baling
gleaning
mechanical harvesting
mowing
AGROVOC
. . .
Narrower/broader relationship
is not clearly defined. So
relationship among bother
words are often mixed and
misunderstood.
relationship
between
siblings
AGROVOC is the most well-known vocabulary in agriculture supervised by
Food and Agriculture Organization(FAO) and the thesaurus containing
more than 32,000 terms of agriculture, fisheries, food, environment and
other related fields.
The number of activity names about rice farming, which is important in
Asia including Japan, are insufficient.
61. Lessons learnt – What should be considered
Define hierarchy clearly
Accept various synonymous words
Hierarchy is convenient for human to understand and for computers to
process. But it often be confused by mixing different criteria on relationship
among concepts/words. It causes difficulty when adding new concepts/words
and when integrating different hierarchies.
Names for a single concept may be multiple by region and by crop
Define relationship clearly between upper
and lower concepts as basis of classification
Clarify an entry word and their synonyms for each concept
harvesting topping(beets)
baling
gleaning
mechanical harvesting
mowing
Thesaurus
(AGROVOC)
. . .
harvesting mechanical harvesting
manual harvesting
[means]. . .
Harvest Harvest
Harvest
Inherit
byMachine
manually
+
+
relationship
between
siblings
Representation: ”Harvesting”
[means][Act]
Ontology!
62. Define activity concepts
Define hierarchy
Seeding:
activity to sow seeds on fields for seed propagation.
Purpose: seed propagation
Place : field
Target : seed
Act : sow
“Seeding”
Define activities with
properties and their values
The hierarchy of activities is organized by property
- New properties and their values are added
- “purpose”, “act”, “target”, “place”, “means” , “equipment”, “season”,
and “crop” in order.
- Property values are specialized
Seeding
property value
Designing of Agricultural Activity Ontology(AAO)
63. Formalization by Description Logics
Crop production activity
Crop growth activity
purpose:crop production
purpose:crop growth
Agricultural activity
Activity for control of
propagation
Activity for seed
propagation
purpose:control of propagation
purpose:seed propagation
Seeding
act : sow
target:seed
place:field
Activity for seed
propagation
Seeding
Designing of Agricultural Activity Ontology(AAO)
64. Differentiate concepts by property
purpose : seed propagation
place : paddy field
target : seed
act : sow
crop:rice
purpose : seed propagation purpose : seed propagation
place : field
target : seed
act : sow
Agricultural activity >…> Activity for seed propagation > Seeding
purpose : seed propagation
place : well-drained paddy field
target : seed
act : sow
crop:rice
Direct sowing of rice on well-drained paddy field Direct seeding in flooded paddy field
Well-drained paddy field < field paddy field < field
Designing of Agricultural Activity Ontology(AAO)
65. Activity for seeding Direct seeding in flooded paddy field
Direct sowing of rice on well-drained paddy field
Seeding on nursery box
The Structuralizaion of the Agricultural Activities (Protégé)
Designing of Agricultural Activity Ontology(AAO)
66. Polysemic concepts
[disjunction form]
[conjunction form]
Pudlling
Subsoil breaking
PulverizationLand preparation
Water retention
Activity for water
management
Land leveling
Polysemic
relationship
Pulverization by
harrow
purpose : pulverization
purpose : water retention
purpose : land leveling
Definition of agriculture activities with multiple purposes or other
properties.
Puddling
Designing of Agricultural Activity Ontology(AAO)
67. Water retention
Land leveling Pulverization
Puddling
Polysemic concepts (Protégé)
Designing of Agricultural Activity Ontology(AAO)
68. Reasoning by Ontology
Reasoning by Agriculture Activity Ontology
Activity for
biotic control
Activity for
suppression of
pest animals
Activity for
suppression of pest
animals by physical
means
control of
pest animals
Physical
means
means
(0,1)
purpose
(0,1)
Biotic control
purpose(0,1)
Activity for
suppression of pest
animals by chemical
means
Chemical
means
purpose
(0,1)
means
(0,1)
Making
scarecrow‘
suppression
of pest
animals
Purpose
(0,1)
build
act
(0,1)
scarecrow
target
(0,1)
Physical
means
Means
(0,1)
? Example of「Making scarecrow」
?
suppression
of pest
animals
Infer the most feasible upper concept for the given constraints for a new words
69. Reasoning by Ontology
かかし作り
物理的手段
means
(0,1)
means
(0,1)
Inference with SWCLOS
[1] Seiji Koide, Theory and Implementation of Object Oriented Semantic Web Language,
PhD Thesis, Graduate University for Advance Studies, 2011
[1]
[1]
Activity for
biotic control
Activity for
suppression of
pest animals
Activity for
suppression of pest
animals by physical
means
control of
pest animals
Physical
means
means
(0,1)
purpose
(0,1)
Biotic control
purpose(0,1)
suppression
of pest
animals
Activity for
suppression of pest
animals by chemical
means
Chemical
means
purpose
(0,1)
means
(0,1)
Making
scarecrow
make
act
(0,1)
scarecrow
target
(0,1)
Infer the most feasible upper concept for the given constraints for a new words
Reasoning by Agriculture Activity Ontology
Making scarecrow is a subclass of Activity for
suppression of pest animals by physical means
71. Web Services based on Agriculture Activity Ontology
Converting synonyms to core vocabulary
http://www.tanbo-kubota.co.jp/foods/watching/14_2.html
“Puddling Activity”
“sowing”
…
AAO
Puddling
Seeding
…
Converting
[system]
API
Puddling Activity
and sowing…
[system’]
Puddling
and seeding…
72. http://cavoc.org/
Common Agricultural VOCabulary
Agriculture Activity Ontology (AAO) ver 1.31
http://cavoc.org/aao/
Agriculture Activity Ontology(AAO): Summary
• Standardize the vocabulary for agricultural activities with the logical
model
• Define concepts of agriculture activities beyond
• Conceptual variety (often dependent to crop and farm style)
• Linguistic diversity (often dependent to crop and area)
• adopted as the part of ”the guideline for agriculture activity names
for agriculture IT systems” issued by Ministry of Agriculture, Forestry
and Fisheries (MAFF), Japan in 2016,
77. Information needed to register new cooperation
Managed by multiple agencies
Different formats
Lack of linkage
78. Local
Government User
User
Company
Company
Local
Government
Government
Company
Product
Name
Code Maker Buyer
Name
Organization Product
Name Address Name Code
Product
Nmae
Product
Code
Price Purcha
se
Date
Maker
Public Vocabulary Framework project
- Infrastructure for Multilayer Interoperability (IMI) -
• Sharing terms
– among administration units
– among administration unites and companies
– among administration units, companies and users
79. Public Vocabulary Framework project
- Infrastructure for Multilayer Interoperability (IMI) -
• A framework that enables exchange of data by sharing primary
vocabulary.
– Provide basic common concepts
• A core and domains
• Extensible vocabulary (application vocabularies)
– For Open data and data exchanges between systems
• RDF, XML, and texts
82Citizen ID Enterprise ID Character-set
Vocabulary
Share, Exchange, Storage
(Format)
Applications
IMI
80. Vocabulary structure of IMI
• IMI consists of core vocabulary, cross domain vocabulary and
domain-specific vocabularies.
Core
Vocabulary
Domain-specific Vocabularies
Vocabularies that are specialised for
the use in each domain.
Eg) number of beds, Schedule.
Shelter
Location
Hospital
Station
Disaster
Restoration
Cost
Core Vocabulary
Universal vocabularies that are widely used
in any domain.
Eg) people, object, place, date.
Geographical Space
/Facilities
Transportation
Disaster
Prevention
Finance
Domain-specific
Vocabularies
81.
82. Image of IMI vocabulary
• Vocabulary set and Information Exchange
Package are defined in trial area.
85
項目名 英語名 データタイプ 項目説明 項目説明(英語) キーワード サンプル値 Usage Info
人 PersonType
氏名 PersonName PersonNameType 氏名 Name of a Person -
性別 Gender
<abstract element,
no type>
性別 Gender of a Person -
Substitutable
Elements:
性別コード GenderCode CodeType 性別のコード Gender of a Person 1
APPLIC標準仕様V2.3
データ一覧
住民基本台帳:性別
引用
性別名 GenderText TextType 性別 Gender of a Person 男
現住所 PresentAddr
ess
AddressType 現住所 -
本籍 AddressType 本籍 -
… … … … … … … … …
… … … … … … … … …
項目名(Type/Sub-properties) 英語名 データタイプ …
氏名 PersonNameType
氏名 FullName TextType
フリガナ TextType
姓 FamilyName TextType
カナ姓 TextType
… … …
AED
Location
Address
LocationTwoDimensional
GeographicCoordinate
Equipment
Information
Spot of
Equipment
Business Hours
Owner
Access
Availability
User
Day of
Installation
Homepage
AED
Information
Type of Pad
Expiry date
Contact
Type
Model Number
SerialNumber
Photo
Note
Information
Source
Sample 1 : Definition of vocabulary
Sample 2 : Information Exchange Package
83. Adaptation by (local) Governments
• Ministry of Economics, Trade, and Industries (METI):
Corporate Information Portal
• Local Governments:
– Mori Town, Yakumo Town [Hokkaido]
– Hirono Town, [Iwate]
– Ishinomaki [Miyagi]
– Ota City [Gunma]
– Kawaguchi City [Saitama]
– Kanazawa-Ward (Yokohama City) [Kanagawa]
– Shizuoka City [Shizuoka]
– Tsuruga City [Fukui]
– Osaka City [Osaka]
– Oku-izumo Town, Yasugi City [Shimane]
– Tokushima Pref., Awa City [Tokushima]
– Ube City [Yamaguchi]
– …
84. Corporate Information portal website
Corporate number
Corporate Name
Corporation Type
Area
Resource
Search
Government
Registers
Applications
Gather the data
by using IMI based data structure
Corporation
85. Benefit of the website
CSV
PDF
RDF
Open Data
Other websites
New
ServicesAPI
Knowledge base
for all government department
86. Adaption by Corporate Information Portal
• This website uses the IMI core vocabulary that is national standard vocabulary
project for interoperability.
• The IMI define basic data items. (Name, Address, Corporation, Facility, - - - )
• corporateBusinessinfo
• corporateActivityInfo
hj:Corporate information
Type
• name(en)
• codeOfIndustry
• objectiveOfBusiness
• abstractOfBusiness
• areaOfBusiness
• stakeholder
• majorStockHolder
• financialInformation
• ・・・
hj:Corporate business
information Type
• adressNumber
hj:Address Type
• noOfStock
• holder
• ratio
hj:Stock holder Type
• ・・・
hj:Subsidy Type
• ・・・
hj:Award Type
• ・・・
hj:Certification Type
• ・・・
hj:Contact Type
• typeOfNote
• memo
hj:Note Type
• positionOfOrgtype
• organizationType
• capiltal
• noOfEmployee
ic:Corporation
Type
• ・・・
ic: Address Type
• dateOfCertification
• title
• category
• block
• area
• type
hj:Corporate activity Type
• target
• reason
• amount
• status
• period
• note
IMI
Core
Vocabulary
Corporate Information
Domain Vocabulary• ID
• name
• abbreviation
• alternativeName
• status
• abstract
• contactInformation
• relatedOrganization
• place
• address
• representative
• dateOfEstablishment
• additionalInformation
ic:Organization Type
• businessDomain
• startDateOfFy
• noOfMember
• agent
ic:Business unit Type
enhance
refer
87. Public Vocabulary Framework project
- Infrastructure for Multilayer Interoperability (IMI) -
• Towards interoperability beyond regions
– Community of Practice on Core Data Models
• Sharing good practice
• Mapping between core vocabularies
• DG Informatics (EC)
• IMI (Japan)
• NIEM (USA)
NIEM
ISA
JoinUp
UN
CEFACT
IMI
90. Our Society (real world)
Computational World
New Technical development
Challenge #1
91. Our Society (real world)
Computational World
Forming new knowledge
Challenge #2
92. Our Society (real world)
Computational World
Forming Structure in Society
Challenge #3
93. Lessons learnt from the challenges
The challenges are
not just in the computational world
rather
between the computational and the real worlds
even
in the real world
We must be socio-computer scientists
94. Summary
Semantic Web created the first step for
knowledge representation in the computer
world
But the computational world alone is not
enough. We should commit (or even change)
both the computational and real world to real
“knowledge is power” world. In order to do so,
we must work with people in our society.
However, the linked taxonomic name is not enough. The synonym name may lead to the incorrect knowledge if a reader doesn’t know the background knowledge of that synonym. Sometimes the synonym comes from a change of taxonomic classification.
For example, a case study of species Icterus galbula (Linnaeus, 1758) and Icterus bullockii (Swainson, 1827) that merged and split many times.
Icterus galbula has been flound since 1758
Icterus bullockii has been flound since 1827
the Baltimore Oriole, of the eastern United States
and Bullock’s Oriole, living in the western US,
were merged as one species, the Northern Oriole in the 1960s.
The merge was based on the fact they interbreed often and produce fertile offspring.
Because the name “galbula ” is the former name, it becomes an accepted name.
So, these name are synonym.
I galbula is a senior synonym whereas I. bullockii is a jounior synonym.
Of course, knowledge of these name must be combined together.
Moreover, after this day, if some researchers discovered new knowledge of this bird, they would record the new information a long with this name.
A problem with this lump was that Bullock’s Orioles are more closely related to other species of orioles rather than
Baltimore Orioles.
So, they split this name again to be “galbula” and “bullockii”.
If we need to find information of the “galbula”, we can query by this name.
However, some information from year 1960 include knowledge of “bullockii”.
In the other hand,
Some information about “bullockii” are missing, because some knowledge between 1960 and 1995 are recorded with the name “galbula”.
Therefore, the correct temporal context of concepts and reasons of their changes becomes necessity for understanding a taxon concept as well.