SlideShare ist ein Scribd-Unternehmen logo
1 von 40
2013 Linked Data in Practice Workshop (LDPW2013) , 30 November, 2013

Building DBpedia Japanese and
Linked Data Cloud in Japanese

Fumihiro Kato, Hideaki Takeda, Seiji Koide, Ikki Ohmukai
{fumi, takeda, koide, i2k}@nii.ac.jp
National Institute of Informatics (NII)
Research Organization of Information and Systems (ROIS)
Graduate University for Advanced Studies (Sokendai)
Two Driving Forces to push LOD in Japan
• LOD for ACademia (LODAC) Project since 2010
– A research project in ROIS and NII
– Research on Linked Data for research

• Linked Open Data Initiative Inc., (LODI) since 2012
– Non Profit Organization
– Promotion of LOD in Japan
– Collaboration with various stakeholders
• Government, Public sectors, companies

• Members of two forces are mostly overlapped
LODAC Location:
Integration of location information

LODAC Project
- connecting academic data LODAC SPECIES: Connecting species data by name
Specimen

DB

Species
Info. DB

App. for query expansion

DBPedia Japanese

Research

GBIF

Taxon
Name DB

DB

BioSci.
No. of Names:
113118
No. of Triples:14,532,449 DB

LODAC Museum: LOD of data in museums
Raw Data for entities Minimum Data to identify entities Data for entities
Raw
Data from Source A
Integrated data
Data from Source B
Work
dc:references
dc:references
crm:P55_has_current_location
crm:P55_has_current_location dc:creator
dc:creator
dc:creator
Museum crm:P55_has_current_location
dc:references
dc:references
Creator
dc:references

dc:references

CKAN Japanese:
Catalog for Open Data
LODAC Museum
• Integrated database for information on
museums in Japan
Type of Information

– Data
• No. of museums:114
• No. of triples:
40,059,131

RDF type

No. of items

Collections (total)

lodac:Specimen +
lodac:Work

ca. 1,770,000

Collections (specimen)

lodac:Specimen

ca. 1,690,000

Collections (creative and
historical work)

lodac:Work

ca. 130,000

Creators

foaf:Person

ca.

Institutes

Foaf:Organization

ca. 200,000

• Integration by creator, work and institute
• Data publication by RDF
• Some applications using the data

8,800
Use

Yokohama Art Spot

LODAC Museum × Yokohama Art LOD

– Application using
museum and local data
– Data related to art in
Yokohama
• Collections
• Events
• Q&A
http://lod.ac/apps/yas/

× PinQA
LODAC SPECIES: Linking Species
Information with names
Museum
Specimen
DB

Species
Info. DB
Research
DB

GBIF

Taxon Name
LOD
BioSci.
DB

No. of Species Names:113118
No. of Triples:14,532,449
Search application
with LODAC SPECIES

http://lod.ac/apps/lsdcs
Specified Non-profit Corporation

Linked Open Data Initiative, Inc.
Prospectus
• LOD is becoming an infrastructure of our society
– Similar to the impact to our society by Web
– LOD help maturity and diversity of our society

• We wish to diffuse LOD more in Japan !
– For Governments (Central and Local)
– For Companies
– For Citizens

• How?
– By Researchers, Engineers, Citizens together
Projects
• Platforms
– CKAN Japanese
– DBpedia Japanese

• Collaborative Projects
– with Ministry of Industry, Trade, and Economics (METI)
• Open Data METI

– with National Statistics Center
• Scheme Design for Area Code

– Collaboration with Sabae City
• e.g., “Sabae Burari”

• Promotional Events
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
provided by NDL
Motivation
• Data hub for Japanese resources
– To promote LOD in Japan
– To connect datasets in Japanese

• Two linguistic datasets
– DBpedia Japanese
– RDFized Japanese WordNet
DBpedia Japanese
• DBpedia i18n project
– 14 chapters

• generated from Japanese
Wikipedia dump files
– DIEF (DBpedia Information
Extraction Framework)
– ~80m triples

• Linking to
– Japanese WordNet
– Japanese Wikipedia Ontology
– other DBpedia chapters

• http://ja.dbpedia.org
i18n/l10n efforts
• IRI, IRI, IRI, ...
• Configurations for Extractors and Parsers
• DBpedia Mappings for each chapter
Extraction process

ref: D. Kontokostas et al. "Internationalization of Linked Data. The case of the Greek DBpedia edition."
Journal of Web Semantics: Science, Services and Agents on the World Wide Web, vol. 15, No.3, Sep. 2012, pp.51-61
DBpedia Information Extraction Framework
• Software to extract data from Wikipedia dump
– including custom extractors/parsers to apply
language specific configurations

• Extractors / Parsers
– DisambiguationExtractor
– HomepageExtractor
– ImageExtractor
– PersondataExtractor
DisambiguationExtractor
• "ja" -> "(曖昧さ回避)"
HomepageExtractor
• propertyNamesMap
– "ja" -> Set("homepage", "website", "
", "
", "Web サイト",
"Webサイト")

• externalLinkSectionsMap
– "ja" -> "外部リンク"

• officialMap
– "ja" -> "公式"
ImageExtractor
• "ja" -> """(?i){{s?(Non free|Non-free
pubart)s?}}""".r
PersondataExtractor
•
•
•
•
•

Names of templates for personal information
“名前”(name)
“別名”(alias)
“概要”(abstract)
dates and places for birth and death
Extracted triples after configurations
Type

Triples

disambiguation

106,386

homepages

49,355

images

843,170

persondata

1,811
Image of Infobox Extraction
Template

Mapping Infobox to ontology
Data Extraction

used for
extraction process
{{TemplateMapping
| mapToClass = ComicsCreator
| mappings =
{{PropertyMapping | templateProperty = 名前 | ontologyProperty = foaf:name }}
{{PropertyMapping | templateProperty = 本名 | ontologyProperty = foaf:name }}
{{PropertyMapping | templateProperty = 生年 | ontologyProperty = birthYear }}
{{PropertyMapping | templateProperty = 生地 | ontologyProperty = birthPlace }}
{{PropertyMapping | templateProperty = 没年 | ontologyProperty = deathYear }}
{{PropertyMapping | templateProperty = 没地 | ontologyProperty = deathPlace }}
{{PropertyMapping | templateProperty = 国籍 | ontologyProperty = nationality }}
{{PropertyMapping | templateProperty = 受賞 | ontologyProperty = award }}
{{PropertyMapping | templateProperty = 公式サイト | ontologyProperty = foaf:homepage }}
{{PropertyMapping | templateProperty = 画像 | ontologyProperty = foaf:depiction }}
{{PropertyMapping | templateProperty = ジャンル | ontologyProperty = genre }}
{{PropertyMapping | templateProperty = 画像サイズ | ontologyProperty = imageSize }}
{{PropertyMapping | templateProperty = 職業 | ontologyProperty = occupation }}
{{PropertyMapping | templateProperty = 代表作 | ontologyProperty = notableWork }}
}}
Statistics for DBpedia Mappings
DBpedia Japanese

DBpeida (English)

rate of all templates in
Wikipedia are mapped

4.67% (81 of 1733)

6.33% (369 of 5,826)

rate of all properties in
Wikipedia are mapped

2.47% (1,581 of 62,679)

3.47% (6,169 of 177,599)

rate of all template
occurrences Wikipedia are
mapped

47.99% (286,858 of
597,696)

82.24% (2,435,773 of
2,728,357)

rate of all property
occurrences Wikipedia are
mapped

38.75% (3,128,208 of
8,071,982)

54.95% (27,283,343 of
49,654,072)
"Mapping Party"
• The mapping task is not easy
– Wikipedia Template
– DBpedia Ontology
– Well known vocabularies

• We held hands-on sessions
– Aug. 2012: 10 people
– Mar. 2013: 25 people
DBpedia Publishing Architecture
URI case
URI

decode URI
for users
URI
URI
IRI case
IRI

IRI to URI

IRI
IRI
IRI issues
IRI

2. Input URIs
must be
decoded to IRIs

IRI to URI
3. Some
serializations can
not use IRIs

4. don't decode IRI

IRI

1. IRIs have to
be used properly
in queries

IRI

5. use the latest version
Query: Notable comics written by comics creators who have
received the Tezuka Osamu Cultural Prize
PREFIX dbp: <http://ja.dbpedia.org/resource/>
PREFIX dbp-owl: <http://dbpedia.org/ontology/>
SELECT ?creatorName ?comicName
WHERE {
?creator a dbp-owl:ComicsCreator ; dbp-owl:award dbp:手塚治虫文化賞 ;
dbp-owl:notableWork ?comic ; rdfs:label ?creatorName .
?comic a dbp-owl:Comics ; rdfs:label ?comicName .
}

dbp-owl:Comics

サイボーグ009
rdfs:label

rdf:type

dbp-owl:AdministrativeRegion
dbp:サイボーグ009

rdf:type

dbp-owl:
ComicsCreator

dbp-owl:notableWork
rdfs:label

dbp:宮城県

rdf:type

dbp-owl:birthPlace
dbp:石ノ森章太
郎

宮城県

rdf:type

foaf:Person

dbp-owl:leaderName
dbp-prop:生年

rdfs:label

dbp-owl:award

dbp:村井嘉浩
1938

石ノ森章太郎

dbp:手塚治虫
文化賞
Japanese Linked Data Cloud
• 21 datasets
• Criteria
– providing more than 1000
triples
– providing either
dereference, data dump or
SPARQL Endpoint
– including Japanese labels
– linking to other datasets in
LOD cloud or JLDC

• Open license is not
mandatory
JLDC with LOD cloud criteria

21 → 9
Links to/from Japanese WordNet
links

WN nouns

DBpedia
IRIs

WN to
DBpedia

DBpedia to
WN

resources

33,017

65,788

1,456,158

50.1%

2.3%

properties

1,245

65,788

16,020

1.9%

7.8%
Ongoing Work
• More Wikipedia entries and infoboxes
– Wikipedia Town

• More DBpedia mappings
– Mapping Party

• Parsers for Japanese
– Japanese Calendar: 慶応3年1月2日 =>
"1868-01-02"^^xsd:date
Summary
• Linked Data in Japan is steadily expanding
– Started by the research project
– Now extended to various areas

• Creating a local chapter of DBpedia is a key to
promote Linked Data in the local language
– A hub in the local language
– People in any areas can find connections in
DBpedia with their data

• Promotion of open license is still in progress

Weitere ähnliche Inhalte

Was ist angesagt?

Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
jexp
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
Richard Cyganiak
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
Trey Grainger
 

Was ist angesagt? (12)

Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
 
Searching for Meaning
Searching for MeaningSearching for Meaning
Searching for Meaning
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 
Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative Space
 
The Future of Search and AI
The Future of Search and AIThe Future of Search and AI
The Future of Search and AI
 
Vital.AI Creating Intelligent Apps
Vital.AI Creating Intelligent AppsVital.AI Creating Intelligent Apps
Vital.AI Creating Intelligent Apps
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Semtech2006
Semtech2006Semtech2006
Semtech2006
 
Hadoop and Neo4j: A Winning Combination for Bioinformatics
Hadoop and Neo4j: A Winning Combination for BioinformaticsHadoop and Neo4j: A Winning Combination for Bioinformatics
Hadoop and Neo4j: A Winning Combination for Bioinformatics
 
Development of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemDevelopment of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management System
 
Mastering the variety dimension of Big Data with semantic technologies: high ...
Mastering the variety dimension of Big Data with semantic technologies: high ...Mastering the variety dimension of Big Data with semantic technologies: high ...
Mastering the variety dimension of Big Data with semantic technologies: high ...
 

Ähnlich wie Building DBpedia Japanese and Linked Data Cloud in Japanese

Ähnlich wie Building DBpedia Japanese and Linked Data Cloud in Japanese (20)

Breaking Down Walls in Enterprise with Social Semantics
Breaking Down Walls in Enterprise with Social SemanticsBreaking Down Walls in Enterprise with Social Semantics
Breaking Down Walls in Enterprise with Social Semantics
 
General Introduction for Semantic Web and Linked Open Data
General Introduction for Semantic Web and Linked Open DataGeneral Introduction for Semantic Web and Linked Open Data
General Introduction for Semantic Web and Linked Open Data
 
The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...
 
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
 
Semantics and Machine Learning
Semantics and Machine LearningSemantics and Machine Learning
Semantics and Machine Learning
 
Facets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data ExplorationFacets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data Exploration
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databases
 
Ontologies & linked open data
Ontologies & linked open dataOntologies & linked open data
Ontologies & linked open data
 
Searching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data WorldSearching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data World
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge Graph
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch Seminar
 
The Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphThe Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge Graph
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic Web
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
 
A FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning ModelsA FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning Models
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
Complex Networks: Science, Programming, and Databases
Complex Networks: Science, Programming, and DatabasesComplex Networks: Science, Programming, and Databases
Complex Networks: Science, Programming, and Databases
 

Mehr von National Institute of Informatics (NII)

Mehr von National Institute of Informatics (NII) (20)

趙簡単LOD入門 〜デジタル庁をデジタル化する〜 (改訂版)
趙簡単LOD入門 〜デジタル庁をデジタル化する〜 (改訂版)趙簡単LOD入門 〜デジタル庁をデジタル化する〜 (改訂版)
趙簡単LOD入門 〜デジタル庁をデジタル化する〜 (改訂版)
 
趙簡単LOD入門 〜デジタル庁をデジタル化する〜
趙簡単LOD入門 〜デジタル庁をデジタル化する〜趙簡単LOD入門 〜デジタル庁をデジタル化する〜
趙簡単LOD入門 〜デジタル庁をデジタル化する〜
 
"分人"型社会とAI
"分人"型社会とAI"分人"型社会とAI
"分人"型社会とAI
 
セマンティックWeb技術を用いた農業分野の標準語彙の構築
セマンティックWeb技術を用いた農業分野の標準語彙の構築セマンティックWeb技術を用いた農業分野の標準語彙の構築
セマンティックWeb技術を用いた農業分野の標準語彙の構築
 
研究オープンデータにおける大学と研究者の役割
研究オープンデータにおける大学と研究者の役割研究オープンデータにおける大学と研究者の役割
研究オープンデータにおける大学と研究者の役割
 
NII研究100連発 ウェブと人工知能の融合 -人間の創造性を刺激するコンピュータ
NII研究100連発 ウェブと人工知能の融合 -人間の創造性を刺激するコンピュータ NII研究100連発 ウェブと人工知能の融合 -人間の創造性を刺激するコンピュータ
NII研究100連発 ウェブと人工知能の融合 -人間の創造性を刺激するコンピュータ
 
Presenting and Preserving the Change in Taxonomic Knowledge for Linked Data
Presenting and Preserving the Change in Taxonomic Knowledge for Linked DataPresenting and Preserving the Change in Taxonomic Knowledge for Linked Data
Presenting and Preserving the Change in Taxonomic Knowledge for Linked Data
 
Crop vocabulary (CVO): Core vocabulary of crop names
Crop vocabulary (CVO): Core vocabulary of crop namesCrop vocabulary (CVO): Core vocabulary of crop names
Crop vocabulary (CVO): Core vocabulary of crop names
 
ORCIDとオープンサイエンス
ORCIDとオープンサイエンスORCIDとオープンサイエンス
ORCIDとオープンサイエンス
 
How to build ontologies - a case study of Agriculture Activity Ontology
How to build ontologies - a case study of Agriculture Activity OntologyHow to build ontologies - a case study of Agriculture Activity Ontology
How to build ontologies - a case study of Agriculture Activity Ontology
 
LODとオープンデータ (DBpediaとIMIの周辺を中心に)
LODとオープンデータ(DBpediaとIMIの周辺を中心に)LODとオープンデータ(DBpediaとIMIの周辺を中心に)
LODとオープンデータ (DBpediaとIMIの周辺を中心に)
 
共通語彙の構築の基本的な考え方と方法 〜研究データのために語彙・スキーマを作るには〜
共通語彙の構築の基本的な考え方と方法 〜研究データのために語彙・スキーマを作るには〜共通語彙の構築の基本的な考え方と方法 〜研究データのために語彙・スキーマを作るには〜
共通語彙の構築の基本的な考え方と方法 〜研究データのために語彙・スキーマを作るには〜
 
Working with Global Infrastructure at a National Level
Working with Global Infrastructure at a National LevelWorking with Global Infrastructure at a National Level
Working with Global Infrastructure at a National Level
 
Activities of JaLC as a national service
Activities of JaLC as a national serviceActivities of JaLC as a national service
Activities of JaLC as a national service
 
Development and Application of Agriculture Ontologies
Development and Application of Agriculture Ontologies Development and Application of Agriculture Ontologies
Development and Application of Agriculture Ontologies
 
Design Process of Agriculture Ontologies
Design Process of Agriculture OntologiesDesign Process of Agriculture Ontologies
Design Process of Agriculture Ontologies
 
AIの未来 ~技術と社会の関係のダイナミクス~
AIの未来~技術と社会の関係のダイナミクス~AIの未来~技術と社会の関係のダイナミクス~
AIの未来 ~技術と社会の関係のダイナミクス~
 
Towards Knowledge-Enabled Society
Towards Knowledge-Enabled SocietyTowards Knowledge-Enabled Society
Towards Knowledge-Enabled Society
 
研究データ利活用に関する国内活動及び国際動向について
研究データ利活用に関する国内活動及び国際動向について研究データ利活用に関する国内活動及び国際動向について
研究データ利活用に関する国内活動及び国際動向について
 
オープンサイエンスとオープンデータ
オープンサイエンスとオープンデータオープンサイエンスとオープンデータ
オープンサイエンスとオープンデータ
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Building DBpedia Japanese and Linked Data Cloud in Japanese

  • 1. 2013 Linked Data in Practice Workshop (LDPW2013) , 30 November, 2013 Building DBpedia Japanese and Linked Data Cloud in Japanese Fumihiro Kato, Hideaki Takeda, Seiji Koide, Ikki Ohmukai {fumi, takeda, koide, i2k}@nii.ac.jp National Institute of Informatics (NII) Research Organization of Information and Systems (ROIS) Graduate University for Advanced Studies (Sokendai)
  • 2. Two Driving Forces to push LOD in Japan • LOD for ACademia (LODAC) Project since 2010 – A research project in ROIS and NII – Research on Linked Data for research • Linked Open Data Initiative Inc., (LODI) since 2012 – Non Profit Organization – Promotion of LOD in Japan – Collaboration with various stakeholders • Government, Public sectors, companies • Members of two forces are mostly overlapped
  • 3. LODAC Location: Integration of location information LODAC Project - connecting academic data LODAC SPECIES: Connecting species data by name Specimen DB Species Info. DB App. for query expansion DBPedia Japanese Research GBIF Taxon Name DB DB BioSci. No. of Names: 113118 No. of Triples:14,532,449 DB LODAC Museum: LOD of data in museums Raw Data for entities Minimum Data to identify entities Data for entities Raw Data from Source A Integrated data Data from Source B Work dc:references dc:references crm:P55_has_current_location crm:P55_has_current_location dc:creator dc:creator dc:creator Museum crm:P55_has_current_location dc:references dc:references Creator dc:references dc:references CKAN Japanese: Catalog for Open Data
  • 4. LODAC Museum • Integrated database for information on museums in Japan Type of Information – Data • No. of museums:114 • No. of triples: 40,059,131 RDF type No. of items Collections (total) lodac:Specimen + lodac:Work ca. 1,770,000 Collections (specimen) lodac:Specimen ca. 1,690,000 Collections (creative and historical work) lodac:Work ca. 130,000 Creators foaf:Person ca. Institutes Foaf:Organization ca. 200,000 • Integration by creator, work and institute • Data publication by RDF • Some applications using the data 8,800
  • 5. Use Yokohama Art Spot LODAC Museum × Yokohama Art LOD – Application using museum and local data – Data related to art in Yokohama • Collections • Events • Q&A http://lod.ac/apps/yas/ × PinQA
  • 6. LODAC SPECIES: Linking Species Information with names Museum Specimen DB Species Info. DB Research DB GBIF Taxon Name LOD BioSci. DB No. of Species Names:113118 No. of Triples:14,532,449
  • 7. Search application with LODAC SPECIES http://lod.ac/apps/lsdcs
  • 8. Specified Non-profit Corporation Linked Open Data Initiative, Inc.
  • 9. Prospectus • LOD is becoming an infrastructure of our society – Similar to the impact to our society by Web – LOD help maturity and diversity of our society • We wish to diffuse LOD more in Japan ! – For Governments (Central and Local) – For Companies – For Citizens • How? – By Researchers, Engineers, Citizens together
  • 10. Projects • Platforms – CKAN Japanese – DBpedia Japanese • Collaborative Projects – with Ministry of Industry, Trade, and Economics (METI) • Open Data METI – with National Statistics Center • Scheme Design for Area Code – Collaboration with Sabae City • e.g., “Sabae Burari” • Promotional Events
  • 11.
  • 12. Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 14. Motivation • Data hub for Japanese resources – To promote LOD in Japan – To connect datasets in Japanese • Two linguistic datasets – DBpedia Japanese – RDFized Japanese WordNet
  • 15. DBpedia Japanese • DBpedia i18n project – 14 chapters • generated from Japanese Wikipedia dump files – DIEF (DBpedia Information Extraction Framework) – ~80m triples • Linking to – Japanese WordNet – Japanese Wikipedia Ontology – other DBpedia chapters • http://ja.dbpedia.org
  • 16. i18n/l10n efforts • IRI, IRI, IRI, ... • Configurations for Extractors and Parsers • DBpedia Mappings for each chapter
  • 17. Extraction process ref: D. Kontokostas et al. "Internationalization of Linked Data. The case of the Greek DBpedia edition." Journal of Web Semantics: Science, Services and Agents on the World Wide Web, vol. 15, No.3, Sep. 2012, pp.51-61
  • 18. DBpedia Information Extraction Framework • Software to extract data from Wikipedia dump – including custom extractors/parsers to apply language specific configurations • Extractors / Parsers – DisambiguationExtractor – HomepageExtractor – ImageExtractor – PersondataExtractor
  • 19. DisambiguationExtractor • "ja" -> "(曖昧さ回避)"
  • 20. HomepageExtractor • propertyNamesMap – "ja" -> Set("homepage", "website", " ", " ", "Web サイト", "Webサイト") • externalLinkSectionsMap – "ja" -> "外部リンク" • officialMap – "ja" -> "公式"
  • 21. ImageExtractor • "ja" -> """(?i){{s?(Non free|Non-free pubart)s?}}""".r
  • 22. PersondataExtractor • • • • • Names of templates for personal information “名前”(name) “別名”(alias) “概要”(abstract) dates and places for birth and death
  • 23. Extracted triples after configurations Type Triples disambiguation 106,386 homepages 49,355 images 843,170 persondata 1,811
  • 24. Image of Infobox Extraction Template Mapping Infobox to ontology Data Extraction used for extraction process
  • 25.
  • 26.
  • 27. {{TemplateMapping | mapToClass = ComicsCreator | mappings = {{PropertyMapping | templateProperty = 名前 | ontologyProperty = foaf:name }} {{PropertyMapping | templateProperty = 本名 | ontologyProperty = foaf:name }} {{PropertyMapping | templateProperty = 生年 | ontologyProperty = birthYear }} {{PropertyMapping | templateProperty = 生地 | ontologyProperty = birthPlace }} {{PropertyMapping | templateProperty = 没年 | ontologyProperty = deathYear }} {{PropertyMapping | templateProperty = 没地 | ontologyProperty = deathPlace }} {{PropertyMapping | templateProperty = 国籍 | ontologyProperty = nationality }} {{PropertyMapping | templateProperty = 受賞 | ontologyProperty = award }} {{PropertyMapping | templateProperty = 公式サイト | ontologyProperty = foaf:homepage }} {{PropertyMapping | templateProperty = 画像 | ontologyProperty = foaf:depiction }} {{PropertyMapping | templateProperty = ジャンル | ontologyProperty = genre }} {{PropertyMapping | templateProperty = 画像サイズ | ontologyProperty = imageSize }} {{PropertyMapping | templateProperty = 職業 | ontologyProperty = occupation }} {{PropertyMapping | templateProperty = 代表作 | ontologyProperty = notableWork }} }}
  • 28.
  • 29. Statistics for DBpedia Mappings DBpedia Japanese DBpeida (English) rate of all templates in Wikipedia are mapped 4.67% (81 of 1733) 6.33% (369 of 5,826) rate of all properties in Wikipedia are mapped 2.47% (1,581 of 62,679) 3.47% (6,169 of 177,599) rate of all template occurrences Wikipedia are mapped 47.99% (286,858 of 597,696) 82.24% (2,435,773 of 2,728,357) rate of all property occurrences Wikipedia are mapped 38.75% (3,128,208 of 8,071,982) 54.95% (27,283,343 of 49,654,072)
  • 30. "Mapping Party" • The mapping task is not easy – Wikipedia Template – DBpedia Ontology – Well known vocabularies • We held hands-on sessions – Aug. 2012: 10 people – Mar. 2013: 25 people
  • 33. IRI case IRI IRI to URI IRI IRI
  • 34. IRI issues IRI 2. Input URIs must be decoded to IRIs IRI to URI 3. Some serializations can not use IRIs 4. don't decode IRI IRI 1. IRIs have to be used properly in queries IRI 5. use the latest version
  • 35. Query: Notable comics written by comics creators who have received the Tezuka Osamu Cultural Prize PREFIX dbp: <http://ja.dbpedia.org/resource/> PREFIX dbp-owl: <http://dbpedia.org/ontology/> SELECT ?creatorName ?comicName WHERE { ?creator a dbp-owl:ComicsCreator ; dbp-owl:award dbp:手塚治虫文化賞 ; dbp-owl:notableWork ?comic ; rdfs:label ?creatorName . ?comic a dbp-owl:Comics ; rdfs:label ?comicName . } dbp-owl:Comics サイボーグ009 rdfs:label rdf:type dbp-owl:AdministrativeRegion dbp:サイボーグ009 rdf:type dbp-owl: ComicsCreator dbp-owl:notableWork rdfs:label dbp:宮城県 rdf:type dbp-owl:birthPlace dbp:石ノ森章太 郎 宮城県 rdf:type foaf:Person dbp-owl:leaderName dbp-prop:生年 rdfs:label dbp-owl:award dbp:村井嘉浩 1938 石ノ森章太郎 dbp:手塚治虫 文化賞
  • 36. Japanese Linked Data Cloud • 21 datasets • Criteria – providing more than 1000 triples – providing either dereference, data dump or SPARQL Endpoint – including Japanese labels – linking to other datasets in LOD cloud or JLDC • Open license is not mandatory
  • 37. JLDC with LOD cloud criteria 21 → 9
  • 38. Links to/from Japanese WordNet links WN nouns DBpedia IRIs WN to DBpedia DBpedia to WN resources 33,017 65,788 1,456,158 50.1% 2.3% properties 1,245 65,788 16,020 1.9% 7.8%
  • 39. Ongoing Work • More Wikipedia entries and infoboxes – Wikipedia Town • More DBpedia mappings – Mapping Party • Parsers for Japanese – Japanese Calendar: 慶応3年1月2日 => "1868-01-02"^^xsd:date
  • 40. Summary • Linked Data in Japan is steadily expanding – Started by the research project – Now extended to various areas • Creating a local chapter of DBpedia is a key to promote Linked Data in the local language – A hub in the local language – People in any areas can find connections in DBpedia with their data • Promotion of open license is still in progress