SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Data description & conversion
Application and examples
Lemon-aid: using lemon to aid quantitative
historical linguistic analysis
Steven Moran & Martin Br¨ummer
University of Zurich & University of Leipzig
1 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Goals
Convert dictionary and wordlist data into the Lexicon Model
for Ontologies, aka lemon
Leverage Linked Data (LD) to combine disparate lexical
resources (50+) from the QuantHistLing research unit
Resulting LD resources provide researchers with:
more linguistic data in the Linguistic Linked Open Data Cloud
(LLOD)
a translation graph to query across the underlying lexicons and
dictionaries to extract semantically-aligned wordlists
2 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Talk map
Data description & conversion
Ontological model
Application and examples
3 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Overview
Data
Conversion
QuantHistLing project
Quantitative Historical Linguistics (QuantHistLing) research
unit aims to uncover and clarify phylogenetic relationships
between native South American languages using quantitative
methods
http://quanthistling.info/
There are two main objectives of the project:
Digitalization of lexical resources on South American
languages and
The development of computer-assisted methods and
algorithms to quantitatively analyze the digitized data
4 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Overview
Data
Conversion
QuantHistLing data
QuantHistLing aims to digitize around 500 works, most of
which are currently only available in print and many of which
are the only resources available for the languages that they
describe
http://quanthistling.info/index.php?id=resources
5 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Overview
Data
Conversion
QuantHistLing data
Simple data output format that contains metadata (prefixed
with “@”) and tab-delimited lexical output
6 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Overview
Data
Conversion
QuantHistLing data format
The first row following the metadata contains the data header
with the fields: QLCID, HEAD, HEAD DOCULECT,
TRANSLATION, TRANSLATION DOCULECT
They correspond respectively to the internal QLC unique
identifier, the headword in the dictionary, the doculect of the
headword (or in other words the language which this
particular document describes), the translation for the given
headword, and the doculect that the translation is given in
For each resource a data dump with the same format is
provided by the project
7 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Overview
Data
Conversion
Conversion from QHL-to-LD
We convert the QLC data into Linked Data that conforms to
the Lemon model with a simple Python script
Lemon is an ontological model for modeling lexicons and
machine-readable dictionaries for linking to the Semantic Web
and the Linked Data cloud
http://lemon-model.net/
Lemon developers also active in the W3C Ontology-Lexica
Community Group
Goal is to “develop models for the representation of lexica (and
machine readable dictionaries) relative to ontologies”
http://www.w3.org/community/ontolex/
8 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Overview
Data
Conversion
Implementation of QHL data in Lemon
9 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Overview
Data
Conversion
Implementation of QHL data in Lemon
10 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Overview
Data
Conversion
Implementation of QHL data in Lemon
11 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Overview
Data
Conversion
Implementation of QHL data in Lemon
12 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Overview
Data
Conversion
Implementation of QHL data in Lemon
13 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Overview
Data
Conversion
Implementation of QHL data in Lemon
According to the Lemon model, senses should link to an
ontological entity like a DBpedia resource
However, we only have the strings representing the word forms
which is not enough data to reliably link to other knowledge
bases
we argue that it is linguistically correct to link the senses of
the entries (instead of their word forms) to their respective
translations
so the ‘sense’ resources serve a purpose, even if they don’t
link the meaning of the entry
14 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Application
Examples
Conclusion
Application
A major goal in historical-comparative linguistics is the
identification of cognates, i.e. sets of words in genealogically
related languages that have been derived from a common
word or root (e.g. English ‘is’, German ‘ist’, Latin ‘est’, from
Indo-European ‘esti’).
Modeling dictionaries and lexicons in a pivot ontology using
overlaps in translations is one way to merge several resources
into one RDF graph for querying and extracting
semantically-aligned wordlists, which can then be used as
input into computational historical linguistics tools such as
LingPy (List and Moran, 2013).
15 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Application
Examples
Conclusion
Application
As a first step, we have converted the QHL data into RDF
and it is available online through a SPARQL endpoint.
http://linked-data.org/sparql/ (preliminary)
http://quanthistlist.info/lod/ (coming soon)
a dump is available at http://linked-data.org/datasets/
Querying the combined dictionaries and lexicons is
straightforward
16 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Application
Examples
Conclusion
Return all triples
returns over 3.8 million triples
17 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Application
Examples
Conclusion
Pairs of languages in the translation graph that contain
written forms for the lexical sense “casa”
18 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Application
Examples
Conclusion
Implementation of QHL data in Lemon
19 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Application
Examples
Conclusion
Languages in the translation graph that contain written
forms for the lexical sense “casa”
66 language pairs
marked entry shows word forms are not normalized and
contain data that can be analyzed further
20 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Application
Examples
Conclusion
Queries
Can be easily extended to incorporate entire wordlists, such as
the Swadesh list (Swadesh, 1952) or Leipzig-Jakarta list
(Tadmor et al., 2010)
The combination of disparate data from many dictionaries and
lexicons is a first step in a computational historical linguistics
pipeline
Results are given in the source documents’ orthographic
representations
They must be normalized into an interlingual pivot, such as
the International Phonetic Alphabet, if phonetic or phonemic
analysis is to be applied to the data
Next step before producing phonetic alignments and cognate
judgements based on metrics and algorithms for calculating
lexical similarity
21 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Application
Examples
Conclusion
Conclusion
From data being digitized and extracted from print resources,
we are creating machine-readable lexicons that are both
interoperable with each other (we link semantic senses using
the Lemon ontology model) and with other linguistics sources
We also interlink the resulting dictionary resources with other
language resources in the LLOD via ISO639-3 codes
22 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Application
Examples
Conclusion
Future work - simplify!
Build algorithms that identify semantically similar
translation-pairs from terse translations
Identify that doculect translations like
“coarsely grind”
“grind up, crush well”
“grind lightly (chili pepper, millet for a quick snack)”
“grind lightly (groundnuts) with stones”
for different languages can be mapped to a simpler form such
as “to crush/grind” for initial comparative analysis
23 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Application
Examples
Conclusion
Future work - annotate!
Use NLP Interchange Format (Hellmann et al., 2012) to keep
track of where information in the dictionaries comes from - or
in other words, use NIF combined with Lemon to annotate the
QHL data sources for provenance
24 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Application
Examples
Conclusion
Future work - link!
Link to further resources that contain linguistic and
non-linguistic information
Typological data and geographic variables that may provide
useful information for determining the genealogical and
geographical relatedness of languages
25 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
Data description & conversion
Application and examples
Application
Examples
Conclusion
Many thanks!
Organiziers and participants of LDL-2013
QuantHistLing Research Unit - Univeristy of Marburg
(Michael Cysouw, PI)
University of Zurich, University of Marburg and University of
Leipzig
26 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing

Weitere ähnliche Inhalte

Was ist angesagt?

Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrievalmghgk
 
Range Query on Big Data Based on Map Reduce
Range Query on Big Data Based on Map ReduceRange Query on Big Data Based on Map Reduce
Range Query on Big Data Based on Map ReduceIJMER
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataMatthew Rowe
 
Coling2014:Single Document Keyphrase Extraction Using Label Information
Coling2014:Single Document Keyphrase Extraction Using Label InformationColing2014:Single Document Keyphrase Extraction Using Label Information
Coling2014:Single Document Keyphrase Extraction Using Label InformationRyuchi Tachibana
 
Unit i data structure FYCS MUMBAI UNIVERSITY SEM II
Unit i  data structure FYCS MUMBAI UNIVERSITY SEM II Unit i  data structure FYCS MUMBAI UNIVERSITY SEM II
Unit i data structure FYCS MUMBAI UNIVERSITY SEM II ajay pashankar
 
Short Report Bridges performance gap between Relational and RDF
Short Report Bridges performance gap between Relational and RDFShort Report Bridges performance gap between Relational and RDF
Short Report Bridges performance gap between Relational and RDFAkram Abbasi
 
Jarrar: RDF Stores -Challenges and Solutions
Jarrar: RDF Stores -Challenges and SolutionsJarrar: RDF Stores -Challenges and Solutions
Jarrar: RDF Stores -Challenges and SolutionsMustafa Jarrar
 
Scalable Discovery Of Hidden Emails From Large Folders
Scalable Discovery Of Hidden Emails From Large FoldersScalable Discovery Of Hidden Emails From Large Folders
Scalable Discovery Of Hidden Emails From Large Foldersfeiwin
 
Using Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebUsing Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebIJwest
 
Query Translation for Data Sources with Heterogeneous Content Semantics
Query Translation for Data Sources with Heterogeneous Content Semantics Query Translation for Data Sources with Heterogeneous Content Semantics
Query Translation for Data Sources with Heterogeneous Content Semantics Jie Bao
 
Deploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application ServerDeploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application Serverwebhostingguy
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsShubhangi Tandon
 
Address standardization with latent semantic association
Address standardization with latent semantic associationAddress standardization with latent semantic association
Address standardization with latent semantic associationjyhuangtc
 
A wiki for_business_rules_in_open_vocabulary_executable_english
A wiki for_business_rules_in_open_vocabulary_executable_englishA wiki for_business_rules_in_open_vocabulary_executable_english
A wiki for_business_rules_in_open_vocabulary_executable_englishAdrian Walker
 

Was ist angesagt? (20)

Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrieval
 
RDF and Java
RDF and JavaRDF and Java
RDF and Java
 
Range Query on Big Data Based on Map Reduce
Range Query on Big Data Based on Map ReduceRange Query on Big Data Based on Map Reduce
Range Query on Big Data Based on Map Reduce
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic Data
 
Coling2014:Single Document Keyphrase Extraction Using Label Information
Coling2014:Single Document Keyphrase Extraction Using Label InformationColing2014:Single Document Keyphrase Extraction Using Label Information
Coling2014:Single Document Keyphrase Extraction Using Label Information
 
Unit i data structure FYCS MUMBAI UNIVERSITY SEM II
Unit i  data structure FYCS MUMBAI UNIVERSITY SEM II Unit i  data structure FYCS MUMBAI UNIVERSITY SEM II
Unit i data structure FYCS MUMBAI UNIVERSITY SEM II
 
Short Report Bridges performance gap between Relational and RDF
Short Report Bridges performance gap between Relational and RDFShort Report Bridges performance gap between Relational and RDF
Short Report Bridges performance gap between Relational and RDF
 
Linked sensor data
Linked sensor dataLinked sensor data
Linked sensor data
 
5010
50105010
5010
 
Jarrar: RDF Stores -Challenges and Solutions
Jarrar: RDF Stores -Challenges and SolutionsJarrar: RDF Stores -Challenges and Solutions
Jarrar: RDF Stores -Challenges and Solutions
 
Scalable Discovery Of Hidden Emails From Large Folders
Scalable Discovery Of Hidden Emails From Large FoldersScalable Discovery Of Hidden Emails From Large Folders
Scalable Discovery Of Hidden Emails From Large Folders
 
Using Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebUsing Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic Web
 
Query Translation for Data Sources with Heterogeneous Content Semantics
Query Translation for Data Sources with Heterogeneous Content Semantics Query Translation for Data Sources with Heterogeneous Content Semantics
Query Translation for Data Sources with Heterogeneous Content Semantics
 
Deploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application ServerDeploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application Server
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
RDF for PubMedCentral
RDF for PubMedCentral RDF for PubMedCentral
RDF for PubMedCentral
 
Semantic Web Nature
Semantic Web NatureSemantic Web Nature
Semantic Web Nature
 
Linked Data Fragments
Linked Data FragmentsLinked Data Fragments
Linked Data Fragments
 
Address standardization with latent semantic association
Address standardization with latent semantic associationAddress standardization with latent semantic association
Address standardization with latent semantic association
 
A wiki for_business_rules_in_open_vocabulary_executable_english
A wiki for_business_rules_in_open_vocabulary_executable_englishA wiki for_business_rules_in_open_vocabulary_executable_english
A wiki for_business_rules_in_open_vocabulary_executable_english
 

Andere mochten auch

Perry bradley and tasmin
Perry   bradley and tasminPerry   bradley and tasmin
Perry bradley and tasminSusannaBadger
 
AS AO1 English Language
AS AO1 English LanguageAS AO1 English Language
AS AO1 English LanguagePolox3
 
Verbal and non- verbal linguistic devices in Pinter’s ‘ The Birthday Party’.
Verbal and non- verbal linguistic devices in Pinter’s ‘ The Birthday Party’.Verbal and non- verbal linguistic devices in Pinter’s ‘ The Birthday Party’.
Verbal and non- verbal linguistic devices in Pinter’s ‘ The Birthday Party’.Kinjal Patel
 
Gruffalo Linguistic Analysis
Gruffalo Linguistic AnalysisGruffalo Linguistic Analysis
Gruffalo Linguistic AnalysisBeth Graham
 
A Level English Language Exam Prep from AQA 2011
A Level English Language Exam Prep from AQA 2011A Level English Language Exam Prep from AQA 2011
A Level English Language Exam Prep from AQA 2011ENSFCEnglish
 
Language skills for higher level responses WJEC
Language skills for higher level responses WJEC Language skills for higher level responses WJEC
Language skills for higher level responses WJEC Emma Sinclair
 
Language analysis essay writing
Language analysis essay writingLanguage analysis essay writing
Language analysis essay writingTy171
 

Andere mochten auch (11)

Perry bradley and tasmin
Perry   bradley and tasminPerry   bradley and tasmin
Perry bradley and tasmin
 
AS AO1 English Language
AS AO1 English LanguageAS AO1 English Language
AS AO1 English Language
 
Verbal and non- verbal linguistic devices in Pinter’s ‘ The Birthday Party’.
Verbal and non- verbal linguistic devices in Pinter’s ‘ The Birthday Party’.Verbal and non- verbal linguistic devices in Pinter’s ‘ The Birthday Party’.
Verbal and non- verbal linguistic devices in Pinter’s ‘ The Birthday Party’.
 
Gruffalo Linguistic Analysis
Gruffalo Linguistic AnalysisGruffalo Linguistic Analysis
Gruffalo Linguistic Analysis
 
Jamaican creole
Jamaican creoleJamaican creole
Jamaican creole
 
A Level English Language Exam Prep from AQA 2011
A Level English Language Exam Prep from AQA 2011A Level English Language Exam Prep from AQA 2011
A Level English Language Exam Prep from AQA 2011
 
Language skills for higher level responses WJEC
Language skills for higher level responses WJEC Language skills for higher level responses WJEC
Language skills for higher level responses WJEC
 
Jamaican Creole
Jamaican CreoleJamaican Creole
Jamaican Creole
 
Language analysis essay writing
Language analysis essay writingLanguage analysis essay writing
Language analysis essay writing
 
The gruffalo
The gruffaloThe gruffalo
The gruffalo
 
L2 propaganda and linguistic devices
L2 propaganda and linguistic devicesL2 propaganda and linguistic devices
L2 propaganda and linguistic devices
 

Ähnlich wie Lemon-aid: using Lemon to aid quantitative historical linguistic analysis

Bourne RDAP11 Data Publication Repositories
Bourne RDAP11 Data Publication RepositoriesBourne RDAP11 Data Publication Repositories
Bourne RDAP11 Data Publication RepositoriesASIS&T
 
Linked Data and Services
Linked Data and ServicesLinked Data and Services
Linked Data and ServicesBarry Norton
 
The Nature of Information
The Nature of InformationThe Nature of Information
The Nature of InformationAdrian Paschke
 
The Linked Data Lifecycle
The Linked Data LifecycleThe Linked Data Lifecycle
The Linked Data Lifecyclegeoknow
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityunivTope Omitola
 
#opentourism - Linked Open Data Publishing and Discovery Workshop
#opentourism - Linked Open Data Publishing and Discovery Workshop#opentourism - Linked Open Data Publishing and Discovery Workshop
#opentourism - Linked Open Data Publishing and Discovery WorkshopRaf Buyle
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsDimitris Kontokostas
 
Semantic Web Adoption
Semantic Web AdoptionSemantic Web Adoption
Semantic Web Adoptionguest262aaa
 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015Sebastian Hellmann
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search ComponentMario Flecha
 
Linking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process DescriptionsLinking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process DescriptionsChristoph Lange
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information RetrievalDustin Smith
 
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...cscpconf
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsDerek Kane
 
Online Presentation
Online PresentationOnline Presentation
Online Presentationnw13
 
Enabling Language Resources to Expose Translations as Linked Data on the Web
Enabling Language Resources to Expose Translations as Linked Data on the WebEnabling Language Resources to Expose Translations as Linked Data on the Web
Enabling Language Resources to Expose Translations as Linked Data on the WebJorge Gracia
 
Establishing the Connection: Creating a Linked Data Version of the BNB
Establishing the Connection: Creating a Linked Data Version of the BNBEstablishing the Connection: Creating a Linked Data Version of the BNB
Establishing the Connection: Creating a Linked Data Version of the BNBnw13
 
iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...
iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...
iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...andimou
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
 

Ähnlich wie Lemon-aid: using Lemon to aid quantitative historical linguistic analysis (20)

Bourne RDAP11 Data Publication Repositories
Bourne RDAP11 Data Publication RepositoriesBourne RDAP11 Data Publication Repositories
Bourne RDAP11 Data Publication Repositories
 
Linked Data and Services
Linked Data and ServicesLinked Data and Services
Linked Data and Services
 
The Nature of Information
The Nature of InformationThe Nature of Information
The Nature of Information
 
The Linked Data Lifecycle
The Linked Data LifecycleThe Linked Data Lifecycle
The Linked Data Lifecycle
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityuniv
 
#opentourism - Linked Open Data Publishing and Discovery Workshop
#opentourism - Linked Open Data Publishing and Discovery Workshop#opentourism - Linked Open Data Publishing and Discovery Workshop
#opentourism - Linked Open Data Publishing and Discovery Workshop
 
RDAP 033111
RDAP 033111RDAP 033111
RDAP 033111
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 
Semantic Web Adoption
Semantic Web AdoptionSemantic Web Adoption
Semantic Web Adoption
 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
 
Linking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process DescriptionsLinking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process Descriptions
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
 
Online Presentation
Online PresentationOnline Presentation
Online Presentation
 
Enabling Language Resources to Expose Translations as Linked Data on the Web
Enabling Language Resources to Expose Translations as Linked Data on the WebEnabling Language Resources to Expose Translations as Linked Data on the Web
Enabling Language Resources to Expose Translations as Linked Data on the Web
 
Establishing the Connection: Creating a Linked Data Version of the BNB
Establishing the Connection: Creating a Linked Data Version of the BNBEstablishing the Connection: Creating a Linked Data Version of the BNB
Establishing the Connection: Creating a Linked Data Version of the BNB
 
iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...
iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...
iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 

Mehr von mbruemmer

Dirk Goldhahn: Introduction to the German Wortschatz Project
Dirk Goldhahn: Introduction to the German Wortschatz ProjectDirk Goldhahn: Introduction to the German Wortschatz Project
Dirk Goldhahn: Introduction to the German Wortschatz Projectmbruemmer
 
Oscar Muñoz: Content Analytics for Media Agencies
Oscar Muñoz: Content Analytics for Media AgenciesOscar Muñoz: Content Analytics for Media Agencies
Oscar Muñoz: Content Analytics for Media Agenciesmbruemmer
 
Alessio Bosca: Linked Data for Content Analytics in CELI
Alessio Bosca: Linked Data for Content Analytics in CELIAlessio Bosca: Linked Data for Content Analytics in CELI
Alessio Bosca: Linked Data for Content Analytics in CELImbruemmer
 
Marc Egger: Text Analytics for Brand Research -Non-reactive Concept Mapping t...
Marc Egger: Text Analytics for Brand Research -Non-reactive Concept Mapping t...Marc Egger: Text Analytics for Brand Research -Non-reactive Concept Mapping t...
Marc Egger: Text Analytics for Brand Research -Non-reactive Concept Mapping t...mbruemmer
 
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked DataMark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Datambruemmer
 
Ilan Kernerman: Generating Multilingual Lexicographic Resources
Ilan Kernerman: Generating Multilingual Lexicographic ResourcesIlan Kernerman: Generating Multilingual Lexicographic Resources
Ilan Kernerman: Generating Multilingual Lexicographic Resourcesmbruemmer
 
Tatiana Gornostay: Language Meets Knowledge in Digital Content Management
Tatiana Gornostay: Language Meets Knowledge in Digital Content ManagementTatiana Gornostay: Language Meets Knowledge in Digital Content Management
Tatiana Gornostay: Language Meets Knowledge in Digital Content Managementmbruemmer
 

Mehr von mbruemmer (7)

Dirk Goldhahn: Introduction to the German Wortschatz Project
Dirk Goldhahn: Introduction to the German Wortschatz ProjectDirk Goldhahn: Introduction to the German Wortschatz Project
Dirk Goldhahn: Introduction to the German Wortschatz Project
 
Oscar Muñoz: Content Analytics for Media Agencies
Oscar Muñoz: Content Analytics for Media AgenciesOscar Muñoz: Content Analytics for Media Agencies
Oscar Muñoz: Content Analytics for Media Agencies
 
Alessio Bosca: Linked Data for Content Analytics in CELI
Alessio Bosca: Linked Data for Content Analytics in CELIAlessio Bosca: Linked Data for Content Analytics in CELI
Alessio Bosca: Linked Data for Content Analytics in CELI
 
Marc Egger: Text Analytics for Brand Research -Non-reactive Concept Mapping t...
Marc Egger: Text Analytics for Brand Research -Non-reactive Concept Mapping t...Marc Egger: Text Analytics for Brand Research -Non-reactive Concept Mapping t...
Marc Egger: Text Analytics for Brand Research -Non-reactive Concept Mapping t...
 
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked DataMark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
 
Ilan Kernerman: Generating Multilingual Lexicographic Resources
Ilan Kernerman: Generating Multilingual Lexicographic ResourcesIlan Kernerman: Generating Multilingual Lexicographic Resources
Ilan Kernerman: Generating Multilingual Lexicographic Resources
 
Tatiana Gornostay: Language Meets Knowledge in Digital Content Management
Tatiana Gornostay: Language Meets Knowledge in Digital Content ManagementTatiana Gornostay: Language Meets Knowledge in Digital Content Management
Tatiana Gornostay: Language Meets Knowledge in Digital Content Management
 

Kürzlich hochgeladen

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 

Kürzlich hochgeladen (20)

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 

Lemon-aid: using Lemon to aid quantitative historical linguistic analysis

  • 1. Data description & conversion Application and examples Lemon-aid: using lemon to aid quantitative historical linguistic analysis Steven Moran & Martin Br¨ummer University of Zurich & University of Leipzig 1 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 2. Data description & conversion Application and examples Goals Convert dictionary and wordlist data into the Lexicon Model for Ontologies, aka lemon Leverage Linked Data (LD) to combine disparate lexical resources (50+) from the QuantHistLing research unit Resulting LD resources provide researchers with: more linguistic data in the Linguistic Linked Open Data Cloud (LLOD) a translation graph to query across the underlying lexicons and dictionaries to extract semantically-aligned wordlists 2 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 3. Data description & conversion Application and examples Talk map Data description & conversion Ontological model Application and examples 3 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 4. Data description & conversion Application and examples Overview Data Conversion QuantHistLing project Quantitative Historical Linguistics (QuantHistLing) research unit aims to uncover and clarify phylogenetic relationships between native South American languages using quantitative methods http://quanthistling.info/ There are two main objectives of the project: Digitalization of lexical resources on South American languages and The development of computer-assisted methods and algorithms to quantitatively analyze the digitized data 4 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 5. Data description & conversion Application and examples Overview Data Conversion QuantHistLing data QuantHistLing aims to digitize around 500 works, most of which are currently only available in print and many of which are the only resources available for the languages that they describe http://quanthistling.info/index.php?id=resources 5 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 6. Data description & conversion Application and examples Overview Data Conversion QuantHistLing data Simple data output format that contains metadata (prefixed with “@”) and tab-delimited lexical output 6 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 7. Data description & conversion Application and examples Overview Data Conversion QuantHistLing data format The first row following the metadata contains the data header with the fields: QLCID, HEAD, HEAD DOCULECT, TRANSLATION, TRANSLATION DOCULECT They correspond respectively to the internal QLC unique identifier, the headword in the dictionary, the doculect of the headword (or in other words the language which this particular document describes), the translation for the given headword, and the doculect that the translation is given in For each resource a data dump with the same format is provided by the project 7 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 8. Data description & conversion Application and examples Overview Data Conversion Conversion from QHL-to-LD We convert the QLC data into Linked Data that conforms to the Lemon model with a simple Python script Lemon is an ontological model for modeling lexicons and machine-readable dictionaries for linking to the Semantic Web and the Linked Data cloud http://lemon-model.net/ Lemon developers also active in the W3C Ontology-Lexica Community Group Goal is to “develop models for the representation of lexica (and machine readable dictionaries) relative to ontologies” http://www.w3.org/community/ontolex/ 8 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 9. Data description & conversion Application and examples Overview Data Conversion Implementation of QHL data in Lemon 9 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 10. Data description & conversion Application and examples Overview Data Conversion Implementation of QHL data in Lemon 10 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 11. Data description & conversion Application and examples Overview Data Conversion Implementation of QHL data in Lemon 11 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 12. Data description & conversion Application and examples Overview Data Conversion Implementation of QHL data in Lemon 12 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 13. Data description & conversion Application and examples Overview Data Conversion Implementation of QHL data in Lemon 13 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 14. Data description & conversion Application and examples Overview Data Conversion Implementation of QHL data in Lemon According to the Lemon model, senses should link to an ontological entity like a DBpedia resource However, we only have the strings representing the word forms which is not enough data to reliably link to other knowledge bases we argue that it is linguistically correct to link the senses of the entries (instead of their word forms) to their respective translations so the ‘sense’ resources serve a purpose, even if they don’t link the meaning of the entry 14 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 15. Data description & conversion Application and examples Application Examples Conclusion Application A major goal in historical-comparative linguistics is the identification of cognates, i.e. sets of words in genealogically related languages that have been derived from a common word or root (e.g. English ‘is’, German ‘ist’, Latin ‘est’, from Indo-European ‘esti’). Modeling dictionaries and lexicons in a pivot ontology using overlaps in translations is one way to merge several resources into one RDF graph for querying and extracting semantically-aligned wordlists, which can then be used as input into computational historical linguistics tools such as LingPy (List and Moran, 2013). 15 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 16. Data description & conversion Application and examples Application Examples Conclusion Application As a first step, we have converted the QHL data into RDF and it is available online through a SPARQL endpoint. http://linked-data.org/sparql/ (preliminary) http://quanthistlist.info/lod/ (coming soon) a dump is available at http://linked-data.org/datasets/ Querying the combined dictionaries and lexicons is straightforward 16 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 17. Data description & conversion Application and examples Application Examples Conclusion Return all triples returns over 3.8 million triples 17 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 18. Data description & conversion Application and examples Application Examples Conclusion Pairs of languages in the translation graph that contain written forms for the lexical sense “casa” 18 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 19. Data description & conversion Application and examples Application Examples Conclusion Implementation of QHL data in Lemon 19 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 20. Data description & conversion Application and examples Application Examples Conclusion Languages in the translation graph that contain written forms for the lexical sense “casa” 66 language pairs marked entry shows word forms are not normalized and contain data that can be analyzed further 20 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 21. Data description & conversion Application and examples Application Examples Conclusion Queries Can be easily extended to incorporate entire wordlists, such as the Swadesh list (Swadesh, 1952) or Leipzig-Jakarta list (Tadmor et al., 2010) The combination of disparate data from many dictionaries and lexicons is a first step in a computational historical linguistics pipeline Results are given in the source documents’ orthographic representations They must be normalized into an interlingual pivot, such as the International Phonetic Alphabet, if phonetic or phonemic analysis is to be applied to the data Next step before producing phonetic alignments and cognate judgements based on metrics and algorithms for calculating lexical similarity 21 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 22. Data description & conversion Application and examples Application Examples Conclusion Conclusion From data being digitized and extracted from print resources, we are creating machine-readable lexicons that are both interoperable with each other (we link semantic senses using the Lemon ontology model) and with other linguistics sources We also interlink the resulting dictionary resources with other language resources in the LLOD via ISO639-3 codes 22 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 23. Data description & conversion Application and examples Application Examples Conclusion Future work - simplify! Build algorithms that identify semantically similar translation-pairs from terse translations Identify that doculect translations like “coarsely grind” “grind up, crush well” “grind lightly (chili pepper, millet for a quick snack)” “grind lightly (groundnuts) with stones” for different languages can be mapped to a simpler form such as “to crush/grind” for initial comparative analysis 23 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 24. Data description & conversion Application and examples Application Examples Conclusion Future work - annotate! Use NLP Interchange Format (Hellmann et al., 2012) to keep track of where information in the dictionaries comes from - or in other words, use NIF combined with Lemon to annotate the QHL data sources for provenance 24 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 25. Data description & conversion Application and examples Application Examples Conclusion Future work - link! Link to further resources that contain linguistic and non-linguistic information Typological data and geographic variables that may provide useful information for determining the genealogical and geographical relatedness of languages 25 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  • 26. Data description & conversion Application and examples Application Examples Conclusion Many thanks! Organiziers and participants of LDL-2013 QuantHistLing Research Unit - Univeristy of Marburg (Michael Cysouw, PI) University of Zurich, University of Marburg and University of Leipzig 26 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing