SlideShare a Scribd company logo
1 of 14
Download to read offline
Diachronic Analysis of
the Italian Language
exploiting Google
Ngram
Hello!
Pierpaolo Basile
Annalina Caputo
Roberta Luisi
Giovanni Semeraro
Department of Computer Science
University of Bari Aldo Moro - Italy
Background
TRI
P. Basile, A. Caputo, G. Semeraro. Temporal random indexing: A system for analysing word meaning over time.
IJCoL vol. 1: Emerging Topics at the First Italian Conference on Computational Linguistics.
Corpus with Temporal
Information
Dictionary/Random Vectors
Temporal Random Indexing
Word
Space
Word
Space
Word
Space
Word
Space
Word
Space
▪ Several WordSpaces for
several time periods
▪ Word vectors are comparable
across WordSpaces
Motivation 1
Detect meaning shift
Marty, in 2015
people will surf
on the web!!!
Motivation 1
Detect meaning shift
Surf!?!?! On
the
web!?!?!?
Motivation 1
Detect meaning shift
Surf!?!?!
On the
web!?!?!?
surf the Net/Internet to use the Internet
When was this meaning introduced?
Motivation 2
Large corpus
▪ Build a method for computing TRI relying on a very
large corpus
▪ Google Ngram for the Italian language
▫n-grams (up to five) extracted from Google Books
▫over five million books spanning the years from
1500 to 2012
▪ covers several languages including Italian
analysis is often described as 1991 104 5
N-gram occurrences books
Methodology
1. Run TRI on the Italian Google Ngram
▫build a WordSpace for each time period (10
years)
2. Provide for each word a time series
3. Search significant changes in the time
series
cossim
( , )
Time Series
Several time series Γ at the time interval k
log frequency
point-wise
cumulative cossim
( , )
Word frequency in each time
period k
Cosine similarity between word
vectors across two time periods
Considers a cumulative vector
of the previous k-1 time periods
Change point
detection
▪ Mean shift of Γ pivoted at time period j
▪ Search statistical significant mean shifts
▫bootstrapping approach under the null hypothesis
that there is no change in the meaning
Evaluation
Dataset
Build a benchmark for meaning shift detection for
the Italian language
▪ extract a set of words by pooling data by running
several system settings
▪ find correct change points in a dictionary (Sabatino
Coletti/Etimologico Zanichelli)
Evaluation
Results
Method Accuracy
TRIpoint
0.3086
TRIcum
0.2963
TRR1point
0.2716
log freq 0.2346
TRR2point
0.1728
TRR1cum
0.1605
TRR2cum
0.1235
Accuracy: the year predicted by the system should be
equal or greater than the year reported in the gold
standard
TRR1 and TRR2 are variants of TRI
based on Reflective Random Indexing
Conclusions and
Future Work
▪ TRI method with point wise detection provides
good results
▫it overcomes the baseline based on
log-frequency
▪ We provide a benchmark for the evaluation of
meaning shifts for the Italian language
▪ Future work: extend the dataset and provide
an evaluation for the English language
Thanks!!
Any questions?
pierpaolo.basile@gmail.com
https://github.com/pippokill/tri

More Related Content

Viewers also liked

Uprising microblogs: A Bayesian network retrieval model for tweet search
Uprising microblogs: A Bayesian network retrieval model for tweet searchUprising microblogs: A Bayesian network retrieval model for tweet search
Uprising microblogs: A Bayesian network retrieval model for tweet searchLamjed Ben Jabeur
 
Web-scale semantic search
Web-scale semantic searchWeb-scale semantic search
Web-scale semantic searchEdgar Meij
 
(Micro)Blog : un sujet de recherche actuel [08/02/2011]
(Micro)Blog : un sujet de recherche actuel [08/02/2011](Micro)Blog : un sujet de recherche actuel [08/02/2011]
(Micro)Blog : un sujet de recherche actuel [08/02/2011]Guillaume Cabanac
 
Barometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociaux
Barometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociauxBarometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociaux
Barometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociauxHelloWork
 
Quels facteurs de pertinence pour la recherche de produits e-commerce ?
Quels facteurs de pertinence pour la recherche de produits e-commerce ?Quels facteurs de pertinence pour la recherche de produits e-commerce ?
Quels facteurs de pertinence pour la recherche de produits e-commerce ?Lamjed Ben Jabeur
 
Moederpresentatie Cross Media Cafe - Uit het Lab
Moederpresentatie Cross Media Cafe - Uit het LabMoederpresentatie Cross Media Cafe - Uit het Lab
Moederpresentatie Cross Media Cafe - Uit het LabMedia Perspectives
 
Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...
Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...
Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...Lamjed Ben Jabeur
 
Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...
Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...
Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...Lamjed Ben Jabeur
 
Un modèle de Recherche d'Information Sociale pour l'Accès aux Ressources Bib...
Un modèle de Recherche d'Information Sociale  pour l'Accès aux Ressources Bib...Un modèle de Recherche d'Information Sociale  pour l'Accès aux Ressources Bib...
Un modèle de Recherche d'Information Sociale pour l'Accès aux Ressources Bib...Lamjed Ben Jabeur
 

Viewers also liked (9)

Uprising microblogs: A Bayesian network retrieval model for tweet search
Uprising microblogs: A Bayesian network retrieval model for tweet searchUprising microblogs: A Bayesian network retrieval model for tweet search
Uprising microblogs: A Bayesian network retrieval model for tweet search
 
Web-scale semantic search
Web-scale semantic searchWeb-scale semantic search
Web-scale semantic search
 
(Micro)Blog : un sujet de recherche actuel [08/02/2011]
(Micro)Blog : un sujet de recherche actuel [08/02/2011](Micro)Blog : un sujet de recherche actuel [08/02/2011]
(Micro)Blog : un sujet de recherche actuel [08/02/2011]
 
Barometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociaux
Barometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociauxBarometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociaux
Barometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociaux
 
Quels facteurs de pertinence pour la recherche de produits e-commerce ?
Quels facteurs de pertinence pour la recherche de produits e-commerce ?Quels facteurs de pertinence pour la recherche de produits e-commerce ?
Quels facteurs de pertinence pour la recherche de produits e-commerce ?
 
Moederpresentatie Cross Media Cafe - Uit het Lab
Moederpresentatie Cross Media Cafe - Uit het LabMoederpresentatie Cross Media Cafe - Uit het Lab
Moederpresentatie Cross Media Cafe - Uit het Lab
 
Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...
Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...
Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...
 
Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...
Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...
Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...
 
Un modèle de Recherche d'Information Sociale pour l'Accès aux Ressources Bib...
Un modèle de Recherche d'Information Sociale  pour l'Accès aux Ressources Bib...Un modèle de Recherche d'Information Sociale  pour l'Accès aux Ressources Bib...
Un modèle de Recherche d'Information Sociale pour l'Accès aux Ressources Bib...
 

Similar to Diachronic Analysis of the Italian Language exploiting Google Ngram

Diachronic Analysis of Language exploiting Google Ngram
Diachronic Analysis of Language exploiting Google NgramDiachronic Analysis of Language exploiting Google Ngram
Diachronic Analysis of Language exploiting Google NgramAnnalina Caputo
 
Detecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexingDetecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexingPierpaolo Basile
 
Añotador: a Temporal Tagger for Spanish
Añotador: a Temporal Tagger for SpanishAñotador: a Temporal Tagger for Spanish
Añotador: a Temporal Tagger for SpanishMaría Navas Loro
 
G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageijnlc
 
Temporal Semantic Techniques for Text Analysis and Applications
Temporal Semantic Techniques for Text Analysis and ApplicationsTemporal Semantic Techniques for Text Analysis and Applications
Temporal Semantic Techniques for Text Analysis and ApplicationsFedelucio Narducci
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...Lifeng (Aaron) Han
 
Fantoni Urgo - Cirp Dictionary
Fantoni Urgo - Cirp DictionaryFantoni Urgo - Cirp Dictionary
Fantoni Urgo - Cirp DictionaryGualtiero Fantoni
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Cornelius Puschmann
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indianeSAT Publishing House
 
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...Lifeng (Aaron) Han
 
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...ESEM 2014
 
Scientific and technical translation in English - week 3 2019
Scientific and technical translation in English - week 3 2019Scientific and technical translation in English - week 3 2019
Scientific and technical translation in English - week 3 2019Ron Martinez
 
Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA RIILP
 
Temporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information RetrievalTemporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information RetrievalNattiya Kanhabua
 
Seminar report on a statistical approach to machine
Seminar report on a statistical approach to machineSeminar report on a statistical approach to machine
Seminar report on a statistical approach to machineHrishikesh Nair
 
Open learning- Text analysis basics
Open learning- Text analysis basicsOpen learning- Text analysis basics
Open learning- Text analysis basicsUp2Universe
 
Etymology Markup in TEI XML
Etymology Markup in TEI XMLEtymology Markup in TEI XML
Etymology Markup in TEI XMLJack Bowers
 
Type Vector Representations from Text. DL4KGS@ESWC 2018
Type Vector Representations from Text. DL4KGS@ESWC 2018Type Vector Representations from Text. DL4KGS@ESWC 2018
Type Vector Representations from Text. DL4KGS@ESWC 2018Federico Bianchi
 

Similar to Diachronic Analysis of the Italian Language exploiting Google Ngram (20)

Diachronic Analysis of Language exploiting Google Ngram
Diachronic Analysis of Language exploiting Google NgramDiachronic Analysis of Language exploiting Google Ngram
Diachronic Analysis of Language exploiting Google Ngram
 
Detecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexingDetecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexing
 
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and SummarizationeSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
 
Añotador: a Temporal Tagger for Spanish
Añotador: a Temporal Tagger for SpanishAñotador: a Temporal Tagger for Spanish
Añotador: a Temporal Tagger for Spanish
 
G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian language
 
Temporal Semantic Techniques for Text Analysis and Applications
Temporal Semantic Techniques for Text Analysis and ApplicationsTemporal Semantic Techniques for Text Analysis and Applications
Temporal Semantic Techniques for Text Analysis and Applications
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
 
Fantoni Urgo - Cirp Dictionary
Fantoni Urgo - Cirp DictionaryFantoni Urgo - Cirp Dictionary
Fantoni Urgo - Cirp Dictionary
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indian
 
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
 
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
 
Scientific and technical translation in English - week 3 2019
Scientific and technical translation in English - week 3 2019Scientific and technical translation in English - week 3 2019
Scientific and technical translation in English - week 3 2019
 
Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA
 
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine TranslationRoee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
 
Temporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information RetrievalTemporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information Retrieval
 
Seminar report on a statistical approach to machine
Seminar report on a statistical approach to machineSeminar report on a statistical approach to machine
Seminar report on a statistical approach to machine
 
Open learning- Text analysis basics
Open learning- Text analysis basicsOpen learning- Text analysis basics
Open learning- Text analysis basics
 
Etymology Markup in TEI XML
Etymology Markup in TEI XMLEtymology Markup in TEI XML
Etymology Markup in TEI XML
 
Type Vector Representations from Text. DL4KGS@ESWC 2018
Type Vector Representations from Text. DL4KGS@ESWC 2018Type Vector Representations from Text. DL4KGS@ESWC 2018
Type Vector Representations from Text. DL4KGS@ESWC 2018
 

More from Pierpaolo Basile

Diachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisionsDiachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisionsPierpaolo Basile
 
Come l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storiaCome l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storiaPierpaolo Basile
 
EVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language gamesEVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language gamesPierpaolo Basile
 
Buon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian TweetsBuon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian TweetsPierpaolo Basile
 
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingBi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingPierpaolo Basile
 
INSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street FighterINSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street FighterPierpaolo Basile
 
QuestionCube DigithON 2017
QuestionCube DigithON 2017QuestionCube DigithON 2017
QuestionCube DigithON 2017Pierpaolo Basile
 
La macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing MachineLa macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing MachinePierpaolo Basile
 
Building WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spacesBuilding WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spacesPierpaolo Basile
 
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingAnalysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingPierpaolo Basile
 
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...Pierpaolo Basile
 
A Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional SpacesA Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional SpacesPierpaolo Basile
 
Exploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringExploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringPierpaolo Basile
 
Sst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaoloSst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaoloPierpaolo Basile
 
AI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHOAI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHOPierpaolo Basile
 
Word Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information AccessWord Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information AccessPierpaolo Basile
 
Encoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutationEncoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutationPierpaolo Basile
 

More from Pierpaolo Basile (18)

Diachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisionsDiachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisions
 
Come l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storiaCome l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storia
 
EVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language gamesEVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language games
 
Buon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian TweetsBuon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian Tweets
 
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingBi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
 
INSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street FighterINSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
 
QuestionCube DigithON 2017
QuestionCube DigithON 2017QuestionCube DigithON 2017
QuestionCube DigithON 2017
 
(Open) data hacking
(Open) data hacking(Open) data hacking
(Open) data hacking
 
La macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing MachineLa macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing Machine
 
Building WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spacesBuilding WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spaces
 
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingAnalysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing
 
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
 
A Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional SpacesA Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional Spaces
 
Exploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringExploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question Answering
 
Sst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaoloSst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaolo
 
AI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHOAI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHO
 
Word Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information AccessWord Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information Access
 
Encoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutationEncoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutation
 

Recently uploaded

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxabhishekdhamu51
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 

Recently uploaded (20)

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 

Diachronic Analysis of the Italian Language exploiting Google Ngram

  • 1. Diachronic Analysis of the Italian Language exploiting Google Ngram
  • 2. Hello! Pierpaolo Basile Annalina Caputo Roberta Luisi Giovanni Semeraro Department of Computer Science University of Bari Aldo Moro - Italy
  • 3. Background TRI P. Basile, A. Caputo, G. Semeraro. Temporal random indexing: A system for analysing word meaning over time. IJCoL vol. 1: Emerging Topics at the First Italian Conference on Computational Linguistics. Corpus with Temporal Information Dictionary/Random Vectors Temporal Random Indexing Word Space Word Space Word Space Word Space Word Space ▪ Several WordSpaces for several time periods ▪ Word vectors are comparable across WordSpaces
  • 4. Motivation 1 Detect meaning shift Marty, in 2015 people will surf on the web!!!
  • 5. Motivation 1 Detect meaning shift Surf!?!?! On the web!?!?!?
  • 6. Motivation 1 Detect meaning shift Surf!?!?! On the web!?!?!? surf the Net/Internet to use the Internet When was this meaning introduced?
  • 7. Motivation 2 Large corpus ▪ Build a method for computing TRI relying on a very large corpus ▪ Google Ngram for the Italian language ▫n-grams (up to five) extracted from Google Books ▫over five million books spanning the years from 1500 to 2012 ▪ covers several languages including Italian analysis is often described as 1991 104 5 N-gram occurrences books
  • 8. Methodology 1. Run TRI on the Italian Google Ngram ▫build a WordSpace for each time period (10 years) 2. Provide for each word a time series 3. Search significant changes in the time series
  • 9. cossim ( , ) Time Series Several time series Γ at the time interval k log frequency point-wise cumulative cossim ( , ) Word frequency in each time period k Cosine similarity between word vectors across two time periods Considers a cumulative vector of the previous k-1 time periods
  • 10. Change point detection ▪ Mean shift of Γ pivoted at time period j ▪ Search statistical significant mean shifts ▫bootstrapping approach under the null hypothesis that there is no change in the meaning
  • 11. Evaluation Dataset Build a benchmark for meaning shift detection for the Italian language ▪ extract a set of words by pooling data by running several system settings ▪ find correct change points in a dictionary (Sabatino Coletti/Etimologico Zanichelli)
  • 12. Evaluation Results Method Accuracy TRIpoint 0.3086 TRIcum 0.2963 TRR1point 0.2716 log freq 0.2346 TRR2point 0.1728 TRR1cum 0.1605 TRR2cum 0.1235 Accuracy: the year predicted by the system should be equal or greater than the year reported in the gold standard TRR1 and TRR2 are variants of TRI based on Reflective Random Indexing
  • 13. Conclusions and Future Work ▪ TRI method with point wise detection provides good results ▫it overcomes the baseline based on log-frequency ▪ We provide a benchmark for the evaluation of meaning shifts for the Italian language ▪ Future work: extend the dataset and provide an evaluation for the English language