SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
What we talk about when
we talk about concepts
Applying distributional semantics
on Dutch historical newspapers to
trace conceptual change
Pim Huijnen - Utrecht University
AIUCD Rome, 26 January 2017
Tracing Concepts over time in Dutch
Newspaper Discourse (1950-1990) using
Word Embeddings
Tom Kenter (University of Amsterdam)
Melvin Wevers (Utrecht University)
Carlos Martinez-Ortiz (NL eScience Center)
Joris van Eijnatten (Utrecht University)
Jaap Verheul (Utrecht University)
Task
Trace concepts (ideas, topics) without
sticking to particular words
Approach
Multi-dimensional word-
vector space using
Google’s word2vec (word
embeddings)
Concept represented as a
network of closely related
words based on distance
Weighting based on
frequency + sum distance
expand to
semantic graph
with
semantic space
for time t+1
vocabulary at time t
prune
t = t + 1
1950 1970 1990
Data: >600.000 digitized newspaper issues from the
Dutch National Library 1950-1990
W2v models of 10 year slices with a sliding window (9
year overlap)
One or more words as entry-points into concept,
concept-as-network used to search subsequent slice
Evaluation based on human annotation / domain
knowledge
"Efficiency"
Observation 1: Seed word not necessarily most representative
“Marxist”, minimum concept similarity 0.6, 2 year interval, forward track direction
Is this "tracing concepts?"
Observation 2: No optimal settings to avoid “concept drift"
>>> tc.trackClouds3(dModels, ['gastarbeider', 'gastarbeiders', 'immigranten'], fMinDist=.65,
bSumOfDistances=True, bBackwards=True)
1981_1990: immigranten (1.34), gastarbeiders (1.34), gastarbeider (1.00), vluchtelingen (0.33), emigranten (0.29)
1980_1989: immigranten (1.89), vluchtelingen (1.32), gastarbeiders (1.30), emigranten (1.27), gastarbeider (1.00), afghanen (0.35),
vietnamezen (0.34), tamils (0.33), asielzoekers (0.27)
1979_1988: vluchtelingen (1.93), vietnamezen (1.64), immigranten (1.63), gastarbeiders (1.32), asielzoekers (1.32), emigranten
(1.30), afghanen (1.30), tamils (1.27), gastarbeider (1.00), cambodjanen (0.89)
1978_1987: vluchtelingen (2.30), cambodjanen (1.88), vietnamezen (1.86), asielzoekers (1.65), tamils (1.61), immigranten (1.59),
afghanen (1.58), gastarbeiders (1.33), emigranten (1.26), gastarbeider (1.00)
1977_1986: asielzoekers (1.68), afghanen (1.65), cambodjanen (1.61), vietnamezen (1.59), tamils (1.35), vluchtelingen (1.33),
gastarbeiders (1.33), immigranten (1.33), emigranten (1.00), gastarbeider (1.00)
[…]
1957_1966: vietkong (2.39), regeringstroepen (2.38), vietcong (2.30), guerrillastrijders (2.18), rebellen (2.13), viëtcong (1.52),
zuidvietnamezen (1.32), vietnamezen (1.32), opstandelingen (1.22), guerillastrijders (1.12)
1956_1965: opstandelingen (2.85), rebellen (2.85), vietcong (2.62), regeringstroepen (2.59), guerrillastrijders (2.19), vietkong (2.18),
guerillastrijders (2.09), viëtcong (1.49), vietminh (1.31), vrijheidsstrijders (1.27)
1955_1964: guerillastrijders (2.83), guerrillastrijders (2.56), vietkong (2.33), opstandelingen (2.31), rebellen (2.28), regeringstroepen
(2.07), vietcong (1.35), vrijheidsstrijders (1.34), vietminh (1.32), viëtcong (1.00)
1954_1963: guerillastrijders (1.90), regeringstroepen (1.79), vietcong (1.67), rebellen (1.67), guerrillastrijders (1.60), vietkong (1.35),
opstandelingen (1.31), vrijheidsstrijders (1.00), vietminh (1.00), viëtcong (1.00)
Is this "tracing concepts?"
Observation 3: Are we looking at changes in “Dutch
language” or in what newspapers happen to write about?
Is this "tracing concepts?"
“Roken” (“To smoke”)
20 most similar words 1974-1983
Very interesting but also highly exploratory:
no singular theory of concepts /
conceptual change for every kind of data
So no absolute guarantee of avoiding
concept drift based on word embeddings
alone
Conclusion
Know your data
Build flexibility (and transparency) into
technical setup
Iterate between close and distant
Follow-up: testing of different kinds of
data, conceptual theories on the basis of
historical use cases
Conclusion
Do it yourself
Find our code / how-to-manual /data
models on:
https://github.com/NLeSC/ShiCo
Thank you!
www.pimhuijnen.com
p.huijnen@uu.nl

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 

Kürzlich hochgeladen (20)

Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 

Empfohlen

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Empfohlen (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

What we talk about when we talk about concepts

  • 1. What we talk about when we talk about concepts Applying distributional semantics on Dutch historical newspapers to trace conceptual change Pim Huijnen - Utrecht University AIUCD Rome, 26 January 2017
  • 2. Tracing Concepts over time in Dutch Newspaper Discourse (1950-1990) using Word Embeddings Tom Kenter (University of Amsterdam) Melvin Wevers (Utrecht University) Carlos Martinez-Ortiz (NL eScience Center) Joris van Eijnatten (Utrecht University) Jaap Verheul (Utrecht University)
  • 3. Task Trace concepts (ideas, topics) without sticking to particular words
  • 4. Approach Multi-dimensional word- vector space using Google’s word2vec (word embeddings) Concept represented as a network of closely related words based on distance Weighting based on frequency + sum distance expand to semantic graph with semantic space for time t+1 vocabulary at time t prune t = t + 1
  • 5. 1950 1970 1990 Data: >600.000 digitized newspaper issues from the Dutch National Library 1950-1990 W2v models of 10 year slices with a sliding window (9 year overlap) One or more words as entry-points into concept, concept-as-network used to search subsequent slice Evaluation based on human annotation / domain knowledge
  • 7. Observation 1: Seed word not necessarily most representative “Marxist”, minimum concept similarity 0.6, 2 year interval, forward track direction Is this "tracing concepts?"
  • 8. Observation 2: No optimal settings to avoid “concept drift" >>> tc.trackClouds3(dModels, ['gastarbeider', 'gastarbeiders', 'immigranten'], fMinDist=.65, bSumOfDistances=True, bBackwards=True) 1981_1990: immigranten (1.34), gastarbeiders (1.34), gastarbeider (1.00), vluchtelingen (0.33), emigranten (0.29) 1980_1989: immigranten (1.89), vluchtelingen (1.32), gastarbeiders (1.30), emigranten (1.27), gastarbeider (1.00), afghanen (0.35), vietnamezen (0.34), tamils (0.33), asielzoekers (0.27) 1979_1988: vluchtelingen (1.93), vietnamezen (1.64), immigranten (1.63), gastarbeiders (1.32), asielzoekers (1.32), emigranten (1.30), afghanen (1.30), tamils (1.27), gastarbeider (1.00), cambodjanen (0.89) 1978_1987: vluchtelingen (2.30), cambodjanen (1.88), vietnamezen (1.86), asielzoekers (1.65), tamils (1.61), immigranten (1.59), afghanen (1.58), gastarbeiders (1.33), emigranten (1.26), gastarbeider (1.00) 1977_1986: asielzoekers (1.68), afghanen (1.65), cambodjanen (1.61), vietnamezen (1.59), tamils (1.35), vluchtelingen (1.33), gastarbeiders (1.33), immigranten (1.33), emigranten (1.00), gastarbeider (1.00) […] 1957_1966: vietkong (2.39), regeringstroepen (2.38), vietcong (2.30), guerrillastrijders (2.18), rebellen (2.13), viëtcong (1.52), zuidvietnamezen (1.32), vietnamezen (1.32), opstandelingen (1.22), guerillastrijders (1.12) 1956_1965: opstandelingen (2.85), rebellen (2.85), vietcong (2.62), regeringstroepen (2.59), guerrillastrijders (2.19), vietkong (2.18), guerillastrijders (2.09), viëtcong (1.49), vietminh (1.31), vrijheidsstrijders (1.27) 1955_1964: guerillastrijders (2.83), guerrillastrijders (2.56), vietkong (2.33), opstandelingen (2.31), rebellen (2.28), regeringstroepen (2.07), vietcong (1.35), vrijheidsstrijders (1.34), vietminh (1.32), viëtcong (1.00) 1954_1963: guerillastrijders (1.90), regeringstroepen (1.79), vietcong (1.67), rebellen (1.67), guerrillastrijders (1.60), vietkong (1.35), opstandelingen (1.31), vrijheidsstrijders (1.00), vietminh (1.00), viëtcong (1.00) Is this "tracing concepts?"
  • 9. Observation 3: Are we looking at changes in “Dutch language” or in what newspapers happen to write about? Is this "tracing concepts?" “Roken” (“To smoke”) 20 most similar words 1974-1983
  • 10. Very interesting but also highly exploratory: no singular theory of concepts / conceptual change for every kind of data So no absolute guarantee of avoiding concept drift based on word embeddings alone Conclusion
  • 11. Know your data Build flexibility (and transparency) into technical setup Iterate between close and distant Follow-up: testing of different kinds of data, conceptual theories on the basis of historical use cases Conclusion
  • 12. Do it yourself Find our code / how-to-manual /data models on: https://github.com/NLeSC/ShiCo