SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Can we track the geographic origin of
surnames based on bibliographic data?
Nicolas Robinson-Garcia, Ed Noyons & Rodrigo Costas
15th INTERNATIONAL CONFERENCE
ON SCIENTOMETRICS & INFORMETRICS
29 June – 3 July, 2015,
Bogazici University, Istanbul, Turkey
EC3metrics spin off CWTS
Leiden University
Agenda
oBackground
oBibliographic data
oMethod 1. Kullback-Leibler divergence
oMethod 2. Concentration Index
oThe ‘golden list’
oNext or previous steps
Background
“the use of surnames in human population biology dates back to
1875, when George Darwin used frequency of occurrences of the
same surname in married couples to study in-breeding”
Kissin, 2011
WHAT IS IN A SURNAME?
o Proxy for genetic/ethnic origin
-> Epidemiology, Biomedical research
o Proxy for country origin
-> Demographic studies, migratory movements
Background
o The representation of Jewish surnames in biomedical journals
and US-patents
Kissin, 2011; Kissin & Bradley, 2013
o Relation between ethnic mix collaboration and citation impact
Freeman & Huang, 2014
… in the field of bibliometrics
Background
HOW CAN WE DETERMINE THE GEOGRAPHIC ORIGIN OF
SURNAMES?
METHODS
o Manually curated lists
o Probability and Bayesian
methods
o Clustering techniques
DATA SOURCES
o National census
o Dispersion of sources
o Lack of international
coverage
Bibliographic data
o Scientific databases as international surnames data
sources
Regional restrictions Temporal restrictions
o Establishing ‘trusted’ linkages between surnames and
countries
Reprint address First author-First address
One country publications Author-address linkages (2008)
Bibliographic data
o Scientific databases as international surnames data
sources
Regional restrictions Temporal restrictions
o Establishing ‘trusted’ linkages between surnames and
countries
Some figures:
-> 1,568,052 distinct surnames assigned
to 119 countries
-> France 8,8%; Germany 8,0%;
Russia 7,1%; Spain 4,9%
Assumptions
HYPOTHESIS 1
A surname should be assigned to the country where there
is a higher frequency of such surname
HYPOTHESIS 2
A surname should be assigned to the country where there
is a greater concentration of such surname.
Method 1. Kullback-Leibler
OPERATIONALIZATION
A surname will be assigned to a country if 1) it has the highest
frequency, and 2) there are “certain levels of assurance”.
METHOD 1
Kullback-Leibler divergence
indicates the (dis)similarity of a
global surname distribution with its
distribution in each country.
Method 2. Gini Index
OPERATIONALIZATION
A surname will be assigned to a country if it is the one with the
highest concentration of such surname.
METHOD 2
Gini Index is an inequality indicator
already employed for other
purposes in bibliometrics. It ponder
within 0 and 1 the concentration of
a surname in a country.
Kulback-Leibler vs. Gini index
Country No. surnames
FRANCE 138349
GERMANY 112445
RUSSIA 111716
SPAIN 83529
USA 76219
ITALY 69637
ENGLAND 63885
JAPAN 56345
CANADA 49775
NETHERLANDS 41306
Country No. surnames
USA 310739
FRANCE 117938
GERMANY 111375
RUSSIA 94369
ITALY 65699
JAPAN 52399
ENGLAND 47521
CANADA 46146
POLAND 44087
INDIA 42897
Method 1. Kullback-Leibler Method 2. Gini index
Top 10 countries with the highest number of surnames assigned
Kulback-Leibler vs. Gini index
Surname Country
CLINTON USA
EGGHE BELGIUM
GARFIELD USA
HERRERA SPAIN
GARCIA SPAIN
EINSTEIN USA
NOYONS NETHERLANDS
PEREIRA BRAZIL
Method 1. Kullback-Leibler Method 2. Gini index
Top 10 countries with the highest number of surnames assigned
Surname Country
CLINTON USA
EGGHE BELGIUM
GARFIELD USA
HERRERA CUBA
GARCIA CUBA
EINSTEIN ISRAEL
NOYONS NETHERLANDS
PEREIRA PORTUGAL
The ‘golden list’
Validating the methods proposed
SEARCHING A ‘GOLDEN LIST’ TO VALIDATE THE RESULTS
o Coverage
o Criteria
› Language
› Ethnicity
› Historical origin
o Reliance and double assignments
The ‘golden list’
Validating the methods proposed
SEARCHING A ‘GOLDEN LIST’ TO VALIDATE THE RESULTS
o Coverage
o Criteria
› Language
› Ethnicity
› Historical origin
o Reliance and double assignments
The ‘golden list’
Validating the methods proposed
Unified country Languages
Denmark Danish
England
Celtic; Anglo-Cornish; English; Scottish;
Irish
Finland Finnish
France Breton; French
Germany German
Greece Greek
Iceland Icelandic
Italy Italian
Japan Japanese
Netherlands Afrikaans; Dutch
Portugal Portuguese
Spain Basque; Catalan; Galician;
In search for a
‘golden list’ of
surnames assigned
to
countries/languages/
ethnicities
http://en.wikipedia.org/wiki/Category:Surnames_by_language
The ‘golden list’
METHOD 1 METHOD 2
Countries % coverage % correct % coverage % correct
DENMARK 91.1% 68.75% 100% 60.16%
ENGLAND 28.8% 80.97% 100% 58.56%
FINLAND 99.11 94.62% 100% 91.96%
FRANCE 88.08% 68.28% 100% 50.54%
GERMANY 52.24% 69.00% 100% 43.78%
GREECE 84.12% 78.32% 100% 78.57%
ICELAND 100.00% 65.52% 100% 100.00%
ITALY 87.65% 86.97% 100% 64.77%
JAPAN 98.74% 98.95% 100% 91.39%
NETHERLANDS 88.11% 60.96% 100% 41.67%
PORTUGAL 98.54% 92.59% 100% 91.91%
SPAIN 93.18% 48.74% 100% 54.74%
Total 73.22% 79.03% 100% 61.29%
Next or previous steps
o Is the Web of Science a good sample of the world
population?
› Country census crossed with the WoS
o Time frames and migratory movements
› Apply methods to different periods
o Validation and comparison with other techniques
› Bayesian, probability, clustering
o Multiple assignments of countries (e.g., Lee, Santos)
Thank you! elrobin@ugr.es
Nicolas Robinson-Garcia, Ed Noyons & Rodrigo Costas
15th INTERNATIONAL CONFERENCE
ON SCIENTOMETRICS & INFORMETRICS
29 June – 3 July, 2015,
Bogazici University, Istanbul, Turkey
EC3metrics spin off CWTS
Leiden University

Weitere ähnliche Inhalte

Ähnlich wie Can we track the geography of surnames based on bibliographic data?

Diagnostic Essay Sample. How to Write a Diagnostic Essay CustomEssayMeister.com
Diagnostic Essay Sample. How to Write a Diagnostic Essay  CustomEssayMeister.comDiagnostic Essay Sample. How to Write a Diagnostic Essay  CustomEssayMeister.com
Diagnostic Essay Sample. How to Write a Diagnostic Essay CustomEssayMeister.comVeronica Johnson
 
Language and Thought Processes
Language and Thought ProcessesLanguage and Thought Processes
Language and Thought Processessavitach
 
IPinCH_aDNAFactSheet_final_revisedOct2015
IPinCH_aDNAFactSheet_final_revisedOct2015IPinCH_aDNAFactSheet_final_revisedOct2015
IPinCH_aDNAFactSheet_final_revisedOct2015Alexa Walker
 
Genomics and its application in forest health
Genomics and its application in forest healthGenomics and its application in forest health
Genomics and its application in forest healthAmanda Roe
 
Evolution of the thermal niche and its impact on biodiversity patterns in sea...
Evolution of the thermal niche and its impact on biodiversity patterns in sea...Evolution of the thermal niche and its impact on biodiversity patterns in sea...
Evolution of the thermal niche and its impact on biodiversity patterns in sea...Heroen Verbruggen
 
Laura Wood Diversity and Distribution of Amphibians in Luxembourg PhD Thesis
Laura Wood Diversity and Distribution of Amphibians in Luxembourg PhD ThesisLaura Wood Diversity and Distribution of Amphibians in Luxembourg PhD Thesis
Laura Wood Diversity and Distribution of Amphibians in Luxembourg PhD ThesisLaura Wood
 
varieties-and-registers-1.pptx
varieties-and-registers-1.pptxvarieties-and-registers-1.pptx
varieties-and-registers-1.pptxJulianneBeaNotarte
 
non mendelian inheretance.pptx
non mendelian inheretance.pptxnon mendelian inheretance.pptx
non mendelian inheretance.pptxpauloalegria3
 

Ähnlich wie Can we track the geography of surnames based on bibliographic data? (10)

Diagnostic Essay Sample. How to Write a Diagnostic Essay CustomEssayMeister.com
Diagnostic Essay Sample. How to Write a Diagnostic Essay  CustomEssayMeister.comDiagnostic Essay Sample. How to Write a Diagnostic Essay  CustomEssayMeister.com
Diagnostic Essay Sample. How to Write a Diagnostic Essay CustomEssayMeister.com
 
Language and Thought Processes
Language and Thought ProcessesLanguage and Thought Processes
Language and Thought Processes
 
IPinCH_aDNAFactSheet_final_revisedOct2015
IPinCH_aDNAFactSheet_final_revisedOct2015IPinCH_aDNAFactSheet_final_revisedOct2015
IPinCH_aDNAFactSheet_final_revisedOct2015
 
Genomics and its application in forest health
Genomics and its application in forest healthGenomics and its application in forest health
Genomics and its application in forest health
 
Development of english
Development of englishDevelopment of english
Development of english
 
Evolution of the thermal niche and its impact on biodiversity patterns in sea...
Evolution of the thermal niche and its impact on biodiversity patterns in sea...Evolution of the thermal niche and its impact on biodiversity patterns in sea...
Evolution of the thermal niche and its impact on biodiversity patterns in sea...
 
Laura Wood Diversity and Distribution of Amphibians in Luxembourg PhD Thesis
Laura Wood Diversity and Distribution of Amphibians in Luxembourg PhD ThesisLaura Wood Diversity and Distribution of Amphibians in Luxembourg PhD Thesis
Laura Wood Diversity and Distribution of Amphibians in Luxembourg PhD Thesis
 
varieties-and-registers-1.pptx
varieties-and-registers-1.pptxvarieties-and-registers-1.pptx
varieties-and-registers-1.pptx
 
non mendelian inheretance.pptx
non mendelian inheretance.pptxnon mendelian inheretance.pptx
non mendelian inheretance.pptx
 
FORDA PPT.pptx
FORDA PPT.pptxFORDA PPT.pptx
FORDA PPT.pptx
 

Mehr von Nicolas Robinson-Garcia

Task specialization across research careers
Task specialization across research careersTask specialization across research careers
Task specialization across research careersNicolas Robinson-Garcia
 
Nuevas fuentes bibliométricas abiertas: Altmetrics y Acceso Abierto
Nuevas fuentes bibliométricas abiertas: Altmetrics y Acceso AbiertoNuevas fuentes bibliométricas abiertas: Altmetrics y Acceso Abierto
Nuevas fuentes bibliométricas abiertas: Altmetrics y Acceso AbiertoNicolas Robinson-Garcia
 
Indicadores avanzados: Acceso Abierto y movilidad
Indicadores avanzados: Acceso Abierto y movilidadIndicadores avanzados: Acceso Abierto y movilidad
Indicadores avanzados: Acceso Abierto y movilidadNicolas Robinson-Garcia
 
Unveiling the Ecosystem of Science: How can we characterize and assess divers...
Unveiling the Ecosystem of Science: How can we characterize and assess divers...Unveiling the Ecosystem of Science: How can we characterize and assess divers...
Unveiling the Ecosystem of Science: How can we characterize and assess divers...Nicolas Robinson-Garcia
 
The effects of specialization on research careers
The effects of specialization on research careersThe effects of specialization on research careers
The effects of specialization on research careersNicolas Robinson-Garcia
 
¿Cómo preparar y afrontar con éxito una estancia de investigación internacional?
¿Cómo preparar y afrontar con éxito una estancia de investigación internacional?¿Cómo preparar y afrontar con éxito una estancia de investigación internacional?
¿Cómo preparar y afrontar con éxito una estancia de investigación internacional?Nicolas Robinson-Garcia
 
Aligning scientific impact and societal relevance: The roles of academic enga...
Aligning scientific impact and societal relevance: The roles of academic enga...Aligning scientific impact and societal relevance: The roles of academic enga...
Aligning scientific impact and societal relevance: The roles of academic enga...Nicolas Robinson-Garcia
 
Towards a multidimensional valuation model of scientists
Towards a multidimensional valuation model of scientistsTowards a multidimensional valuation model of scientists
Towards a multidimensional valuation model of scientistsNicolas Robinson-Garcia
 
Introduction to bibliometric data sources - Google Scholar
Introduction to bibliometric data sources - Google ScholarIntroduction to bibliometric data sources - Google Scholar
Introduction to bibliometric data sources - Google ScholarNicolas Robinson-Garcia
 
Aplicaciones prácticas de las Altmétricas
Aplicaciones prácticas de las AltmétricasAplicaciones prácticas de las Altmétricas
Aplicaciones prácticas de las AltmétricasNicolas Robinson-Garcia
 
Curso básico de lenguaje R aplicado a las Ciencias Sociales
Curso básico de lenguaje R aplicado a las Ciencias SocialesCurso básico de lenguaje R aplicado a las Ciencias Sociales
Curso básico de lenguaje R aplicado a las Ciencias SocialesNicolas Robinson-Garcia
 
Altmétricas aplicadas a nivel institucional
Altmétricas aplicadas a nivel institucionalAltmétricas aplicadas a nivel institucional
Altmétricas aplicadas a nivel institucionalNicolas Robinson-Garcia
 
From theory to practice: Operationalization of the GTEC framework
From theory to practice: Operationalization of the GTEC frameworkFrom theory to practice: Operationalization of the GTEC framework
From theory to practice: Operationalization of the GTEC frameworkNicolas Robinson-Garcia
 
Making an impact: Scientific profiles and bibliometric indicators
Making an impact: Scientific profiles and bibliometric indicatorsMaking an impact: Scientific profiles and bibliometric indicators
Making an impact: Scientific profiles and bibliometric indicatorsNicolas Robinson-Garcia
 
The SSH conundrum: A matter of audiences
The SSH conundrum: A matter of audiencesThe SSH conundrum: A matter of audiences
The SSH conundrum: A matter of audiencesNicolas Robinson-Garcia
 
Indicadores de movilidad científica basados en datos bibliométricos
Indicadores de movilidad científica basados en datos bibliométricosIndicadores de movilidad científica basados en datos bibliométricos
Indicadores de movilidad científica basados en datos bibliométricosNicolas Robinson-Garcia
 

Mehr von Nicolas Robinson-Garcia (20)

Task specialization across research careers
Task specialization across research careersTask specialization across research careers
Task specialization across research careers
 
Nuevas fuentes bibliométricas abiertas: Altmetrics y Acceso Abierto
Nuevas fuentes bibliométricas abiertas: Altmetrics y Acceso AbiertoNuevas fuentes bibliométricas abiertas: Altmetrics y Acceso Abierto
Nuevas fuentes bibliométricas abiertas: Altmetrics y Acceso Abierto
 
Indicadores avanzados: Acceso Abierto y movilidad
Indicadores avanzados: Acceso Abierto y movilidadIndicadores avanzados: Acceso Abierto y movilidad
Indicadores avanzados: Acceso Abierto y movilidad
 
Unveiling the Ecosystem of Science: How can we characterize and assess divers...
Unveiling the Ecosystem of Science: How can we characterize and assess divers...Unveiling the Ecosystem of Science: How can we characterize and assess divers...
Unveiling the Ecosystem of Science: How can we characterize and assess divers...
 
The effects of specialization on research careers
The effects of specialization on research careersThe effects of specialization on research careers
The effects of specialization on research careers
 
¿Cómo preparar y afrontar con éxito una estancia de investigación internacional?
¿Cómo preparar y afrontar con éxito una estancia de investigación internacional?¿Cómo preparar y afrontar con éxito una estancia de investigación internacional?
¿Cómo preparar y afrontar con éxito una estancia de investigación internacional?
 
Aligning scientific impact and societal relevance: The roles of academic enga...
Aligning scientific impact and societal relevance: The roles of academic enga...Aligning scientific impact and societal relevance: The roles of academic enga...
Aligning scientific impact and societal relevance: The roles of academic enga...
 
Towards a multidimensional valuation model of scientists
Towards a multidimensional valuation model of scientistsTowards a multidimensional valuation model of scientists
Towards a multidimensional valuation model of scientists
 
Breaking the Wall of Science Policy
Breaking the Wall of Science PolicyBreaking the Wall of Science Policy
Breaking the Wall of Science Policy
 
Practical Applications of Altmetrics
Practical Applications of AltmetricsPractical Applications of Altmetrics
Practical Applications of Altmetrics
 
Introduction to bibliometric data sources - Google Scholar
Introduction to bibliometric data sources - Google ScholarIntroduction to bibliometric data sources - Google Scholar
Introduction to bibliometric data sources - Google Scholar
 
Aplicaciones prácticas de las Altmétricas
Aplicaciones prácticas de las AltmétricasAplicaciones prácticas de las Altmétricas
Aplicaciones prácticas de las Altmétricas
 
Curso básico de lenguaje R aplicado a las Ciencias Sociales
Curso básico de lenguaje R aplicado a las Ciencias SocialesCurso básico de lenguaje R aplicado a las Ciencias Sociales
Curso básico de lenguaje R aplicado a las Ciencias Sociales
 
Altmétricas aplicadas a nivel institucional
Altmétricas aplicadas a nivel institucionalAltmétricas aplicadas a nivel institucional
Altmétricas aplicadas a nivel institucional
 
From theory to practice: Operationalization of the GTEC framework
From theory to practice: Operationalization of the GTEC frameworkFrom theory to practice: Operationalization of the GTEC framework
From theory to practice: Operationalization of the GTEC framework
 
Practical applications of altmetrics
Practical applications of altmetricsPractical applications of altmetrics
Practical applications of altmetrics
 
Disentangling gold open access
Disentangling gold open accessDisentangling gold open access
Disentangling gold open access
 
Making an impact: Scientific profiles and bibliometric indicators
Making an impact: Scientific profiles and bibliometric indicatorsMaking an impact: Scientific profiles and bibliometric indicators
Making an impact: Scientific profiles and bibliometric indicators
 
The SSH conundrum: A matter of audiences
The SSH conundrum: A matter of audiencesThe SSH conundrum: A matter of audiences
The SSH conundrum: A matter of audiences
 
Indicadores de movilidad científica basados en datos bibliométricos
Indicadores de movilidad científica basados en datos bibliométricosIndicadores de movilidad científica basados en datos bibliométricos
Indicadores de movilidad científica basados en datos bibliométricos
 

Kürzlich hochgeladen

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 

Kürzlich hochgeladen (20)

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 

Can we track the geography of surnames based on bibliographic data?

  • 1. Can we track the geographic origin of surnames based on bibliographic data? Nicolas Robinson-Garcia, Ed Noyons & Rodrigo Costas 15th INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS 29 June – 3 July, 2015, Bogazici University, Istanbul, Turkey EC3metrics spin off CWTS Leiden University
  • 2. Agenda oBackground oBibliographic data oMethod 1. Kullback-Leibler divergence oMethod 2. Concentration Index oThe ‘golden list’ oNext or previous steps
  • 3. Background “the use of surnames in human population biology dates back to 1875, when George Darwin used frequency of occurrences of the same surname in married couples to study in-breeding” Kissin, 2011 WHAT IS IN A SURNAME? o Proxy for genetic/ethnic origin -> Epidemiology, Biomedical research o Proxy for country origin -> Demographic studies, migratory movements
  • 4. Background o The representation of Jewish surnames in biomedical journals and US-patents Kissin, 2011; Kissin & Bradley, 2013 o Relation between ethnic mix collaboration and citation impact Freeman & Huang, 2014 … in the field of bibliometrics
  • 5. Background HOW CAN WE DETERMINE THE GEOGRAPHIC ORIGIN OF SURNAMES? METHODS o Manually curated lists o Probability and Bayesian methods o Clustering techniques DATA SOURCES o National census o Dispersion of sources o Lack of international coverage
  • 6. Bibliographic data o Scientific databases as international surnames data sources Regional restrictions Temporal restrictions o Establishing ‘trusted’ linkages between surnames and countries Reprint address First author-First address One country publications Author-address linkages (2008)
  • 7. Bibliographic data o Scientific databases as international surnames data sources Regional restrictions Temporal restrictions o Establishing ‘trusted’ linkages between surnames and countries Some figures: -> 1,568,052 distinct surnames assigned to 119 countries -> France 8,8%; Germany 8,0%; Russia 7,1%; Spain 4,9%
  • 8. Assumptions HYPOTHESIS 1 A surname should be assigned to the country where there is a higher frequency of such surname HYPOTHESIS 2 A surname should be assigned to the country where there is a greater concentration of such surname.
  • 9. Method 1. Kullback-Leibler OPERATIONALIZATION A surname will be assigned to a country if 1) it has the highest frequency, and 2) there are “certain levels of assurance”. METHOD 1 Kullback-Leibler divergence indicates the (dis)similarity of a global surname distribution with its distribution in each country.
  • 10. Method 2. Gini Index OPERATIONALIZATION A surname will be assigned to a country if it is the one with the highest concentration of such surname. METHOD 2 Gini Index is an inequality indicator already employed for other purposes in bibliometrics. It ponder within 0 and 1 the concentration of a surname in a country.
  • 11. Kulback-Leibler vs. Gini index Country No. surnames FRANCE 138349 GERMANY 112445 RUSSIA 111716 SPAIN 83529 USA 76219 ITALY 69637 ENGLAND 63885 JAPAN 56345 CANADA 49775 NETHERLANDS 41306 Country No. surnames USA 310739 FRANCE 117938 GERMANY 111375 RUSSIA 94369 ITALY 65699 JAPAN 52399 ENGLAND 47521 CANADA 46146 POLAND 44087 INDIA 42897 Method 1. Kullback-Leibler Method 2. Gini index Top 10 countries with the highest number of surnames assigned
  • 12. Kulback-Leibler vs. Gini index Surname Country CLINTON USA EGGHE BELGIUM GARFIELD USA HERRERA SPAIN GARCIA SPAIN EINSTEIN USA NOYONS NETHERLANDS PEREIRA BRAZIL Method 1. Kullback-Leibler Method 2. Gini index Top 10 countries with the highest number of surnames assigned Surname Country CLINTON USA EGGHE BELGIUM GARFIELD USA HERRERA CUBA GARCIA CUBA EINSTEIN ISRAEL NOYONS NETHERLANDS PEREIRA PORTUGAL
  • 13. The ‘golden list’ Validating the methods proposed SEARCHING A ‘GOLDEN LIST’ TO VALIDATE THE RESULTS o Coverage o Criteria › Language › Ethnicity › Historical origin o Reliance and double assignments
  • 14. The ‘golden list’ Validating the methods proposed SEARCHING A ‘GOLDEN LIST’ TO VALIDATE THE RESULTS o Coverage o Criteria › Language › Ethnicity › Historical origin o Reliance and double assignments
  • 15. The ‘golden list’ Validating the methods proposed Unified country Languages Denmark Danish England Celtic; Anglo-Cornish; English; Scottish; Irish Finland Finnish France Breton; French Germany German Greece Greek Iceland Icelandic Italy Italian Japan Japanese Netherlands Afrikaans; Dutch Portugal Portuguese Spain Basque; Catalan; Galician; In search for a ‘golden list’ of surnames assigned to countries/languages/ ethnicities http://en.wikipedia.org/wiki/Category:Surnames_by_language
  • 16. The ‘golden list’ METHOD 1 METHOD 2 Countries % coverage % correct % coverage % correct DENMARK 91.1% 68.75% 100% 60.16% ENGLAND 28.8% 80.97% 100% 58.56% FINLAND 99.11 94.62% 100% 91.96% FRANCE 88.08% 68.28% 100% 50.54% GERMANY 52.24% 69.00% 100% 43.78% GREECE 84.12% 78.32% 100% 78.57% ICELAND 100.00% 65.52% 100% 100.00% ITALY 87.65% 86.97% 100% 64.77% JAPAN 98.74% 98.95% 100% 91.39% NETHERLANDS 88.11% 60.96% 100% 41.67% PORTUGAL 98.54% 92.59% 100% 91.91% SPAIN 93.18% 48.74% 100% 54.74% Total 73.22% 79.03% 100% 61.29%
  • 17. Next or previous steps o Is the Web of Science a good sample of the world population? › Country census crossed with the WoS o Time frames and migratory movements › Apply methods to different periods o Validation and comparison with other techniques › Bayesian, probability, clustering o Multiple assignments of countries (e.g., Lee, Santos)
  • 18. Thank you! elrobin@ugr.es Nicolas Robinson-Garcia, Ed Noyons & Rodrigo Costas 15th INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS 29 June – 3 July, 2015, Bogazici University, Istanbul, Turkey EC3metrics spin off CWTS Leiden University