SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Mining personal names in the
‘Big Data’ to map Diasporas
Who are they, where are they and what are they doing?
Connecting, Communicating and Networking with Diasporas
4-6 May 2016 - Dublin Castle - Ireland
Elian CARSENAT, NamSor
Funded by the
European Union
2
#RMM4Dublin
NamSor sorts Names
3
 Personal names are meaningful : we use sociolinguistics to
extract their semantics and deliver actionable intelligence.
 Names reflect cultural Identity
 NamSor data mining software
recognizes the linguistic or cultural
origin of names in any alphabet /
language, with fine grain and high
accuracy.
Mining 3M twitter names to map Diasporas
Who are they, where are they and what are they doing?
4
Source: Twitter
Source: Twitter
Visualization : CartoDB
Data Mining: NamSor
Flow view –
who travels
where?
5
Source Target Type Id Onoma Weight
United Kingdom France Directed 16 Great Britain 37
Spain France Directed 55 Spain 14
United States France Directed 75 Great Britain 12
Turkey France Directed 79 Turkey 11
Brazil France Directed 87 Portugal 10
United Kingdom France Directed 112 Ireland 9
Italy France Directed 152 Italy 7
Switzerland France Directed 226 France 5
Belgium France Directed 247 France 5
United Kingdom France Directed 258 France 5
Mexico France Directed 287 Spain 4
Ireland France Directed 317 Great Britain 4
United Kingdom France Directed 333 Italy 4
United States France Directed 375 France 4
Source: Twitter
Visualization : Gephi
Data Mining: NamSor
Flow view –
who travels
where?
6
Source: Twitter
Visualization : Gephi
Data Mining: NamSor
“Incredible India” – 1.2 Billion People
Indian onomastics by State/Union Territory
7
Names in LATIN, BENGALI, DEVANAGARI, GUJARATI, GURMUKHI,
KANNADA, MALAYALAM, ORIYA, TAMIL, TELUGU, ARABIC
Applications to a global Airline’s customer intelligence
8
Example: Indian Diaspora / Non Resident Indians (NRI)
based in the United States
‘It applies indeed to 93% of our customers: when
NamSor recognizes an Indian name, the client has
travelled to India in the past.’
At state level : ~50%
Finer grain segmentation using names brings insights into
diasporas’ travel patterns visiting family and friends in their
home country, as well as their specific needs.
Mapping Talents in Cancer Research
(in collaboration with French INSERM)
9
Thomson Reuters WebOfScience
(6 countries, 250k scientists, 50k papers)
“Analysts uncovered amazing patterns in the way scientists’ names
correlate with whom they publish, and who they cite in their papers
- not just in case of a particular country, but globally. Tania
Vichnevskaia of the French National Institute for Health (INSERM)
presented the paper ‘Applying onomastics to scientometrics‘ at IREG
International symposium 2015 organised by University of Maribor
and Shanghai Jiao Tong University. The paper was prepared jointly
with NamSor, a private start-up company specialized in mapping
international Diasporas.”
Source: WoS; Data Mining: INSERM with NamSor
10
Source: WoS; Data Mining: INSERM with NamSor
Mapping Talents in Cancer Research
(in collaboration with French INSERM)
Cancer Research in Poland and Slovenia
Examining the ‘brain drain’
11
In the Polish Corpus, we look at co-
authors with Polish names, affiliated
abroad. Top countries:
1. USA
2. Great-Britain
3. Germany
In the Slovenian Corpus, we look at co-
authors with Slovenian names,
affiliated abroad. Top countries:
1. Great-Britain
2. USA
3. Germany
Source: WoS; Data Mining: INSERM with NamSor
Tunisie
Marocains Résidant à l'Étranger (MRE)
Répartition parmi les principales Universités au Canada
13
Canadian Science Policy Conference - CSPC2015
Boston geo-demographics 1/2
14
Boston geo-demographics 2/2
15
Analysing patent data
16
Founder Bio
17
Elian CARSENAT, a computer scientist trained at ENSIIE/INRIA, started his career
at JP Morgan in Paris in 1997. He later worked as consultant and managed
business & IT projects in London, Paris, Moscow and Shanghai.
In 2012, Elian created NamSor, a piece of sociolinguistics software to mine the
'Big Data' and better understand international flows of money, ideas and
people. NamSor helps answer the perennial question all countries ask about
their diasporas – who are they, where are they and what are they doing.
NamSor has been used to attract Foreign Direct Investments (FDI), to build-up
international collaboration within scientific communities, to attract and
facilitate Diaspora investment in Start-ups...
as well as other use cases.
http://fr.linkedin.com/in/eliancarsenat/en
Thank you!
Elian CARSENAT
elian.carsenat@namsor.com
Phone : +33 6 52 77 99 07
www.namsor.com
18
Juillet 2013, Ambassade de Lituanie à Paris

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (17)

Gold mining process
Gold mining processGold mining process
Gold mining process
 
Topic 2: Mining
Topic 2: MiningTopic 2: Mining
Topic 2: Mining
 
Philippine Mining Act
Philippine Mining ActPhilippine Mining Act
Philippine Mining Act
 
Mining law
Mining lawMining law
Mining law
 
Environmental policy
Environmental policyEnvironmental policy
Environmental policy
 
PHILIPPINE BIODIVERSITY: Ecological Roles, Uses, and Conservation Status
PHILIPPINE BIODIVERSITY: Ecological Roles, Uses, and Conservation StatusPHILIPPINE BIODIVERSITY: Ecological Roles, Uses, and Conservation Status
PHILIPPINE BIODIVERSITY: Ecological Roles, Uses, and Conservation Status
 
The Republic of Ireland Pw pt Presentation
The Republic of Ireland Pw pt PresentationThe Republic of Ireland Pw pt Presentation
The Republic of Ireland Pw pt Presentation
 
Environmental Treaties, Laws and Policies
Environmental Treaties, Laws and PoliciesEnvironmental Treaties, Laws and Policies
Environmental Treaties, Laws and Policies
 
Brazil
BrazilBrazil
Brazil
 
ENVIRONMENTAL ETHICS
ENVIRONMENTAL ETHICSENVIRONMENTAL ETHICS
ENVIRONMENTAL ETHICS
 
Germany powerpoint
Germany powerpointGermany powerpoint
Germany powerpoint
 
Environmental Laws
Environmental LawsEnvironmental Laws
Environmental Laws
 
Environmental ethics
Environmental ethicsEnvironmental ethics
Environmental ethics
 
State of the Philippine Environment
State of the Philippine EnvironmentState of the Philippine Environment
State of the Philippine Environment
 
Powerpoint Germany
Powerpoint GermanyPowerpoint Germany
Powerpoint Germany
 
German Culture
German CultureGerman Culture
German Culture
 
Ethics
EthicsEthics
Ethics
 

Ähnlich wie Mining names in the big data to map diasporas - NamSor

Diasporas Digital Développement
Diasporas Digital DéveloppementDiasporas Digital Développement
Diasporas Digital DéveloppementElian CARSENAT
 
Bigdataforesight
BigdataforesightBigdataforesight
Bigdataforesightsuresh sood
 
HomeComing for Develoment in Africa
HomeComing for Develoment in AfricaHomeComing for Develoment in Africa
HomeComing for Develoment in AfricaElian CARSENAT
 
Text mining names in ‘Big Data’ to recognize migration trends
Text mining names in ‘Big Data’ to recognize migration trendsText mining names in ‘Big Data’ to recognize migration trends
Text mining names in ‘Big Data’ to recognize migration trendsElian CARSENAT
 
Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...
Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...
Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...Lewis Shepherd
 
Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset
Using Digital Traces for User Profiling: the Uncertainty of Identity ToolsetUsing Digital Traces for User Profiling: the Uncertainty of Identity Toolset
Using Digital Traces for User Profiling: the Uncertainty of Identity ToolsetDr Muhammad Adnan
 
Communications practices for livestock genetics for Africa
Communications practices for livestock genetics for AfricaCommunications practices for livestock genetics for Africa
Communications practices for livestock genetics for AfricaILRI
 
Future Social Media Research
Future Social Media ResearchFuture Social Media Research
Future Social Media ResearchPulsar
 
1000 Images About Cause And Effect Essay On Pinter
1000 Images About Cause And Effect Essay On Pinter1000 Images About Cause And Effect Essay On Pinter
1000 Images About Cause And Effect Essay On PinterJennifer Nulton
 
How DNA and Criminal Profiling Solve Crimes
How DNA and Criminal Profiling Solve CrimesHow DNA and Criminal Profiling Solve Crimes
How DNA and Criminal Profiling Solve CrimesSam Brandt
 
Coś zupełnie offline: badania etnograficzne są kluczem do skutecznego zaangaż...
Coś zupełnie offline: badania etnograficzne są kluczem do skutecznego zaangaż...Coś zupełnie offline: badania etnograficzne są kluczem do skutecznego zaangaż...
Coś zupełnie offline: badania etnograficzne są kluczem do skutecznego zaangaż...Future Processing
 
Culture Mapping Food & Beverage Trends 2018
Culture Mapping Food & Beverage Trends 2018Culture Mapping Food & Beverage Trends 2018
Culture Mapping Food & Beverage Trends 2018Tim Stock
 
IAOS 2018 - Official statistics and Indigenous People – the state of play and...
IAOS 2018 - Official statistics and Indigenous People – the state of play and...IAOS 2018 - Official statistics and Indigenous People – the state of play and...
IAOS 2018 - Official statistics and Indigenous People – the state of play and...StatsCommunications
 
ECSA, the ECSA principles, and the ECSA Characteristics of Citizen Science
ECSA, the ECSA principles, and the ECSA Characteristics of Citizen ScienceECSA, the ECSA principles, and the ECSA Characteristics of Citizen Science
ECSA, the ECSA principles, and the ECSA Characteristics of Citizen ScienceMargaret Gold
 
Artificial Intelligence for Goods: Cases and Tools
Artificial Intelligence for Goods: Cases and ToolsArtificial Intelligence for Goods: Cases and Tools
Artificial Intelligence for Goods: Cases and ToolsOleksandr Krakovetskyi
 
Enabling information interoperability with identifiers (L. Haak)
 Enabling information interoperability with identifiers (L. Haak) Enabling information interoperability with identifiers (L. Haak)
Enabling information interoperability with identifiers (L. Haak)ORCID, Inc
 

Ähnlich wie Mining names in the big data to map diasporas - NamSor (20)

NamSor for GEOINT
NamSor for GEOINTNamSor for GEOINT
NamSor for GEOINT
 
Diasporas Digital Développement
Diasporas Digital DéveloppementDiasporas Digital Développement
Diasporas Digital Développement
 
Bigdataforesight
BigdataforesightBigdataforesight
Bigdataforesight
 
HomeComing for Develoment in Africa
HomeComing for Develoment in AfricaHomeComing for Develoment in Africa
HomeComing for Develoment in Africa
 
Text mining names in ‘Big Data’ to recognize migration trends
Text mining names in ‘Big Data’ to recognize migration trendsText mining names in ‘Big Data’ to recognize migration trends
Text mining names in ‘Big Data’ to recognize migration trends
 
Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...
Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...
Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...
 
Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset
Using Digital Traces for User Profiling: the Uncertainty of Identity ToolsetUsing Digital Traces for User Profiling: the Uncertainty of Identity Toolset
Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset
 
Communications practices for livestock genetics for Africa
Communications practices for livestock genetics for AfricaCommunications practices for livestock genetics for Africa
Communications practices for livestock genetics for Africa
 
Future Social Media Research
Future Social Media ResearchFuture Social Media Research
Future Social Media Research
 
1000 Images About Cause And Effect Essay On Pinter
1000 Images About Cause And Effect Essay On Pinter1000 Images About Cause And Effect Essay On Pinter
1000 Images About Cause And Effect Essay On Pinter
 
How DNA and Criminal Profiling Solve Crimes
How DNA and Criminal Profiling Solve CrimesHow DNA and Criminal Profiling Solve Crimes
How DNA and Criminal Profiling Solve Crimes
 
Conoscenza finals 2016
Conoscenza finals 2016Conoscenza finals 2016
Conoscenza finals 2016
 
Coś zupełnie offline: badania etnograficzne są kluczem do skutecznego zaangaż...
Coś zupełnie offline: badania etnograficzne są kluczem do skutecznego zaangaż...Coś zupełnie offline: badania etnograficzne są kluczem do skutecznego zaangaż...
Coś zupełnie offline: badania etnograficzne są kluczem do skutecznego zaangaż...
 
Culture Mapping Food & Beverage Trends 2018
Culture Mapping Food & Beverage Trends 2018Culture Mapping Food & Beverage Trends 2018
Culture Mapping Food & Beverage Trends 2018
 
IAOS 2018 - Official statistics and Indigenous People – the state of play and...
IAOS 2018 - Official statistics and Indigenous People – the state of play and...IAOS 2018 - Official statistics and Indigenous People – the state of play and...
IAOS 2018 - Official statistics and Indigenous People – the state of play and...
 
ECSA, the ECSA principles, and the ECSA Characteristics of Citizen Science
ECSA, the ECSA principles, and the ECSA Characteristics of Citizen ScienceECSA, the ECSA principles, and the ECSA Characteristics of Citizen Science
ECSA, the ECSA principles, and the ECSA Characteristics of Citizen Science
 
Artificial Intelligence for Goods: Cases and Tools
Artificial Intelligence for Goods: Cases and ToolsArtificial Intelligence for Goods: Cases and Tools
Artificial Intelligence for Goods: Cases and Tools
 
Enabling information interoperability with identifiers (L. Haak)
 Enabling information interoperability with identifiers (L. Haak) Enabling information interoperability with identifiers (L. Haak)
Enabling information interoperability with identifiers (L. Haak)
 
Spark
SparkSpark
Spark
 
Our digital traces and how they can be missuseed
Our digital traces and how they can be missuseedOur digital traces and how they can be missuseed
Our digital traces and how they can be missuseed
 

Mehr von ICMPD

Current state of migration in the Mediterranean - Nov 2016 by OECD
Current state of migration in the Mediterranean - Nov 2016 by OECDCurrent state of migration in the Mediterranean - Nov 2016 by OECD
Current state of migration in the Mediterranean - Nov 2016 by OECDICMPD
 
Prospects for International Migration Governance
Prospects for International Migration GovernanceProspects for International Migration Governance
Prospects for International Migration GovernanceICMPD
 
Migration Governance Framework & its applications by IOM
Migration Governance Framework & its applications by IOMMigration Governance Framework & its applications by IOM
Migration Governance Framework & its applications by IOMICMPD
 
25 Points d'action pour l'engagement des diasporas
25 Points d'action pour l'engagement des diasporas25 Points d'action pour l'engagement des diasporas
25 Points d'action pour l'engagement des diasporasICMPD
 
Trafficking in persons in Syria and the neighbouring countries
Trafficking in persons in Syria and the neighbouring countriesTrafficking in persons in Syria and the neighbouring countries
Trafficking in persons in Syria and the neighbouring countriesICMPD
 
EUROMED Migration IV: Approaches & possibilities for future collaboration
EUROMED Migration IV: Approaches & possibilities for future collaborationEUROMED Migration IV: Approaches & possibilities for future collaboration
EUROMED Migration IV: Approaches & possibilities for future collaborationICMPD
 

Mehr von ICMPD (6)

Current state of migration in the Mediterranean - Nov 2016 by OECD
Current state of migration in the Mediterranean - Nov 2016 by OECDCurrent state of migration in the Mediterranean - Nov 2016 by OECD
Current state of migration in the Mediterranean - Nov 2016 by OECD
 
Prospects for International Migration Governance
Prospects for International Migration GovernanceProspects for International Migration Governance
Prospects for International Migration Governance
 
Migration Governance Framework & its applications by IOM
Migration Governance Framework & its applications by IOMMigration Governance Framework & its applications by IOM
Migration Governance Framework & its applications by IOM
 
25 Points d'action pour l'engagement des diasporas
25 Points d'action pour l'engagement des diasporas25 Points d'action pour l'engagement des diasporas
25 Points d'action pour l'engagement des diasporas
 
Trafficking in persons in Syria and the neighbouring countries
Trafficking in persons in Syria and the neighbouring countriesTrafficking in persons in Syria and the neighbouring countries
Trafficking in persons in Syria and the neighbouring countries
 
EUROMED Migration IV: Approaches & possibilities for future collaboration
EUROMED Migration IV: Approaches & possibilities for future collaborationEUROMED Migration IV: Approaches & possibilities for future collaboration
EUROMED Migration IV: Approaches & possibilities for future collaboration
 

Kürzlich hochgeladen

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Kürzlich hochgeladen (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Mining names in the big data to map diasporas - NamSor

  • 1. Mining personal names in the ‘Big Data’ to map Diasporas Who are they, where are they and what are they doing? Connecting, Communicating and Networking with Diasporas 4-6 May 2016 - Dublin Castle - Ireland Elian CARSENAT, NamSor Funded by the European Union
  • 3. NamSor sorts Names 3  Personal names are meaningful : we use sociolinguistics to extract their semantics and deliver actionable intelligence.  Names reflect cultural Identity  NamSor data mining software recognizes the linguistic or cultural origin of names in any alphabet / language, with fine grain and high accuracy.
  • 4. Mining 3M twitter names to map Diasporas Who are they, where are they and what are they doing? 4 Source: Twitter Source: Twitter Visualization : CartoDB Data Mining: NamSor
  • 5. Flow view – who travels where? 5 Source Target Type Id Onoma Weight United Kingdom France Directed 16 Great Britain 37 Spain France Directed 55 Spain 14 United States France Directed 75 Great Britain 12 Turkey France Directed 79 Turkey 11 Brazil France Directed 87 Portugal 10 United Kingdom France Directed 112 Ireland 9 Italy France Directed 152 Italy 7 Switzerland France Directed 226 France 5 Belgium France Directed 247 France 5 United Kingdom France Directed 258 France 5 Mexico France Directed 287 Spain 4 Ireland France Directed 317 Great Britain 4 United Kingdom France Directed 333 Italy 4 United States France Directed 375 France 4 Source: Twitter Visualization : Gephi Data Mining: NamSor
  • 6. Flow view – who travels where? 6 Source: Twitter Visualization : Gephi Data Mining: NamSor
  • 7. “Incredible India” – 1.2 Billion People Indian onomastics by State/Union Territory 7 Names in LATIN, BENGALI, DEVANAGARI, GUJARATI, GURMUKHI, KANNADA, MALAYALAM, ORIYA, TAMIL, TELUGU, ARABIC
  • 8. Applications to a global Airline’s customer intelligence 8 Example: Indian Diaspora / Non Resident Indians (NRI) based in the United States ‘It applies indeed to 93% of our customers: when NamSor recognizes an Indian name, the client has travelled to India in the past.’ At state level : ~50% Finer grain segmentation using names brings insights into diasporas’ travel patterns visiting family and friends in their home country, as well as their specific needs.
  • 9. Mapping Talents in Cancer Research (in collaboration with French INSERM) 9 Thomson Reuters WebOfScience (6 countries, 250k scientists, 50k papers) “Analysts uncovered amazing patterns in the way scientists’ names correlate with whom they publish, and who they cite in their papers - not just in case of a particular country, but globally. Tania Vichnevskaia of the French National Institute for Health (INSERM) presented the paper ‘Applying onomastics to scientometrics‘ at IREG International symposium 2015 organised by University of Maribor and Shanghai Jiao Tong University. The paper was prepared jointly with NamSor, a private start-up company specialized in mapping international Diasporas.” Source: WoS; Data Mining: INSERM with NamSor
  • 10. 10 Source: WoS; Data Mining: INSERM with NamSor Mapping Talents in Cancer Research (in collaboration with French INSERM)
  • 11. Cancer Research in Poland and Slovenia Examining the ‘brain drain’ 11 In the Polish Corpus, we look at co- authors with Polish names, affiliated abroad. Top countries: 1. USA 2. Great-Britain 3. Germany In the Slovenian Corpus, we look at co- authors with Slovenian names, affiliated abroad. Top countries: 1. Great-Britain 2. USA 3. Germany Source: WoS; Data Mining: INSERM with NamSor
  • 13. Marocains Résidant à l'Étranger (MRE) Répartition parmi les principales Universités au Canada 13 Canadian Science Policy Conference - CSPC2015
  • 17. Founder Bio 17 Elian CARSENAT, a computer scientist trained at ENSIIE/INRIA, started his career at JP Morgan in Paris in 1997. He later worked as consultant and managed business & IT projects in London, Paris, Moscow and Shanghai. In 2012, Elian created NamSor, a piece of sociolinguistics software to mine the 'Big Data' and better understand international flows of money, ideas and people. NamSor helps answer the perennial question all countries ask about their diasporas – who are they, where are they and what are they doing. NamSor has been used to attract Foreign Direct Investments (FDI), to build-up international collaboration within scientific communities, to attract and facilitate Diaspora investment in Start-ups... as well as other use cases. http://fr.linkedin.com/in/eliancarsenat/en
  • 18. Thank you! Elian CARSENAT elian.carsenat@namsor.com Phone : +33 6 52 77 99 07 www.namsor.com 18 Juillet 2013, Ambassade de Lituanie à Paris