SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Using Digital Traces for User Profiling: the Uncertainty 
of Identity Toolset 
Muhammad Adnan1, Antonio Lima2, Luca Rossi2, Suresh Veluru3, Paul 
Longley1, Mirco Musolesi2, Muttukrishnan Rajarajan3 
1 Department of Geography, University College London 
2 School of Computer Science, University of Birmingham 
3 School of Engineering and Mathematical Sciences, City University London 
Web: www.uncertaintyofidentity.com
Introduction 
• Past years have witnessed a rapid growth of the use of 
online services 
• Online shopping, bank transactions, social networking services 
• Issues related to cyber-crimes, identity frauds, and hacking 
• This project aims to combining real and virtual world 
datasets to better understand the identity of individuals 
• Identities 
• Real world (Name: Forename & Surname) 
• Virtual world (Email addresses, Social media accounts etc)
Introduction 
• This paper presents a framework for the identification and 
profiling of individuals from their 
• Social media accounts 
• E-mail addresses 
• Twitter Geographic Profiler 
• Maps ethno-cultural communities of a person’s friends 
• E-mail Address Profiler 
• Used a database of family names to extract probably identities from 
E-mail addresses 
• Could have potential applications in targeted marketing and 
online fraud detection
Outline 
• Onomap 
• A Name (Forename and Surname) classification system 
• Twitter Geographic Profiler 
• Extracting identities of Twitter users 
• Mapping them to probable ethnic origins 
• E-mail Address Profiler 
• Extracting identities from E-mail addresses 
• Geographic distribution
Onomap classification 
• A name is a person’s ethnic, linguistic, and cultural identity 
• A network of Forename-Surname pairs was created by using 
Pablo 
Forenames Surnames 
Mateos 
Garcia 
Pérez 
... 
Juan 
Rosa 
Marta 
... 
Sánchez 
Rodríguez 
the data from 26 different countries 
• www.onomap.org 
Name: Pablo Mateos
Onomap Classification
Onomap Classification 
• ONOMAP (www.onomap.org) for forename – surname pairs 
Kevin Hodge (English) 
Pablo Mateos (Spanish) 
… 
… 
… 
…
Twitter Geographic Profiler
Twitter Geographic Profiler 
• Given an individual’s Twitter Username or ID 
• Extracts the information of individual’s friends 
• Extracts the forename-surname pairs of the friends 
• Maps forename-surname pairs to Onomap 
• Builds an ethno-cultural profile person’s friends 
• Maps the geographic distribution
Data available through the Twitter API 
• User ID 
• User Creation Date 
• Followers 
• Friends 
• Language 
• Location 
• Name 
• Screen Name or User Name 
• Time Zone 
• Geo Enabled 
• Latitude 
• Longitude 
• Tweet date and time 
• Tweet text
Twitter: getting the ids and usernames 
• Given a Twitter username of a person, we use the Twitter 
API to get the list of friends’ ids 
– A max of 15 requests every 15 minutes is allowed 
– Each query can get up to 5000 ids 
– Generally enough to download all the ids 
• Using the ids, we fetch the name associated to each id 
– Limited to 180 requests every 15 min 
– Returns a single string from which we need to extract the name 
and surname tokens 
– Not necessarily a valid forename + surname! 
• E.g., “University of Birmingham”, “John1965”, “ What is Love”, 
“Mystic_mind”
Twitter: getting forename-surname pairs 
• Name field was divided into different tokens 
• Forenames and Surnames were detected by matching the 
string tokens against the database of forename surnames 
pairs of 26 countries 
• Users discarded 
– where tokens were not matched against valid forename and 
surname
Onomap: from names to ethnicity 
• ONOMAP (www.onomap.org) was applied on forename – 
surname pairs 
Kevin Hodge (English) 
Pablo Mateos (Spanish) 
… 
… 
… 
…
Friends’ Ethnicity Histogram 
GEOGRAPHIC PROFILER 
cultural communities of a 
determine the distribution 
groups of the friends of a 
integrate information from two 
Note, that the same ideas 
other Online Social 
Foursquare1. However, 
around different and 
Foursquare’s venues. In this 
because of the general 
not restricted to a specific 
Facebook, information is 
username of the person being 
surname, forename) pairs of 
of names to a list of 
classification of Onomap. 
probable countries of 
estimate respectively the 
set of possible ethno-cultural 
countries. In the following 
details of the tool and 
terms of users' privacy. 
Twitter is directed, in the 
necessarily reciprocated. 
associated with each user, 
following and one for the 
Figure 1: Screenshot of the Twitter Geographic Profiler. The 
bottom part of the screen shows the histogram of the Twitter 
user's friends ethno-cultural groups. 
Once the entire list of friends name + surname pairs has been parsed, we can 
easily estimate the distribution over the set of possible ethno-cultural groups of 
the Twitter user's friends 
her followers. In this 
representing the list of a user's 
actually follow a limited number of profiles, which are then 
accessible even with the rate limitation in place. 
With the list of (surname, forename) pairs to hand, we query 
Onomap to get the ethno-cultural classification associated with 
each (surname, forename) pair, and the 
SearchSurnameTopCountries method to get the list of the 
countries where an instance of a given surname was observed.
pair among the extracted tokens. In this work we mark as invalid 
any string that is composed of a single token. If this is the case, 
we skip the profile of the corresponding friend. 
Friends’ Geographic Origins 
Map showing the geographic origin of the Twitter user's friends’ surnames as 
assigned by our tool. Below the map the user is shown a list of the top 10 
countries with the respective frequency. 
If the string contains two or more tokens, we take the first one to 
be the forename and the last one to be the surname. Moreover, 
when a (surname, forename) pair is sent to Onomap, an error 
distance matrix one can Euclidean space for the purpose similar ethno-cultural groups. 
However, note that we expect ethno-cultural groups to vary is, on average a resident of spanning a wider spectrum of Swansea4, due to the substantial in London. As a consequence, performed within a limited been shown that roughly 50% assigned in their profile, and are at town level [10], thus feasible. 
Given the friendships distribution it is also possible to use identify individuals or group of the ethno-cultural groups also infer the ethnicity of an but for which a list of friend To understand the extent of we should stress that the default profile of a user as public. Although private, thus making it impossible profile, when testing our tool profile. Consequently, we download the list of names Figure 2: Map showing the geographical origin of the Twitter 
ethno-cultural profiling. 
user's friends’ surnames as assigned by our tool. Below the 
map the user is shown a list of the top 10 countries with the 
As for the limitations of the respective frequency. 
we observed that the Twitter noise, which can considerably computation. The source of of extracting the surname string introduces unwanted
Twitter Geographic Profiler 
• Potential applications include 
– Measure the level of segregation/integration of a given individual 
(community) as the Shannon entropy of the (average) friends’ 
ethnicity histogram 
– Outliers detection: identify uncommon behaviors, e.g., individuals 
that stand out in terms of the ethno-cultural groups they bond with 
• Limitations 
– Twitter data is very noisy 
– We need a better heuristic to extract forename + surname
E-mail Address Profiler
E-mail address profiler 
• In many instances, an e-mail address encapsulates some 
kind of identity information 
– Forename or surname 
• This tool 
– Extracts identities of individuals from their e-mail addresses 
– Maps the geographical distribution of a Surname in the UK 
• The tool identifies surname or forename as substring in an 
email address 
• Tool builds a suffix tree of an e-mail address and searches 
for probable identities
An example suffix tree 
Suffix Tree for a name aamalam$. The surname for this name is alam$ 
and it has been shown at a leaf node
Surname matching algorithm 
• Surname matching algorithm constructs a suffix tree for an 
email address. 
• Uses a database of surnames and forenames and matches 
them 
– with each substring of the suffix tree 
• A probable identity is the substring where a surname or 
forename matches with the substring 
• We use a database of the most common 10,000 surnames 
in the UK
E-mail Address Profiler: geographic distribution 
• 2007 Electoral Register 
– Name and Address of every individual who is eligible to vote in 
the UK 
• Every postcode in the Electoral Register was converted 
to latitude/longitude values 
• The tool maps all the latitude/longitudes for a particular 
surname geographically 
• Onomap is used to identify the probable ethnic origin of 
a surname
E-mail Address Profiler 
Email: a.singleton@ucl.ac.uk
Geographic distribution 
Surname: Singleton Surname: Keay
Conclusion 
• A toolkit for identity detection and profiling 
• Identification and profiling of ethno-cultural characteristics of 
individuals 
• From Social media accounts and e-mail address 
• Future work will include 
• The extension of Twitter Geographic Profiler for other social media 
services 
• The extension of E-mail address profiler to process a large corpus of 
e-mail address 
• Study of privacy implications on social media services
Thanks for Listening 
Any Questions ?

Weitere ähnliche Inhalte

Ähnlich wie Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset

ancestry-bigdatasummit-april2013
ancestry-bigdatasummit-april2013ancestry-bigdatasummit-april2013
ancestry-bigdatasummit-april2013Leonid Zhukov
 
Linguistic Cues to Deception: Identifying Political Trolls on Social Media
Linguistic Cues to Deception: Identifying Political Trolls on Social MediaLinguistic Cues to Deception: Identifying Political Trolls on Social Media
Linguistic Cues to Deception: Identifying Political Trolls on Social MediaAseel Addawood
 
21 New Age Ways To Essa
21 New Age Ways To Essa21 New Age Ways To Essa
21 New Age Ways To EssaJulie Potts
 
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"Pete Burnap
 
Example Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free EssExample Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free EssLeonard Goudy
 
Relationships In Wbs Ns (Tin180 Com)
Relationships In Wbs Ns (Tin180 Com)Relationships In Wbs Ns (Tin180 Com)
Relationships In Wbs Ns (Tin180 Com)Tin180 VietNam
 
An Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
An Analytical Survey on Hate Speech Recognition through NLP and Deep LearningAn Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
An Analytical Survey on Hate Speech Recognition through NLP and Deep LearningIRJET Journal
 
CSE5656 Complex Networks - Dunbar's Number
CSE5656   Complex Networks - Dunbar's NumberCSE5656   Complex Networks - Dunbar's Number
CSE5656 Complex Networks - Dunbar's NumberMarcello Tomasini
 
Measuring Anonymity in Academic Virtual Reference
Measuring Anonymity in Academic Virtual ReferenceMeasuring Anonymity in Academic Virtual Reference
Measuring Anonymity in Academic Virtual Referencekslovesbooks
 
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...Julie Roest
 
George Washington (Elementary) Writing Pape
George Washington (Elementary) Writing PapeGeorge Washington (Elementary) Writing Pape
George Washington (Elementary) Writing PapeEvelyn Donaldson
 
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...1crore projects
 
Data for the Humanities
Data for the HumanitiesData for the Humanities
Data for the Humanitieslibrarianrafia
 
Matthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptxMatthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptxreenarocky
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)David Graus
 
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docxReading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docxsodhi3
 
Social Web 2014: Final Presentations (Part II)
Social Web 2014: Final Presentations (Part II)Social Web 2014: Final Presentations (Part II)
Social Web 2014: Final Presentations (Part II)Lora Aroyo
 

Ähnlich wie Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset (20)

ancestry-bigdatasummit-april2013
ancestry-bigdatasummit-april2013ancestry-bigdatasummit-april2013
ancestry-bigdatasummit-april2013
 
Linguistic Cues to Deception: Identifying Political Trolls on Social Media
Linguistic Cues to Deception: Identifying Political Trolls on Social MediaLinguistic Cues to Deception: Identifying Political Trolls on Social Media
Linguistic Cues to Deception: Identifying Political Trolls on Social Media
 
21 New Age Ways To Essa
21 New Age Ways To Essa21 New Age Ways To Essa
21 New Age Ways To Essa
 
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
 
Example Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free EssExample Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free Ess
 
01 Network Data Collection
01 Network Data Collection01 Network Data Collection
01 Network Data Collection
 
Duke talk
Duke talkDuke talk
Duke talk
 
Relationships In Wbs Ns (Tin180 Com)
Relationships In Wbs Ns (Tin180 Com)Relationships In Wbs Ns (Tin180 Com)
Relationships In Wbs Ns (Tin180 Com)
 
An Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
An Analytical Survey on Hate Speech Recognition through NLP and Deep LearningAn Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
An Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
 
CSE5656 Complex Networks - Dunbar's Number
CSE5656   Complex Networks - Dunbar's NumberCSE5656   Complex Networks - Dunbar's Number
CSE5656 Complex Networks - Dunbar's Number
 
Measuring Anonymity in Academic Virtual Reference
Measuring Anonymity in Academic Virtual ReferenceMeasuring Anonymity in Academic Virtual Reference
Measuring Anonymity in Academic Virtual Reference
 
Our digital traces and how they can be missuseed
Our digital traces and how they can be missuseedOur digital traces and how they can be missuseed
Our digital traces and how they can be missuseed
 
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
 
George Washington (Elementary) Writing Pape
George Washington (Elementary) Writing PapeGeorge Washington (Elementary) Writing Pape
George Washington (Elementary) Writing Pape
 
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
 
Data for the Humanities
Data for the HumanitiesData for the Humanities
Data for the Humanities
 
Matthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptxMatthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptx
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
 
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docxReading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
 
Social Web 2014: Final Presentations (Part II)
Social Web 2014: Final Presentations (Part II)Social Web 2014: Final Presentations (Part II)
Social Web 2014: Final Presentations (Part II)
 

Mehr von Dr Muhammad Adnan

Spatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter usersSpatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter usersDr Muhammad Adnan
 
Analysing the digital traces of Social Media users
Analysing the digital traces of Social Media usersAnalysing the digital traces of Social Media users
Analysing the digital traces of Social Media usersDr Muhammad Adnan
 
Open Data: Analysis and Visualisation
Open Data: Analysis and VisualisationOpen Data: Analysis and Visualisation
Open Data: Analysis and VisualisationDr Muhammad Adnan
 
Geodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtodsGeodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtodsDr Muhammad Adnan
 
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...Dr Muhammad Adnan
 
Spatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identitySpatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identityDr Muhammad Adnan
 
Visualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographicsVisualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographicsDr Muhammad Adnan
 

Mehr von Dr Muhammad Adnan (8)

Spatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter usersSpatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter users
 
Analysing the digital traces of Social Media users
Analysing the digital traces of Social Media usersAnalysing the digital traces of Social Media users
Analysing the digital traces of Social Media users
 
Open Data: Analysis and Visualisation
Open Data: Analysis and VisualisationOpen Data: Analysis and Visualisation
Open Data: Analysis and Visualisation
 
Geodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtodsGeodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtods
 
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
 
Spatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identitySpatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identity
 
Visualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographicsVisualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographics
 
Real Time Geodemographics
Real Time GeodemographicsReal Time Geodemographics
Real Time Geodemographics
 

Kürzlich hochgeladen

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 

Kürzlich hochgeladen (20)

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 

Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset

  • 1. Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset Muhammad Adnan1, Antonio Lima2, Luca Rossi2, Suresh Veluru3, Paul Longley1, Mirco Musolesi2, Muttukrishnan Rajarajan3 1 Department of Geography, University College London 2 School of Computer Science, University of Birmingham 3 School of Engineering and Mathematical Sciences, City University London Web: www.uncertaintyofidentity.com
  • 2. Introduction • Past years have witnessed a rapid growth of the use of online services • Online shopping, bank transactions, social networking services • Issues related to cyber-crimes, identity frauds, and hacking • This project aims to combining real and virtual world datasets to better understand the identity of individuals • Identities • Real world (Name: Forename & Surname) • Virtual world (Email addresses, Social media accounts etc)
  • 3. Introduction • This paper presents a framework for the identification and profiling of individuals from their • Social media accounts • E-mail addresses • Twitter Geographic Profiler • Maps ethno-cultural communities of a person’s friends • E-mail Address Profiler • Used a database of family names to extract probably identities from E-mail addresses • Could have potential applications in targeted marketing and online fraud detection
  • 4. Outline • Onomap • A Name (Forename and Surname) classification system • Twitter Geographic Profiler • Extracting identities of Twitter users • Mapping them to probable ethnic origins • E-mail Address Profiler • Extracting identities from E-mail addresses • Geographic distribution
  • 5. Onomap classification • A name is a person’s ethnic, linguistic, and cultural identity • A network of Forename-Surname pairs was created by using Pablo Forenames Surnames Mateos Garcia Pérez ... Juan Rosa Marta ... Sánchez Rodríguez the data from 26 different countries • www.onomap.org Name: Pablo Mateos
  • 7. Onomap Classification • ONOMAP (www.onomap.org) for forename – surname pairs Kevin Hodge (English) Pablo Mateos (Spanish) … … … …
  • 9. Twitter Geographic Profiler • Given an individual’s Twitter Username or ID • Extracts the information of individual’s friends • Extracts the forename-surname pairs of the friends • Maps forename-surname pairs to Onomap • Builds an ethno-cultural profile person’s friends • Maps the geographic distribution
  • 10. Data available through the Twitter API • User ID • User Creation Date • Followers • Friends • Language • Location • Name • Screen Name or User Name • Time Zone • Geo Enabled • Latitude • Longitude • Tweet date and time • Tweet text
  • 11. Twitter: getting the ids and usernames • Given a Twitter username of a person, we use the Twitter API to get the list of friends’ ids – A max of 15 requests every 15 minutes is allowed – Each query can get up to 5000 ids – Generally enough to download all the ids • Using the ids, we fetch the name associated to each id – Limited to 180 requests every 15 min – Returns a single string from which we need to extract the name and surname tokens – Not necessarily a valid forename + surname! • E.g., “University of Birmingham”, “John1965”, “ What is Love”, “Mystic_mind”
  • 12. Twitter: getting forename-surname pairs • Name field was divided into different tokens • Forenames and Surnames were detected by matching the string tokens against the database of forename surnames pairs of 26 countries • Users discarded – where tokens were not matched against valid forename and surname
  • 13. Onomap: from names to ethnicity • ONOMAP (www.onomap.org) was applied on forename – surname pairs Kevin Hodge (English) Pablo Mateos (Spanish) … … … …
  • 14. Friends’ Ethnicity Histogram GEOGRAPHIC PROFILER cultural communities of a determine the distribution groups of the friends of a integrate information from two Note, that the same ideas other Online Social Foursquare1. However, around different and Foursquare’s venues. In this because of the general not restricted to a specific Facebook, information is username of the person being surname, forename) pairs of of names to a list of classification of Onomap. probable countries of estimate respectively the set of possible ethno-cultural countries. In the following details of the tool and terms of users' privacy. Twitter is directed, in the necessarily reciprocated. associated with each user, following and one for the Figure 1: Screenshot of the Twitter Geographic Profiler. The bottom part of the screen shows the histogram of the Twitter user's friends ethno-cultural groups. Once the entire list of friends name + surname pairs has been parsed, we can easily estimate the distribution over the set of possible ethno-cultural groups of the Twitter user's friends her followers. In this representing the list of a user's actually follow a limited number of profiles, which are then accessible even with the rate limitation in place. With the list of (surname, forename) pairs to hand, we query Onomap to get the ethno-cultural classification associated with each (surname, forename) pair, and the SearchSurnameTopCountries method to get the list of the countries where an instance of a given surname was observed.
  • 15. pair among the extracted tokens. In this work we mark as invalid any string that is composed of a single token. If this is the case, we skip the profile of the corresponding friend. Friends’ Geographic Origins Map showing the geographic origin of the Twitter user's friends’ surnames as assigned by our tool. Below the map the user is shown a list of the top 10 countries with the respective frequency. If the string contains two or more tokens, we take the first one to be the forename and the last one to be the surname. Moreover, when a (surname, forename) pair is sent to Onomap, an error distance matrix one can Euclidean space for the purpose similar ethno-cultural groups. However, note that we expect ethno-cultural groups to vary is, on average a resident of spanning a wider spectrum of Swansea4, due to the substantial in London. As a consequence, performed within a limited been shown that roughly 50% assigned in their profile, and are at town level [10], thus feasible. Given the friendships distribution it is also possible to use identify individuals or group of the ethno-cultural groups also infer the ethnicity of an but for which a list of friend To understand the extent of we should stress that the default profile of a user as public. Although private, thus making it impossible profile, when testing our tool profile. Consequently, we download the list of names Figure 2: Map showing the geographical origin of the Twitter ethno-cultural profiling. user's friends’ surnames as assigned by our tool. Below the map the user is shown a list of the top 10 countries with the As for the limitations of the respective frequency. we observed that the Twitter noise, which can considerably computation. The source of of extracting the surname string introduces unwanted
  • 16. Twitter Geographic Profiler • Potential applications include – Measure the level of segregation/integration of a given individual (community) as the Shannon entropy of the (average) friends’ ethnicity histogram – Outliers detection: identify uncommon behaviors, e.g., individuals that stand out in terms of the ethno-cultural groups they bond with • Limitations – Twitter data is very noisy – We need a better heuristic to extract forename + surname
  • 18. E-mail address profiler • In many instances, an e-mail address encapsulates some kind of identity information – Forename or surname • This tool – Extracts identities of individuals from their e-mail addresses – Maps the geographical distribution of a Surname in the UK • The tool identifies surname or forename as substring in an email address • Tool builds a suffix tree of an e-mail address and searches for probable identities
  • 19. An example suffix tree Suffix Tree for a name aamalam$. The surname for this name is alam$ and it has been shown at a leaf node
  • 20. Surname matching algorithm • Surname matching algorithm constructs a suffix tree for an email address. • Uses a database of surnames and forenames and matches them – with each substring of the suffix tree • A probable identity is the substring where a surname or forename matches with the substring • We use a database of the most common 10,000 surnames in the UK
  • 21. E-mail Address Profiler: geographic distribution • 2007 Electoral Register – Name and Address of every individual who is eligible to vote in the UK • Every postcode in the Electoral Register was converted to latitude/longitude values • The tool maps all the latitude/longitudes for a particular surname geographically • Onomap is used to identify the probable ethnic origin of a surname
  • 22. E-mail Address Profiler Email: a.singleton@ucl.ac.uk
  • 23. Geographic distribution Surname: Singleton Surname: Keay
  • 24. Conclusion • A toolkit for identity detection and profiling • Identification and profiling of ethno-cultural characteristics of individuals • From Social media accounts and e-mail address • Future work will include • The extension of Twitter Geographic Profiler for other social media services • The extension of E-mail address profiler to process a large corpus of e-mail address • Study of privacy implications on social media services
  • 25. Thanks for Listening Any Questions ?