SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Using Digital Traces for User Profiling: the Uncertainty 
of Identity Toolset 
Muhammad Adnan1, Antonio Lima2, Luca Rossi2, Suresh Veluru3, Paul 
Longley1, Mirco Musolesi2, Muttukrishnan Rajarajan3 
1 Department of Geography, University College London 
2 School of Computer Science, University of Birmingham 
3 School of Engineering and Mathematical Sciences, City University London 
Web: www.uncertaintyofidentity.com
Introduction 
• Past years have witnessed a rapid growth of the use of 
online services 
• Online shopping, bank transactions, social networking services 
• Issues related to cyber-crimes, identity frauds, and hacking 
• This project aims to combining real and virtual world 
datasets to better understand the identity of individuals 
• Identities 
• Real world (Name: Forename & Surname) 
• Virtual world (Email addresses, Social media accounts etc)
Introduction 
• This paper presents a framework for the identification and 
profiling of individuals from their 
• Social media accounts 
• E-mail addresses 
• Twitter Geographic Profiler 
• Maps ethno-cultural communities of a person’s friends 
• E-mail Address Profiler 
• Used a database of family names to extract probably identities from 
E-mail addresses 
• Could have potential applications in targeted marketing and 
online fraud detection
Outline 
• Onomap 
• A Name (Forename and Surname) classification system 
• Twitter Geographic Profiler 
• Extracting identities of Twitter users 
• Mapping them to probable ethnic origins 
• E-mail Address Profiler 
• Extracting identities from E-mail addresses 
• Geographic distribution
Onomap classification 
• A name is a person’s ethnic, linguistic, and cultural identity 
• A network of Forename-Surname pairs was created by using 
Pablo 
Forenames Surnames 
Mateos 
Garcia 
Pérez 
... 
Juan 
Rosa 
Marta 
... 
Sánchez 
Rodríguez 
the data from 26 different countries 
• www.onomap.org 
Name: Pablo Mateos
Onomap Classification
Onomap Classification 
• ONOMAP (www.onomap.org) for forename – surname pairs 
Kevin Hodge (English) 
Pablo Mateos (Spanish) 
… 
… 
… 
…
Twitter Geographic Profiler
Twitter Geographic Profiler 
• Given an individual’s Twitter Username or ID 
• Extracts the information of individual’s friends 
• Extracts the forename-surname pairs of the friends 
• Maps forename-surname pairs to Onomap 
• Builds an ethno-cultural profile person’s friends 
• Maps the geographic distribution
Data available through the Twitter API 
• User ID 
• User Creation Date 
• Followers 
• Friends 
• Language 
• Location 
• Name 
• Screen Name or User Name 
• Time Zone 
• Geo Enabled 
• Latitude 
• Longitude 
• Tweet date and time 
• Tweet text
Twitter: getting the ids and usernames 
• Given a Twitter username of a person, we use the Twitter 
API to get the list of friends’ ids 
– A max of 15 requests every 15 minutes is allowed 
– Each query can get up to 5000 ids 
– Generally enough to download all the ids 
• Using the ids, we fetch the name associated to each id 
– Limited to 180 requests every 15 min 
– Returns a single string from which we need to extract the name 
and surname tokens 
– Not necessarily a valid forename + surname! 
• E.g., “University of Birmingham”, “John1965”, “ What is Love”, 
“Mystic_mind”
Twitter: getting forename-surname pairs 
• Name field was divided into different tokens 
• Forenames and Surnames were detected by matching the 
string tokens against the database of forename surnames 
pairs of 26 countries 
• Users discarded 
– where tokens were not matched against valid forename and 
surname
Onomap: from names to ethnicity 
• ONOMAP (www.onomap.org) was applied on forename – 
surname pairs 
Kevin Hodge (English) 
Pablo Mateos (Spanish) 
… 
… 
… 
…
Friends’ Ethnicity Histogram 
GEOGRAPHIC PROFILER 
cultural communities of a 
determine the distribution 
groups of the friends of a 
integrate information from two 
Note, that the same ideas 
other Online Social 
Foursquare1. However, 
around different and 
Foursquare’s venues. In this 
because of the general 
not restricted to a specific 
Facebook, information is 
username of the person being 
surname, forename) pairs of 
of names to a list of 
classification of Onomap. 
probable countries of 
estimate respectively the 
set of possible ethno-cultural 
countries. In the following 
details of the tool and 
terms of users' privacy. 
Twitter is directed, in the 
necessarily reciprocated. 
associated with each user, 
following and one for the 
Figure 1: Screenshot of the Twitter Geographic Profiler. The 
bottom part of the screen shows the histogram of the Twitter 
user's friends ethno-cultural groups. 
Once the entire list of friends name + surname pairs has been parsed, we can 
easily estimate the distribution over the set of possible ethno-cultural groups of 
the Twitter user's friends 
her followers. In this 
representing the list of a user's 
actually follow a limited number of profiles, which are then 
accessible even with the rate limitation in place. 
With the list of (surname, forename) pairs to hand, we query 
Onomap to get the ethno-cultural classification associated with 
each (surname, forename) pair, and the 
SearchSurnameTopCountries method to get the list of the 
countries where an instance of a given surname was observed.
pair among the extracted tokens. In this work we mark as invalid 
any string that is composed of a single token. If this is the case, 
we skip the profile of the corresponding friend. 
Friends’ Geographic Origins 
Map showing the geographic origin of the Twitter user's friends’ surnames as 
assigned by our tool. Below the map the user is shown a list of the top 10 
countries with the respective frequency. 
If the string contains two or more tokens, we take the first one to 
be the forename and the last one to be the surname. Moreover, 
when a (surname, forename) pair is sent to Onomap, an error 
distance matrix one can Euclidean space for the purpose similar ethno-cultural groups. 
However, note that we expect ethno-cultural groups to vary is, on average a resident of spanning a wider spectrum of Swansea4, due to the substantial in London. As a consequence, performed within a limited been shown that roughly 50% assigned in their profile, and are at town level [10], thus feasible. 
Given the friendships distribution it is also possible to use identify individuals or group of the ethno-cultural groups also infer the ethnicity of an but for which a list of friend To understand the extent of we should stress that the default profile of a user as public. Although private, thus making it impossible profile, when testing our tool profile. Consequently, we download the list of names Figure 2: Map showing the geographical origin of the Twitter 
ethno-cultural profiling. 
user's friends’ surnames as assigned by our tool. Below the 
map the user is shown a list of the top 10 countries with the 
As for the limitations of the respective frequency. 
we observed that the Twitter noise, which can considerably computation. The source of of extracting the surname string introduces unwanted
Twitter Geographic Profiler 
• Potential applications include 
– Measure the level of segregation/integration of a given individual 
(community) as the Shannon entropy of the (average) friends’ 
ethnicity histogram 
– Outliers detection: identify uncommon behaviors, e.g., individuals 
that stand out in terms of the ethno-cultural groups they bond with 
• Limitations 
– Twitter data is very noisy 
– We need a better heuristic to extract forename + surname
E-mail Address Profiler
E-mail address profiler 
• In many instances, an e-mail address encapsulates some 
kind of identity information 
– Forename or surname 
• This tool 
– Extracts identities of individuals from their e-mail addresses 
– Maps the geographical distribution of a Surname in the UK 
• The tool identifies surname or forename as substring in an 
email address 
• Tool builds a suffix tree of an e-mail address and searches 
for probable identities
An example suffix tree 
Suffix Tree for a name aamalam$. The surname for this name is alam$ 
and it has been shown at a leaf node
Surname matching algorithm 
• Surname matching algorithm constructs a suffix tree for an 
email address. 
• Uses a database of surnames and forenames and matches 
them 
– with each substring of the suffix tree 
• A probable identity is the substring where a surname or 
forename matches with the substring 
• We use a database of the most common 10,000 surnames 
in the UK
E-mail Address Profiler: geographic distribution 
• 2007 Electoral Register 
– Name and Address of every individual who is eligible to vote in 
the UK 
• Every postcode in the Electoral Register was converted 
to latitude/longitude values 
• The tool maps all the latitude/longitudes for a particular 
surname geographically 
• Onomap is used to identify the probable ethnic origin of 
a surname
E-mail Address Profiler 
Email: a.singleton@ucl.ac.uk
Geographic distribution 
Surname: Singleton Surname: Keay
Conclusion 
• A toolkit for identity detection and profiling 
• Identification and profiling of ethno-cultural characteristics of 
individuals 
• From Social media accounts and e-mail address 
• Future work will include 
• The extension of Twitter Geographic Profiler for other social media 
services 
• The extension of E-mail address profiler to process a large corpus of 
e-mail address 
• Study of privacy implications on social media services
Thanks for Listening 
Any Questions ?

Weitere ähnliche Inhalte

Ähnlich wie Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset

Linguistic Cues to Deception: Identifying Political Trolls on Social Media
Linguistic Cues to Deception: Identifying Political Trolls on Social MediaLinguistic Cues to Deception: Identifying Political Trolls on Social Media
Linguistic Cues to Deception: Identifying Political Trolls on Social MediaAseel Addawood
 
21 New Age Ways To Essa
21 New Age Ways To Essa21 New Age Ways To Essa
21 New Age Ways To EssaJulie Potts
 
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"Pete Burnap
 
Example Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free EssExample Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free EssLeonard Goudy
 
Relationships In Wbs Ns (Tin180 Com)
Relationships In Wbs Ns (Tin180 Com)Relationships In Wbs Ns (Tin180 Com)
Relationships In Wbs Ns (Tin180 Com)Tin180 VietNam
 
An Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
An Analytical Survey on Hate Speech Recognition through NLP and Deep LearningAn Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
An Analytical Survey on Hate Speech Recognition through NLP and Deep LearningIRJET Journal
 
CSE5656 Complex Networks - Dunbar's Number
CSE5656   Complex Networks - Dunbar's NumberCSE5656   Complex Networks - Dunbar's Number
CSE5656 Complex Networks - Dunbar's NumberMarcello Tomasini
 
Measuring Anonymity in Academic Virtual Reference
Measuring Anonymity in Academic Virtual ReferenceMeasuring Anonymity in Academic Virtual Reference
Measuring Anonymity in Academic Virtual Referencekslovesbooks
 
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...Julie Roest
 
George Washington (Elementary) Writing Pape
George Washington (Elementary) Writing PapeGeorge Washington (Elementary) Writing Pape
George Washington (Elementary) Writing PapeEvelyn Donaldson
 
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...1crore projects
 
Data for the Humanities
Data for the HumanitiesData for the Humanities
Data for the Humanitieslibrarianrafia
 
Matthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptxMatthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptxreenarocky
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)David Graus
 
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docxReading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docxsodhi3
 
Social Web 2014: Final Presentations (Part II)
Social Web 2014: Final Presentations (Part II)Social Web 2014: Final Presentations (Part II)
Social Web 2014: Final Presentations (Part II)Lora Aroyo
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applicationsVasileios Lampos
 

Ähnlich wie Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset (20)

Linguistic Cues to Deception: Identifying Political Trolls on Social Media
Linguistic Cues to Deception: Identifying Political Trolls on Social MediaLinguistic Cues to Deception: Identifying Political Trolls on Social Media
Linguistic Cues to Deception: Identifying Political Trolls on Social Media
 
21 New Age Ways To Essa
21 New Age Ways To Essa21 New Age Ways To Essa
21 New Age Ways To Essa
 
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
 
Example Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free EssExample Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free Ess
 
01 Network Data Collection
01 Network Data Collection01 Network Data Collection
01 Network Data Collection
 
Duke talk
Duke talkDuke talk
Duke talk
 
Relationships In Wbs Ns (Tin180 Com)
Relationships In Wbs Ns (Tin180 Com)Relationships In Wbs Ns (Tin180 Com)
Relationships In Wbs Ns (Tin180 Com)
 
An Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
An Analytical Survey on Hate Speech Recognition through NLP and Deep LearningAn Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
An Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
 
CSE5656 Complex Networks - Dunbar's Number
CSE5656   Complex Networks - Dunbar's NumberCSE5656   Complex Networks - Dunbar's Number
CSE5656 Complex Networks - Dunbar's Number
 
Measuring Anonymity in Academic Virtual Reference
Measuring Anonymity in Academic Virtual ReferenceMeasuring Anonymity in Academic Virtual Reference
Measuring Anonymity in Academic Virtual Reference
 
Our digital traces and how they can be missuseed
Our digital traces and how they can be missuseedOur digital traces and how they can be missuseed
Our digital traces and how they can be missuseed
 
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
 
George Washington (Elementary) Writing Pape
George Washington (Elementary) Writing PapeGeorge Washington (Elementary) Writing Pape
George Washington (Elementary) Writing Pape
 
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
 
Data for the Humanities
Data for the HumanitiesData for the Humanities
Data for the Humanities
 
Matthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptxMatthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptx
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
 
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docxReading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
 
Social Web 2014: Final Presentations (Part II)
Social Web 2014: Final Presentations (Part II)Social Web 2014: Final Presentations (Part II)
Social Web 2014: Final Presentations (Part II)
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applications
 

Mehr von Dr Muhammad Adnan

Spatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter usersSpatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter usersDr Muhammad Adnan
 
Analysing the digital traces of Social Media users
Analysing the digital traces of Social Media usersAnalysing the digital traces of Social Media users
Analysing the digital traces of Social Media usersDr Muhammad Adnan
 
Open Data: Analysis and Visualisation
Open Data: Analysis and VisualisationOpen Data: Analysis and Visualisation
Open Data: Analysis and VisualisationDr Muhammad Adnan
 
Geodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtodsGeodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtodsDr Muhammad Adnan
 
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...Dr Muhammad Adnan
 
Spatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identitySpatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identityDr Muhammad Adnan
 
Visualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographicsVisualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographicsDr Muhammad Adnan
 

Mehr von Dr Muhammad Adnan (8)

Spatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter usersSpatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter users
 
Analysing the digital traces of Social Media users
Analysing the digital traces of Social Media usersAnalysing the digital traces of Social Media users
Analysing the digital traces of Social Media users
 
Open Data: Analysis and Visualisation
Open Data: Analysis and VisualisationOpen Data: Analysis and Visualisation
Open Data: Analysis and Visualisation
 
Geodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtodsGeodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtods
 
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
 
Spatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identitySpatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identity
 
Visualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographicsVisualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographics
 
Real Time Geodemographics
Real Time GeodemographicsReal Time Geodemographics
Real Time Geodemographics
 

Kürzlich hochgeladen

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 

Kürzlich hochgeladen (20)

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 

Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset

  • 1. Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset Muhammad Adnan1, Antonio Lima2, Luca Rossi2, Suresh Veluru3, Paul Longley1, Mirco Musolesi2, Muttukrishnan Rajarajan3 1 Department of Geography, University College London 2 School of Computer Science, University of Birmingham 3 School of Engineering and Mathematical Sciences, City University London Web: www.uncertaintyofidentity.com
  • 2. Introduction • Past years have witnessed a rapid growth of the use of online services • Online shopping, bank transactions, social networking services • Issues related to cyber-crimes, identity frauds, and hacking • This project aims to combining real and virtual world datasets to better understand the identity of individuals • Identities • Real world (Name: Forename & Surname) • Virtual world (Email addresses, Social media accounts etc)
  • 3. Introduction • This paper presents a framework for the identification and profiling of individuals from their • Social media accounts • E-mail addresses • Twitter Geographic Profiler • Maps ethno-cultural communities of a person’s friends • E-mail Address Profiler • Used a database of family names to extract probably identities from E-mail addresses • Could have potential applications in targeted marketing and online fraud detection
  • 4. Outline • Onomap • A Name (Forename and Surname) classification system • Twitter Geographic Profiler • Extracting identities of Twitter users • Mapping them to probable ethnic origins • E-mail Address Profiler • Extracting identities from E-mail addresses • Geographic distribution
  • 5. Onomap classification • A name is a person’s ethnic, linguistic, and cultural identity • A network of Forename-Surname pairs was created by using Pablo Forenames Surnames Mateos Garcia Pérez ... Juan Rosa Marta ... Sánchez Rodríguez the data from 26 different countries • www.onomap.org Name: Pablo Mateos
  • 7. Onomap Classification • ONOMAP (www.onomap.org) for forename – surname pairs Kevin Hodge (English) Pablo Mateos (Spanish) … … … …
  • 9. Twitter Geographic Profiler • Given an individual’s Twitter Username or ID • Extracts the information of individual’s friends • Extracts the forename-surname pairs of the friends • Maps forename-surname pairs to Onomap • Builds an ethno-cultural profile person’s friends • Maps the geographic distribution
  • 10. Data available through the Twitter API • User ID • User Creation Date • Followers • Friends • Language • Location • Name • Screen Name or User Name • Time Zone • Geo Enabled • Latitude • Longitude • Tweet date and time • Tweet text
  • 11. Twitter: getting the ids and usernames • Given a Twitter username of a person, we use the Twitter API to get the list of friends’ ids – A max of 15 requests every 15 minutes is allowed – Each query can get up to 5000 ids – Generally enough to download all the ids • Using the ids, we fetch the name associated to each id – Limited to 180 requests every 15 min – Returns a single string from which we need to extract the name and surname tokens – Not necessarily a valid forename + surname! • E.g., “University of Birmingham”, “John1965”, “ What is Love”, “Mystic_mind”
  • 12. Twitter: getting forename-surname pairs • Name field was divided into different tokens • Forenames and Surnames were detected by matching the string tokens against the database of forename surnames pairs of 26 countries • Users discarded – where tokens were not matched against valid forename and surname
  • 13. Onomap: from names to ethnicity • ONOMAP (www.onomap.org) was applied on forename – surname pairs Kevin Hodge (English) Pablo Mateos (Spanish) … … … …
  • 14. Friends’ Ethnicity Histogram GEOGRAPHIC PROFILER cultural communities of a determine the distribution groups of the friends of a integrate information from two Note, that the same ideas other Online Social Foursquare1. However, around different and Foursquare’s venues. In this because of the general not restricted to a specific Facebook, information is username of the person being surname, forename) pairs of of names to a list of classification of Onomap. probable countries of estimate respectively the set of possible ethno-cultural countries. In the following details of the tool and terms of users' privacy. Twitter is directed, in the necessarily reciprocated. associated with each user, following and one for the Figure 1: Screenshot of the Twitter Geographic Profiler. The bottom part of the screen shows the histogram of the Twitter user's friends ethno-cultural groups. Once the entire list of friends name + surname pairs has been parsed, we can easily estimate the distribution over the set of possible ethno-cultural groups of the Twitter user's friends her followers. In this representing the list of a user's actually follow a limited number of profiles, which are then accessible even with the rate limitation in place. With the list of (surname, forename) pairs to hand, we query Onomap to get the ethno-cultural classification associated with each (surname, forename) pair, and the SearchSurnameTopCountries method to get the list of the countries where an instance of a given surname was observed.
  • 15. pair among the extracted tokens. In this work we mark as invalid any string that is composed of a single token. If this is the case, we skip the profile of the corresponding friend. Friends’ Geographic Origins Map showing the geographic origin of the Twitter user's friends’ surnames as assigned by our tool. Below the map the user is shown a list of the top 10 countries with the respective frequency. If the string contains two or more tokens, we take the first one to be the forename and the last one to be the surname. Moreover, when a (surname, forename) pair is sent to Onomap, an error distance matrix one can Euclidean space for the purpose similar ethno-cultural groups. However, note that we expect ethno-cultural groups to vary is, on average a resident of spanning a wider spectrum of Swansea4, due to the substantial in London. As a consequence, performed within a limited been shown that roughly 50% assigned in their profile, and are at town level [10], thus feasible. Given the friendships distribution it is also possible to use identify individuals or group of the ethno-cultural groups also infer the ethnicity of an but for which a list of friend To understand the extent of we should stress that the default profile of a user as public. Although private, thus making it impossible profile, when testing our tool profile. Consequently, we download the list of names Figure 2: Map showing the geographical origin of the Twitter ethno-cultural profiling. user's friends’ surnames as assigned by our tool. Below the map the user is shown a list of the top 10 countries with the As for the limitations of the respective frequency. we observed that the Twitter noise, which can considerably computation. The source of of extracting the surname string introduces unwanted
  • 16. Twitter Geographic Profiler • Potential applications include – Measure the level of segregation/integration of a given individual (community) as the Shannon entropy of the (average) friends’ ethnicity histogram – Outliers detection: identify uncommon behaviors, e.g., individuals that stand out in terms of the ethno-cultural groups they bond with • Limitations – Twitter data is very noisy – We need a better heuristic to extract forename + surname
  • 18. E-mail address profiler • In many instances, an e-mail address encapsulates some kind of identity information – Forename or surname • This tool – Extracts identities of individuals from their e-mail addresses – Maps the geographical distribution of a Surname in the UK • The tool identifies surname or forename as substring in an email address • Tool builds a suffix tree of an e-mail address and searches for probable identities
  • 19. An example suffix tree Suffix Tree for a name aamalam$. The surname for this name is alam$ and it has been shown at a leaf node
  • 20. Surname matching algorithm • Surname matching algorithm constructs a suffix tree for an email address. • Uses a database of surnames and forenames and matches them – with each substring of the suffix tree • A probable identity is the substring where a surname or forename matches with the substring • We use a database of the most common 10,000 surnames in the UK
  • 21. E-mail Address Profiler: geographic distribution • 2007 Electoral Register – Name and Address of every individual who is eligible to vote in the UK • Every postcode in the Electoral Register was converted to latitude/longitude values • The tool maps all the latitude/longitudes for a particular surname geographically • Onomap is used to identify the probable ethnic origin of a surname
  • 22. E-mail Address Profiler Email: a.singleton@ucl.ac.uk
  • 23. Geographic distribution Surname: Singleton Surname: Keay
  • 24. Conclusion • A toolkit for identity detection and profiling • Identification and profiling of ethno-cultural characteristics of individuals • From Social media accounts and e-mail address • Future work will include • The extension of Twitter Geographic Profiler for other social media services • The extension of E-mail address profiler to process a large corpus of e-mail address • Study of privacy implications on social media services
  • 25. Thanks for Listening Any Questions ?