This presentation proposes the methods of classifying Twitter Data. There has been a tremendous rise in the growth of online social networks all over the world in recent times. Here we present the analysis performed on the Twitter data to identify the aspects of cultural and ethnic identity.
1. Uncertainty of Identity: Classifying Twitter
Data
Muhammad Adnan (and Prof. Paul Longley)
University College London
2. Uncertainty of Identity: Project Aims
• A combined project between UCL, City University, and
University of Birmingham
• Combining real and virtual world datasets to better
understand the identity of individuals
• Real world datasets (Surname data, socio-economic datasets)
• Virtual world datasets (Email addresses, Social media accounts)
My research interests
• Data mining
• Analysis of Twitter data
• Visualisation of the data
3. Twitter (www.twitter.com)
• Online social-networking and micro blogging service
• Was launched in 2006. After 6 years, Twitter has 500
million active users.
• Generates 350 million tweets daily
• One of the top 10 most visited websites on the internet
• Twitter API can be used to download live tweets
4. Twitter API’s data
• User Creation Date • Geo Enabled
• Followers • Latitude
• Friends • Longitude
• User ID • Tweet date and time
• Language • Tweet text
• Location
• Name
• Screen Name
• Time Zone
5.
6.
7.
8. Classifying Twitter Data to ethnic origins
• User Creation Date • Geo Enabled
• Followers • Latitude
• Friends • Longitude
• User ID • Tweet date and time
• Language • Tweet text
• Location
• Name
• Screen Name
• Time Zone
9. Classifying Twitter Data to ethnic origins
• Some examples of NAME variations on Twitter
Real Names Fake Names
Kevin Hodge Castor 5.
Andre Alves WHAT IS LOVE?
Jose de Franco MysticMind
Carolina Thomas, Dr. KIRILL_aka_KID
Prof. Martha Del Val Vanessa
Fabíola Sanchez Fernandes Petuna
14. Classifying Twitter Data to ethnic origins
• Applied ONOMAP (www.onomap.org) on FORENAME +
SURNAME pairs
Kevin Hodge (ENGLISH)
Andre de Franco (ITALIAN)
…
…
…
…
22. Which places they are talking about ?
• Tweets containing ‘London’ in their text string
• Applying text matching algorithms to remove tweets contain places
which are not London e.g. London Road or London, Ontaio
London
28. Conclusion
• Use of social media is increasing day by day
• Social-media datasets can give an insight into people’s
behaviour in virtual worlds
• Investigation of ethnicity origins in other countries to establish
inferences on migration trends in developed and developing
countries
• Future work will involve the investigation of Four Square and
Facebook data
29. Thank you for Listening
Any Questions ?
Web: http://www.uncertaintyofidentity.com
Email: m.adnan@ucl.ac.uk
Twitter: @gisandtech