SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Geodemographics: Open tools and methods

Dr. Muhammad Adnan
Department of Geography, University College London
Web: http://www.uncertaintyofidentity.com
Email: m.adnan@ucl.ac.uk
Twitter: @gisandtech
Lecture Outline
• Geodemographic Classification
• Problems with the Geodemographic Classifications
• Real-time bespoke Geodemographic Classifications
• GeodemCreator: A software for creating Geodemographic
Classifications
• Social Media data for Geodemographics
Geodemographics
• “Analysis of people by where they live” or “locality
marketing”
(Sleight, 1993:3)

Person

Home
Address
Area
Steps in Creating a Geodemographic Classification
• Variable Selection
• Transformation of the Data
• Standardisation of the Data

• Clustering of the Data (k-means)
• Naming the clusters
Data – Census + Other
ONS Output Area Classification (2001 and 2011)
• Census data: 100%

Experian: Mosaic
• Census data: 54%

• Non-Census data: 46%

CACI: Accorn
• Census data: 30%

• Non-Census data: 70%
Standardising the data
• Z-Scores
• Widely used variable normalisation technique
• Can create outliers in the datasets

• Range Standardisation
• Standardise values between a range of 0-1
• Can erase interesting patterns in the data

• Principal Component Analysis (PCA)
• Reduces the dimensions of a data set
• Focuses on the part of dataset having maximum variance
• Can erase interesting patterns in the data
Segmentations are created by cluster analysis
Areas

V1

V2

V3

V4

V5

V6

…

Variable 2

Cluster 1
Area1

Cluster 2

Area2
Area3
Area4

Variable 1
Area5
Area6
…….

Cluster 3
Output of Cluster Analysis
Areas

Cluster

Area1

1

Area2

1

Area3

2

Area4

1

Area5

3

Area6

3

…….

2001 OAC (around Greater London)
Naming the clusters
• 2011 OAC has 8 super groups
1. Rural Residents
2. Cosmopolitans
3. Ethnic Mix
4. Blue Collar Neighbourhoods
5. Multicultural Metripolitans
6. Suburbanites
7. Hard-Pressed Households
8. Urbanites
But geodemographic classifciations have
some problems !
Does one size fit all ?
• Most geodemographic classifications divide areas into a
specified number of categories
• 2011 OAC divides the Output Areas in the UK into 8 broad
categories

• Do these categories account for all the characteristics of
the population ?
• We need to create bespoke small area classifications ?
• Geodemographic categories only apply to a particular area
Closed Methods
• Commercial geodemographic classifications (i.e.
MOSAIC, ACCORN) use closed methods
•
•
•
•

Data sources used ?
Weighting of the variables ?
Data standardisation techniques employed ?
Clustering algorithm applied ?

• We need open methods and clear documentation of the
geodemographic classifications
• 2001 OAC
• 2001 LOAC (London‟s Output Area Classification)
• 2011 OAC
Public Consultation
• Users of the classification cannot modify or give a
feedback
• Users should have the control to modify the classification
through their feedback
• UCL‟s E-Society Classification
Public Consultation

Feedback
Real time Geodemographics
Need for real time Geodemographics
• Current classifications are created using static data sources
• Rate and scale of current population change is making large
surveys (census) increasingly redundant
• Significant hidden value in transactional data

• Data is increasingly available in near real time
e.g. ONS (Office of National Statistics) NESS API

• Social media data is available in real time
What are real time Geodemographics ?

Specification

Real time
feeds of data

Estimation

Online
Specification
of inputs

Clustering

Testing

Visualisation
Computational challenges
• Integration of large and possibly disparate databases
• E.g. NHS data; Census data

• Data normalisation and optimization for fast transactions
• Minimizing computational time of clustering algorithms
(Very Important)!
• Common protocol
• XML (SOAP)

• Use of non traditional data sources. (Singleton, 2008)
• E.g. Flickr; Facebook, Twitter
Important Challenge: Selection of clustering
algorithm
•
•
•
•

K-Means
PAM (Partitioning Around Medoids)
CLARA (Clustering Large Applications)
GA (Genetic Algorithm)
k-means
• Widely used clustering algorithm for geodemographics
• Attempts to find out cluster centroids by minimising within
sum of squares distance.
• K-means is unstable due to its initial seeds assignment.
• Sensitive to outliers in the data set.

• Creating a Geodemographic classification requires running
algorithm multiple times.
• 10,000 times (Singleton, 2008)
• Computationally expensive in a real time environment.
An example of bad clustering result (K-means)
An example of bad clustering result (K-means)
An example of bad clustering result (K-means)
Alternate Clustering Algorithms
• PAM (Partitioning around medoids)
• CLARA (Clustering Large Applications)
• GA (Genetic Algorithm)
Alternate Clustering Algorithms…
• PAM (Partitioning around medoids)
• It tries to minimize the sum of dissimilarities of the data
points to their cluster centers.
• Less sensitive to outliers than K-means.
• Cannot handle larger data sets.

• Produces better results than k-means for smaller data
sets.
Alternate Clustering Algorithms…
• CLARA (Clustering Large Applications)
• It draws multiple samples of the dataset, applies PAM to
each sample and returns the best result.
• Can handle large data sets as it operates on samples rather than
on actual data set.

• Could be a better choice for creating classifications on the
fly.
Alternate Clustering Algorithms…
• GA (Genetic Algorithm)
• It is inspired by models of biological evolution. It produces
results through a breeding procedure.
• Creates hierarchies of generations and then merge the
hierarchies in homogeneous groups having similar
characteristics.
• Can be time consuming due to the creation of generation
hierarchies.
Comparing computational efficiency (Z-scores)
OA (Output Area) level results

LSOA (Lower Super Output Area) level results

Ward level results
Algorithm Stability (w.r.t. Computational time)
Running k-means on OA (Output Area) for 120 times on each iteration

4
3.5
3
2.5
2
1.5
1
0.5
0
1
5
9
13
17
21
25
29
33
37
41
45
49
53
57
61
65
69
73
77
81
85
89
93
97

Time (s)

K-means

Running GA on OA (Output Area) for 120 times on each iteration

Running CLARA on OA (Output Area) for 120 times on each iteration

GA
4
3.5
3
2.5
2
1.5
1
0.5
0
1
5
9
13
17
21
25
29
33
37
41
45
49
53
57
61
65
69
73
77
81
85
89
93
97

Time (s)

4
3.5
3
2.5
2
1.5
1
0.5
0
1
5
9
13
17
21
25
29
33
37
41
45
49
53
57
61
65
69
73
77
81
85
89
93
97

Time (s)

CLARA
Bespoke Real-time Geodemographics
Data

•
•
•
•

Realtime
Measurement

Specify inputs and weights
Data normalisation
Clustering
Visualisation

Bespoke
Requests
GeodemCreator: A software for creating
Geodemographic Classifications in near
real-time.
GeodemCreator
• Allows users to create Geodemographic Classifications
• Users have the control of how a Geodemographic
Classification is created (Open Methods !)
Building a Geodemographic Classification
• Step-1: Choose a dataset
Building a Geodemographic Classification
• Step-2: Check Correlation of the Variables
Building a Geodemographic Classification
• Step-3: Select variables
Building a Geodemographic Classification
• Step-4: Specify „number of clusters‟ and „spatial area‟

Number of Clusters

Spatial Area
Building a Geodemographic Classification
• Step-5: Build the Classification
Building a Geodemographic Classification
• Output – Cluster Numbers
Building a Geodemographic Classification
• Output
Social Media data for Geodemographics
Why we need Social Media data for Geodemographics ?
• Traditional geodemographic classifications are based on
Census data
• Night time geography

• These classifications do not identify where the population is
during the day time
• We do not know about the Social links between different
people
• A solution is to infuse Social Media data with traditional data
sources
Geodemographics
• “Analysis of people by where they live” or “locality
marketing”

Social Media Geodemographics
• “Analysis of people by where they live, travel, and who
they communicate with”
Social Media Geodemographics
• Who: Ethnicity, Gender, and Age of social media users
• Where: Where social media conversations are happening
and who is leading them
•

Intelligence about where people are located and what they are
doing

• When: What time of day conversations happen
Twitter (www.twitter.com)
• Online social-networking and micro blogging service
• Launched in 2006

• Users can send messages of 140 characters or less
• Approximately 200 million active users

• 350 million tweets daily
• In 2012, UK and London were ranked 4th and
3rd, respectively, in terms of the number of posted tweets
Data available through the Twitter API
•
•
•
•
•
•
•
•
•

User Creation Date
Followers
Friends
User ID
Language
Location
Name
Screen Name
Time Zone

•
•
•
•
•

Geo Enabled
Latitude
Longitude
Tweet date and time
Tweet text
Analysing Names on Twitter
• Some examples of NAME variations on Twitter
Real Names
Kevin Hodge
Andre Alves
Jose de Franco
Carolina Thomas, Dr.
Prof. Martha Del Val
Fabíola Sanchez Fernandes

Fake Names
Castor 5.
WHAT IS LOVE?
MysticMind
KIRILL_aka_KID
Vanessa
Petuna
Tweeting Activity by different Ethnic Groups
Genders of Twitter Users
Age distribution of Twitter Users vs 2011 Census
Summary
• Geodemographics is the analysis of people by where they live
• But generalised geodemographic classifications have some
problems
– We need bespoke classifications for smaller areas

• Real-time geodemographic classifications is a solution to
create bespoke classifications
• Methods of creating current classifications are not open
– We need Open tools and Open methods for geodemographics

• Social media data for geodemographic classifications

Weitere ähnliche Inhalte

Ähnlich wie Geodemographics: Open tools and mehtods

IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079ibankuk
 
Community Structure, Interaction and Evolution Analysis of Online Social Netw...
Community Structure, Interaction and Evolution Analysis of Online Social Netw...Community Structure, Interaction and Evolution Analysis of Online Social Netw...
Community Structure, Interaction and Evolution Analysis of Online Social Netw...Symeon Papadopoulos
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiVijay Susheedran C G
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introductionNeeraj Tewari
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
Lessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleLessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleDomonkos Tikk
 
SERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolSERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolHenry Muccini
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENEWorkshop
 
Querylog-based Assessment of Retrievability Bias in a Large Newspaper Corpus
Querylog-based Assessment of Retrievability Bias in a  Large Newspaper CorpusQuerylog-based Assessment of Retrievability Bias in a  Large Newspaper Corpus
Querylog-based Assessment of Retrievability Bias in a Large Newspaper CorpusMyriam Traub
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopExtremeEarth
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudRevolution Analytics
 
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...jins0618
 
Assigning semantic labels to data sources
Assigning semantic labels to data sourcesAssigning semantic labels to data sources
Assigning semantic labels to data sourcesCraig Knoblock
 
2012 ASPRS Track, Modernized Method for Estimating Time-Weighted Urban Popula...
2012 ASPRS Track, Modernized Method for Estimating Time-Weighted Urban Popula...2012 ASPRS Track, Modernized Method for Estimating Time-Weighted Urban Popula...
2012 ASPRS Track, Modernized Method for Estimating Time-Weighted Urban Popula...GIS in the Rockies
 

Ähnlich wie Geodemographics: Open tools and mehtods (20)

Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop PDF
Hadoop PDFHadoop PDF
Hadoop PDF
 
Big data
Big dataBig data
Big data
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
 
Community Structure, Interaction and Evolution Analysis of Online Social Netw...
Community Structure, Interaction and Evolution Analysis of Online Social Netw...Community Structure, Interaction and Evolution Analysis of Online Social Netw...
Community Structure, Interaction and Evolution Analysis of Online Social Netw...
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
 
John McGaughey - Towards integrated interpretation
John McGaughey - Towards integrated interpretationJohn McGaughey - Towards integrated interpretation
John McGaughey - Towards integrated interpretation
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
Lessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleLessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scale
 
SERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolSERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_school
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the Cloud
 
Querylog-based Assessment of Retrievability Bias in a Large Newspaper Corpus
Querylog-based Assessment of Retrievability Bias in a  Large Newspaper CorpusQuerylog-based Assessment of Retrievability Bias in a  Large Newspaper Corpus
Querylog-based Assessment of Retrievability Bias in a Large Newspaper Corpus
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open Workshop
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...
 
Assigning semantic labels to data sources
Assigning semantic labels to data sourcesAssigning semantic labels to data sources
Assigning semantic labels to data sources
 
2012 ASPRS Track, Modernized Method for Estimating Time-Weighted Urban Popula...
2012 ASPRS Track, Modernized Method for Estimating Time-Weighted Urban Popula...2012 ASPRS Track, Modernized Method for Estimating Time-Weighted Urban Popula...
2012 ASPRS Track, Modernized Method for Estimating Time-Weighted Urban Popula...
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
 

Mehr von Dr Muhammad Adnan

Spatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter usersSpatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter usersDr Muhammad Adnan
 
Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset
Using Digital Traces for User Profiling: the Uncertainty of Identity ToolsetUsing Digital Traces for User Profiling: the Uncertainty of Identity Toolset
Using Digital Traces for User Profiling: the Uncertainty of Identity ToolsetDr Muhammad Adnan
 
Analysing the digital traces of Social Media users
Analysing the digital traces of Social Media usersAnalysing the digital traces of Social Media users
Analysing the digital traces of Social Media usersDr Muhammad Adnan
 
Open Data: Analysis and Visualisation
Open Data: Analysis and VisualisationOpen Data: Analysis and Visualisation
Open Data: Analysis and VisualisationDr Muhammad Adnan
 
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...Dr Muhammad Adnan
 
Uncertainty of Identity: Classifying Twitter Data
Uncertainty of Identity: Classifying Twitter DataUncertainty of Identity: Classifying Twitter Data
Uncertainty of Identity: Classifying Twitter DataDr Muhammad Adnan
 
Spatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identitySpatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identityDr Muhammad Adnan
 
Visualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographicsVisualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographicsDr Muhammad Adnan
 

Mehr von Dr Muhammad Adnan (9)

Spatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter usersSpatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter users
 
Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset
Using Digital Traces for User Profiling: the Uncertainty of Identity ToolsetUsing Digital Traces for User Profiling: the Uncertainty of Identity Toolset
Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset
 
Analysing the digital traces of Social Media users
Analysing the digital traces of Social Media usersAnalysing the digital traces of Social Media users
Analysing the digital traces of Social Media users
 
Open Data: Analysis and Visualisation
Open Data: Analysis and VisualisationOpen Data: Analysis and Visualisation
Open Data: Analysis and Visualisation
 
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
 
Uncertainty of Identity: Classifying Twitter Data
Uncertainty of Identity: Classifying Twitter DataUncertainty of Identity: Classifying Twitter Data
Uncertainty of Identity: Classifying Twitter Data
 
Spatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identitySpatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identity
 
Visualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographicsVisualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographics
 
Real Time Geodemographics
Real Time GeodemographicsReal Time Geodemographics
Real Time Geodemographics
 

Kürzlich hochgeladen

Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 

Kürzlich hochgeladen (20)

Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 

Geodemographics: Open tools and mehtods

  • 1. Geodemographics: Open tools and methods Dr. Muhammad Adnan Department of Geography, University College London Web: http://www.uncertaintyofidentity.com Email: m.adnan@ucl.ac.uk Twitter: @gisandtech
  • 2. Lecture Outline • Geodemographic Classification • Problems with the Geodemographic Classifications • Real-time bespoke Geodemographic Classifications • GeodemCreator: A software for creating Geodemographic Classifications • Social Media data for Geodemographics
  • 3. Geodemographics • “Analysis of people by where they live” or “locality marketing” (Sleight, 1993:3) Person Home Address Area
  • 4. Steps in Creating a Geodemographic Classification • Variable Selection • Transformation of the Data • Standardisation of the Data • Clustering of the Data (k-means) • Naming the clusters
  • 5. Data – Census + Other ONS Output Area Classification (2001 and 2011) • Census data: 100% Experian: Mosaic • Census data: 54% • Non-Census data: 46% CACI: Accorn • Census data: 30% • Non-Census data: 70%
  • 6. Standardising the data • Z-Scores • Widely used variable normalisation technique • Can create outliers in the datasets • Range Standardisation • Standardise values between a range of 0-1 • Can erase interesting patterns in the data • Principal Component Analysis (PCA) • Reduces the dimensions of a data set • Focuses on the part of dataset having maximum variance • Can erase interesting patterns in the data
  • 7. Segmentations are created by cluster analysis Areas V1 V2 V3 V4 V5 V6 … Variable 2 Cluster 1 Area1 Cluster 2 Area2 Area3 Area4 Variable 1 Area5 Area6 ……. Cluster 3
  • 8. Output of Cluster Analysis Areas Cluster Area1 1 Area2 1 Area3 2 Area4 1 Area5 3 Area6 3 ……. 2001 OAC (around Greater London)
  • 9. Naming the clusters • 2011 OAC has 8 super groups 1. Rural Residents 2. Cosmopolitans 3. Ethnic Mix 4. Blue Collar Neighbourhoods 5. Multicultural Metripolitans 6. Suburbanites 7. Hard-Pressed Households 8. Urbanites
  • 10. But geodemographic classifciations have some problems !
  • 11. Does one size fit all ? • Most geodemographic classifications divide areas into a specified number of categories • 2011 OAC divides the Output Areas in the UK into 8 broad categories • Do these categories account for all the characteristics of the population ? • We need to create bespoke small area classifications ? • Geodemographic categories only apply to a particular area
  • 12. Closed Methods • Commercial geodemographic classifications (i.e. MOSAIC, ACCORN) use closed methods • • • • Data sources used ? Weighting of the variables ? Data standardisation techniques employed ? Clustering algorithm applied ? • We need open methods and clear documentation of the geodemographic classifications • 2001 OAC • 2001 LOAC (London‟s Output Area Classification) • 2011 OAC
  • 13. Public Consultation • Users of the classification cannot modify or give a feedback • Users should have the control to modify the classification through their feedback • UCL‟s E-Society Classification
  • 16. Need for real time Geodemographics • Current classifications are created using static data sources • Rate and scale of current population change is making large surveys (census) increasingly redundant • Significant hidden value in transactional data • Data is increasingly available in near real time e.g. ONS (Office of National Statistics) NESS API • Social media data is available in real time
  • 17. What are real time Geodemographics ? Specification Real time feeds of data Estimation Online Specification of inputs Clustering Testing Visualisation
  • 18. Computational challenges • Integration of large and possibly disparate databases • E.g. NHS data; Census data • Data normalisation and optimization for fast transactions • Minimizing computational time of clustering algorithms (Very Important)! • Common protocol • XML (SOAP) • Use of non traditional data sources. (Singleton, 2008) • E.g. Flickr; Facebook, Twitter
  • 19. Important Challenge: Selection of clustering algorithm • • • • K-Means PAM (Partitioning Around Medoids) CLARA (Clustering Large Applications) GA (Genetic Algorithm)
  • 20. k-means • Widely used clustering algorithm for geodemographics • Attempts to find out cluster centroids by minimising within sum of squares distance. • K-means is unstable due to its initial seeds assignment. • Sensitive to outliers in the data set. • Creating a Geodemographic classification requires running algorithm multiple times. • 10,000 times (Singleton, 2008) • Computationally expensive in a real time environment.
  • 21. An example of bad clustering result (K-means)
  • 22. An example of bad clustering result (K-means)
  • 23. An example of bad clustering result (K-means)
  • 24. Alternate Clustering Algorithms • PAM (Partitioning around medoids) • CLARA (Clustering Large Applications) • GA (Genetic Algorithm)
  • 25. Alternate Clustering Algorithms… • PAM (Partitioning around medoids) • It tries to minimize the sum of dissimilarities of the data points to their cluster centers. • Less sensitive to outliers than K-means. • Cannot handle larger data sets. • Produces better results than k-means for smaller data sets.
  • 26. Alternate Clustering Algorithms… • CLARA (Clustering Large Applications) • It draws multiple samples of the dataset, applies PAM to each sample and returns the best result. • Can handle large data sets as it operates on samples rather than on actual data set. • Could be a better choice for creating classifications on the fly.
  • 27. Alternate Clustering Algorithms… • GA (Genetic Algorithm) • It is inspired by models of biological evolution. It produces results through a breeding procedure. • Creates hierarchies of generations and then merge the hierarchies in homogeneous groups having similar characteristics. • Can be time consuming due to the creation of generation hierarchies.
  • 28. Comparing computational efficiency (Z-scores) OA (Output Area) level results LSOA (Lower Super Output Area) level results Ward level results
  • 29. Algorithm Stability (w.r.t. Computational time) Running k-means on OA (Output Area) for 120 times on each iteration 4 3.5 3 2.5 2 1.5 1 0.5 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 Time (s) K-means Running GA on OA (Output Area) for 120 times on each iteration Running CLARA on OA (Output Area) for 120 times on each iteration GA 4 3.5 3 2.5 2 1.5 1 0.5 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 Time (s) 4 3.5 3 2.5 2 1.5 1 0.5 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 Time (s) CLARA
  • 30. Bespoke Real-time Geodemographics Data • • • • Realtime Measurement Specify inputs and weights Data normalisation Clustering Visualisation Bespoke Requests
  • 31. GeodemCreator: A software for creating Geodemographic Classifications in near real-time.
  • 32. GeodemCreator • Allows users to create Geodemographic Classifications • Users have the control of how a Geodemographic Classification is created (Open Methods !)
  • 33. Building a Geodemographic Classification • Step-1: Choose a dataset
  • 34. Building a Geodemographic Classification • Step-2: Check Correlation of the Variables
  • 35. Building a Geodemographic Classification • Step-3: Select variables
  • 36. Building a Geodemographic Classification • Step-4: Specify „number of clusters‟ and „spatial area‟ Number of Clusters Spatial Area
  • 37. Building a Geodemographic Classification • Step-5: Build the Classification
  • 38. Building a Geodemographic Classification • Output – Cluster Numbers
  • 39. Building a Geodemographic Classification • Output
  • 40. Social Media data for Geodemographics
  • 41. Why we need Social Media data for Geodemographics ? • Traditional geodemographic classifications are based on Census data • Night time geography • These classifications do not identify where the population is during the day time • We do not know about the Social links between different people • A solution is to infuse Social Media data with traditional data sources
  • 42. Geodemographics • “Analysis of people by where they live” or “locality marketing” Social Media Geodemographics • “Analysis of people by where they live, travel, and who they communicate with”
  • 43. Social Media Geodemographics • Who: Ethnicity, Gender, and Age of social media users • Where: Where social media conversations are happening and who is leading them • Intelligence about where people are located and what they are doing • When: What time of day conversations happen
  • 44. Twitter (www.twitter.com) • Online social-networking and micro blogging service • Launched in 2006 • Users can send messages of 140 characters or less • Approximately 200 million active users • 350 million tweets daily • In 2012, UK and London were ranked 4th and 3rd, respectively, in terms of the number of posted tweets
  • 45. Data available through the Twitter API • • • • • • • • • User Creation Date Followers Friends User ID Language Location Name Screen Name Time Zone • • • • • Geo Enabled Latitude Longitude Tweet date and time Tweet text
  • 46.
  • 47. Analysing Names on Twitter • Some examples of NAME variations on Twitter Real Names Kevin Hodge Andre Alves Jose de Franco Carolina Thomas, Dr. Prof. Martha Del Val Fabíola Sanchez Fernandes Fake Names Castor 5. WHAT IS LOVE? MysticMind KIRILL_aka_KID Vanessa Petuna
  • 48. Tweeting Activity by different Ethnic Groups
  • 50. Age distribution of Twitter Users vs 2011 Census
  • 51. Summary • Geodemographics is the analysis of people by where they live • But generalised geodemographic classifications have some problems – We need bespoke classifications for smaller areas • Real-time geodemographic classifications is a solution to create bespoke classifications • Methods of creating current classifications are not open – We need Open tools and Open methods for geodemographics • Social media data for geodemographic classifications