1. Social Maps for a City
Taha Kachwala
4408225
TU Delft
taha.kachwala21@gmail.com
ABSTRACT
There has been a large scale migration towards urban cities in
many countries around the globe. Traditional methods of
diversifying the citizens living in the city based on race, religion,
age, nationality cannot work anymore. For effectively managing a
city, the governments require to know what kind of communities
of people live in their city, and in which areas. Governments can
involve certain communities of people while taking specific
decisions. This can be done if people living in a city can be
segregated effectively on the basis of their interests, background,
knowledge etc. This paper suggests two methods based on social
media that can be used to effectively segregate people. Both the
methods will eventually create Social Maps, that can visualize
clusters of communities as well as how these communities are
related to each other. These maps can help city as well as national
governments to collectively improve the “Social Progress Index”
of the nation.
1. INTRODUCTION
In real world, every person can maintain roughly 150 real-world
relationships. This number is called the “Dunbar Number”. Some
people may have more and some may have less. However, in an
online world, people may have many more relationships, perhaps
a few thousand. The offline real-world relationships that people
have will roughly be an overlapping subset of relationships they
have online. The online relationships are more flexible as they can
operate globally and at all times of the day. If we have this online
relationship between people in a specific geographical area, then
we can roughly estimate their real world relationships as well. The
principle of ‘Homophily’, is a powerful tendency for people to
organize themselves into groups of people who are similar to
themselves, it doesn’t matter whether it is online or offline.
So if we accept the notion that people do, in fact, have
relationships that both shape and are shaped by their interactions,
then it follows that there may be some ways to measure these
relationships with some level of fidelity. Social network can help
to offer some information regarding these relationships though
with some biases. The relationship and interaction of people in
these social networks is extensively used today for
recommendation systems, to find people you may know, security
reasons etc then why not for managing a city effectively?
This paper suggests a methodology by which user generated data
on twitter, Facebook, LinkedIn, Xing, and Soundcloud can be
used to map users into certain communities based on their
interests. Then the relationships between different communities
are found and how much they are connected with each other e.g.
‘Tech geek’ community will be closely related to ‘Web
Developer’ community. These communities can further be used to
obtain relevant information from the data generated by these
users. People of the city can be diversified based on their social
construct unlike based on ‘Race’, ‘Religion’, ‘ethnicity’ etc which
has proven to be a poor proxy to represent diversity. City
governments/municipalities can consult specific or related groups
or communities of people to give valuable inputs on certain
decisions.
By categorizing city users into different communities like politics,
tech geeks, radio and newspapers, sports, travel, religious
ideologies, web developers and coders, bloggers, activists, age
groups, etc using user modelling on their social media platforms.
Then start creating relations between these groups of people.
If we achieve providing this data, the governments can learn more
about the social construct of the people living in a city, what they
like to do and what they can do about it. This will contribute to
the development of diversity. This diversity can be used in a way
to tackle some intractable problems of the society in a new way.
This can be used to tackle many of the urban challenges regarding
environment, transportation, buildings etc.
2. RELATED WORK
2.1 Social Progress Index
GDP is usually used as a measure of development of a nation.
GDP has defined and shaped our lives for the last 80 years. GDP
was a concept that was introduced by Simon Kuznets in a report
that he delivered called “national Income, 1929-1932’. But, in that
first report, Kuznets himself delivered a warning which said
‘welfare of a nation can, therefore, scarcely be inferred from a
measurement of national income’. It clearly states that GDP is a
tool to help us measure economic performance, but it’s not a
measure of our well-being.
Social Progress Index (SPI) is a new tool which helps measure the
social progress of people living in a city or country. It provides a
rich framework for measuring the multiple dimensions of social
progress, benchmarking success, and catalyzing greater human
well-being.
Social Progress Index is defined as
The capacity of a society to meet the basic human needs of its
citizens, establish the building blocks that allow citizens and
communities to enhance and sustain the quality of their lives, and
create the conditions for all individuals to reach their full
potential.
2. Figure 1 (Source-[7]) below gives a detail about all the attributes
that define ‘Social Progress Index’.
You can obtain more information about Social Progress Index
from [8] and watch a TED talk about it on [9]. To summarize,
Table 1 gives a list of top 20 countries measured with SPI and
their corresponding GDP’s.
Table 1
RA
NK
COUNTRY SPI GDP RA
NK
COUNTRY SPI GDP
1 New
Zealand
88.24 25,857 11 Austria 85.11 36,200
2 Switzerland 88.19 39,293 12 Germany 84.61 34,819
3 Iceland 88.07 33,880 13 United
Kingdom
84.56 32,671
4 Netherlands 87.37 36,438 14 Japan 84.21 31,425
5 Norway 87.12 47,547 15 Ireland 84.05 36,723
6 Sweden 87.08 34,945 16 United
States
82.77 45,336
7 Canada 86.95 35,936 17 Belgium 82.63 32,639
8 Finland 86.91 31,610 18 Slovenia 81.65 24,483
9 Denmark 86.55 32,363 19 Estonia 81.28 18,927
10 Austrailia 86.10 35,669 20 France 81.11 29,819
As we can see, United States stands 16th according to SPI, though
its GDP is the highest. On the contrary, New Zealand has a GDP
which is far lower than US, but its ranked #1 according to SPI.
This means that people in New Zealand are much happier than
those living in US.
One of the main goals of this paper is to help governments
increase their SPI by giving them a tool to make their citizens
happier, instead of bragging about the growth in GDP.
User information has been diversified across a lot of platforms.
Now with the help of Social Web and Web 2.0, we can try to
merge this user information from different platforms, into one.
For this paper, we will try to collaborate. We can use 2-different
types of models for such cross-system collaborative approaches.
● A centralized approach with standardized models that
can aggregate the distributed user information over
different platforms.
● A decentralized approach where dedicated software
components transfer user information from one
application’s representation into another.
In this paper, we will rely on the former model of centralized
approach. Within this centralized approach, two main submodels
exist. The first submodel relies on use of standardized user models
which involved applications must agree on. This involves using
generalized ontologies like General User Modeling Ontology
(GUMO) [1] or Friend-of-a-Friend (FOAF). For this paper we
will rely on the second submodel to build meta-models that allow
defining how application-dependent user data corresponds to user
data from another application. The advantage with this application
is that the application need not be using the same generic user
model as in the first case. The ontology also allows defining
relationships between the data and can be aggregated. So it is
possible to get a set of user-interests from Facebook and merge it
with music related interests from Soundcloud. Though the music
related interest will be a subset of user-interest but will be more
detailed and specific [2].
An assumption regarding this research paper is based on the fact
that the system generated will be used by city governments and
municipalities. Therefore, the governments can obtain required set
of permissions from their citizens in applications like facebook,
twitter, LinkedIn, Xing and Soundcloud. These sets of
permissions are such that it does not violate the user’s privacy.
3. SOCIAL MAPS USING SWUM
For building a successful Social Web User Model (SWUM) based
on various platforms, we first need to analyze what kind of data
we can capture from different social platforms without invading
the privacy of the citizens. Due to the extensive use of OAuth
protocol, many successful web platforms are ready to provide
their own authorization and basic profile information for external
applications. Facebook, twitter, LinkedIn etc are also ready to
provide more data about their users through their API’s to external
applications. Table 2 lists the relevant information that we can
obtain from these various social platforms.
Table 2
Platform Required User Permissions
facebook id,name,gender,locale,user friends(only the friends
living in the same city), email,
user_actions(books,fitness,music,news),
user_activities, user_interests, user_location,
user_education_history
3. Twitter Read tweets from timeline, who you follow
LinkedIn Basic Profile Fields, language fields, skills fields,
certification fields, Education fields, Position
fields
Xing Basic Profile Fields, professional_experience,
active_email
SoundCloud music interests
The above websites are chosen because they provide relevant and
useful information that can collectively be used to recreate a
perfect user model for every citizen. Table 3 provides the
semantics that we can get from each of the platforms.
To be able to create a social web user model, we need to analyze
which type of information and which user model dimensions
should be a part of the model and which attributes in these
dimensions should be supported.
Table 3
Platform Semantics obtained
facebook 1. Generic information about the user
2. User’s connections within the city(This
only provides the list of users who
access the same platform)
3. Daily activities of the user
4. Users interests
5. Behavioral Analysis
Twitter Twitter can help us gain ample of information
regarding the user.
1. Current activities of the user
2. Political viewpoints
3. Logical viewpoints
4. Behavioural analysis
LinkedIn Basic Profile Fields, language fields, skills fields,
certification fields, Education fields, Position
fields
Xing Basic Profile Fields, professional_experience,
active_email
SoundCloud music interests
3.1 User Model Dimensions
After collecting the above mentioned information about our users,
we need to model the dimensions of this information so that we
can create our user model. The following is taxonomy of
dimensions that we need to use.
1. Personal Characteristics and Demographics- Basic
information like age, gender, name, address, location,
contact information. We can collect this data from
facebook, LinkedIn and Xing.
2. Interests- The type of hobbies and interests a user has
e.g. news, politics, gaming, online shopping etc.
Facebook and twitter can collectively give us an
accurate set of users’ interests.
3. Mental and Physical well-being - Describes individual
characteristics like physical limitations, health or mental
states like stress, cognitive load. This can be derived
from information received from facebook page likes, the
people the user follows on twitter, job profile from
LinkedIn and Xing, and music interests from
soundcloud.
4. Knowledge- This describes how socially active a user is
in certain fields, their educational status, skills etc. This
type of information can be derived from the tweets a
user makes, educational universities, their
qualifications, people they follow on twitter. This type
of information is dynamic and needs to be analyzed
after certain periods because the knowledge level
always changes from time to time. Knowledge
regarding certain topics might increase or decrease
overtime. This information can be derived from Social
business applications like LinkedIn and Xing.
5. Individual Behaviour- This is certainly one of the most
important characteristic that can define a user. This
dimension has a direct impact on the previously
specified dimensions and can also be used to infer
information about the previous dimensions. Deriving
User Behaviour is a complicated process as it is usually
an implicit feature and is not available on a user’s social
profile. We will discuss analysis of this dimension in
more detail in section 3.6.
6. Context: In computer science this term generally refers
to “any information that can be used to characterize the
situation of an entity” [3]. In the area of user modelling,
this term focuses on the user’s environment (location
and time, devices the user uses). According to research,
‘context’ is a very important area as far as user
modelling is concerned [4], but it has a very limited
application for this research.
This means that for an effective generation of user model for our
application, it is important to cover the dimensions of Personal
characteristic and Demographics, Interests, Mental and Physical
well-being, Knowledge and Individual Behaviour [6].
4. 3.2 User Model Attributes
We have selected the required dimensions, so now we need to
define the attributes of these dimensions that our user model
supports. Table 4 shows an example of attributes in Personal
Characteristic dimension.
Table 4
3.3 WordNet
Similar information about the same user is stored over many
applications e.g. Facebook, twitter, LinkedIn all store basic
information about the same user, but with different names.
Facebook uses the term ‘username’ while twitter uses ‘handle’ to
identify unique users. This problem of attribute name
heterogeneity complicates a possible aggregation using Meta-
Model strategy. To solve this problem, WordNet is used.
WordNet defines word sense relations between words. You can
dive into more details about WordNet from [5]. To summarize, if
a word represents a user attribute, the relatedness between
different attributes can be acquired through WordNet. As our
previous assumption was that the user himself gives permissions
on our application for his profiles in different applications,
aggregation of personal characteristics is not a problem. But for
aggregation of interests, and mental and physical well-being, a
little help from WordNet would not hurt.
3.4 Use Case: Profile Aggregation
The figure above shows an example of how a user profile from
Facebook and LinkedIn can be aggregated together to form our
Social Web User Model. For deriving our first dimension of
Personal characteristics, we need to merge the data which we get
from facebook and LinkedIn. We know that name, email, contact
info, educational qualifications, current employment can be
derived from both facebook as well as LinkedIn. But our
dimension would be more concrete if we use professional data
from LinkedIn, because on facebook many people are studying at
‘Hogwarts school of Wizardry’ and working at ‘Mah Lyf, Mah
Rulez’. Even demographic information from LinkedIn sometimes
should be preferred over facebook, but I find it more relevant to
extract it from facebook because people update basic information
on facebook more frequently than LinkedIn (As once they found a
good job, they do not update their location, email etc).
Demographics
● Gender: string
○ Male: bool
○ Female:
bool
● Birthdate: Date
● Language: string
● Education: string
○ High-
school:
bool
○ Bachelor’s
: bool
○ Master’s:
bool
○ Phd: bool
● Employment: string
○ employed:
bool
Contact Information
● Name: string
● Mobile number: int
● e-mail: string
● Places lived: list of locations
● Current City: Location
Location
● Country: string
● State: string
● City: string
● address: string
5. For our second dimension regarding ‘Interests’, it is clear that
facebook can provide more concrete details. We can obtain more
information regarding user preferences from what the user likes,
his interests that are already extracted by facebook. Similarly for
‘Knowledge’, we can extract data from LinkedIn regarding work
experience, past jobs, skills etc.
3.5 Mapping Connections between different
User Models
Once we have the Social web User Model ready for all or most of
the citizens living in the city, we can start analyzing connection
and map them. This can be done in 2 steps.
1. Finding relations between different dimensions of the
user model.
2. Finding relations between same dimensions of different
users’ model.
Let me explain both the steps using an example. Let’s assume that
there are 10 male users, living in Rotterdam, in an age group of
20-30 yrs (Personal Info and demographics), they are all
aerospace engineers (Knowledge), and are interested into sports
and rock music (Interests). Now we have a new male user A
whose ‘Interests’ dimension is incomplete. ‘A’ lives in Rotterdam,
is an aerospace engineer and is in the age group of 20-30 yrs, then
there is a high probability that he might be interested in sports and
rock music.
For the 2nd step, let us assume that there are 3 groups of users
A,B and C. Group A is interested into politics, sports and music,
group B is interested in politics, sports and stock market and
Group C is interested into stock market and Justin Bieber. From
analyzing these dimensions, we can find that politics and sports
are closely related to music and stock markets, while stock
markets is somewhat related to Justin Bieber. By this we can infer
that Group A and B are closely related, B and C are somewhat
related and A and C are not related. This concludes that politics
and sports are interests that are far away from having an interest in
Justin Bieber.
Finally we can map these relationships using a force directed
graphs. This will create a Social Map. These social maps can be
generated for each and every dimension of the Social Web User
Model. These dimensions can be studied in detail, and how
different attributes of these dimensions are related. These maps
can be used to take decisions by the government on different
scenarios. Governments can easily consult and gather opinions for
specific situations from the communities which will be most
affected and the other closely related communities.
3.6 Behavior Analysis
The kind of democracy that is offered by social media and the
internet, has resulted in users exhibiting different behaviours like
sharing, posting, liking, commenting, tweeting, following and,
advertising on a daily basis. By analyzing these user behaviours
over social media, they can be categorized into individual and
collective behaviours. Individual behaviour is exhibited by a
single user, whereas collective behaviour is observed when a
group of users behave together for e.g. users using the same
hashtag on twitter.
3.6.1 Individual Behavior Analysis
Individual Behaviour can be considered one of the following
1. User-User Behaviour: Observed between two
users. For e.g. befriending or following
another user
2. User-Entity Behaviour: Liking a post or
posting a tweet on twitter.
3. User-Community Behaviour: Joining/Leaving
groups on facebook or LinkedIn
Irrespective of the type of behaviour, we can use computational
methodology to analyze behaviour and find interesting patterns.
To analyze individual behaviour, we can trace who the user
follows on twitter overtime and try to understand the underlying
reasons for such followings. A machine learning program can be
implemented using randomization tests or causality testing
techniques [7].
3.6.2 Collective Behavior Analysis
Collective behaviour analysis can be easily derived from
analyzing individuals that exhibit a collective behaviour
independently. It can be achieved by aggregating the result of
individual behaviour analysis. You can read more about
behavioural analysis from [8].
This behavioural data is massive, expansive and, indicative of
user preferences, interests and opinions. These ‘opinions’ are
something which is one of the most important aspect that a city
government needs to know about their citizens. These opinions
can vary collectively based on the communities of users on
different issues. During a certain financial situation, opinions from
group of economists can be of use, while during a political
situation, opinions from the politically interested community
would be more relevant over others. This can help the government
in managing the welfare and well-being of their citizens and
collectively increase the ‘Social Progress Index’ of the city.
3.7 Conclusion-Social Maps using Social Web
User Model
If we can secure the above information, from every citizen in the
city, we can conveniently create a user model for every citizen of
the city. This data can then be effectively used to map
relationships between users. We can find clusters of communities
living within the city, their job profiles, salaries received, and
eventually target issues that really matter.
However due to the NSA-revelations and the fear of secret
government surveillance programs, many people will be reluctant
to provide the required permissions. However the users need to be
assured that they won't be targets to any such surveillance
programs or censorship and it is for the sole purpose of
maintaining a city. Regarding this issue, I did a short research
survey to find out how much people are willing to provide their
social data. The results are shown on the next page (source-
http://www.pollican.com/result/7/What_information_are_you_wil
ling_to_share_with_your_government).
From these survey results, I found out that people are usually not
willing to give away facebook data to governments. According to
this survey, the only data they are willing to share is email. From
LinkedIn and Xing, people are readily willing to share their skills,
experience, and educational details. Most of them are ready to
give away their Job Profile as well. Twitter has the most positive
results as people are willingly ready to share their twitter streams
as well as follow list. The only down side of twitter is that not
many people have twitter accounts or tweet regularly. But
6. however, this leads us to the next step of this paper of developing
‘Social Maps’ using twitter data.
4. SOCIAL MAPS USING TWITTER DATA
In this section we will use twitter data to generate social maps of a
particular city and propose an algorithm for doing it. For example
purpose, we will be using Munich as a target city for analyzing.
As this is for a city government, we will assume that we already
have Personal Information and demographics, and every citizen’s
twitter handle. Twitter streams for Munich can be analyzed using
coordinates 48.1333° N, 11.5667° E. This approach is in
connectivity with [10].
4.1 Algorithm for gathering Data
Before starting this algorithm, create a database table having the
following fields
uid
(int)
Twitter Handle
(string)
Visited
(bool)
Relationships
(int Array)
Cluster or Community
(string)
The twitter handle will have a set of handles of users and uid is
user id which will be unique. Visited will be set to true if that user
has been visited so that we do not run into an infinite recursive
loop. Relationships will have an array of uid’s who the user
follows and is within our dataset. Cluster or community will be
the group in which the user belongs, e.g. music, geek, politics,
sports etc.
Algorithm:
1. Select a seed user from the collection of handles.
2. Create a FIFO list that will store the handles followed
by the seed user which belongs in our dataset. Then
determine the corresponding uid’s of these handles and
store the array in ‘Relationships’ column. Mark the
current seed as visited=true.
3. Start analyzing the Relationship list of the seed user. If a
Uid in that list is not visited, then go to step 2 and run a
recursive loop using the Handle of the unvisited uid as
new seed.
4. Run through the table, if a uid is not visited, then set the
handle as seed and go to step 2, else abort.
Once this algorithm has completed its run, we will have a
‘Relationship’ array for every uid in the database which will be
connected to 1 or more uid’s.
4.2 Laying out the Network Graph
As we are primarily interested in homophily and clustering, we
will use a graph layout which can express communities of
relationships. We will use force-directed graph layout algorithm
[11]. With this approach, relationships will act like springs and
each user node will repel nearby nodes. This graph will eventually
represent the following properties:
1. People with many relationships between them will be
arranged into tight clusters.
2. People with the fewest relationships between then will
appear at opposite edges of the graph.
3. People who have many relationships at both ends of the
graph will appear in the middle.
4. Clusters with few or no relationships between then will
appear very far apart on the graph.
It is based on the concept that if there are 10 people, there can be a
total of 45 relationships between them which is given by the
formulae (n*(n-1)/2). This means that every person is related to
every other person, the force directed graph will be a perfect
symmetric ball. Similarly, if these 10 people are split into 2
groups of 5 people each, and both the groups hate each other, but
each member of the same group has a relation with every member
of the same group, then the final force directed graph will be 2
separate balls with no connection between them.
Similarly if we visualize the data from a city in this way, we will
be able to measure the separateness of communities.
7. 4.3 Detecting Communities and Adding color
Communities can be detected by the number of shared
relationships or interests within a given subgroup. We can use
Louvian community detection algorithm [12], which iteratively
determines communities of interest within a larger network and
can assign community membership to each user accordingly.
Finally we can assign each community a color arbitrarily. There
maybe some user nodes that are affiliated to multiple
communities. These users can be assigned a community with
whom they share the maximum relationships. Another approach is
that they can be given a blend of colors of all the community they
are affiliated to. This can also generate a boundary between
different communities. For example, a person belonging to a
group primarily concerned with politics (blue) and a group
primarily concerned with music (yellow) may be represented by
green.
Finally we can plot these users based on their locations on the
map of Munich. Since we already have the geo locations of each
user, we can just plot these users based on their community colors
on the map. The representation of the force directed graph and the
geographical social map is shown in the following figures. Each
dot in both of these maps, represent a person, and the color of the
dot represents the community. The geographical map is just for
representational purposes.
4.4 Determining Community Interests
Each user node can first be given various sizes which depend on
the number of relationships it shares with other nearby nodes.
This means, more the relationships, larger will be the node.
Finally we can determine the total community interests by
manually inspecting each node, starting with the largest nodes in
the community. Typically people will organize themselves into
groups like: sports, music, media, movies, politics, finance, news,
arts, literature, engineering, cultures etc.
Finally we can start to monitor traffic for each community using,
1. Hashtags
2. shared links
3. languages used
4. Operating systems in use (desktop, mobile, android,
iOS, windows etc)
5. geographic coordinates
6. age
5. CONCLUSION
Interactive and Informative social Maps can be generated by using
both, the social web User Model as well as using Twitter data.
These Social Maps can effectively represent the clusters of
communities living is different areas in a geographical region.
These Maps can be a boon for managing various activities for the
city.
6. REFERENCES
[1] Heckmann, D., Schwarzkopf, E., Mori, J., Dengler, D.,
Krner, A.: The user model
[2] and context ontology gumo revisited for future web 2.0
extensions. In: Proceedings of the Int. Workshop on
Contexts and Ontologies: Representation and
Reasoning. CEUR Workshop Proceedings, vol. 298.
CEUR-WS.org (2007).
[3] Till Plumbaum, Songxuan Wu, Ernesto William, Sahin
Albayrak: User Modeling for the Social Semantic Web.
[4] Dey, A.K.: Understanding and using context. Personal
and Ubiquitous Computing 5, 4–7 (2001)
[5] Said, A., Berkovsky, S., De Luca, E.W.: Putting things
in context: Challenge on context-aware movie
recommendation. In: Proceedings of the Workshop on
Context-Aware Movie Recommendation. pp. 2–6.
CAMRa ’10, ACM, New York, NY, USA (2010)
[6] Bernardo Magnini and Carlo Strapparava: Using
WordNet to Improve User Modelling in a Web
Document Recommender System. In: ITC-irst, Istituto
per la Ricerca Scientica e Tecnologica, I-38050 Trento,
ITALY.
http://multiwordnet.fbk.eu/paper/WordnetWumNAACL
.pdf
[7] Zafarani, R., Abbasi, MA., Liu, H., Social Media
Mining: An Introduction, Cambridge University Press,
2014
[8] http://www.socialprogressimperative.org/system/resourc
es/W1siZiIsIjIwMTQvMDUvMjYvMTYvMzcvMDAv
MjUzL1NvY2lhbF9Qcm9ncmVzc19JbmRleF8yMDE0
X0V4ZWN1dGl2ZV9TdW1tYXJ5LnBkZiJdXQ/Social
%20Progress%20Index%202014%20Executive%20Su
mmary.pdf
[9] http://www.socialprogressimperative.org/data/spi#data_
table/countries/spi/
[10] http://www.ted.com/talks/michael_green_what_the_soci
al_progress_index_can_reveal_about_your_country/tran
script?language=en#t-70266
[11] http://peoplemaps.org
[12] http://en.wikipedia.org/wiki/Force-
directed_graph_drawing
[13] http://perso.uclouvain.be/vincent.blondel/research/louva
in.html