1. place graphs
are the new
social graphs
Matt Biddulph
@mattb | matt@hackdiary.com
Every data scientist has their own favourite way of representing their data. For some people
itâs Excel, and they think in rows and columns. For others itâs matrices, and they use linear
algreba to interrogate their data. For me, itâs graphs.
2. Weâre all pretty used to the idea that you can model human relationships in a social graph.
3. âSocial network analysis
views social relationships in
terms of network theory
consisting of nodes and ties.
Nodes are the individual actors
within the networks, and ties
are the relationships between
the actors.â
Thereâs a pretty deep area of mathematical study called Social Network Analysis that goes
back at least 20 years. It tries to create insight by analysing the structure of social networks,
and usually doesnât incorporate any elements of culture or sociology in doing so.
4. Centrality
measures
It led to the creation of techniques like centrality measures, that try to ïŹnd the nodes that are
most central to the network. These might be the kind of people on Twitter who have the
highest chance of being retweeted.
5. Community
detection
There are also community detection algorithms that try to ïŹnd the most tightly-knit
subgraphs and cluster those nodes together. If you ran this over the network of people I
follow on Twitter, it might be able to pick out my work colleagues or the people I socialise
with face-to-face.
6. People you
may know
Sites like LinkedIn build almost-telepathic âpeople you may knowâ features by walking around
the graph starting at your node and looking for people that show up a lot in your
neighbourhood that you havenât connected with yet.
8. Belgium is a country in the northwest of Europe with some unusual cultural qualities. Itâs
sandwiched between the Netherlands and France. About half of the country speaks French,
and the other half speaks Dutch. Itâd be very interesting to study the patterns of interactions
in this country.
9. Researchers at Louvain in Belgium were lucky enough to do a joint project with a Belgian
mobile phone company. They had access to anonymised records of 2.6 million phone calls -
the record of which phone called which number when.
http://arxiv.org/pdf/0802.2178v2
10. Belgian
phonecall
network
Fast unfolding of communities in large networks, Blondel et al [2008]
They used these calls to construct a âcall graphâ. They were able to develop a community-
detection algorithm that could detect the two separate clusters of Dutch and French speakers
that were mostly only calling each other. The algorithm achieved this simply by analysing the
shape of the graph. It knew nothing about French, Dutch or phone calls.
http://arxiv.org/pdf/0803.0476
11. So letâs take a step back and think about what other kinds of graph we could form, from what
kinds of data.
12. I work in location apps at Nokia, and so I naturally think of places. Wouldnât it be interesting
to study the connections between cities instead of people? For example, people probably ïŹy
more often between NYC and LA than they do between NYC and New Jersey. We could re-
draw the map based on closeness in the travel network.
13. I turned to the Hadoop cluster at Nokia and took a sample of several weeks of logs from our
routing servers. These are used every time someone uses our maps application to request a
driving route from one place to another. Every time someone drove from A to B, I made an
edge in a âplace graphâ from A to B.
14. I ran the data through Gephi and asked it to cluster it based on the strength of connections
between towns. The result is a not-quite-geographic new map of the world, where two cities
are close to each other if people often drive between them.
15. UK
China
Korea,
Japan, etc
Spain Most of Europe
India
Pakistan
Finland Russia
As youâd expect, the UK is an island and so people donât drive in and out of it very often.
Spain and Portugal are not islands, but they appear separate because theyâre attached to the
rest of Europe by a very narrow neck of land. So people are much more likely to ïŹy than drive
out of Spain.
16. How could we use this data in a practical application? Say Iâm coming to New York to attend a
conference on big data. I could choose a hotel near the conference venue, but Iâd rather see
more interesting parts of New York.
17. Where should
I stay?
If Iâve never been to New York before, I could ask a friend. I could tell them that I like
Londonâs West End and San Franciscoâs downtown.
18. Times Square = Piccadilly Circus
New York London
If they know both towns, theyâd probably tell me that Times Square is the Piccadilly Circus of
New York.
19. What is the Greenwich Village
of Tokyo?
... the Noe Valley of New York?
... the Shibuya of Los Angeles?
But if we delve into the place graph, we could answer much more interesting questions, and
create a âneighbourhood isomorphismâ from city to city. People who like the Mission in SF
and Shoreditch in London could ïŹnd out that Williamsberg is probably the best place for
them to stay in New York.