The majority of NoSQL meetups in London are hosted on meetup.com and luckily for us meetup.com has an API that allows us to extract all the corresponding data - groups, events, venues, members and RSVPs.
In this talk Mark will show how we can use R to gain quick insights into the data using tools like dplyr and ggplot2. We'll also do some social network analysis of the attendees of London's meetup scene using igraph.
Finally we'll look at how we could bring together all these insights into a brand new Clojure front end for the meetup website.
9. Interesting questions to ask...
● What day of the week do people go to meetups?
● Where abouts in London are NoSQL meetups held?
● Do people sign up for multiple meetups on the same
day?
● Are there common members between groups?
● What topics are people most interested in?
● In which order do people join the NoSQL groups?
● Who are the most connected people on the NoSQL
scene?
12. When do people go to meetups?
(g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]-
({response: 'yes'})<-[:RSVPD]-()
13. When do people go to meetups?
MATCH (g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]-
({response: 'yes'})<-[:RSVPD]-()
WHERE (event.time + event.utc_offset) < timestamp()
RETURN g.name,
event.time + event.utc_offset AS eventTime,
event.announced_at AS announcedAt,
event.name,
COUNT(*) AS rsvps
22. Where do people go to meetups?
(g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]-
({response: 'yes'})<-[:RSVPD]-(),
(event)-[:HELD_AT]->(venue)
23. Where do people go to meetups?
MATCH (g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]-
({response: 'yes'})<-[:RSVPD]-(), (event)-[:HELD_AT]->(venue)
WHERE (event.time + event.utc_offset) < timestamp()
RETURN g.name,
event.time + event.utc_offset AS eventTime,
event.announced_at AS announcedAt,
event.name,
venue.name AS venue,
venue.lat AS lat,
venue.lon AS lon,
COUNT(*) AS rsvps
24. Where do people go to meetups?
MATCH (g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]-
({response: 'yes'})<-[:RSVPD]-(), (event)-[:HELD_AT]->(venue)
WHERE (event.time + event.utc_offset) < timestamp()
RETURN g.name,
event.time + event.utc_offset AS eventTime,
event.announced_at AS announcedAt,
event.name,
venue.name AS venue,
venue.lat AS lat,
venue.lon AS lon,
COUNT(*) AS rsvps
25. Where do people go to meetups?
byVenue = events %>%
count(lat, lon, venue) %>%
ungroup() %>%
arrange(desc(n)) %>%
rename(count = n)
26. Where do people go to meetups?
## lat lon venue count
## 1 51.50256 -0.019379 Skyline Bar at CCT Venues Plus 1
## 2 51.53373 -0.122340 The Guardian 1
## 3 51.51289 -0.067163 Erlang Solutions 3
## 4 51.49146 -0.219424 Novotel - W6 8DR 1
## 5 51.49311 -0.146531 Google HQ 1
## 6 51.52655 -0.084219 Look Mum No Hands! 22
## 7 51.51976 -0.097270 Vibrant Media, 3rd Floor 1
## 8 51.52303 -0.085178 Mind Candy HQ 2
## 9 51.51786 -0.109260 ThoughtWorks UK Office 2
## 10 51.51575 -0.097978 BT Centre 1
27. Where do people go to meetups?
library(ggmap)
map = get_map(location = 'London', zoom = 12)
ggmap(map) +
geom_point(aes(x = lon, y = lat, size = count),
data = byVenue,
col = "red",
alpha = 0.8)
34. Meetup Group Member Overlap
● Why would we want to know this?
○ Perhaps for joint meetups
○ Topics for future meetups
35. Extracting the data
MATCH (group1:Group), (group2:Group)
WHERE group1 <> group2
OPTIONAL MATCH p =
(group1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(group2)
WITH group1, group2, COLLECT(p) AS paths
RETURN group1.name, group2.name,
LENGTH(paths) as commonMembers
ORDER BY group1.name, group2.name
36.
37. MATCH (group1:Group), (group2:Group)
WHERE group1 <> group2
OPTIONAL MATCH (group1)<-[:MEMBER_OF]-(member)
WITH group1, group2, COLLECT(member) AS group1Members
WITH group1, group2, group1Members,
LENGTH(group1Members) AS numberOfGroup1Members
UNWIND group1Members AS member
OPTIONAL MATCH path = (member)-[:MEMBER_OF]->(group2)
WITH group1, group2, COLLECT(path) AS paths, numberOfGroup1Members
WITH group1, group2, LENGTH(paths) as commonMembers, numberOfGroup1Members
RETURN group1.name, group2.name,
toInt(round(100.0 * commonMembers / numberOfGroup1Members)) AS percentage
ORDER BY group1.name, group1.name
Finding overlap as a percentage
38.
39. How many groups are people part of?
MATCH (p:MeetupProfile)-[:MEMBER_OF]->()
RETURN ID(p), COUNT(*) AS groups
ORDER BY groups DESC
40. How many groups are people part of?
ggplot(aes(x = groups, y = n),
data = group_count %>% count(groups)) +
geom_bar(stat="identity", fill="dark blue") +
scale_y_sqrt() +
scale_x_continuous(
breaks = round(seq(min(group_count$groups),
max(group_count$groups), by = 1),1)) +
ggtitle("Number of groups people are members of")
41.
42. Who’s the most connected?
● i.e. the person who had the chance to meet
the most people in the community
● Betweenness Centrality
● Page Rank
47. Page Rank
PageRank works by counting the number and quality of
links to a page to determine a rough estimate of how
important the website is.
The underlying assumption is that more important websites
are likely to receive more links from other websites.
48. Page Rank
PageRank works by counting the number and quality of
links to a person to determine a rough estimate of how
important the person is.
The underlying assumption is that more important people
are likely to receive more links from other people.
50. Blending back into the graph
query = "MATCH (p:MeetupProfile {id: {id}}) SET p.betweenness = {score}"
tx = newTransaction(graph)
for(i in 1:nrow(bwDf)) {
if(i %% 1000 == 0) {
commit(tx)
print(paste("Batch", i / 1000, "committed."))
tx = newTransaction(graph)
}
id = bwDf[i, "id"]
score = bwDf[i, "score"]
appendCypher(tx, query, id = id, score = as.double(score))
}
commit(tx)
51. Blending back into the graph
query = "MATCH (p:MeetupProfile {id: {id}}) SET p.pageRank = {score}"
tx = newTransaction(graph)
for(i in 1:nrow(prDf)) {
if(i %% 1000 == 0) {
commit(tx)
print(paste("Batch", i / 1000, "committed."))
tx = newTransaction(graph)
}
name = prDf[i, "name"]
rank = prDf[i, "rank"]
appendCypher(tx, query, id = name, score = as.double(rank))
}
commit(tx)
52. Are they in the Neo4j group?
MATCH (p:MeetupProfile)
WITH p
ORDER BY p.pageRank DESC
LIMIT 20
OPTIONAL MATCH member = (p)-[m:MEMBER_OF]->(g:Group)
WHERE group.name = "Neo4j - London User Group"
RETURN p.name, p.id, p.pageRank, NOT m is null AS isMember
ORDER BY p.pageRank DESC
53. Are they in the Neo4j group?
blended_data = cypher(graph, query)
55. Have they been to any events?
MATCH (p:MeetupProfile)
WITH p
ORDER BY p.pageRank DESC
LIMIT 20
OPTIONAL MATCH member = (p)-[m:MEMBER_OF]->(g:Group)
WHERE g.name = "Neo4j - London User Group"
WITH p, NOT m is null AS isMember, g
OPTIONAL MATCH event= (p)-[:RSVPD]-({response:'yes'})-[:TO]->()<-[:HOSTED_EVENT]-(g)
WITH p, isMember, COLLECT(event) as events
RETURN p.name, p.id, p.pageRank, isMember, LENGTH(events) AS events
ORDER BY p.pageRank DESC
56. Have they been to any events?
blended_data = cypher(graph, query)
57. Take Aways
● ggplot => visualisations with minimal code
● dplyr => easy data manipulation for
people from other languages
● igraph => find the influencers in a network
● graphs => flexible way of modelling data
that allows querying across multiple
dimensions