Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Visually Analyzing People
Leo Meyerovich (@LMeyerov)
CEO 1
is:
Supercharging visual analytics
through GPU cloud streaming.
(We tricky graphs.)
CASE STUDY:
TWITTER FRAUD
Naïve layout on 1K+ node graphs
give impenetrable hairballs.
Gauss-Seidel Force-Directed Graph, ...
Even on a small graph (77 nodes),
smart design starts adding clarity
With smart layouts, fake account clusters pop out
ForceAtlas2 Layout, O(n log n) n-body, GPU
The spambot
is an entrypoint
...
A quiet small business who buys
virtual game currency from gamers…
Who somehow got exactly
1 message massively
trended & advertised by Twitter
spammer
laundering
accounts
bot retweet network
It’s a “retweet laundering” botnet!
Tricks Twitter into targeting gamers
t...
Relationships hard to see without
graphs with smart layouts & interactions.
Next step: explore the time dimension
Ex: how ...
Leo A. Meyerovich, @lmeyerov,
Graphistry
THE
SOCIOLOGY
OF
PROGRAMMING
LANGUAGES
11
http://hammerprinciple.com/therighttool
~14,000
developers
Fastest? C > Java> JavaScript > Pascal
Safest? Java > Pascal > JavaScript > C
Goal: Rank Beliefs
Programmers won’t
agree o...
Idea: Chess Ranking
Let’s run a competition for the
friendliest language! (Glicko2)
Each survey response is a game match:
1. Person A says Pyt...
Score Points set by a Bookie
Every language starts with rank 1000
1. “Person A: Python friendlier than C”
 Python’s rank ...
Many Tournaments = Correlation Matrix!
Language x Belief
Cluster (K-Means)
Reduce Dimensionality:
Pick fun languages & cluster centers
Graphs are (Adjacency)
Matrices
Correlation Matrices are Fuzzy Graphs
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5 0.5 0.5
0.5
0.5
0.5 0.5
0.5 0.5
Weak Edges Are Annoying!
Filter: Only Show Strong Relationships
Relationships hard to see without
graphs with smart layouts & interactions.
Step 2 of analysis is correlate (step 1 is cou...
Projects (2000-2010)200K
[PLATEAU 2013]
26
0%
20%
40%
60%
Popularity
Project categories (223)
Java
0%
1%
2%
3%
4%
Project categories (223)
Scheme
Popularity Across N...
Prolog
VBScript Scheme
Fortran
PL/SQL Assembly
C#
Java
0.0001
0.001
0.01
0.1
1
0 1 2 3 4 5
Popularity
Dispersion across ni...
0.0100%
0.1000%
1.0000%
10.0000%
100.0000%
1 10 100
Proportion
of
Projects for
Language
Language Rank (Decreasing )
Langua...
Survey of 1,679 Developers
Extrinsic factors
dominate!
(on last
project)
30
FUTURE STEP:
Now that we’ve counted things, let’s correlate them!
Topics in Free-form ResponsesAnswer Correlations
Relationships hard to see without
graphs with smart layouts & interactions.
Step 2 of analysis is correlate (step 1 is cou...
We’re Hiring Designers!
(and contact if you have interesting
graphs)
33
info@graphistry.com
Twitter: @LMeyerov
Visually Analyzing People with Graphs
Nächste SlideShare
Wird geladen in …5
×

Visually Analyzing People with Graphs

771 Aufrufe

Veröffentlicht am

Analyzing people is great -- we can just talk to them -- but hard: their answers are fuzzy. This talk walks through our analysis of a Twitter botnet and another of how programmers pick programming languages. Interestingly, we used interactive graph visualizations to unravel mysteries in both.

Veröffentlicht in: Daten & Analysen
  • Loggen Sie sich ein, um Kommentare anzuzeigen.

  • Gehören Sie zu den Ersten, denen das gefällt!

Visually Analyzing People with Graphs

  1. 1. Visually Analyzing People Leo Meyerovich (@LMeyerov) CEO 1
  2. 2. is: Supercharging visual analytics through GPU cloud streaming. (We tricky graphs.)
  3. 3. CASE STUDY: TWITTER FRAUD Naïve layout on 1K+ node graphs give impenetrable hairballs. Gauss-Seidel Force-Directed Graph, O(N^2) n-body, GPU Node: Twitter account Edge: Friendship Friends and friend-of-friends of a bot who randomly messaged real people and retweeted them.
  4. 4. Even on a small graph (77 nodes), smart design starts adding clarity
  5. 5. With smart layouts, fake account clusters pop out ForceAtlas2 Layout, O(n log n) n-body, GPU The spambot is an entrypoint to more bots… Obviously fake account names
  6. 6. A quiet small business who buys virtual game currency from gamers…
  7. 7. Who somehow got exactly 1 message massively trended & advertised by Twitter
  8. 8. spammer laundering accounts bot retweet network It’s a “retweet laundering” botnet! Tricks Twitter into targeting gamers to check out a cyberfraud site. They steal gamers’ money and identities.
  9. 9. Relationships hard to see without graphs with smart layouts & interactions. Next step: explore the time dimension Ex: how do mobs launch from Twitter?
  10. 10. Leo A. Meyerovich, @lmeyerov, Graphistry THE SOCIOLOGY OF PROGRAMMING LANGUAGES 11
  11. 11. http://hammerprinciple.com/therighttool
  12. 12. ~14,000 developers
  13. 13. Fastest? C > Java> JavaScript > Pascal Safest? Java > Pascal > JavaScript > C Goal: Rank Beliefs Programmers won’t agree on ranking..
  14. 14. Idea: Chess Ranking
  15. 15. Let’s run a competition for the friendliest language! (Glicko2) Each survey response is a game match: 1. Person A says Python beats C in friendliness 2. Person A says Java beats C in friendliness 3. Person B says C beats APL in friendliness …
  16. 16. Score Points set by a Bookie Every language starts with rank 1000 1. “Person A: Python friendlier than C”  Python’s rank goes up 2. “Person B: Python friendlier than C”  Python already > C, less valuable win 3. “Person C: Haskell friendlier than Python” Problem: little known about Haskell (“sparse”)  Haskell beat a high-rank language: big level increase! (Bayesian!)
  17. 17. Many Tournaments = Correlation Matrix! Language x Belief
  18. 18. Cluster (K-Means)
  19. 19. Reduce Dimensionality: Pick fun languages & cluster centers
  20. 20. Graphs are (Adjacency) Matrices
  21. 21. Correlation Matrices are Fuzzy Graphs 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
  22. 22. Weak Edges Are Annoying!
  23. 23. Filter: Only Show Strong Relationships
  24. 24. Relationships hard to see without graphs with smart layouts & interactions. Step 2 of analysis is correlate (step 1 is count). Correlations are relationships, so explore them as graphs!
  25. 25. Projects (2000-2010)200K [PLATEAU 2013] 26
  26. 26. 0% 20% 40% 60% Popularity Project categories (223) Java 0% 1% 2% 3% 4% Project categories (223) Scheme Popularity Across Niches 27 bloggin g searc h build tools
  27. 27. Prolog VBScript Scheme Fortran PL/SQL Assembly C# Java 0.0001 0.001 0.01 0.1 1 0 1 2 3 4 5 Popularity Dispersion across niches (σ / μ) Popularity vs. Niche: Dispersion 28
  28. 28. 0.0100% 0.1000% 1.0000% 10.0000% 100.0000% 1 10 100 Proportion of Projects for Language Language Rank (Decreasing ) Language Use (survey) 29 Java: winner takes all Long Tail Design for niches and grow 
  29. 29. Survey of 1,679 Developers Extrinsic factors dominate! (on last project) 30
  30. 30. FUTURE STEP: Now that we’ve counted things, let’s correlate them! Topics in Free-form ResponsesAnswer Correlations
  31. 31. Relationships hard to see without graphs with smart layouts & interactions. Step 2 of analysis is correlate (step 1 is count). Correlations are relationships, so explore them as graphs! Powerful because correlations everywhere: raw features, inferred topics, …
  32. 32. We’re Hiring Designers! (and contact if you have interesting graphs) 33 info@graphistry.com Twitter: @LMeyerov

×