1. The document discusses analyzing social media networks using NodeXL. It introduces social media, social networks, and social network analysis.
2. The tutorial section teaches how to use NodeXL to lay out networks, calculate metrics, and visualize networks. It allows the user to learn social network analysis through hands-on practice with NodeXL.
3. The document provides an introduction to analyzing social media networks and networks in general, followed by a practical NodeXL tutorial to help users learn and apply social network analysis.
SQL Database Design For Developers at php[tek] 2024
Visualizing Knowledge for Discovery
1. Information Visualization for
Knowledge Discovery
Ben Shneiderman ben@cs.umd.edu @benbendc
Founding Director (1983-2000), Human-Computer Interaction Lab
Professor, Department of Computer Science
Member, Institute for Advanced Computer Studies
University of Maryland
College Park, MD 20742
6. Obama Unveils “Big Data” Initiative (3/2012)
Big Data challenges:
•Developing scalable algorithms
for processing imperfect data
in distributed data stores
•Creating effective human-
computer interaction tools
for facilitating rapidly
customizable visual reasoning
for diverse missions.
http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf `
7. Information Visualization & Visual Analytics
• Visual bands
• Human percle
• Trend, clus..
• Color, size,..
• Three challe
• Meaningful vi
• Interaction: w
• Process mo
1999
8. Information Visualization & Visual Analytics
• Visual bandwidth is enormous
• Human perceptual skills are remarkable
• Trend, cluster, gap, outlier...
• Color, size, shape, proximity...
• Three challenges
• Meaningful visual displays of massive da
• Interaction: widgets & window coordinati
• Process models for discovery
1999 2004
9. Information Visualization & Visual Analytics
• Visual bandwidth is enormous
• Human perceptual skills are remarkable
• Trend, cluster, gap, outlier...
• Color, size, shape, proximity...
• Three challenges
• Meaningful visual displays of massive data
• Interaction: widgets & window coordination
• Process models for discovery
1999 2004 2010
10. Business takes action
• General Dynamics buys MayaViz
• Agilent buys GeneSpring
• Google buys Gapminder
• Oracle buys Hyperion
• Microsoft buys Proclarity
• InfoBuilders buys Advizor Solutions
• SAP buys (Business Objects buys
Xcelsius & Inxight & Crystal Reports )
• IBM buys (Cognos buys Celequest) & ILOG
• TIBCO buys Spotfire
18. . Information Visualization: Data Types
• 1-D Linear Document Lens, SeeSoft, Info Mural
• 2-D Map GIS, ArcView, PageMaker, Medical imagery
• 3-D World CAD, Medical, Molecules, Architecture
zi Vc S
i
• Multi-Var Spotfire, Tableau, Qliktech, Visual Insight
• Temporal LifeLines, TimeSearcher, Palantir, DataMontage
• Tree Cone/Cam/Hyperbolic, SpaceTree, Treemap
• Network
zi V f nI
Pajek, UCINet, NodeXL, Gephi, Tom Sawyer
o
infosthetics.com visualcomplexity.com eagereyes.org
flowingdata.com perceptualedge.com datakind.org
visual.ly Visualizing.org infovis.org
22. Temporal Data: TimeSearcher 1.3
• Time series
• Stocks
• Weather
• Genes
• User-specified
patterns
• Rapid search
23. Temporal Data: TimeSearcher 2.0
• Long Time series (>10,000 time points)
• Multiple variables
• Controlled precision in match
(Linear, offset, noise, amplitude)
26. LifeFlow: Aggregation Strategy
Temporal
Categorical Data
(4 records)
LifeLines2 format
Tree of Event
Sequences
LifeFlow Aggregation
www.cs.umd.edu/hcil/lifeflow
47. Treemap: WHC Emergency Room
(6304 patients in Jan2006)
Group by Admissions/MF, size by service time, color by age
48. Treemap: WHC Emergency Room
(6304 patients in Jan2006) (only those service time >12 hours)
Group by Admissions/MF, size by service time, color by age
63. Twitter discussion of #GOP
Red: Republicans, anti-Obama,
mention Fox
Blue: Democrats, pro-Obama,
mention CNN
Green: non-affiliated
Node size is number of followers
Politico is major bridging group
70. No Location Philadelphia
Patent
Tech
Navy SBIR (federal)
PA DCED (state)
Related patent
2: Federal agency
Pharmaceutical/Medical 3: Enterprise
Pittsburgh Metro 5: Inventors
9: Universities
10: PA DCED
11/12: Phil/Pitt metro cnty
13-15: Semi-rural/rural cnty
17: Foreign countries
19: Other states
Westinghouse Electric
72. No Location Philadelphia
Innovation Clusters: People, Locations, Companies
Patent
Tech
Navy SBIR (federal)
PA DCED (state)
Related patent
2: Federal agency
Pharmaceutical/Medical 3: Enterprise
Pittsburgh Metro 5: Inventors
9: Universities
10: PA DCED
11/12: Phil/Pitt metro cnty
13-15: Semi-rural/rural cnty
17: Foreign countries
19: Other states
Westinghouse Electric
73.
74. Interactive Methods to Reveal Patterns
Filtering Node & link attribute values or statistics
Clustering Cluster algorithmically by link connectivity
Grouping Group based on node attributes
Motif Common, meaningful structures
Simplification replaced with simplified glyphs
77. Interactive Methods to Reveal Patterns
Filtering Node & link attribute values or statistics
Clustering Cluster algorithmically by link connectivity
Grouping Group based on node attributes
Motif Common, meaningful structures
Simplification replaced with simplified glyphs
87. Analyzing Social Media Networks with NodeXL
I. Getting Started with Analyzing Social Media Networks
1. Introduction to Social Media and Social Networks
2. Social media: New Technologies of Collaboration
3. Social Network Analysis
II. NodeXL Tutorial: Learning by Doing
4. Layout, Visual Design & Labeling
5. Calculating & Visualizing Network Metrics
6. Preparing Data & Filtering
7. Clustering &Grouping
III Social Media Network Analysis Case Studies
8. Email
9. Threaded Networks
10. Twitter
11. Facebook
12. WWW
13. Flickr
14. YouTube
15. Wiki Networks
www.elsevier.com/wps/find/bookdescription.cws_home/723354/description
88. Social Media Research Foundation
Researchers who want to
- create open tools
- generate & host open data
- support open scholarship
Map, measure & understand
social media
Support tool projects to
collection, analyze & visualize
social media data.
smrfoundation.org
91. Discovery Process: Systematic Yet Flexible
Preparation
• Own the problem & define the schedule
• Data cleaning & conditioning
• Handle missing & uncertain data
• Extract subsets & link to related information
92. Discovery Process: Systematic Yet Flexible
Preparation
• Own the problem & define the schedule
• Data cleaning & conditioning
• Handle missing & uncertain data
• Extract subsets & link to related information
Purposeful exploration – Hypothesis testing
• Range & distribution
• Relationships & correlations
• Clusters & gaps
• Outliers & anomalies
• Aggregation & summary
• Split & trellis
• Temporal comparisons & multiple views
• Statistics & forecasts
93. Discovery Process: Systematic Yet Flexible
Preparation
• Own the problem & define the schedule
• Data cleaning & conditioning
• Handle missing & uncertain data
• Extract subsets & link to related information
Purposeful exploration – Hypothesis testing
• Range & distribution
• Relationships & correlations
• Clusters & gaps
• Outliers & anomalies
• Aggregation & summary
• Split & trellis
• Temporal comparisons & multiple views
• Statistics & forecasts
Situated decision making - Social context
• Annotation & marking
• Collaboration & coordination
• Decisions & presentations
94. UN Millennium Development Goals
To be achieved by 2015
• Eradicate extreme poverty and hunger
• Achieve universal primary education
• Promote gender equality and empower women
• Reduce child mortality
• Improve maternal health
• Combat HIV/AIDS, malaria and other diseases
• Ensure environmental sustainability
• Develop a global partnership for development
96. For More Information
• Visit the HCIL website for 700+ papers & info on videos
www.cs.umd.edu/hcil
• See Chapter 14 on Info Visualization
Shneiderman, B. and Plaisant, C., Designing the User Interface:
Strategies for Effective Human-Computer Interaction:
Fifth Edition (2010) www.awl.com/DTUI
• Edited Collections:
Card, S., Mackinlay, J., and Shneiderman, B. (1999)
Readings in Information Visualization: Using Vision to Think
Bederson, B. and Shneiderman, B. (2003)
The Craft of Information Visualization: Readings and Reflections
"The IN Cell Analyzer automated microscope was used to identify proteins influencing the division of human cells. After the images were analyzed, quantitative results were transferred to Spotfire DecisionSite. This screen revealed the previously unknown involvement of the retinol binding protein RBP1 in cell cycle control.(Stubbs S, & Thomas N. 2006 Methods in Enzymology; 414:1-21.) Retinol a form of Vitamin A plays a crucial role in vision and during embryonic development"
Using LifeFlow, 7,041 patients are aggregated into this visualization and LifeFlow immediately reveal the most common pattern, which you could not do easily in SQL. You could easily notice this huge pattern “Arrival -> ER -> Exit”, meaning patients who visited with minor injuries or simple conditions and left the hospital immediately after receiving their treatment. When hovering the mouse over, LifeFlow displays a tooltip that gives more information, such as number of patients and other statistics, and also shows the distribution of the patients. As the horizontal gap represents time, you can see from the distribution that some patients left the hospital very quickly after visiting the emergency room while some of them stayed longer. *optional The second most common pattern is “Arrival (Blue) -> ER (Pink) -> Floor (Green) -> Exit (Cyan)”, meaning patients who were admitted to observe the conditions and then everything went well so they left the hospital. You can also use the horizontal gap to compare these patients with the patients who exit from the emergency room. Comparing the gap from pink to cyan and pink to green, you can see that the gap from pink to green is smaller than pink to cyan, so the patients were transferred to Floor faster than exit the hospital in average. You have seen the two most common cases, now I will remove the common patterns so we can analyze the less frequent patterns.
After removing all the common cases, we have 344 patients left. These are mostly the patients who were admitted. There are many information that I can explain from this visualization here, but I will go straight into the case that our physician partners are mostly interested in. The mouse is pointing at this sequence, which represents the “bounce backs” patients, meaning patients who were transferred from ICU to Floor because they seemed to get better, however, they were transferred back to the ICU. So the physician are interested in finding these patients to analyze what made them made the wrong decisions. *optional Another case is the step ups, which means the patients whose level of care were escalated to higher level, you can see from the visualization that there were patients who were transferred from ER to Floor (green) to ICU (red) and IMC (orange). The number of these patients and the average transferred time could be compare to the hospital standards to measure the quality of care.
Ben: This slide is optional. You can use it to show that when you click on the bounce backs patients, you can get the details of each patient in LifeLines2 view.
Another interesting feature is you can align by a particular event. For example, if you want to know what happened before and after the patients went to the ICU, you can align by ICU. The dash line separate between what happened before and what happened after. You can see that the ICU patients mostly came from the ER (pink), and most of them were transferred to Floor (green) after that. Unfortunately, some of them died after they were transferred to the ICU (black). From this visualization, you may notice a small pattern in the bottom. Let me zoom in.
So this patient was dead before transferred to the ICU, which is impossible. Of course, this must be problem with data entry. But we may never notice it if the data are hidden in the database. Therefore, you can see that LifeFlow support this kind of analysis by giving overview, showing common trends, providing summary of every sequences, you can do SQL and calculate average for every transfer if you like, but in LifeFlow, it is right there, you just need to move your mouse over. showing every possible transfer pattern and may led you to a discovery of surprising pattern.
Live Demonstration
Aligning sales and marketing is essential for success. The graph on the left shows sales people linked to opportunities, including industry. The thicker the line, the higher the probability of closing the deal. The larger the dollar sign, the bigger the deal. Sullivan, Vazquez and Distefano are performing the best. The upper right shows the number of deals by stage in the sales cycle. The blue bubble chart shows potential revenue by marketing program and stage in the sales cycle. Search engine optimization and inbound links from Web sites have the biggest impact. Armed with this information, marketing managers can advertise to the financial services and manufacturing sectors through specific tactics, and sales managers can see the performance of the reps and the industries where they are successful.
Chapter 3, Figure 1 (page 6). A NodeXL social media network diagram of relationships among Twitter users mentioning the hashtag “#WIN09” used by attendees of a conference on Network Science at NYU in September 2009. Each user’s node is sized proportional to the number of tweets they have ever made to that date.
Chapter 3, Figure 1 (page 6). A NodeXL social media network diagram of relationships among Twitter users mentioning the hashtag “#WIN09” used by attendees of a conference on Network Science at NYU in September 2009. Each user’s node is sized proportional to the number of tweets they have ever made to that date.
Figure 13.24. NodeXL network of Flickr users who comment on Marc_Smith’s photos (network depth 1.5; edge weight≥4).