SlideShare ist ein Scribd-Unternehmen logo
1 von 83
Network Data Collection
http://www.soc.duke.edu/~jmoody77/SNH/SNH.html
Dropbox: http://tinyurl.com/duke-snh
1. Intro/Big Picture
1. How networks fit into social science generally
2. Connections & Positions
3. Types of network data
2.Collecting Data
A. Research Design
i. Relational Content
ii. Boundary Specification
iii. Network Samples
a. Local
b. Global
c. Link Tracing designs
B. Sources
i. Archive, Observation, Survey
ii. Survey
a. Name Generators
b. Delivery Mode
3. Data Accuracy
A. How accurate are network survey data?
B. Effect on measurement
C. What can we do about inaccurate or missing
data?
Outline
Social Network Data
Introduction
We live in a connected world:
“To speak of social life is to speak of the association between
people – their associating in work and in play, in love and in
war, to trade or to worship, to help or to hinder. It is in the
social relations men establish that their interests find
expression and their desires become realized.”
Peter M. Blau
Exchange and Power in Social Life, 1964
*1934, NYTime. Moreno claims this work was covered in “all the major papers” but I can’t find any other clips…
*
Introduction
We live in a connected world:
"If we ever get to the point of charting a whole city or a whole nation, we would have … a picture
of a vast solar system of intangible structures, powerfully influencing conduct, as gravitation does
in space. Such an invisible structure underlies society and has its influence in determining the
conduct of society as a whole."
J.L. Moreno, New York Times, April 13, 1933
But scientists are starting to take network seriously:
“Networks”
Introduction
“Networks”
“Obesity”
Introduction
But scientists are starting to take network seriously: why?
Introduction
…and NSF is investing heavily in it.
High Schools as Networks
Introduction
Introduction
Countryside High School, by grade
Introduction
Countryside High School, by race
And yet, standard social science analysis methods do not take this space
into account.
“For the last thirty years, empirical social research has been
dominated by the sample survey. But as usually practiced, …, the
survey is a sociological meat grinder, tearing the individual from his
social context and guaranteeing that nobody in the study interacts
with anyone else in it.”
Allen Barton, 1968 (Quoted in Freeman 2004)
Moreover, the complexity of the relational world makes it impossible to
identify social connectivity using only our intuition.
Social Network Analysis (SNA) provides a set of tools to empirically
extend our theoretical intuition of the patterns that compose social
structure.
Introduction
Social network analysis is:
•a set of relational methods for systematically understanding
and identifying connections among actors. SNA
•is motivated by a structural intuition based on ties
linking social actors
•is grounded in systematic empirical data
•draws heavily on graphic imagery
•relies on the use of mathematical and/or computational
models.
•Social Network Analysis embodies a range of theories
relating types of observable social spaces and their relation
to individual and group behavior.
Introduction
Introduction
Key Questions
Social Network analysis lets us answer questions about social interdependence.
These include:
“Networks as Variables” approaches
•Are kids with smoking peers more likely to smoke themselves?
•Do unpopular kids get in more trouble than popular kids?
•Do central actors control resources?
“Networks as Structures” approaches
•What generates hierarchy in social relations?
•What network patterns spread diseases most quickly?
•How do role sets evolve out of consistent relational activity?
Both: Connectionist vs. Positional features of the network
We don’t want to draw this line too sharply: emergent role positions can
affect individual outcomes in a ‘variable’way, and variable approaches
constrain relational activity.
Why do networks matter?
Two fundamental mechanisms: Problem space
Connectionist:
Positional:
Networks as pipes
Networks as roles
Networks
As Cause
Networks
As Result
Diffusion
Peer influence
Social Capital
“small worlds”
Social integration
Peer selection
Homophily
Network robustness
Popularity Effects
Role Behavior
Network Constraint
Group stability
Network ecology
“Structuration”
This rubric is organized around social mechanisms – the reasons why networks matter,
which ends up being loosely correlated with specific types of measures, analysis, and
data collection method.
Why do networks matter?
Two fundamental mechanisms: Connections
Connectionist network mechanisms : Networks matter because of the
things that flow through them. Networks as pipes.
C
P
X Y
The spread of any epidemic depends on the number of
secondary cases per infected case, known as the
reproductive rate (R0). R0 depends on the probability that
a contact will be infected over the duration of contact (b),
the likelihood of contact (c), and the duration of
infectiousness (D).
cDRo b
For network transmission problems, the trick is specifying c,
which depends on the network.
C
P
X Y
Why do networks matter?
Two fundamental mechanisms: Connections example
Isolated visionWhy do networks matter?
Two fundamental mechanisms: Connections example
C
P
X Y
Connected visionWhy do networks matter?Why do networks matter?
Two fundamental mechanisms: Connections example
C
P
X Y
Partner
Distribution
Component
Size/Shape
Emergent Connectivity in “low-degree” networks
C
P
X Y
Connections: Diffusion
Example: Small local changes can create cohesion
cascades
Based on work supported by R21-HD072810 (NICHD, Moody PI), R01 DA012831-05 (NIDA Morris, Martina PI)
Provides food for
Romantic Love
Bickers with
Why do networks matter?
Two fundamental mechanisms: Positions
Positional network mechanisms : Networks matter because of the way they
capture role behavior and social exchange. Networks as Roles.
C
P
X Y
Parent Parent
Child
Child
Child
Provides food for
Romantic Love
Bickers with
Why do networks matter?
Two fundamental mechanisms: Positions
Positional network mechanisms : Networks matter because of the way they
capture role behavior and social exchange. Networks as Roles.
C
P
X Y
Social network analysis is:
•a set of relational methods for systematically understanding
and identifying connections among actors. SNA
•is motivated by a structural intuition based on ties linking
social actors
•is grounded in systematic empirical data
•draws heavily on graphic imagery
•relies on the use of mathematical and/or computational
models.
•Social Network Analysis embodies a range of theories
relating types of observable social spaces and their relation to
individual and group behavior.
Network Methods & Measures
The unit of interest in a network are the combined sets of
actors and their relations.
We represent actors with points and relations with lines.
Actors are referred to variously as:
Nodes, vertices or points
Relations are referred to variously as:
Edges, Arcs, Lines, Ties
Example:
a
b
c e
d
Social Network Data
In general, a relation can be:
Binary or Valued
Directed or Undirected
a
b
c e
d
Undirected, binary Directed, binary
a
b
c e
d
a
b
c e
d
Undirected, Valued Directed, Valued
a
b
c e
d
1 3
4
21
Social Network Data
In general, a relation can be: (1) Binary or Valued (2) Directed or Undirected
Social Network Data
Basic Data Elements
The social process of interest will often determine what form your data take. Conceptually, almost
all of the techniques and measures we describe can be generalized across data format, but you may
have to do some of the coding work yourself….
a
b
c e
d
Directed,
Multiplex categorical edges
We can examine networks across multiple levels:
1) Ego-network
- Have data on a respondent (ego) and the people they are connected to
(alters). Example: 1985 GSS module
- May include estimates of connections among alters
2) Partial network
- Ego networks plus some amount of tracing to reach contacts of
contacts
- Something less than full account of connections among all pairs of
actors in the relevant population
- Example: CDC Contact tracing data for STDs
Social Network Data
Basic Data Elements: Levels of analysis
3) Complete or “Global” data
- Data on all actors within a particular (relevant) boundary
- Never exactly complete (due to missing data), but boundaries are set
-Example: Coauthorship data among all writers in the social
sciences, friendships among all students in a classroom
We can examine networks across multiple levels:
Social Network Data
Basic Data Elements: Levels of analysis
Ego-Net
Global-Net
Best Friend
Dyad
Primary
Group
Social Network Data
Basic Data Elements: Levels of analysis
2-step
Partial network
What information do you want to collect?
This is ultimately a theory question about how you think the social network matters
and what social or biological mechanisms matter for the outcome of interest. This
is driven by thinking through:
Health Outcome  Mechanism  Relation(s)
Examples:
Sometimes the relations are clear:
STD/HIV  Contagion-carrying contact  Sex, Drug sharing, etc.
Sometimes not so much:
Health Behavior  Information flow  Discussion networks
Health Behavior  Social Conformity Pressure  Admiration nets
Health Behavior  opportunities  Unsupervised interaction
Research Design: new data collection
Social Network Data
What information do you want to collect?
Sometimes the outcome is deliberately unspecified, as when you are collecting data
for a large common use projects (GSS, Add Health, NHRS).
Then the design is effectively reversed: What relations capture the most (general?
comprehensive? efficacious? Reliable?) social mechanisms that will be of broad
interest?
Research Design: new data collection
Social Network Data
Relation(s) Respect
Contact
Information
Pressure
Substance Use
Suicidal Ideation
Treatment adherence
BMI
Disease
Excitement
Social mechanism ambiguity allows broad use, which favors relations that tend to be
general. This, of course, makes crisp causal associations more difficult.
What information do you want to collect?
Health Outcome  Mechanism  Relation(s)
Relations themselves are often multi-dimensional…do these matter for
your question?
- Perception vs. interaction?
“who do you like?”  “who do you talk with?”
- Intensity?
“How often …”, “how much…”
strong vs. weak
- Dynamics?
Starting & ending dates, everyday contact or sporadic?
Research Design: New data collection
Social Network Data
Boundary Specification
Network methods describe positions in relevant social fields, where flows of
particular goods are of interest. As such, boundaries are a fundamentally
theoretical question about what you think matters in the setting of interest.
In general, there are usually relevant social foci that bound the relevant social
field. We expect that social relations will be very clumpy. Consider the
example of friendship ties within and between a high-school and a Jr. high:
What is the theoretically relevant population?
Research Design: Boundary Specification
Social Network Data
What is the theoretically relevant population?
Local Global
“Realist”
(Boundary from actors’
Point of view)
Nominalist
(Boundary from researchers’
point of view)
Relations within a
particular setting (“School
friends” or “Physicians
serving this hospital”)
All relations relevant to
social action (“adolescent
peer network” or
“Community Health
Leaders” )
Everyone connected to
ego in the relevant manner
(all friends, all sex
partners)
Relations defined by a
name-generator, typically
limited in number (“5
closest friends”)
Research Design: Boundary Specification
Social Network Data
Networks are (generally) treated as bounded systems, what constitutes your bound?
Most of the time….these boundaries are porous
Add Health: while
students were given the
option to name friends in
the other school, they
rarely do. As such, the
school likely serves as a
strong substantive
boundary
What is the theoretically relevant population?
Research Design: Boundary Specification
Social Network Data
Boundaries are often defined theoretically the relation not the setting:
Research Design: Boundary Specification
Social Network Data
Physician patient-sharing networks:
Physicians who share (Medicare)
patients (within one hospital)
For all patients selected in Ohio….
Research Design: Boundary Specification
Social Network Data
In practice:
a) set a pragmatic bound that captures the bulk of theoretically relevant data
b) Collect data on boundary crossing.
a) You might ask “friends in this neighborhood” but also “Other close
friends?”
b) Don’t limit nominations to current setting, but only trace within the
bounds.
Good prior research, ethnography, informants, etc. should be used to identify
the bounds as best as possible, but these sorts of data allow one to at least
control for out-of-sample effects in models.
For adaptive sampling, such as link-trace designs, you might use a
capture/recapture rule to figure out if you’ve saturated your population. Once
you stop receiving new names…you’ve finished.
--but, if you jump to a new population…this can be hard to discern.
1. The level of analysis implies a perspective on sampling:
1. Local  random probability sampling
2. Adaptive  Link trace, RDS
3. Complete  Census
These are not as dissimilar as they may appear:
a) Local nets imply global connectivity:
a) Every ego-network is a sample from the population-level global
network, and thus should be consistent with a constrained range of
global networks.
b) If you have a clustered setting, many alters in a local network may
overlap, making partial connectivity information possible.
c) For attribute mixing (proportion of whites with black friends, low
BMI with high, users with non-usres, etc.), ego-network data is
sufficient to draw global inference
Research Design: Network Sample
Social Network Data
Research Design: Network Sample
Social Network Data
Nominalist
(researcher pov)
Realist
(natural groups)
Local • Probability samples
• Clinical samples
• Extracted from
complete settings
• Family interviews
• Neighbors
• Workplace samples
Adaptive • Fixed diameter chain
from qualifying
seed(s)
• Unlimited diameter
chain on qualifying
relation
Complete • Census within a fixed
setting (hospital,
school, etc.)
• Only practical for
real groups (“Duke
Faculty” “Crip”).
Get list from
informant &
enumerate.
Data collection strategy
(The column distinction is squishy…)
Research Design: Network Sample
Social Network Data
1. Ego Network Sampling (analysis will be covered in separate session)
• Most similar to standard social survey:
• Easily sampled (as any other survey implementation)
• All information comes from the respondent, so very subject to personal projection.
• Ask ego to report on characteristics of alter
For k alters and q attributes  adding kq questions
i.e. 5 friends with 10 behaviors adds 50 questions to the survey!
• Ask ego to report on relations amongst alters.
For k alters and j relational features  j(k(k-1)/2) questions
i.e. 5 friends and 2 relation question is 20 questions: 2*((5*4)/2)
Respondent
Alter 1
Alter 4
Alter 2
Alter 3
2. Snowball and “link trace” designs
Ego-networks Complete Census
Link-Tracing Designs
Basic idea is to use “adaptive sampling” – start with (a) seed node(s), identify
the network partners, and then interview them.
Earliest “snowball” samples are of this type. Most recent work is “respondent
driven sampling. (RDS)”
-- If done systematically, some inference elements are knowable. Else, you
have to try and disentangle the sampling process from the real structure
Research Design: Network Sample
Social Network Data
3. Global network samples: Population Census
• Key issue is to enumerate the population & collect relational
information on all.
– If dynamic, this can make implementation difficult
– Tends to force case-study style designs (highly clustered
settings)
– Contrast N of networks with N of respondents
– Because behavior is self-reported (rather than alter
reported), adding network questions to a census-based
survey is low cost.
• If you are doing a census anyway….then good to
add network questions. Propser Peers followed this
strategy.
Research Design: Network Sample
Social Network Data
Network Data Sources: Secondary & archival data
Social Network Data
Extant direct network data
National Health and Social Life Survey
Americans’ Changing Life Study
Add Health
Prosper Peers
Archival Sources
Most common is two-mode data, records of people in groups or shared
activity
Examples:
Electronic Health Records
Hospital transfer records
Admission records
Group membership
collaboration
Key issue with any secondary or archival data is you have to take what you can get…
Survey Elements
a) Informed consent
a) It is important to let people know that their identities matter: network data are
confidential but (at least in the construction) not anonymous.
b) Name Generator Questions
a) General term for what relation you are trying to tap.
b) Many extant name generators out there…most evidence suggests that people are very
sensitive to the questions asked.
a) If you ask multiple relations, be clear whether it is OK to repeat names!
c) Response Format
a) Open List  number of lines suggests “right” answer
b) Check off/select  very simple on/off, might result in over-estimates
c) Limit choice  limiting choice limits degree which affects *every* network statistics.
d) Rank/Rate  asking people to rank each other is difficult (and can backfire!)
e) If multiple name generators – grid or separate questions?
Network Data Sources: survey data
Social Network Data
If you use surveys to collect data, some general rules of thumb:
a) Network data collection can be time consuming.
If interests are in network-level structure effects, it is better to have breadth over depth.
Having detailed information on <50% of the sample will make it very difficult to draw
conclusions about the general network structure.
If interest is in detail interpersonal information – social support for example – detailed
information on one or two key ties might be more important.
Survey time is the crucial resource: never enough to ask everything you want.
b) Question format:
• If you ask people to recall names (an open list format), fatigue will
result in under-reporting
• If you ask people to check off names from a full list, you can often get
over-reporting
c) It is common to limit people to ~5 nominations. This will bias network stats
for stars, but is sometimes the best choice to avoid fatigue.
Network Data Sources: survey data
Social Network Data
Local Network data:
• When using a survey, common to use an “ego-network module.”
• First part: “Name Generator” question to elicit a list of names
• Second part: Working through the list of names to get
information about each person named
• Third part: asking about relations among each person named.
GSS Name Generator:
“From time to time, most people discuss important matters with other people.
Looking back over the last six months -- who are the people with whom you
discussed matters important to you? Just tell me their first names or initials.”
Why this question?
•Only time for one question
•Normative pressure and influence likely travels through strong ties
•Similar to ‘best friend’ or other strong tie generators
•Note there are significant substantive problems with this name generator
Network Data Sources: survey data
Social Network Data
Local Network data:
The third part usually asks about relations among the alters. Do this
by looping over all possible combinations. If you are asking about a
symmetric relation, then you can limit your questions to the n(n-1)/2
cells of one triangle of the adjacency matrix:
1 2 3 4 5
1
2
3
4
5
GSS: Please think about the relations between the people you just mentioned. Some of them may
be total strangers in the sense that they wouldn't recognize each other if they bumped into each
other on the street. Others may be especially close, as close or closer to each other as they are to
you. First, think about NAME 1 and NAME 2. A. Are NAME 1 and NAME 2 total strangers? B.
ARe they especially close? PROBE: As close or closer to eahc other as they are to you?
Network Data Sources: survey data
Social Network Data
Local Network data:
The third part usually asks about relations among the alters. Do this
by looping over all possible combinations. If you are asking about a
symmetric relation, then you can limit your questions to the n(n-1)/2
cells of one triangle of the adjacency matrix:
Network Data Sources: survey data
Social Network Data
Complete network surveys require
a process that lets you link answers
to respondents.
•You cannot have
anonymous surveys.
•Recall format:
•Need Id numbers & a
roster to link, or hand-
code names to find
matches
•Checklists
•Need a roster for people
to check through
Network Data Sources: survey data
Social Network Data
(1994)
Complete network surveys require a process that lets you link answers to respondents.
•Typically you have a number of data tradeoffs:
•Limited number of responses.
•Eases survey construction & coding, lowers density & degree, which affects
nearly every other system-level measure.
•Evidence that people try to fill all of the slots.
•Name check-off roster (names down a row or on screen, relations as check-
boxes).
•Easy in small settings or CADI, but encourages over-response.
•The “Amy Willis” Problem.
•Open recall list.
•Very difficult cognitively, requires an extra name-matching step in analysis.
•Still have to give slots in pen & paper, can be dynamic on-line.
Think carefully about what you want to learn from your survey items.
Network Data Sources: survey data
Social Network Data
Network Data Sources: survey data
Social Network Data
Check off or Open Ended?
Open ended require more of respondents…subject to
fatigue & size suggestion
Network Data Sources: survey data
Social Network Data
Check off or Open Ended?
Check off is simpler – particularly if yes/no – but also
subject to over-response.
Network Data Sources: survey data
Social Network Data
Ask respondent for yes/no decisions or quantitative assessment?
Yes/no are cognitively easier (therefore reliable, believable),
Yes/no *much* faster to administer
But yes/no provides no discrimination among levels –ratings provide
more nuance
•A series of binaries can replace one quant rating:
Instead of “How often do you see each person?”
1 = once a year; 2 = once a month; 3 = once a week; etc.
Use three questions (in this order):
Who do you see at least once a year?
Who do you see at least once a month?
Who do you see at least once a week?
Slide from Steve Borgatti: http://www.analytictech.com/mgt780/slides/survey.pdf
Network Data Sources: survey data
Social Network Data
Absolute:
“How often do you talk to _____, on average?”
–Need to do pre-testing to determine appropriate time scale
Danger of getting no variance
–Assumes a lot of respondents
Relative:
“How often do you speak to each person on the list below?”
Very infrequently, Somewhat infrequently, About average, Somewhat frequently, Very frequently
Assumes less of respondents; easier task
Is automatically normalized within respondent
Makes it harder to compare values across respondents
Slide from Steve Borgatti: http://www.analytictech.com/mgt780/slides/survey.pdf
Network Data Sources: survey data
Social Network Data
Survey Mode
Lots of ongoing research on best practices.
Focus on clear design, careful wording.
Pretest as much as you can afford
Key advantage of electronic survey is data processing on the
back-end.
Even with open-ended; no data entry.
See: https://www.une.edu/sites/default/files/Microsoft-Word-Guiding-Principles-for-
Mail-and-Internet-Surveys_8-3.pdf
Data Accuracy: Survey induced error
Social Network Data
How reliable are network data?
In a well-known series of
studies, BKS compare recall
of communication with
records of communication,
and recall doesn’t do well…
• Killworth, P. D . , Bernard, H. R. 1976.
Informant accuracy in social network data.
Hum. Organ. 35:269-86
• Bernard, H. R . , Killworth, P. D. 1977.
Informant accuracy in social network data, II.
Hum. Commun. Res. 4:3-18
• Killworth, P. D. , Bernard, H. R. 1979.
Informant accuracy in social network data, III.
A Comparison of triadic structures in behavioral
and cognitive data. Soc. Networks 2 : 1 9-46
• Bernard, H. R., Killworth , P. D . , Sailer, L.
1980. Informant accuracy in social network
data, IV. A comparison of clique-level structure
in behavioral and cognitive data. Soc. Networks
2: 1 91-218
• Bernard H, Killworth P and Sailer L. 1982.
Informant accuracy in social network data V.
Social Science Research, 11, 30-66. The Problem of Informant Accuracy: The Validity of Retrospective Data
Annual Review of Anthropology
Vol. 13: 495-517 (Volume publication date October 1984)
DOI: 10.1146/annurev.an.13.100184.002431
Data Accuracy: Survey induced error
Social Network Data
How reliable are network data?
The BKS studies sparked a bunch of work on network survey reliability and the results
are mixed. Some general features:
a) Important relations are recalled
b) People bias toward “common” activities…
c) …that are relationally salient.
d) Behavior reports are more consistent than attitude reports
e) Strong survey, interviewer or instrument effects.
Data Accuracy: Survey induced error
Social Network Data
How reliable are network data?
Data Accuracy: Survey induced error
Social Network Data
How reliable are network data?
Assessing accuracy is difficult, because respondents report on relations over
the last 6 months (or year, depending on type), but may be interviewed at
different times.
Data Accuracy: Survey induced error
Social Network Data
How reliable are network data?
Once we account for observation windows and question length, we find very
high concordance on dates of relations.
Data Accuracy: Survey induced error
Social Network Data
How reliable are network data?
For ego-level ties that were not timed, we can ask if a t1 nomination is
retained: If I “ever did drugs” with you at t1, then I should also have reported
doing so at future data collections.
Very few relations are “recanted” (4.7% sex, 13.6% drug, 3% social).
Data Accuracy: Survey induced error
Social Network Data
How reliable are network data?
Ego
A
B
Proportion of times a “matrix” tie is corroborated by a direct response?
Given: How often: A B
A B
Data Accuracy: Survey induced error
Social Network Data
How reliable are network data?
Ego
A
B
Proportion of times a “matrix” tie is corroborated by a direct response?
Given: How often:A B
A B
Data Accuracy: Survey induced error
Social Network Data
How reliable are network data?
Why are the Colorado Springs data so much more reliable than the BKS data?
a) Very dedicated data collectors
b) No nomination limits on self-reports
c) Highly salient relations in a small community
• Interviewer effects
– Systematic variation in responses by interviewer (Paik
and Sagacharin, 2013; Marsden, 2003)
• Design of the survey instrument (Lozar, Vehovar and Hlebec, 2004)
• Panel Conditioning (Lazarsfeld, 1940; Warren and Halpern-Manner, 2012)
– Rise of panels for basic social research (Keeter et al., 2015)
– Survey memory is short (Groves, 1986)
Data Accuracy: Survey induced error
Social Network Data
Data Accuracy: Survey induced error
Social Network Data
Source: Clergy Health Panel Survey 2008
Probability
Respondent Names 5
Confidants
Data Accuracy: Survey induced error
Social Network Data
Data Accuracy: Survey induced error
Social Network Data
Whatever method is used, data will always be incomplete. What are the
implications for analysis?
Example 1. Ego is a matchable person in the School
Ego
M
M
M
M
Out
Un
True Network
Ego
M
M
M
M
Out
Un
Observed Network
Un
Out
Social Network Data
Effects of missing data
Example 2. Ego is not on the school roster
M
M
M
M
M
Un
True Network
M
M
M
M
M
Un
Observed Network
Un
Un
Un
Social Network Data
Effects of missing data
Example 3:
Node population: 2-step neighborhood of Actor X
Relational population: Any connection among all nodes
1-step
2-step
3-step
F
1.1
1.2
1.3
1.4
1.5
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
3.1
3.2
3.3
F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3
Full Full (0) Full (0)
Full Full
Full Full
F
F
(0)
F
(0)
Full (0) Unknown UK
UK
Full (0)
Social Network Data
Effects of missing data
Example 4
Node population: 2-step neighborhood of Actor X
Relational population: Trace, plus All connections among 1-step contacts
F
1.1
1.2
1.3
1.4
1.5
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
3.1
3.2
3.3
F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3
Full Full (0) Full (0)
Full Full
Full Unknown
F
F
(0)
F
(0)
Full (0) Unknown UK
UK
Full (0)
1-step
2-step
3-step
Social Network Data
Effects of missing data
Example 5.
Node population: 2-step neighborhood of Actor X
Relational population: Only tracing contacts
F
1.1
1.2
1.3
1.4
1.5
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
3.1
3.2
3.3
F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3
Full Full (0) Full (0)
Unknown Full
Full Unknown
F
F
(0)
F
(0)
Full (0) Unknown UK
UK
Full (0)
1-step
2-step
3-step
Social Network Data
Effects of missing data
Example 6
Node population: 2-step neighborhood from 3 focal actors
Relational population: All relations among actors
Full Full (0) Full (0)
Full Full
Full Full
Full
Full
(0)
Full
(0)
Full (0) Unknown UK
UK
Full (0)
FullFocal
1-Step
2-Step
3-Step
Focal 1-Step 2-Step 3-Step
Social Network Data
Effects of missing data
Example 7.
Node population: 1-step neighborhood from 3 focal actors
Relational population: Only relations from focal nodes
Full Full (0) Full (0)
Unknown Unknown
Unknown Unknown
Full
Full
(0)
Full
(0)
Full (0) Unknown UK
UK
Full (0)
FullFocal
1-Step
2-Step
3-Step
Focal 1-Step 2-Step 3-Step
Social Network Data
Effects of missing data
Social Network Data
Effects of missing data on measures Smith & Moody, 2014,
Smith, Morgan & Moody 2016
Identify the practical effect of missing data as a measurement error problem:
induce error and evaluate effect.
Randomly select nodes to delete, remove their edges & recalculate statistics of
interest.
Social Network Data
Effects of missing data on measures Smith & Moody, 2014
Social Network Data
Effects of missing data on measures Smith & Moody, 2014
Centrality
Social Network Data
Effects of missing data on measures Smith & Moody, 2014
Homophily
Social Network Data
Effects of missing data on measures Smith, Moody & Morgan, 2016
We redid the simulation
study with non-random
missingness. The results
are (as expected) a bit
more complicated, but the
general trend is still
largely good (well, except
for centrality scores of
course!).
To make this more useful,
we constructed a network
bias calculator, which
allows researchers to
specify the amount and
form of missing data, to
see how it affects their
results.
http://www.soc.duke.edu/~jmoody77/missingdata/NetworkBiasCalculator_Jan262017.jar
Smith, Jeff, Jonathan Morgan and James Moody. 2016. “Network Sampling Coverage II: The Effect of Non-random
Missing Data on Network Measurement” Social Networks 48:78-99.
Social Network Data
Effects of missing data on measures Smith, Moody & Morgan, 2016
Here we assume:
http://www.soc.duke.edu/~jmoody77/missingdata/NetworkBiasCalculator_Jan262017.jar
Smith, Jeff, Jonathan Morgan and James Moody. 2016. “Network Sampling Coverage II: The Effect of Non-random
Missing Data on Network Measurement” Social Networks 48:78-99.
Social Network Data
Effects of missing data on measures Smith, Moody & Morgan, 2016
And find that:
http://www.soc.duke.edu/~jmoody77/missingdata/NetworkBiasCalculator_Jan262017.jar
Smith, Jeff, Jonathan Morgan and James Moody. 2016. “Network Sampling Coverage II: The Effect of Non-random
Missing Data on Network Measurement” Social Networks 48:78-99.
Centrality Topology Homophily
“We thus expect the correlation between the true and observed in-degree to be biased by 3.5%”
Social Network Data
Effects of missing data on measures
What to do about missing data?
Easy:
• Do nothing. If associated error is small ignore it. This is the default, not
particularly satisfying.
Harder: Impute ties
• If the relation has known constraints, use those (symmetry, for example)
• If there is a clear association, you can use those to impute values.
• If imputing and can use a randomization routine, do so (akin to multiple
imputation routines)
• All ad hoc.
Hardest:
• Model missingness with ERGM/Latent-network models.
• Build a model for tie formation on observed, include structural missing &
impute. Handcock & Gile have new routines for this.
• Computationally intensive…but analytically not difficult.
Summary:
Data collection design & missing data affect the information at hand to draw
conclusions about the system. Everything we do from now on is built on some
manipulation of the observed adjacency matrix; so we want to understand what are valid
and invalid conclusions due to systematic distortions on the data.
Statistical modeling tools hold promise. We can build models of networks that account
for missing data – we are able to “fix” the structural zeros in or models by treating them
as given. This then lets us infer to the world of all graphs with that same missing data
structure. These models are very new, and not widely available yet….
Social Network Data
Network Data Sources: Missing Data

Weitere ähnliche Inhalte

Was ist angesagt?

Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and more
Wael Elrifai
 

Was ist angesagt? (20)

04 Network Data Collection
04 Network Data Collection04 Network Data Collection
04 Network Data Collection
 
13 Community Detection
13 Community Detection13 Community Detection
13 Community Detection
 
05 Communities in Network
05 Communities in Network05 Communities in Network
05 Communities in Network
 
06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)
 
00 Introduction to SN&H: Key Concepts and Overview
00 Introduction to SN&H: Key Concepts and Overview00 Introduction to SN&H: Key Concepts and Overview
00 Introduction to SN&H: Key Concepts and Overview
 
09 Ego Network Analysis
09 Ego Network Analysis09 Ego Network Analysis
09 Ego Network Analysis
 
Mining and analyzing social media part 2 - hicss47 tutorial - dave king
Mining and analyzing social media   part 2 - hicss47 tutorial - dave kingMining and analyzing social media   part 2 - hicss47 tutorial - dave king
Mining and analyzing social media part 2 - hicss47 tutorial - dave king
 
07 Whole Network Descriptive Statistics
07 Whole Network Descriptive Statistics07 Whole Network Descriptive Statistics
07 Whole Network Descriptive Statistics
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
CSE509 Lecture 6
CSE509 Lecture 6CSE509 Lecture 6
CSE509 Lecture 6
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
 
Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and more
 
How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...
 
02 Introduction to Social Networks and Health: Key Concepts and Overview
02 Introduction to Social Networks and Health: Key Concepts and Overview02 Introduction to Social Networks and Health: Key Concepts and Overview
02 Introduction to Social Networks and Health: Key Concepts and Overview
 
15 Network Visualization and Communities
15 Network Visualization and Communities15 Network Visualization and Communities
15 Network Visualization and Communities
 
03 Ego Network Analysis
03 Ego Network Analysis03 Ego Network Analysis
03 Ego Network Analysis
 
Preso on social network analysis for rtp analytics unconference
Preso on social network analysis for rtp analytics unconferencePreso on social network analysis for rtp analytics unconference
Preso on social network analysis for rtp analytics unconference
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 

Ähnlich wie 01 Network Data Collection (2017)

Tepl webinar 20032013
Tepl webinar   20032013Tepl webinar   20032013
Tepl webinar 20032013
Nina Pataraia
 
Socialnetworkanalysis
SocialnetworkanalysisSocialnetworkanalysis
Socialnetworkanalysis
kcarter14
 
Social networkanalysisfinal
Social networkanalysisfinalSocial networkanalysisfinal
Social networkanalysisfinal
kcarter14
 
Visually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeVisually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of Life
Harish Vaidyanathan
 
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
Marc Smith
 

Ähnlich wie 01 Network Data Collection (2017) (20)

01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)
 
SSRI_pt1.ppt
SSRI_pt1.pptSSRI_pt1.ppt
SSRI_pt1.ppt
 
02 Network Data Collection (2016)
02 Network Data Collection (2016)02 Network Data Collection (2016)
02 Network Data Collection (2016)
 
Tepl webinar 20032013
Tepl webinar   20032013Tepl webinar   20032013
Tepl webinar 20032013
 
2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis
 
The Mathematics of Social Network Analysis: Metrics for Academic Social Networks
The Mathematics of Social Network Analysis: Metrics for Academic Social NetworksThe Mathematics of Social Network Analysis: Metrics for Academic Social Networks
The Mathematics of Social Network Analysis: Metrics for Academic Social Networks
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Socialnetworkanalysis 100225055227-phpapp02
Socialnetworkanalysis 100225055227-phpapp02Socialnetworkanalysis 100225055227-phpapp02
Socialnetworkanalysis 100225055227-phpapp02
 
Scalable
ScalableScalable
Scalable
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Social Network Analysis: An Overview
Social Network Analysis: An OverviewSocial Network Analysis: An Overview
Social Network Analysis: An Overview
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networks
 
Socialnetworkanalysis
SocialnetworkanalysisSocialnetworkanalysis
Socialnetworkanalysis
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
 
Internet
InternetInternet
Internet
 
Internet
InternetInternet
Internet
 
Network Analysis in the Social Sciences
Network Analysis in the Social SciencesNetwork Analysis in the Social Sciences
Network Analysis in the Social Sciences
 
Social networkanalysisfinal
Social networkanalysisfinalSocial networkanalysisfinal
Social networkanalysisfinal
 
Visually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeVisually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of Life
 
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
 

Mehr von Duke Network Analysis Center

Mehr von Duke Network Analysis Center (20)

01 Add Health Network Data Challenges: IRB and Security Issues
01 Add Health Network Data Challenges: IRB and Security Issues01 Add Health Network Data Challenges: IRB and Security Issues
01 Add Health Network Data Challenges: IRB and Security Issues
 
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...
 
24 The Evolution of Network Thinking
24 The Evolution of Network Thinking24 The Evolution of Network Thinking
24 The Evolution of Network Thinking
 
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
 
20 Network Experiments
20 Network Experiments20 Network Experiments
20 Network Experiments
 
19 Electronic Medical Records
19 Electronic Medical Records19 Electronic Medical Records
19 Electronic Medical Records
 
18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence
 
17 Statistical Models for Networks
17 Statistical Models for Networks17 Statistical Models for Networks
17 Statistical Models for Networks
 
11 Respondent Driven Sampling
11 Respondent Driven Sampling11 Respondent Driven Sampling
11 Respondent Driven Sampling
 
00 Differentiating Between Network Structure and Network Function
00 Differentiating Between Network Structure and Network Function00 Differentiating Between Network Structure and Network Function
00 Differentiating Between Network Structure and Network Function
 
00 Arrest Networks and the Spread of Violent Victimization
00 Arrest Networks and the Spread of Violent Victimization00 Arrest Networks and the Spread of Violent Victimization
00 Arrest Networks and the Spread of Violent Victimization
 
00 Networks of People Who Use Opiods Nonmedically: Reports from Rural Souther...
00 Networks of People Who Use Opiods Nonmedically: Reports from Rural Souther...00 Networks of People Who Use Opiods Nonmedically: Reports from Rural Souther...
00 Networks of People Who Use Opiods Nonmedically: Reports from Rural Souther...
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...
 
12 SN&H Keynote: Thomas Valente, USC
12 SN&H Keynote: Thomas Valente, USC12 SN&H Keynote: Thomas Valente, USC
12 SN&H Keynote: Thomas Valente, USC
 
11 Siena Models for Selection & Influence
11 Siena Models for Selection & Influence 11 Siena Models for Selection & Influence
11 Siena Models for Selection & Influence
 
10 Network Experiments
10 Network Experiments10 Network Experiments
10 Network Experiments
 
09 Diffusion Models & Peer Influence
09 Diffusion Models & Peer Influence09 Diffusion Models & Peer Influence
09 Diffusion Models & Peer Influence
 
08 Statistical Models for Nets I, cross-section
08 Statistical Models for Nets I, cross-section08 Statistical Models for Nets I, cross-section
08 Statistical Models for Nets I, cross-section
 
07 Network Visualization
07 Network Visualization07 Network Visualization
07 Network Visualization
 
06 Community Detection
06 Community Detection06 Community Detection
06 Community Detection
 

Kürzlich hochgeladen

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 

Kürzlich hochgeladen (20)

Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 

01 Network Data Collection (2017)

  • 2. 1. Intro/Big Picture 1. How networks fit into social science generally 2. Connections & Positions 3. Types of network data 2.Collecting Data A. Research Design i. Relational Content ii. Boundary Specification iii. Network Samples a. Local b. Global c. Link Tracing designs B. Sources i. Archive, Observation, Survey ii. Survey a. Name Generators b. Delivery Mode 3. Data Accuracy A. How accurate are network survey data? B. Effect on measurement C. What can we do about inaccurate or missing data? Outline Social Network Data
  • 3. Introduction We live in a connected world: “To speak of social life is to speak of the association between people – their associating in work and in play, in love and in war, to trade or to worship, to help or to hinder. It is in the social relations men establish that their interests find expression and their desires become realized.” Peter M. Blau Exchange and Power in Social Life, 1964
  • 4. *1934, NYTime. Moreno claims this work was covered in “all the major papers” but I can’t find any other clips… * Introduction We live in a connected world: "If we ever get to the point of charting a whole city or a whole nation, we would have … a picture of a vast solar system of intangible structures, powerfully influencing conduct, as gravitation does in space. Such an invisible structure underlies society and has its influence in determining the conduct of society as a whole." J.L. Moreno, New York Times, April 13, 1933
  • 5. But scientists are starting to take network seriously: “Networks” Introduction
  • 6. “Networks” “Obesity” Introduction But scientists are starting to take network seriously: why?
  • 7. Introduction …and NSF is investing heavily in it.
  • 8. High Schools as Networks Introduction
  • 11. And yet, standard social science analysis methods do not take this space into account. “For the last thirty years, empirical social research has been dominated by the sample survey. But as usually practiced, …, the survey is a sociological meat grinder, tearing the individual from his social context and guaranteeing that nobody in the study interacts with anyone else in it.” Allen Barton, 1968 (Quoted in Freeman 2004) Moreover, the complexity of the relational world makes it impossible to identify social connectivity using only our intuition. Social Network Analysis (SNA) provides a set of tools to empirically extend our theoretical intuition of the patterns that compose social structure. Introduction
  • 12. Social network analysis is: •a set of relational methods for systematically understanding and identifying connections among actors. SNA •is motivated by a structural intuition based on ties linking social actors •is grounded in systematic empirical data •draws heavily on graphic imagery •relies on the use of mathematical and/or computational models. •Social Network Analysis embodies a range of theories relating types of observable social spaces and their relation to individual and group behavior. Introduction
  • 13. Introduction Key Questions Social Network analysis lets us answer questions about social interdependence. These include: “Networks as Variables” approaches •Are kids with smoking peers more likely to smoke themselves? •Do unpopular kids get in more trouble than popular kids? •Do central actors control resources? “Networks as Structures” approaches •What generates hierarchy in social relations? •What network patterns spread diseases most quickly? •How do role sets evolve out of consistent relational activity? Both: Connectionist vs. Positional features of the network We don’t want to draw this line too sharply: emergent role positions can affect individual outcomes in a ‘variable’way, and variable approaches constrain relational activity.
  • 14. Why do networks matter? Two fundamental mechanisms: Problem space Connectionist: Positional: Networks as pipes Networks as roles Networks As Cause Networks As Result Diffusion Peer influence Social Capital “small worlds” Social integration Peer selection Homophily Network robustness Popularity Effects Role Behavior Network Constraint Group stability Network ecology “Structuration” This rubric is organized around social mechanisms – the reasons why networks matter, which ends up being loosely correlated with specific types of measures, analysis, and data collection method.
  • 15. Why do networks matter? Two fundamental mechanisms: Connections Connectionist network mechanisms : Networks matter because of the things that flow through them. Networks as pipes. C P X Y
  • 16. The spread of any epidemic depends on the number of secondary cases per infected case, known as the reproductive rate (R0). R0 depends on the probability that a contact will be infected over the duration of contact (b), the likelihood of contact (c), and the duration of infectiousness (D). cDRo b For network transmission problems, the trick is specifying c, which depends on the network. C P X Y Why do networks matter? Two fundamental mechanisms: Connections example
  • 17. Isolated visionWhy do networks matter? Two fundamental mechanisms: Connections example C P X Y
  • 18. Connected visionWhy do networks matter?Why do networks matter? Two fundamental mechanisms: Connections example C P X Y
  • 19. Partner Distribution Component Size/Shape Emergent Connectivity in “low-degree” networks C P X Y Connections: Diffusion Example: Small local changes can create cohesion cascades Based on work supported by R21-HD072810 (NICHD, Moody PI), R01 DA012831-05 (NIDA Morris, Martina PI)
  • 20. Provides food for Romantic Love Bickers with Why do networks matter? Two fundamental mechanisms: Positions Positional network mechanisms : Networks matter because of the way they capture role behavior and social exchange. Networks as Roles. C P X Y
  • 21. Parent Parent Child Child Child Provides food for Romantic Love Bickers with Why do networks matter? Two fundamental mechanisms: Positions Positional network mechanisms : Networks matter because of the way they capture role behavior and social exchange. Networks as Roles. C P X Y
  • 22. Social network analysis is: •a set of relational methods for systematically understanding and identifying connections among actors. SNA •is motivated by a structural intuition based on ties linking social actors •is grounded in systematic empirical data •draws heavily on graphic imagery •relies on the use of mathematical and/or computational models. •Social Network Analysis embodies a range of theories relating types of observable social spaces and their relation to individual and group behavior. Network Methods & Measures
  • 23. The unit of interest in a network are the combined sets of actors and their relations. We represent actors with points and relations with lines. Actors are referred to variously as: Nodes, vertices or points Relations are referred to variously as: Edges, Arcs, Lines, Ties Example: a b c e d Social Network Data
  • 24. In general, a relation can be: Binary or Valued Directed or Undirected a b c e d Undirected, binary Directed, binary a b c e d a b c e d Undirected, Valued Directed, Valued a b c e d 1 3 4 21 Social Network Data
  • 25. In general, a relation can be: (1) Binary or Valued (2) Directed or Undirected Social Network Data Basic Data Elements The social process of interest will often determine what form your data take. Conceptually, almost all of the techniques and measures we describe can be generalized across data format, but you may have to do some of the coding work yourself…. a b c e d Directed, Multiplex categorical edges
  • 26. We can examine networks across multiple levels: 1) Ego-network - Have data on a respondent (ego) and the people they are connected to (alters). Example: 1985 GSS module - May include estimates of connections among alters 2) Partial network - Ego networks plus some amount of tracing to reach contacts of contacts - Something less than full account of connections among all pairs of actors in the relevant population - Example: CDC Contact tracing data for STDs Social Network Data Basic Data Elements: Levels of analysis
  • 27. 3) Complete or “Global” data - Data on all actors within a particular (relevant) boundary - Never exactly complete (due to missing data), but boundaries are set -Example: Coauthorship data among all writers in the social sciences, friendships among all students in a classroom We can examine networks across multiple levels: Social Network Data Basic Data Elements: Levels of analysis
  • 28. Ego-Net Global-Net Best Friend Dyad Primary Group Social Network Data Basic Data Elements: Levels of analysis 2-step Partial network
  • 29. What information do you want to collect? This is ultimately a theory question about how you think the social network matters and what social or biological mechanisms matter for the outcome of interest. This is driven by thinking through: Health Outcome  Mechanism  Relation(s) Examples: Sometimes the relations are clear: STD/HIV  Contagion-carrying contact  Sex, Drug sharing, etc. Sometimes not so much: Health Behavior  Information flow  Discussion networks Health Behavior  Social Conformity Pressure  Admiration nets Health Behavior  opportunities  Unsupervised interaction Research Design: new data collection Social Network Data
  • 30. What information do you want to collect? Sometimes the outcome is deliberately unspecified, as when you are collecting data for a large common use projects (GSS, Add Health, NHRS). Then the design is effectively reversed: What relations capture the most (general? comprehensive? efficacious? Reliable?) social mechanisms that will be of broad interest? Research Design: new data collection Social Network Data Relation(s) Respect Contact Information Pressure Substance Use Suicidal Ideation Treatment adherence BMI Disease Excitement Social mechanism ambiguity allows broad use, which favors relations that tend to be general. This, of course, makes crisp causal associations more difficult.
  • 31. What information do you want to collect? Health Outcome  Mechanism  Relation(s) Relations themselves are often multi-dimensional…do these matter for your question? - Perception vs. interaction? “who do you like?”  “who do you talk with?” - Intensity? “How often …”, “how much…” strong vs. weak - Dynamics? Starting & ending dates, everyday contact or sporadic? Research Design: New data collection Social Network Data
  • 32. Boundary Specification Network methods describe positions in relevant social fields, where flows of particular goods are of interest. As such, boundaries are a fundamentally theoretical question about what you think matters in the setting of interest. In general, there are usually relevant social foci that bound the relevant social field. We expect that social relations will be very clumpy. Consider the example of friendship ties within and between a high-school and a Jr. high: What is the theoretically relevant population? Research Design: Boundary Specification Social Network Data
  • 33. What is the theoretically relevant population? Local Global “Realist” (Boundary from actors’ Point of view) Nominalist (Boundary from researchers’ point of view) Relations within a particular setting (“School friends” or “Physicians serving this hospital”) All relations relevant to social action (“adolescent peer network” or “Community Health Leaders” ) Everyone connected to ego in the relevant manner (all friends, all sex partners) Relations defined by a name-generator, typically limited in number (“5 closest friends”) Research Design: Boundary Specification Social Network Data Networks are (generally) treated as bounded systems, what constitutes your bound? Most of the time….these boundaries are porous
  • 34. Add Health: while students were given the option to name friends in the other school, they rarely do. As such, the school likely serves as a strong substantive boundary What is the theoretically relevant population? Research Design: Boundary Specification Social Network Data
  • 35. Boundaries are often defined theoretically the relation not the setting: Research Design: Boundary Specification Social Network Data Physician patient-sharing networks: Physicians who share (Medicare) patients (within one hospital) For all patients selected in Ohio….
  • 36. Research Design: Boundary Specification Social Network Data In practice: a) set a pragmatic bound that captures the bulk of theoretically relevant data b) Collect data on boundary crossing. a) You might ask “friends in this neighborhood” but also “Other close friends?” b) Don’t limit nominations to current setting, but only trace within the bounds. Good prior research, ethnography, informants, etc. should be used to identify the bounds as best as possible, but these sorts of data allow one to at least control for out-of-sample effects in models. For adaptive sampling, such as link-trace designs, you might use a capture/recapture rule to figure out if you’ve saturated your population. Once you stop receiving new names…you’ve finished. --but, if you jump to a new population…this can be hard to discern.
  • 37. 1. The level of analysis implies a perspective on sampling: 1. Local  random probability sampling 2. Adaptive  Link trace, RDS 3. Complete  Census These are not as dissimilar as they may appear: a) Local nets imply global connectivity: a) Every ego-network is a sample from the population-level global network, and thus should be consistent with a constrained range of global networks. b) If you have a clustered setting, many alters in a local network may overlap, making partial connectivity information possible. c) For attribute mixing (proportion of whites with black friends, low BMI with high, users with non-usres, etc.), ego-network data is sufficient to draw global inference Research Design: Network Sample Social Network Data
  • 38. Research Design: Network Sample Social Network Data Nominalist (researcher pov) Realist (natural groups) Local • Probability samples • Clinical samples • Extracted from complete settings • Family interviews • Neighbors • Workplace samples Adaptive • Fixed diameter chain from qualifying seed(s) • Unlimited diameter chain on qualifying relation Complete • Census within a fixed setting (hospital, school, etc.) • Only practical for real groups (“Duke Faculty” “Crip”). Get list from informant & enumerate. Data collection strategy (The column distinction is squishy…)
  • 39. Research Design: Network Sample Social Network Data 1. Ego Network Sampling (analysis will be covered in separate session) • Most similar to standard social survey: • Easily sampled (as any other survey implementation) • All information comes from the respondent, so very subject to personal projection. • Ask ego to report on characteristics of alter For k alters and q attributes  adding kq questions i.e. 5 friends with 10 behaviors adds 50 questions to the survey! • Ask ego to report on relations amongst alters. For k alters and j relational features  j(k(k-1)/2) questions i.e. 5 friends and 2 relation question is 20 questions: 2*((5*4)/2) Respondent Alter 1 Alter 4 Alter 2 Alter 3
  • 40. 2. Snowball and “link trace” designs Ego-networks Complete Census Link-Tracing Designs Basic idea is to use “adaptive sampling” – start with (a) seed node(s), identify the network partners, and then interview them. Earliest “snowball” samples are of this type. Most recent work is “respondent driven sampling. (RDS)” -- If done systematically, some inference elements are knowable. Else, you have to try and disentangle the sampling process from the real structure Research Design: Network Sample Social Network Data
  • 41. 3. Global network samples: Population Census • Key issue is to enumerate the population & collect relational information on all. – If dynamic, this can make implementation difficult – Tends to force case-study style designs (highly clustered settings) – Contrast N of networks with N of respondents – Because behavior is self-reported (rather than alter reported), adding network questions to a census-based survey is low cost. • If you are doing a census anyway….then good to add network questions. Propser Peers followed this strategy. Research Design: Network Sample Social Network Data
  • 42. Network Data Sources: Secondary & archival data Social Network Data Extant direct network data National Health and Social Life Survey Americans’ Changing Life Study Add Health Prosper Peers Archival Sources Most common is two-mode data, records of people in groups or shared activity Examples: Electronic Health Records Hospital transfer records Admission records Group membership collaboration Key issue with any secondary or archival data is you have to take what you can get…
  • 43. Survey Elements a) Informed consent a) It is important to let people know that their identities matter: network data are confidential but (at least in the construction) not anonymous. b) Name Generator Questions a) General term for what relation you are trying to tap. b) Many extant name generators out there…most evidence suggests that people are very sensitive to the questions asked. a) If you ask multiple relations, be clear whether it is OK to repeat names! c) Response Format a) Open List  number of lines suggests “right” answer b) Check off/select  very simple on/off, might result in over-estimates c) Limit choice  limiting choice limits degree which affects *every* network statistics. d) Rank/Rate  asking people to rank each other is difficult (and can backfire!) e) If multiple name generators – grid or separate questions? Network Data Sources: survey data Social Network Data
  • 44. If you use surveys to collect data, some general rules of thumb: a) Network data collection can be time consuming. If interests are in network-level structure effects, it is better to have breadth over depth. Having detailed information on <50% of the sample will make it very difficult to draw conclusions about the general network structure. If interest is in detail interpersonal information – social support for example – detailed information on one or two key ties might be more important. Survey time is the crucial resource: never enough to ask everything you want. b) Question format: • If you ask people to recall names (an open list format), fatigue will result in under-reporting • If you ask people to check off names from a full list, you can often get over-reporting c) It is common to limit people to ~5 nominations. This will bias network stats for stars, but is sometimes the best choice to avoid fatigue. Network Data Sources: survey data Social Network Data
  • 45. Local Network data: • When using a survey, common to use an “ego-network module.” • First part: “Name Generator” question to elicit a list of names • Second part: Working through the list of names to get information about each person named • Third part: asking about relations among each person named. GSS Name Generator: “From time to time, most people discuss important matters with other people. Looking back over the last six months -- who are the people with whom you discussed matters important to you? Just tell me their first names or initials.” Why this question? •Only time for one question •Normative pressure and influence likely travels through strong ties •Similar to ‘best friend’ or other strong tie generators •Note there are significant substantive problems with this name generator Network Data Sources: survey data Social Network Data
  • 46. Local Network data: The third part usually asks about relations among the alters. Do this by looping over all possible combinations. If you are asking about a symmetric relation, then you can limit your questions to the n(n-1)/2 cells of one triangle of the adjacency matrix: 1 2 3 4 5 1 2 3 4 5 GSS: Please think about the relations between the people you just mentioned. Some of them may be total strangers in the sense that they wouldn't recognize each other if they bumped into each other on the street. Others may be especially close, as close or closer to each other as they are to you. First, think about NAME 1 and NAME 2. A. Are NAME 1 and NAME 2 total strangers? B. ARe they especially close? PROBE: As close or closer to eahc other as they are to you? Network Data Sources: survey data Social Network Data
  • 47. Local Network data: The third part usually asks about relations among the alters. Do this by looping over all possible combinations. If you are asking about a symmetric relation, then you can limit your questions to the n(n-1)/2 cells of one triangle of the adjacency matrix: Network Data Sources: survey data Social Network Data
  • 48. Complete network surveys require a process that lets you link answers to respondents. •You cannot have anonymous surveys. •Recall format: •Need Id numbers & a roster to link, or hand- code names to find matches •Checklists •Need a roster for people to check through Network Data Sources: survey data Social Network Data (1994)
  • 49. Complete network surveys require a process that lets you link answers to respondents. •Typically you have a number of data tradeoffs: •Limited number of responses. •Eases survey construction & coding, lowers density & degree, which affects nearly every other system-level measure. •Evidence that people try to fill all of the slots. •Name check-off roster (names down a row or on screen, relations as check- boxes). •Easy in small settings or CADI, but encourages over-response. •The “Amy Willis” Problem. •Open recall list. •Very difficult cognitively, requires an extra name-matching step in analysis. •Still have to give slots in pen & paper, can be dynamic on-line. Think carefully about what you want to learn from your survey items. Network Data Sources: survey data Social Network Data
  • 50. Network Data Sources: survey data Social Network Data Check off or Open Ended? Open ended require more of respondents…subject to fatigue & size suggestion
  • 51. Network Data Sources: survey data Social Network Data Check off or Open Ended? Check off is simpler – particularly if yes/no – but also subject to over-response.
  • 52. Network Data Sources: survey data Social Network Data Ask respondent for yes/no decisions or quantitative assessment? Yes/no are cognitively easier (therefore reliable, believable), Yes/no *much* faster to administer But yes/no provides no discrimination among levels –ratings provide more nuance •A series of binaries can replace one quant rating: Instead of “How often do you see each person?” 1 = once a year; 2 = once a month; 3 = once a week; etc. Use three questions (in this order): Who do you see at least once a year? Who do you see at least once a month? Who do you see at least once a week? Slide from Steve Borgatti: http://www.analytictech.com/mgt780/slides/survey.pdf
  • 53. Network Data Sources: survey data Social Network Data Absolute: “How often do you talk to _____, on average?” –Need to do pre-testing to determine appropriate time scale Danger of getting no variance –Assumes a lot of respondents Relative: “How often do you speak to each person on the list below?” Very infrequently, Somewhat infrequently, About average, Somewhat frequently, Very frequently Assumes less of respondents; easier task Is automatically normalized within respondent Makes it harder to compare values across respondents Slide from Steve Borgatti: http://www.analytictech.com/mgt780/slides/survey.pdf
  • 54. Network Data Sources: survey data Social Network Data Survey Mode Lots of ongoing research on best practices. Focus on clear design, careful wording. Pretest as much as you can afford Key advantage of electronic survey is data processing on the back-end. Even with open-ended; no data entry. See: https://www.une.edu/sites/default/files/Microsoft-Word-Guiding-Principles-for- Mail-and-Internet-Surveys_8-3.pdf
  • 55. Data Accuracy: Survey induced error Social Network Data How reliable are network data? In a well-known series of studies, BKS compare recall of communication with records of communication, and recall doesn’t do well… • Killworth, P. D . , Bernard, H. R. 1976. Informant accuracy in social network data. Hum. Organ. 35:269-86 • Bernard, H. R . , Killworth, P. D. 1977. Informant accuracy in social network data, II. Hum. Commun. Res. 4:3-18 • Killworth, P. D. , Bernard, H. R. 1979. Informant accuracy in social network data, III. A Comparison of triadic structures in behavioral and cognitive data. Soc. Networks 2 : 1 9-46 • Bernard, H. R., Killworth , P. D . , Sailer, L. 1980. Informant accuracy in social network data, IV. A comparison of clique-level structure in behavioral and cognitive data. Soc. Networks 2: 1 91-218 • Bernard H, Killworth P and Sailer L. 1982. Informant accuracy in social network data V. Social Science Research, 11, 30-66. The Problem of Informant Accuracy: The Validity of Retrospective Data Annual Review of Anthropology Vol. 13: 495-517 (Volume publication date October 1984) DOI: 10.1146/annurev.an.13.100184.002431
  • 56. Data Accuracy: Survey induced error Social Network Data How reliable are network data? The BKS studies sparked a bunch of work on network survey reliability and the results are mixed. Some general features: a) Important relations are recalled b) People bias toward “common” activities… c) …that are relationally salient. d) Behavior reports are more consistent than attitude reports e) Strong survey, interviewer or instrument effects.
  • 57. Data Accuracy: Survey induced error Social Network Data How reliable are network data?
  • 58. Data Accuracy: Survey induced error Social Network Data How reliable are network data? Assessing accuracy is difficult, because respondents report on relations over the last 6 months (or year, depending on type), but may be interviewed at different times.
  • 59. Data Accuracy: Survey induced error Social Network Data How reliable are network data? Once we account for observation windows and question length, we find very high concordance on dates of relations.
  • 60. Data Accuracy: Survey induced error Social Network Data How reliable are network data? For ego-level ties that were not timed, we can ask if a t1 nomination is retained: If I “ever did drugs” with you at t1, then I should also have reported doing so at future data collections. Very few relations are “recanted” (4.7% sex, 13.6% drug, 3% social).
  • 61. Data Accuracy: Survey induced error Social Network Data How reliable are network data? Ego A B Proportion of times a “matrix” tie is corroborated by a direct response? Given: How often: A B A B
  • 62. Data Accuracy: Survey induced error Social Network Data How reliable are network data? Ego A B Proportion of times a “matrix” tie is corroborated by a direct response? Given: How often:A B A B
  • 63. Data Accuracy: Survey induced error Social Network Data How reliable are network data? Why are the Colorado Springs data so much more reliable than the BKS data? a) Very dedicated data collectors b) No nomination limits on self-reports c) Highly salient relations in a small community
  • 64. • Interviewer effects – Systematic variation in responses by interviewer (Paik and Sagacharin, 2013; Marsden, 2003) • Design of the survey instrument (Lozar, Vehovar and Hlebec, 2004) • Panel Conditioning (Lazarsfeld, 1940; Warren and Halpern-Manner, 2012) – Rise of panels for basic social research (Keeter et al., 2015) – Survey memory is short (Groves, 1986) Data Accuracy: Survey induced error Social Network Data
  • 65. Data Accuracy: Survey induced error Social Network Data
  • 66. Source: Clergy Health Panel Survey 2008 Probability Respondent Names 5 Confidants Data Accuracy: Survey induced error Social Network Data
  • 67. Data Accuracy: Survey induced error Social Network Data
  • 68. Whatever method is used, data will always be incomplete. What are the implications for analysis? Example 1. Ego is a matchable person in the School Ego M M M M Out Un True Network Ego M M M M Out Un Observed Network Un Out Social Network Data Effects of missing data
  • 69. Example 2. Ego is not on the school roster M M M M M Un True Network M M M M M Un Observed Network Un Un Un Social Network Data Effects of missing data
  • 70. Example 3: Node population: 2-step neighborhood of Actor X Relational population: Any connection among all nodes 1-step 2-step 3-step F 1.1 1.2 1.3 1.4 1.5 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3.1 3.2 3.3 F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3 Full Full (0) Full (0) Full Full Full Full F F (0) F (0) Full (0) Unknown UK UK Full (0) Social Network Data Effects of missing data
  • 71. Example 4 Node population: 2-step neighborhood of Actor X Relational population: Trace, plus All connections among 1-step contacts F 1.1 1.2 1.3 1.4 1.5 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3.1 3.2 3.3 F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3 Full Full (0) Full (0) Full Full Full Unknown F F (0) F (0) Full (0) Unknown UK UK Full (0) 1-step 2-step 3-step Social Network Data Effects of missing data
  • 72. Example 5. Node population: 2-step neighborhood of Actor X Relational population: Only tracing contacts F 1.1 1.2 1.3 1.4 1.5 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3.1 3.2 3.3 F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3 Full Full (0) Full (0) Unknown Full Full Unknown F F (0) F (0) Full (0) Unknown UK UK Full (0) 1-step 2-step 3-step Social Network Data Effects of missing data
  • 73. Example 6 Node population: 2-step neighborhood from 3 focal actors Relational population: All relations among actors Full Full (0) Full (0) Full Full Full Full Full Full (0) Full (0) Full (0) Unknown UK UK Full (0) FullFocal 1-Step 2-Step 3-Step Focal 1-Step 2-Step 3-Step Social Network Data Effects of missing data
  • 74. Example 7. Node population: 1-step neighborhood from 3 focal actors Relational population: Only relations from focal nodes Full Full (0) Full (0) Unknown Unknown Unknown Unknown Full Full (0) Full (0) Full (0) Unknown UK UK Full (0) FullFocal 1-Step 2-Step 3-Step Focal 1-Step 2-Step 3-Step Social Network Data Effects of missing data
  • 75. Social Network Data Effects of missing data on measures Smith & Moody, 2014, Smith, Morgan & Moody 2016 Identify the practical effect of missing data as a measurement error problem: induce error and evaluate effect. Randomly select nodes to delete, remove their edges & recalculate statistics of interest.
  • 76. Social Network Data Effects of missing data on measures Smith & Moody, 2014
  • 77. Social Network Data Effects of missing data on measures Smith & Moody, 2014 Centrality
  • 78. Social Network Data Effects of missing data on measures Smith & Moody, 2014 Homophily
  • 79. Social Network Data Effects of missing data on measures Smith, Moody & Morgan, 2016 We redid the simulation study with non-random missingness. The results are (as expected) a bit more complicated, but the general trend is still largely good (well, except for centrality scores of course!). To make this more useful, we constructed a network bias calculator, which allows researchers to specify the amount and form of missing data, to see how it affects their results. http://www.soc.duke.edu/~jmoody77/missingdata/NetworkBiasCalculator_Jan262017.jar Smith, Jeff, Jonathan Morgan and James Moody. 2016. “Network Sampling Coverage II: The Effect of Non-random Missing Data on Network Measurement” Social Networks 48:78-99.
  • 80. Social Network Data Effects of missing data on measures Smith, Moody & Morgan, 2016 Here we assume: http://www.soc.duke.edu/~jmoody77/missingdata/NetworkBiasCalculator_Jan262017.jar Smith, Jeff, Jonathan Morgan and James Moody. 2016. “Network Sampling Coverage II: The Effect of Non-random Missing Data on Network Measurement” Social Networks 48:78-99.
  • 81. Social Network Data Effects of missing data on measures Smith, Moody & Morgan, 2016 And find that: http://www.soc.duke.edu/~jmoody77/missingdata/NetworkBiasCalculator_Jan262017.jar Smith, Jeff, Jonathan Morgan and James Moody. 2016. “Network Sampling Coverage II: The Effect of Non-random Missing Data on Network Measurement” Social Networks 48:78-99. Centrality Topology Homophily “We thus expect the correlation between the true and observed in-degree to be biased by 3.5%”
  • 82. Social Network Data Effects of missing data on measures What to do about missing data? Easy: • Do nothing. If associated error is small ignore it. This is the default, not particularly satisfying. Harder: Impute ties • If the relation has known constraints, use those (symmetry, for example) • If there is a clear association, you can use those to impute values. • If imputing and can use a randomization routine, do so (akin to multiple imputation routines) • All ad hoc. Hardest: • Model missingness with ERGM/Latent-network models. • Build a model for tie formation on observed, include structural missing & impute. Handcock & Gile have new routines for this. • Computationally intensive…but analytically not difficult.
  • 83. Summary: Data collection design & missing data affect the information at hand to draw conclusions about the system. Everything we do from now on is built on some manipulation of the observed adjacency matrix; so we want to understand what are valid and invalid conclusions due to systematic distortions on the data. Statistical modeling tools hold promise. We can build models of networks that account for missing data – we are able to “fix” the structural zeros in or models by treating them as given. This then lets us infer to the world of all graphs with that same missing data structure. These models are very new, and not widely available yet…. Social Network Data Network Data Sources: Missing Data