SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Networks
APAM E4990
Modeling Social Data
Jake Hofman
Columbia University
April 7, 2017
Jake Hofman (Columbia University) Networks April 7, 2017 1 / 16
History
Jake Hofman (Columbia University) Networks April 7, 2017 2 / 16
∼1930s: Relationships as networks
Moreno (1933)
Jake Hofman (Columbia University) Networks April 7, 2017 3 / 16
∼1960s: Random graph theory
p >
(1 + ) ln n
n
Erd˝os & R´enyi (1959)
Jake Hofman (Columbia University) Networks April 7, 2017 4 / 16
∼1970s: Clustering, weak ties
Granovetter (1973)
Jake Hofman (Columbia University) Networks April 7, 2017 5 / 16
∼1970s: Clustering, weak ties
Granovetter (1973)
Jake Hofman (Columbia University) Networks April 7, 2017 5 / 16
∼1970s: Cumulative advantage
have never been cited, about 10 percent woulld prove so distinctive that they
have been cited once, about 9 percent could be picked automatically by
twice, and so on, the percentages slowly means of citation-index-production ‘pro-
decreasing, so that half of all papers cedures and published as a single U.X
will be cited eventually five times or (or World) Journal of Really Impor-
more, and a quarter of all papers, ten tan t Papers,
In year’
100 old papers in field 91references ~n~~i~,
40
papers
not cited
in year
- .
IO cited
more
than
unce
2w
*%
2s
2T
2y
2
3
3
4
6
50 papers
cited
once
10 miscellaneous
from outside field
Fig. 3. Idealized representation of the balance of papers and citations for a given
“almost closed” field in a single year. It is assumed that the field consists of 1010
papers whose numbers have been growing exponentially at the normal rate. If we
assume that each of the seven new papers contains about 13 references to journal
papers and that about 11 percent of these 91 cited papers (or ten papers) are outside
the field, we find that 50 of the old papers are connected by one citation each to the
new papers (these links are not shown) and that 40 of the old papers are not cited
at all during the year. The seven new papers, then, are linked to ten sf the old ones
by the complex network shown here,
512
relation, if one exists, is very smalf,
Certainly, there is no strong tendency
for review papers ‘to be cited unusually
often Tf my conjecture is valid, it is
worth noting that, since 10 percent of
all papers contain no ~bibliogrXapbicref-
erencesand another, presumably almost
independent, 10 percent of all pa.pers
are never cited, it follows that there
is a lower Ibound of -1.percent of all
papers on the number of papers tlhat
are totally disconnected in a pure ci-
tation network and could be found
only by topical indexing or similar
methods; this is a very small class, and
probaibly a most unim:portant one.
The balance of references and ci-
tations in a single. year indicates one
very important attribute of the net-
work (seeFig. 3). Although most papers
produced in the year contain a near-
average number of bibliographic refer-
ences, half of these are references to
about half of all the papers that have
been published in previous years. The
other half of the references tie these
new papers to a quite small group of
earlier ones, and generate a rather tight
pattern of multiple relationships. Thus
each group of new papers is “knitted”
to a small, select part of the existing
scientific literature tbut connected rath-
er weakly and randomly to a much
greater part. Since only a small part of
the earlier literature is knitted together
by the new year’s crop of papers, we
may look upon this small part as a sort
of growing tip or epidermal Jayer, an
active research front. I believe it is the
existence of a research front, in this
sense, that distinguishes the sciences
from the rest of scholarship, a.nd, be-
cause of it, I propose that one of the
major ,tasks of statistical analysis is to
determine the mechanism that enables
science to cumulate so ~much faster than
nonscience that it produces a literature
crisis,
An analysis of the distribution of
publication dates of all -papers cited in
a single year (Fig. 4) sheds further
light on the existence of such a research
front. Taking [from Garfield (2)] data
for 1961, the ‘most numerous count
SCIENCE, VOL. 149
de Solla Price (1965, 1976)
Jake Hofman (Columbia University) Networks April 7, 2017 6 / 16
∼1970s: Cumulative advantage
41
dex.
ndex.
d data for
rterly and
I fmd for
five years,
and inde-
ues of 1.4,
efore that
the quin-
nafifth of
we should
for n = 29,655 we have m =0.53.
. .
2 . . Dimibution
1
10 100
Fig. I . Number of papers with (a) exactly and (b)at least n cita-
tions in %, 1, and 5-year indexes.
fomation Science-September-October 1976
de Solla Price (1965, 1976)
Jake Hofman (Columbia University) Networks April 7, 2017 6 / 16
∼1970s: Small-world networks
Watts & Strogatz (1998)
Jake Hofman (Columbia University) Networks April 7, 2017 7 / 16
∼1990s: Empirical structure and dynamics of networks
Newman, Barabasi, Watts (2006)
Jake Hofman (Columbia University) Networks April 7, 2017 8 / 16
∼2000s: Homophily, contagion, and all that
Figure 1: Community structure of political blogs (expanded set), shown using utilizing the GUESS visual-
ization and analysis tool[2]. The colors reflect political orientation, red for conservative, and blue for liberal.
Orange links go from liberal to conservative, and purple ones from conservative to liberal. The size of each
blog reflects the number of other blogs that link to it.
Because of bloggers’ ability to identify and frame break-
ing news, many mainstream media sources keep a close eye
on the best known political blogs. A number of mainstream
news sources have started to discuss and even to host blogs.
neighborhoods of Atrios, a popular liberal blog, and In-
stapundit, a popular conservative blog. He found the In-
stapundit neighborhood to include many more blogs than
the Atrios one, and observed no overlap in the URLs cited
Adamic & Glance (2005)
Jake Hofman (Columbia University) Networks April 7, 2017 9 / 16
Types of networks
Jake Hofman (Columbia University) Networks April 7, 2017 10 / 16
Types of networks
Networks are a useful abstractions for many different types of data
• Social networks (e.g., Facebook)
• Information networks (e.g., the Web)
• Activity networks (e.g., email)
• Biological networks (e.g., protein interactions)
• Geographical networks (e.g., roads)
Jake Hofman (Columbia University) Networks April 7, 2017 11 / 16
Representations
There are many different levels of abstraction for representing
networks (e.g., directed, weighted, metadata, etc.)
32 CHAPTER 2. GRAPHS
B
A
C D
(a) A graph on 4 nodes.
B
A
C D
(b) A directed graph on 4 nodes.
Figure 2.1: Two graphs: (a) an undirected graphs, and (b) a directed graph.
will be undirected unless noted otherwise.
Graphs as Models of Networks. Graphs are useful because they serve as mathematical
models of network structures. With this in mind, it is useful before going further to replace
the toy examples in Figure 2.1 with a real example. Figure 2.2 depicts the network structureJake Hofman (Columbia University) Networks April 7, 2017 12 / 16
Representations
There are many different levels of abstraction for representing
networks (e.g., directed, weighted, metadata, etc.)
2.2. PATHS AND CONNECTIVITY 33
Jake Hofman (Columbia University) Networks April 7, 2017 12 / 16
Representations
There are many different levels of abstraction for representing
networks (e.g., directed, weighted, metadata, etc.)
Relational Topic Models for Document Networks
52
478
430
2487
75
288
1123
2122
2299
1354
1854
1855
89
635
92
2438
136
479
109
640
119
686
120
1959
1539
147
172
177
965
911
2192
1489
885
178
378
286
208
1569
2343
1270
218
1290
223
227
236
1617
254
1176
256
634
264
1963
2195
1377
303
426
2091
313
1642
534
801
335
344
585
1244
2291
2617
1627
2290
1275
375
1027
396
1678
2447
2583
1061 692
1207
960
1238
2012
1644
2042
381
418
1792
1284
651
524
1165
2197
1568
2593
1698
547 683
2137 1637
2557
2033
632
1020
436
442
449
474
649
2636
2300
539
541
603
1047
722
660
806
1121
1138
831
837
1335
902
964
966
981
1673
1140
1481
1432
1253
1590
1060
992
994
1001
1010
1651
1578
1039
1040
1344
1345
1348
1355
1420
1089
1483
1188
1674
1680
2272
1285
1592
1234
1304
1317
1426
1695
1465
1743
1944
2259
2213
We address the problem of
finding a subset of features that
allows a supervised induction
algorithm to induce small high-
accuracy concepts...
Irrelevant features and the
subset selection problem
In many domains, an appropriate
inductive bias is the MIN-
FEATURES bias, which prefers
consistent hypotheses definable
over as few features as
possible...
Learning with many irrelevant
features
In this introduction, we define the
term bias as it is used in machine
learning systems. We motivate
the importance of automated
methods for evaluating...
Evaluation and selection of
biases in machine learning
The inductive learning problem
consists of learning a concept
given examples and
nonexamples of the concept. To
perform this learning task,
inductive learning algorithms bias
their learning method...
Utilizing prior concepts for
learning
The problem of learning decision
rules for sequential tasks is
addressed, focusing on the
problem of learning tactical plans
from a simple flight simulator
where a plane must avoid a
missile...
Improving tactical plans with
genetic algorithms
Evolutionary learning methods
have been found to be useful in
several areas in the development
of intelligent robots. In the
approach described here,
evolutionary...
An evolutionary approach to
learning in robots
Navigation through obstacles
such as mine fields is an
important capability for
autonomous underwater vehicles.
One way to produce robust
behavior...
Using a genetic algorithm to
learn strategies for collision
avoidance and local
navigation
...
...
...
...
...
...
...
...
...
...
Figure 1: Example data appropriate for the relational topic model. Each document is represented as a bag of words and
linked to other documents via citation. The RTM defines a joint distribution over the words in each document and the
citation links between them.
The RTM is based on latent Dirichlet allocation (LDA)
(Blei et al. 2003). LDA is a generative probabilistic model
that uses a set of “topics,” distributions over a fixed vocab-
Figure 2 illustrates the graphical model for this process for
a single pair of documents. The full model, which is dif-
ficult to illustrate, contains the observed words from all DJake Hofman (Columbia University) Networks April 7, 2017 12 / 16
Which network?
3.4. TIE STRENGTH, SOCIAL MEDIA, AND PASSIVE ENGAGEMENT 69
All Friends
One-way Communication Mutual Communication
Maintained Relationships
Figure 3.8: Four di erent views of a Facebook user’s network neighborhood, showing the
structure of links coresponding respectively to all declared friendships, maintained relation-
ships, one-way communication, and reciprocal (i.e. mutual) communication. (Image from
[281].)
Notice that these three categories are not mutually exclusive — indeed, the links classified
as reciprocal communication always belong to the set of links classified as one-way commu-
nication.Jake Hofman (Columbia University) Networks April 7, 2017 13 / 16
Which network?636 CHAPTER 20. THE SMALL-WORLD PHENOMENON
Figure 20.12: The pattern of e-mail communication among 436 employees of Hewlett
Packard Research Lab is superimposed on the o⌅cial organizational hierarchy, show-
ing how network links span di erent social foci [6]. (Image from http://www-
personal.umich.edu/ ladamic/img/hplabsemailhierarchy.jpg)
Social Foci and Social Distance. When we first discussed the Watts-Strogatz model inJake Hofman (Columbia University) Networks April 7, 2017 13 / 16
Which network?
Figure 1: Topology of the largest components over various choices of threshold conditions for (a) a dataset
based on email server logs at a US university, and (b) the Enron email corpus. Significant changes in topology
are observed as the thresholding condition of the network is varied.
where alternative definitions are considered [15, 17], the pur-
pose is exclusively to serve as a robustness check on the find-
ings; thus the scope of possibilities is typically limited to
within some range of the original choice of threshold. Most
closely related to the current work are two recent studies us-
ing mobile phone data [27, 9]. In [27], the authors systemat-
ically deleted edges as a function of call frequency in order to
investigate the connectivity of the network, and its impact
The emails contain encrypted IDs of the sender and recipi-
ent(s) of each email and the timestamp, but do not contain
the content. The dataset also features several (anonymized)
personal attributes, including status, gender, age, depart-
mental affiliation, number of years in the community, dorm
and home zipcode information for the students, as well as
course affiliations for the students at each semester.
In order to focus on a population of users who use emails
WWW 2010 • Full Paper April 26-30 • Raleigh • NC • USA
Jake Hofman (Columbia University) Networks April 7, 2017 13 / 16
Data structures
[ [0,1], [0,6], [0,8], [1,4], [1,6],
[1,9], [2,4], [2,6], [3,4], [3,5],
[3,8], [4,5], [4,9], [7,8], [7,9] ]
Simple for storage, but difficult
to compute with
Jake Hofman (Columbia University) Networks April 7, 2017 14 / 16
Data structures
Adjacency matrix
Quick to check edges, good for
linear algebra, often sparse
Jake Hofman (Columbia University) Networks April 7, 2017 14 / 16
Data structures
Adjacency list
Good for graph traversal
Jake Hofman (Columbia University) Networks April 7, 2017 14 / 16
Describing networks
Jake Hofman (Columbia University) Networks April 7, 2017 15 / 16
Descriptive statistics
• Degree: How many connections does a node have?
• Path length: What’s the shortest path between two nodes?
• Clustering: How many friends of friends are also friends?
• Components: How many disconnected parts does the network
have?
Jake Hofman (Columbia University) Networks April 7, 2017 16 / 16
Algorithms for Descriptive statistics
• Degree: How many connections does a node have?
→ Degree distributions
• Path length: What’s the shortest path between two nodes?
→ Breadth first search
• Clustering: How many friends of friends are also friends?
→ Triangle counting
• Components: How many disconnected parts does the network
have?
→ Connected components
Jake Hofman (Columbia University) Networks April 7, 2017 16 / 16

Weitere ähnliche Inhalte

Was ist angesagt?

Privacy Concerns and Social Robots
Privacy Concerns and Social Robots Privacy Concerns and Social Robots
Privacy Concerns and Social Robots Christoph Lutz
 
South Korea’s 2007 presidential election
South Korea’s 2007 presidential electionSouth Korea’s 2007 presidential election
South Korea’s 2007 presidential electionHan Woo PARK
 
Political Institutions and Online campaigning
Political Institutions and Online campaigningPolitical Institutions and Online campaigning
Political Institutions and Online campaigningNickAnstead
 
I3 presentation john mowbray
I3 presentation john mowbrayI3 presentation john mowbray
I3 presentation john mowbrayJohn Mowbray
 
Fake News Detector
Fake News DetectorFake News Detector
Fake News DetectorIrisYoon5
 
AAPOR 2012 Langer AASRO
AAPOR 2012 Langer AASROAAPOR 2012 Langer AASRO
AAPOR 2012 Langer AASROLangerResearch
 
AAPOR 2012 Langer Election
AAPOR 2012 Langer ElectionAAPOR 2012 Langer Election
AAPOR 2012 Langer ElectionLangerResearch
 
Trust Us, Again? Twitter Campaigning Strategies in the 2019 Australian Federa...
Trust Us, Again? Twitter Campaigning Strategies in the 2019 Australian Federa...Trust Us, Again? Twitter Campaigning Strategies in the 2019 Australian Federa...
Trust Us, Again? Twitter Campaigning Strategies in the 2019 Australian Federa...Axel Bruns
 
Community Structure in Congressional Conversation Networks
Community Structure in Congressional Conversation NetworksCommunity Structure in Congressional Conversation Networks
Community Structure in Congressional Conversation NetworksIllinois Institute of Technology
 
Improving Your Surveys and Questionnaires with Cognitive Interviewing
Improving Your Surveys and Questionnaires with Cognitive InterviewingImproving Your Surveys and Questionnaires with Cognitive Interviewing
Improving Your Surveys and Questionnaires with Cognitive InterviewingUXPA International
 
Social Network Analysis and Interstate Mobility
Social Network Analysis and Interstate MobilitySocial Network Analysis and Interstate Mobility
Social Network Analysis and Interstate MobilityMatthew Hendrickson
 
It's Not the Technology, Stupid: How the ‘Echo Chamber’ and ‘Filter Bubble’ M...
It's Not the Technology, Stupid: How the ‘Echo Chamber’ and ‘Filter Bubble’ M...It's Not the Technology, Stupid: How the ‘Echo Chamber’ and ‘Filter Bubble’ M...
It's Not the Technology, Stupid: How the ‘Echo Chamber’ and ‘Filter Bubble’ M...Axel Bruns
 

Was ist angesagt? (13)

Privacy Concerns and Social Robots
Privacy Concerns and Social Robots Privacy Concerns and Social Robots
Privacy Concerns and Social Robots
 
South Korea’s 2007 presidential election
South Korea’s 2007 presidential electionSouth Korea’s 2007 presidential election
South Korea’s 2007 presidential election
 
Political Institutions and Online campaigning
Political Institutions and Online campaigningPolitical Institutions and Online campaigning
Political Institutions and Online campaigning
 
I3 presentation john mowbray
I3 presentation john mowbrayI3 presentation john mowbray
I3 presentation john mowbray
 
Fake News Detector
Fake News DetectorFake News Detector
Fake News Detector
 
AAPOR 2012 Langer AASRO
AAPOR 2012 Langer AASROAAPOR 2012 Langer AASRO
AAPOR 2012 Langer AASRO
 
AAPOR 2012 Langer Election
AAPOR 2012 Langer ElectionAAPOR 2012 Langer Election
AAPOR 2012 Langer Election
 
Trust Us, Again? Twitter Campaigning Strategies in the 2019 Australian Federa...
Trust Us, Again? Twitter Campaigning Strategies in the 2019 Australian Federa...Trust Us, Again? Twitter Campaigning Strategies in the 2019 Australian Federa...
Trust Us, Again? Twitter Campaigning Strategies in the 2019 Australian Federa...
 
Community Structure in Congressional Conversation Networks
Community Structure in Congressional Conversation NetworksCommunity Structure in Congressional Conversation Networks
Community Structure in Congressional Conversation Networks
 
Improving Your Surveys and Questionnaires with Cognitive Interviewing
Improving Your Surveys and Questionnaires with Cognitive InterviewingImproving Your Surveys and Questionnaires with Cognitive Interviewing
Improving Your Surveys and Questionnaires with Cognitive Interviewing
 
Social Network Analysis and Interstate Mobility
Social Network Analysis and Interstate MobilitySocial Network Analysis and Interstate Mobility
Social Network Analysis and Interstate Mobility
 
Monitoring of the Last US Presidential Elections
Monitoring of the Last US Presidential ElectionsMonitoring of the Last US Presidential Elections
Monitoring of the Last US Presidential Elections
 
It's Not the Technology, Stupid: How the ‘Echo Chamber’ and ‘Filter Bubble’ M...
It's Not the Technology, Stupid: How the ‘Echo Chamber’ and ‘Filter Bubble’ M...It's Not the Technology, Stupid: How the ‘Echo Chamber’ and ‘Filter Bubble’ M...
It's Not the Technology, Stupid: How the ‘Echo Chamber’ and ‘Filter Bubble’ M...
 

Ähnlich wie Modeling Social Data, Lecture 10: Networks

Why canceling subscriptions may just yet save scholarship
Why canceling subscriptions may just yet save scholarshipWhy canceling subscriptions may just yet save scholarship
Why canceling subscriptions may just yet save scholarshipBjörn Brembs
 
From Moby Dick To Mashups (Revised)
From Moby Dick To Mashups (Revised)From Moby Dick To Mashups (Revised)
From Moby Dick To Mashups (Revised)Ronald Murray
 
Mining and Supporting Community Structures in Sensor Network Research
Mining and Supporting Community Structures in Sensor Network ResearchMining and Supporting Community Structures in Sensor Network Research
Mining and Supporting Community Structures in Sensor Network ResearchMarko Rodriguez
 
Review of "Tastes, ties, and time: A new social network dataset using Faceboo...
Review of "Tastes, ties, and time: A new social network dataset using Faceboo...Review of "Tastes, ties, and time: A new social network dataset using Faceboo...
Review of "Tastes, ties, and time: A new social network dataset using Faceboo...Marco Frassoni
 
The ability to find and select information online
The ability to find and select information onlineThe ability to find and select information online
The ability to find and select information onlineHarryRoss3
 
Collaborative research network and scientific productivity
Collaborative research network and scientific productivityCollaborative research network and scientific productivity
Collaborative research network and scientific productivityHanbat National Univerisity
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisHendrik Speck
 
Small Worlds Social Graphs Social Media
Small Worlds Social Graphs Social MediaSmall Worlds Social Graphs Social Media
Small Worlds Social Graphs Social Mediasuresh sood
 
The three infrastructure crises in science
The three infrastructure crises in scienceThe three infrastructure crises in science
The three infrastructure crises in scienceBjörn Brembs
 
NERM 2006: Introduction to the future of scholarly communication
NERM 2006: Introduction to the future of scholarly communicationNERM 2006: Introduction to the future of scholarly communication
NERM 2006: Introduction to the future of scholarly communicationElizabeth Brown
 
Scientometric analysis of contributions to the journal college and research l...
Scientometric analysis of contributions to the journal college and research l...Scientometric analysis of contributions to the journal college and research l...
Scientometric analysis of contributions to the journal college and research l...Ghouse Modin Mamdapur
 
(LEA's communication series) Wicks, Jan LeBlanc - Media management _ a casebo...
(LEA's communication series) Wicks, Jan LeBlanc - Media management _ a casebo...(LEA's communication series) Wicks, Jan LeBlanc - Media management _ a casebo...
(LEA's communication series) Wicks, Jan LeBlanc - Media management _ a casebo...tungtran376667
 
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicThe Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicDavid De Roure
 
A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...
A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...
A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...Susan Campos
 
A Survey Of The First 20 Years Of Research On Semantic Web And Linked Data
A Survey Of The First 20 Years Of Research On Semantic Web And Linked DataA Survey Of The First 20 Years Of Research On Semantic Web And Linked Data
A Survey Of The First 20 Years Of Research On Semantic Web And Linked DataKelly Lipiec
 
Open access for researchers and students, research managers and publishers
Open access  for researchers and students, research managers and publishersOpen access  for researchers and students, research managers and publishers
Open access for researchers and students, research managers and publishersIryna Kuchma
 
Cni Dec 2007 Copyright And Mass Dig For Cni
Cni Dec 2007 Copyright And Mass Dig For CniCni Dec 2007 Copyright And Mass Dig For Cni
Cni Dec 2007 Copyright And Mass Dig For CniNancy Elkington
 
Open access for researchers and research managers
Open access  for researchers and research managersOpen access  for researchers and research managers
Open access for researchers and research managersIryna Kuchma
 
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
2010-November-8-NIA - Smart Society and Civic Culture - Marc SmithMarc Smith
 

Ähnlich wie Modeling Social Data, Lecture 10: Networks (20)

Why canceling subscriptions may just yet save scholarship
Why canceling subscriptions may just yet save scholarshipWhy canceling subscriptions may just yet save scholarship
Why canceling subscriptions may just yet save scholarship
 
From Moby Dick To Mashups (Revised)
From Moby Dick To Mashups (Revised)From Moby Dick To Mashups (Revised)
From Moby Dick To Mashups (Revised)
 
Mining and Supporting Community Structures in Sensor Network Research
Mining and Supporting Community Structures in Sensor Network ResearchMining and Supporting Community Structures in Sensor Network Research
Mining and Supporting Community Structures in Sensor Network Research
 
Review of "Tastes, ties, and time: A new social network dataset using Faceboo...
Review of "Tastes, ties, and time: A new social network dataset using Faceboo...Review of "Tastes, ties, and time: A new social network dataset using Faceboo...
Review of "Tastes, ties, and time: A new social network dataset using Faceboo...
 
The ability to find and select information online
The ability to find and select information onlineThe ability to find and select information online
The ability to find and select information online
 
Collaborative research network and scientific productivity
Collaborative research network and scientific productivityCollaborative research network and scientific productivity
Collaborative research network and scientific productivity
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network Analysis
 
Small Worlds Social Graphs Social Media
Small Worlds Social Graphs Social MediaSmall Worlds Social Graphs Social Media
Small Worlds Social Graphs Social Media
 
The three infrastructure crises in science
The three infrastructure crises in scienceThe three infrastructure crises in science
The three infrastructure crises in science
 
NERM 2006: Introduction to the future of scholarly communication
NERM 2006: Introduction to the future of scholarly communicationNERM 2006: Introduction to the future of scholarly communication
NERM 2006: Introduction to the future of scholarly communication
 
Scientometric analysis of contributions to the journal college and research l...
Scientometric analysis of contributions to the journal college and research l...Scientometric analysis of contributions to the journal college and research l...
Scientometric analysis of contributions to the journal college and research l...
 
(LEA's communication series) Wicks, Jan LeBlanc - Media management _ a casebo...
(LEA's communication series) Wicks, Jan LeBlanc - Media management _ a casebo...(LEA's communication series) Wicks, Jan LeBlanc - Media management _ a casebo...
(LEA's communication series) Wicks, Jan LeBlanc - Media management _ a casebo...
 
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicThe Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and Music
 
DREaM Event 2: Louise Cooke
DREaM Event 2: Louise CookeDREaM Event 2: Louise Cooke
DREaM Event 2: Louise Cooke
 
A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...
A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...
A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...
 
A Survey Of The First 20 Years Of Research On Semantic Web And Linked Data
A Survey Of The First 20 Years Of Research On Semantic Web And Linked DataA Survey Of The First 20 Years Of Research On Semantic Web And Linked Data
A Survey Of The First 20 Years Of Research On Semantic Web And Linked Data
 
Open access for researchers and students, research managers and publishers
Open access  for researchers and students, research managers and publishersOpen access  for researchers and students, research managers and publishers
Open access for researchers and students, research managers and publishers
 
Cni Dec 2007 Copyright And Mass Dig For Cni
Cni Dec 2007 Copyright And Mass Dig For CniCni Dec 2007 Copyright And Mass Dig For Cni
Cni Dec 2007 Copyright And Mass Dig For Cni
 
Open access for researchers and research managers
Open access  for researchers and research managersOpen access  for researchers and research managers
Open access for researchers and research managers
 
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
 

Mehr von jakehofman

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2jakehofman
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1jakehofman
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classificationjakehofman
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationjakehofman
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1jakehofman
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in Rjakehofman
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systemsjakehofman
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayesjakehofman
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Sciencejakehofman
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classificationjakehofman
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regressionjakehofman
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experimentsjakehofman
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wranglingjakehofman
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIjakehofman
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part Ijakehofman
 
Computational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part IIComputational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part IIjakehofman
 
Computational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part IComputational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part Ijakehofman
 

Mehr von jakehofman (20)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classification
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regression
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experiments
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wrangling
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part I
 
Computational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part IIComputational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part II
 
Computational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part IComputational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part I
 

Kürzlich hochgeladen

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 

Kürzlich hochgeladen (20)

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 

Modeling Social Data, Lecture 10: Networks

  • 1. Networks APAM E4990 Modeling Social Data Jake Hofman Columbia University April 7, 2017 Jake Hofman (Columbia University) Networks April 7, 2017 1 / 16
  • 2. History Jake Hofman (Columbia University) Networks April 7, 2017 2 / 16
  • 3. ∼1930s: Relationships as networks Moreno (1933) Jake Hofman (Columbia University) Networks April 7, 2017 3 / 16
  • 4. ∼1960s: Random graph theory p > (1 + ) ln n n Erd˝os & R´enyi (1959) Jake Hofman (Columbia University) Networks April 7, 2017 4 / 16
  • 5. ∼1970s: Clustering, weak ties Granovetter (1973) Jake Hofman (Columbia University) Networks April 7, 2017 5 / 16
  • 6. ∼1970s: Clustering, weak ties Granovetter (1973) Jake Hofman (Columbia University) Networks April 7, 2017 5 / 16
  • 7. ∼1970s: Cumulative advantage have never been cited, about 10 percent woulld prove so distinctive that they have been cited once, about 9 percent could be picked automatically by twice, and so on, the percentages slowly means of citation-index-production ‘pro- decreasing, so that half of all papers cedures and published as a single U.X will be cited eventually five times or (or World) Journal of Really Impor- more, and a quarter of all papers, ten tan t Papers, In year’ 100 old papers in field 91references ~n~~i~, 40 papers not cited in year - . IO cited more than unce 2w *% 2s 2T 2y 2 3 3 4 6 50 papers cited once 10 miscellaneous from outside field Fig. 3. Idealized representation of the balance of papers and citations for a given “almost closed” field in a single year. It is assumed that the field consists of 1010 papers whose numbers have been growing exponentially at the normal rate. If we assume that each of the seven new papers contains about 13 references to journal papers and that about 11 percent of these 91 cited papers (or ten papers) are outside the field, we find that 50 of the old papers are connected by one citation each to the new papers (these links are not shown) and that 40 of the old papers are not cited at all during the year. The seven new papers, then, are linked to ten sf the old ones by the complex network shown here, 512 relation, if one exists, is very smalf, Certainly, there is no strong tendency for review papers ‘to be cited unusually often Tf my conjecture is valid, it is worth noting that, since 10 percent of all papers contain no ~bibliogrXapbicref- erencesand another, presumably almost independent, 10 percent of all pa.pers are never cited, it follows that there is a lower Ibound of -1.percent of all papers on the number of papers tlhat are totally disconnected in a pure ci- tation network and could be found only by topical indexing or similar methods; this is a very small class, and probaibly a most unim:portant one. The balance of references and ci- tations in a single. year indicates one very important attribute of the net- work (seeFig. 3). Although most papers produced in the year contain a near- average number of bibliographic refer- ences, half of these are references to about half of all the papers that have been published in previous years. The other half of the references tie these new papers to a quite small group of earlier ones, and generate a rather tight pattern of multiple relationships. Thus each group of new papers is “knitted” to a small, select part of the existing scientific literature tbut connected rath- er weakly and randomly to a much greater part. Since only a small part of the earlier literature is knitted together by the new year’s crop of papers, we may look upon this small part as a sort of growing tip or epidermal Jayer, an active research front. I believe it is the existence of a research front, in this sense, that distinguishes the sciences from the rest of scholarship, a.nd, be- cause of it, I propose that one of the major ,tasks of statistical analysis is to determine the mechanism that enables science to cumulate so ~much faster than nonscience that it produces a literature crisis, An analysis of the distribution of publication dates of all -papers cited in a single year (Fig. 4) sheds further light on the existence of such a research front. Taking [from Garfield (2)] data for 1961, the ‘most numerous count SCIENCE, VOL. 149 de Solla Price (1965, 1976) Jake Hofman (Columbia University) Networks April 7, 2017 6 / 16
  • 8. ∼1970s: Cumulative advantage 41 dex. ndex. d data for rterly and I fmd for five years, and inde- ues of 1.4, efore that the quin- nafifth of we should for n = 29,655 we have m =0.53. . . 2 . . Dimibution 1 10 100 Fig. I . Number of papers with (a) exactly and (b)at least n cita- tions in %, 1, and 5-year indexes. fomation Science-September-October 1976 de Solla Price (1965, 1976) Jake Hofman (Columbia University) Networks April 7, 2017 6 / 16
  • 9. ∼1970s: Small-world networks Watts & Strogatz (1998) Jake Hofman (Columbia University) Networks April 7, 2017 7 / 16
  • 10. ∼1990s: Empirical structure and dynamics of networks Newman, Barabasi, Watts (2006) Jake Hofman (Columbia University) Networks April 7, 2017 8 / 16
  • 11. ∼2000s: Homophily, contagion, and all that Figure 1: Community structure of political blogs (expanded set), shown using utilizing the GUESS visual- ization and analysis tool[2]. The colors reflect political orientation, red for conservative, and blue for liberal. Orange links go from liberal to conservative, and purple ones from conservative to liberal. The size of each blog reflects the number of other blogs that link to it. Because of bloggers’ ability to identify and frame break- ing news, many mainstream media sources keep a close eye on the best known political blogs. A number of mainstream news sources have started to discuss and even to host blogs. neighborhoods of Atrios, a popular liberal blog, and In- stapundit, a popular conservative blog. He found the In- stapundit neighborhood to include many more blogs than the Atrios one, and observed no overlap in the URLs cited Adamic & Glance (2005) Jake Hofman (Columbia University) Networks April 7, 2017 9 / 16
  • 12. Types of networks Jake Hofman (Columbia University) Networks April 7, 2017 10 / 16
  • 13. Types of networks Networks are a useful abstractions for many different types of data • Social networks (e.g., Facebook) • Information networks (e.g., the Web) • Activity networks (e.g., email) • Biological networks (e.g., protein interactions) • Geographical networks (e.g., roads) Jake Hofman (Columbia University) Networks April 7, 2017 11 / 16
  • 14. Representations There are many different levels of abstraction for representing networks (e.g., directed, weighted, metadata, etc.) 32 CHAPTER 2. GRAPHS B A C D (a) A graph on 4 nodes. B A C D (b) A directed graph on 4 nodes. Figure 2.1: Two graphs: (a) an undirected graphs, and (b) a directed graph. will be undirected unless noted otherwise. Graphs as Models of Networks. Graphs are useful because they serve as mathematical models of network structures. With this in mind, it is useful before going further to replace the toy examples in Figure 2.1 with a real example. Figure 2.2 depicts the network structureJake Hofman (Columbia University) Networks April 7, 2017 12 / 16
  • 15. Representations There are many different levels of abstraction for representing networks (e.g., directed, weighted, metadata, etc.) 2.2. PATHS AND CONNECTIVITY 33 Jake Hofman (Columbia University) Networks April 7, 2017 12 / 16
  • 16. Representations There are many different levels of abstraction for representing networks (e.g., directed, weighted, metadata, etc.) Relational Topic Models for Document Networks 52 478 430 2487 75 288 1123 2122 2299 1354 1854 1855 89 635 92 2438 136 479 109 640 119 686 120 1959 1539 147 172 177 965 911 2192 1489 885 178 378 286 208 1569 2343 1270 218 1290 223 227 236 1617 254 1176 256 634 264 1963 2195 1377 303 426 2091 313 1642 534 801 335 344 585 1244 2291 2617 1627 2290 1275 375 1027 396 1678 2447 2583 1061 692 1207 960 1238 2012 1644 2042 381 418 1792 1284 651 524 1165 2197 1568 2593 1698 547 683 2137 1637 2557 2033 632 1020 436 442 449 474 649 2636 2300 539 541 603 1047 722 660 806 1121 1138 831 837 1335 902 964 966 981 1673 1140 1481 1432 1253 1590 1060 992 994 1001 1010 1651 1578 1039 1040 1344 1345 1348 1355 1420 1089 1483 1188 1674 1680 2272 1285 1592 1234 1304 1317 1426 1695 1465 1743 1944 2259 2213 We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small high- accuracy concepts... Irrelevant features and the subset selection problem In many domains, an appropriate inductive bias is the MIN- FEATURES bias, which prefers consistent hypotheses definable over as few features as possible... Learning with many irrelevant features In this introduction, we define the term bias as it is used in machine learning systems. We motivate the importance of automated methods for evaluating... Evaluation and selection of biases in machine learning The inductive learning problem consists of learning a concept given examples and nonexamples of the concept. To perform this learning task, inductive learning algorithms bias their learning method... Utilizing prior concepts for learning The problem of learning decision rules for sequential tasks is addressed, focusing on the problem of learning tactical plans from a simple flight simulator where a plane must avoid a missile... Improving tactical plans with genetic algorithms Evolutionary learning methods have been found to be useful in several areas in the development of intelligent robots. In the approach described here, evolutionary... An evolutionary approach to learning in robots Navigation through obstacles such as mine fields is an important capability for autonomous underwater vehicles. One way to produce robust behavior... Using a genetic algorithm to learn strategies for collision avoidance and local navigation ... ... ... ... ... ... ... ... ... ... Figure 1: Example data appropriate for the relational topic model. Each document is represented as a bag of words and linked to other documents via citation. The RTM defines a joint distribution over the words in each document and the citation links between them. The RTM is based on latent Dirichlet allocation (LDA) (Blei et al. 2003). LDA is a generative probabilistic model that uses a set of “topics,” distributions over a fixed vocab- Figure 2 illustrates the graphical model for this process for a single pair of documents. The full model, which is dif- ficult to illustrate, contains the observed words from all DJake Hofman (Columbia University) Networks April 7, 2017 12 / 16
  • 17. Which network? 3.4. TIE STRENGTH, SOCIAL MEDIA, AND PASSIVE ENGAGEMENT 69 All Friends One-way Communication Mutual Communication Maintained Relationships Figure 3.8: Four di erent views of a Facebook user’s network neighborhood, showing the structure of links coresponding respectively to all declared friendships, maintained relation- ships, one-way communication, and reciprocal (i.e. mutual) communication. (Image from [281].) Notice that these three categories are not mutually exclusive — indeed, the links classified as reciprocal communication always belong to the set of links classified as one-way commu- nication.Jake Hofman (Columbia University) Networks April 7, 2017 13 / 16
  • 18. Which network?636 CHAPTER 20. THE SMALL-WORLD PHENOMENON Figure 20.12: The pattern of e-mail communication among 436 employees of Hewlett Packard Research Lab is superimposed on the o⌅cial organizational hierarchy, show- ing how network links span di erent social foci [6]. (Image from http://www- personal.umich.edu/ ladamic/img/hplabsemailhierarchy.jpg) Social Foci and Social Distance. When we first discussed the Watts-Strogatz model inJake Hofman (Columbia University) Networks April 7, 2017 13 / 16
  • 19. Which network? Figure 1: Topology of the largest components over various choices of threshold conditions for (a) a dataset based on email server logs at a US university, and (b) the Enron email corpus. Significant changes in topology are observed as the thresholding condition of the network is varied. where alternative definitions are considered [15, 17], the pur- pose is exclusively to serve as a robustness check on the find- ings; thus the scope of possibilities is typically limited to within some range of the original choice of threshold. Most closely related to the current work are two recent studies us- ing mobile phone data [27, 9]. In [27], the authors systemat- ically deleted edges as a function of call frequency in order to investigate the connectivity of the network, and its impact The emails contain encrypted IDs of the sender and recipi- ent(s) of each email and the timestamp, but do not contain the content. The dataset also features several (anonymized) personal attributes, including status, gender, age, depart- mental affiliation, number of years in the community, dorm and home zipcode information for the students, as well as course affiliations for the students at each semester. In order to focus on a population of users who use emails WWW 2010 • Full Paper April 26-30 • Raleigh • NC • USA Jake Hofman (Columbia University) Networks April 7, 2017 13 / 16
  • 20. Data structures [ [0,1], [0,6], [0,8], [1,4], [1,6], [1,9], [2,4], [2,6], [3,4], [3,5], [3,8], [4,5], [4,9], [7,8], [7,9] ] Simple for storage, but difficult to compute with Jake Hofman (Columbia University) Networks April 7, 2017 14 / 16
  • 21. Data structures Adjacency matrix Quick to check edges, good for linear algebra, often sparse Jake Hofman (Columbia University) Networks April 7, 2017 14 / 16
  • 22. Data structures Adjacency list Good for graph traversal Jake Hofman (Columbia University) Networks April 7, 2017 14 / 16
  • 23. Describing networks Jake Hofman (Columbia University) Networks April 7, 2017 15 / 16
  • 24. Descriptive statistics • Degree: How many connections does a node have? • Path length: What’s the shortest path between two nodes? • Clustering: How many friends of friends are also friends? • Components: How many disconnected parts does the network have? Jake Hofman (Columbia University) Networks April 7, 2017 16 / 16
  • 25. Algorithms for Descriptive statistics • Degree: How many connections does a node have? → Degree distributions • Path length: What’s the shortest path between two nodes? → Breadth first search • Clustering: How many friends of friends are also friends? → Triangle counting • Components: How many disconnected parts does the network have? → Connected components Jake Hofman (Columbia University) Networks April 7, 2017 16 / 16