6. Bonacich, P. (2004).
The Invasion of the Physicists. Social Networks 26(3): 285-288
Graph structure in the web
7. Introduction
Webometricsis broadly defined as the study of web-
based content (e.g.,text,images,audio-visual objects,and
hyperlinks) with primarily quantitative indicatorsfor
social science research goals and visualization techniques
derived from information science and social network
analysis.
8. 8
• Han Woo Park
- “hidden” and “relational” data about
lots of people as well as the few
individuals, or small groups
• Lev Manovich
- “surface” data about lots of people (i.e.,
statistical, mathematical or computational
techniques for analyzing data)
- “deep” data about the few individuals or small
groups (i.e., hermeneutics, participant
observation, thick description, semiotics, and
close reading)
9. First type of Webometrics
• Hyperlink Network Analysis
- Inter-linkage: who linked to whom matrix
- Co-inlink: a link to two different nodes from a third node
- Co-outlink: A link from two different nodes to a third node
Björneborn (2003)
10. My First SSCI Research: 44 Websites
categorically
selected sites
financial sites the
most central
revenue sources:
advertising & e.c.
common payment:
credit card
11. The future of social relations
The social benefits of internet use will far outweigh
the negatives over the next decade.They say this is
because email, social networks, and other online tools
offer ‘low‐friction’ opportunities to create, enhance,
and rediscover social ties that make a difference in
people’s lives.
Some 85%agreed with the statement:
“In 2020,when I look at the bigpicture and consider my
personal friendships,marriage and other relationships,I see
that the internet has mostly been apositive force on my social
world.And this will only grow more true in the future.”
“There's no escapingpeople anymore,and I believe that will
yield better relationships.”—Jeff Jarvis,
12. M.Castells (2009), Communication Power
1) Networkingpower: the power over who and what is
included in the network. ‘Mass self-communication’, the use of
new media for private messages that are able to reach masses
2) Network power: the power of the protocols of network
communication. In mass self-communication the diversity of
formats is the rule and that this amplifies the diffusion of
messages
3) Networked power: the power of certain nodes over other
nodes inside the network.This is the managerial, agenda-
setting, editorial and decision makingpower in the
organizations that own or operate networks.
4) Network-makingpower: the capacity to set-up and program
a network – of multimedia or traditional mass communication-
by their owners and controllers
13. Given that social mediaconnect individuals in
dramatically different ways, research questions are like
these:
W hat do people talk?
W ho can see what?
W ho can reply to whom?
How longis content visible?
W hat can link to what?
W ho can link to whom?
Webometricsand Hyperlink Network Analysiscan be
particularlyuseful to answer these questions!!!
14. Big Data and Social Webometrics Network Analysis
Increasing data size in
terms of the no. of nodes
Micro ≦100 nodes →10K
Meso ≦1000 nodes →1000K
Macro ≦10000 nodes
→100,000K
Super-
Macro
≥10000 nodes → ∽
출처: 박한우(2014)
15. “Those studies perpetuate the idea that linking
behaviour is not random, and that links are ‘socially
significant in some way’. In this perspective, links
have an ‘information side-effect’, they can be used
to understand other facts even though they were
not individually designed to do so: ‘information
side-effects are by-products of data intended for
one use which can be mined in order to understand
some tangential, and possibly larger scale,
phenomena’
16. Park and his colleagues were
extensively cited: 9 times!
• Barnett GA, Chung CJ and Park HW (2011) Uncovering transnational hyperlink patterns
and web mediated contents: a new approach based on cracking.com domain. Social
Science Computer Review 29(3): 369–384.
• Hsu C and Park HW (2011) Sociology of hyperlink networks of Web 1.0, Web 2.0, and
Twitter: a case study of South Korea. Social Science Computer Review 29(3): 354–368.
• Park HW (2003) Hyperlink network analysis: a new method for the study of social
structure on the web. Connections 25(1): 49–61.
• Park HW (2010) Mapping the e-science landscape in South Korea using the
webometrics method. Journal of Computer-Mediated Communication 15(2): 211–229.
• Park HW and Jankowski NW (2008) A hyperlink network analysis of citizen blogs in
South Korean politics. Javnost: The Public 15(2): 5–16.
• Park HW and Thelwall M (2003) Hyperlink analyses of the World Wide Web: a review.
Journal of Computer-Mediated Communication 8(4).
• Park HW and Thelwall M (2008) Developing network indicators for ideological
landscapes from the political blogosphere in South Korea. Journal of Computer-
Mediated Communication 13(4): 856–879.
• Park HW, Kim C and Barnett GA (2004) Socio-communicational structure among political
actors on the web in South Korea. New Media & Society 6(3): 403–423.
• Park HW, Thelwall M and Kluver R (2005) Political hyperlinking in South Korea: technical
indicators of ideology and content. Sociological Research Online 12(3).
17. A comment from those who are
NOT doing a hyperlink analysis
• In a chapter of The Sage Handbook of
Online Research Methods edited by
Fielding et al. (2008), Horgan emphasizes
that ‘link analysis’ has become an active
research domain in examining social
behavior online.
17
19. 2nd type of Webometrics: Web Visibility
Web mention as an indicator of
online viral power and reputation
Presence or appearance of actors or
issues beingdiscussed by the public
(Internet users) on the web.
Trackingweb visibility is powerful way
to get an insight into public reactions to
actors or issues.
20. Construct validity of webometrics
data
Ackland, R. (2013). Web Social Science:
Concepts, Data and Tools for Social
Scientists in the Digital Age. Sage.
P. 16.
21. How to either empirically or theoretically
demonstrate the construct validity of web
data for social science research?
• By testing whether the online network displays structural signatures
that are consistent with those displayed by real-world actors.
– For example: Does Facebook friendship network data display
homophily on the basis of race, ethnicity, etc.?
• By testing whether variables constructed from web data are
correlated with other accepted measures of the construct.
– For example: If counts of inbound hyperlinks to academic project
websites are correlated with other characteristics of academic
teams (e.g. publications, industry connections) that are used as
proxies of academic authority or performance, then this is
evidence of the construct validity of hyperlink data in the context
of scientometrics.
• If it can be shown that an actor's position in an online network has
influence on his or her performance or outcomes in a manner that
accords with what is found offline.
24. WCU
WEBOMETRICS
INSTITUTE
INVESTIGATING INTERNET-BASED POLITICS WITH E-RESEARCH TOOLS
Park, H. W. (2010). Mapping the e-science landscape in South Korea using the webometrics
method. Journal of Computer-Mediated Communication, Vol. 15, No. 2. 211 – 229
Computational perspective based on the
use of high performance computing
to facilitate high-speed processing of
large volumes of digital data
e-Science in humanities
and social sciences
The networking perspective based on
virtual collaboration through the Grid
Two major
strands exist in
computational
science
(also called
e-Science)
?
A third alternative strand
25. Computational Social Science (CSS)
A minor but growing approach to
the study of society
Focus on the methodological
perspective based on the use
of new digital tools to manage
the data deluge
26. Computational (Social) Science
Focus on the methodological
perspective based on the use of
new digital tools to manage the
data deluge.
D evelopment of e-science
tools to automate research
process.
Experimentation with new
types of data visualization.
27. Measuring information exposure in dynamic
and dependent networks (ExpoNet)
According to the OECD's Global Science Forum
2013 report, social scientists' inability to anticipate
the Arab Spring was partly due to a failure to
understand 'the new ways in which humans
communicate' via social media and the ways they
are exposed to information. And social media's
mixed record for predicting the results of recent UK
elections suggests better tools and a unified
methodology are needed to analyse and extract
political meaning from this new type of data.
• http://www.ncrm.ac.uk/research/ExpoNet/
28. Why Data Science?
Savage and Burrows (2007, p.
886) lament, “Fifty years ago,
academic social scientists might
be seen as occupying the apex
of the – generally limited – social
science research ‘apparatus’.
Now they occupy an increasingly
marginal position in the huge
research infrastructure”.
Bonacich, P. (2004).
The Invasion of the Physicists. Social Networks 26(3): 285-288
29. All models are wrong but some are useful
Emergence of data author on dataverse
30. Andersons claims
Data is everythingwe need.
We don't have to settle for models.
Agnostic statistics.
Out with every theory of human behavior.
This approach to science — hypothesize, model,
test — is becomingobsolete.
Petabytes allow us to say: "Correlation is enough."
We can stop lookingfor models.
W hat can science learn from Google? E-Science.
31. Big data and the end of theory?
Does big data have the answers? Maybe some, but not all, says -
Mark Graham
In 2008, Chris Anderson, then editor of W ired, wrote a
provocative piece titled The End of Theory. Anderson was
referring to the ways that computers, algorithms, and big data can
potentially generate more insightful, useful, accurate, or true
results than specialists or domain experts who traditionally craft
carefully targeted hypotheses and research strategies.
W e may one day get to the point where sufficient quantities of big
data can be harvested to answer all of the social questions that
most concern us. I doubt it though. There will always be digital
divides; always be uneven data shadows; and always be biases in
how information and technology are used and produced.
And so we shouldn't forget the important role of specialists to
contextualize and offer insights into what our data do, and maybe
more importantly, don't tell us.
http://www.guardian.co.uk/news/datablog/2012/mar/09/big-data-theory
32. The Coming of Triple Divide?
There are three main gaps I’d like to emphasize
in the present/future of Big Data research
community:
1) Developing/Transitional VS
Developed/Advanced countries,
2) Researcher in academia VS Researcher in
commercial sector,
3) Researchers with computational skills VS
Less computational scholars.
33. Method used Developed
Country/Region
Developing
Country/Region
Mixed Region
N % N % N %
Social-
Informetics
114 74.51 30 83.33 9 52.94
Scientometrics 28 18.30 6 16.67 8 47.06
Webometrics 11 7.19 0 0 0 0
Total 153 100 36 100 17 100
No. of articles in each category of methods
by the developed/developing division
Skoric, M. M. (2013, Online First). The implications of big data for developing
and transitional economies: Extending the Triple Helix?. Scientometrics.
40. This approach to science is attributed to the late Jim Gray,
one of the most influential computer scientists, at Microsoft.
41. Science published a special
issue (February 11, 2011) looking
broadly at increasingly data-driven
research efforts as a scientific
domain (Science staff, 2011).
Data Science is composed of interrelated
clusters of research tasks. For example, the
technologies on data collection, curation, and
access, and the unique skill sets have
increasingly been central to Data Science
(Science staff, 2011).
42. Phrase map of highly occurring keywords 1999-2005
Halevi, G., & Moed, H. F. (2012).
43. Phrase map of highly occurring keywords 2006-2012
Halevi, G., & Moed, H. F. (2012).
44. Park, H. W., & Leydesdorff, L. (2013 Work-In-Progress). Decomposing a Data-Driven Science Using a Scientometric Method.
But, Halevi and Moed (2012), and Rousseau (2012) are
based on descriptive statistics. Therefore, we intend to add
the network perspective both in the social (in terms of co-
authorship) and semantic networks.
Furthermore, we extend search queries to various
terminologies related to Data Science because the term
“big data” is regarded only as one among a list of policy
priority issues.
We show where the research system in Data Science is
“hot” in terms of international collaborations and
prevailingsemantics.
45.
46. Park, H.W.@, & Leydesdorff, L. (2013). Decomposing Social and Semantic Networks in
Emerging “Big Data” Research. Journal of Informetrics*. 7 (3), 756-765.
47.
48. The Signal and the Noise:
W hy Most Predictions Fail but Some Don't. Nate Silver
I do not go as far as a Popper in asserting that such
theories are therefore unscientific or that they lack any
value. However, the fact that the few theories we can
test have produced quite poor results suggests that
many of the ideas we haven’t tested are very wrong as
well. We are undoubtedly living with many delusions
that we do not even realize.
page 15
49. OECD (2012).OECD Technology Foresight Forum 2012 - Harnessingdata as a new source of growth: Big
data analytics and policies. OECD Headquarters, Paris, France 22 October 2012
51. Algorithmic management of socially shared
information: Facebook as a designed social system
Which features should be deployed?
[Ugander-Karrer-Backstrom-Kleinberg 2013]
Which discussions will be most active? [Backstrom-
Kleinberg-Lee-DanescuNiculescuMizil 2013]
Which memes will receive the most reshares?
[Cheng-Adamic-Dow-Kleinberg-Leskovec 2014]
Which links should be emphasized?
[Backstrom-Kleinberg 2014]
http://cdn.oreillystatic.com/en/assets/1/event/119/Computational%20Problems%20in%20Managing%20Social%20Information%20%20Presentation.pdf
52. Typical FB user writes 60-70% of comments to ≈ 15 people.
[Backstrom-Bakshy-Kleinberg-Lento-Rosenn 2011]
http://www.cs.cornell.edu/home/kleinber/icwsm11-attention.pdf
53.
54. Economics in the age of big data
http://www.sciencemag.org/content/346/6210/1243089.ful
l
58. A more recent development was made with the
establishment of journals that included the term “Data Science”
in their titles:
• Data Science Journal in 2002
• Journal of Data Science in 2003
• EPJ Data Science in 2012
• GigaScience gigasciencejournal.com in 2012
• BigData & Society in 2015
59. 1Ying Huang • Jannik Schuehle • Alan L. Porter • JanYoutie
68. The chart Tim Cook doesn’t want you to see
5
http://qz.com/122921/the-chart-tim-cook-doesnt-want-you-to-see/
69.
70. Kim, G. H., Trimi, S., & Chung, J. H. (2014). Big-data applications in the government sector. Communications of the ACM, 57(3), 78-85
71. Kim, G. H., Trimi, S., & Chung, J. H. (2014). Big-data applications in the government sector. Communications of the ACM, 57(3), 78-85
72. Kim, G. H., Trimi, S., & Chung, J. H. (2014). Big-data applications in the government sector. Communications of the ACM, 57(3), 78-85
73. Yet, there still are serious problems to overcome. A trenchant
critique concerning the big data field as it is nowadays came in
the form of six statements intending to temper unbridled
enthusiasm. [42] These six provocative statements are:
Bigdata change the definition of knowledge;
Claims to accuracy and objectivity are misleading;
More data are not always better data;
Taken out of context, bigdata loses its meaning;
Just because it is accessible, it does not make it ethical; and
(Limited) access to bigdata creates a new digital divide.
Rousseau (2012)
83. Kobayashi, T., & Boase, J. (2012). No Such Effect? The Implications of Measurement
Error in Self-Report Measures of Mobile Communication Use. Communication Methods
and Measures, 6, 1–18. DOI: 10.1080/19312458.2012.679243
84. N. A. Christakis, & J. H. Fowler (2009). Connected: The
Surprising Power of Our Social Networks and How They Shape
Our Lives.
NY Times
86. Christakis, N. A., & Fowler, J. H. (2014). Friendship and natural selection. Proceedings of the National Academy of Sciences, 111(3), 10796–10801.
https://www.youtube.com/watch?v=6vwg0dJY1NM
Friendship and natural selection
87. 창조를 위해선 적당히 좁은 세상이 필요함
Financial success of Broadway musicals 1945 to 1989
90. Using Big Data to Fight Range
Anxiety in Electric Vehicles
• The software acquires
data from five sources:
Google Maps (for route,
terrain, and traffic data),
Wunderground.com (for
weather), driver history
(through driving
behavior measurements),
vehicle manufacturers
(for vehicle modeling
data), and battery
manufacturers (for
battery modeling data).
http://spectrum.ieee.org/cars-that-think/transportation/sensors/using-big-data-to-fight-range-anxiety-in-electric-vehicles
111. Oreilly
10 data trends on our radar for 2016
1. Metadata
2. Systems optimization via deep neural networks
For example, as shown in the screenshot below, a
search on Google for "let it be lyrics" returns the
lyrics of the classic Beatles song at the top of the
search results. But a search for "let it go lyrics"
doesn't return such an interface element, despite
the immense popularity of this Disney song and
the wide availability of its lyrics.
112.
113. Help users ask good questions, rather than
attempt to answer bad ones.
You can see this in action on LinkedIn, where typing "micr" into a search box triggers
search suggestions like "Jobs at Microsoft" and "People who work at Microsoft":
114. Artificial Intelligence and Intelligence Augmentation:
Very Different Approaches Yield Very Different Results
“Artificial intelligence” is the
idea of a computer system
that, by reproducing human
cognition, allows that system
to function autonomously
and effectively in a given
domain. An AI system
demonstrates a kind
of intentionality—it initiates
action in its environment and
pursues goals
“Intelligence
augmentation,” on the other
hand, is the idea of a
computer system that
supplements and supports
human thinking, analysis, and
planning, leaving
the intentionality of a human
actor at the heart of the
human-computer interaction.
Because intelligence
augmentation focuses on the
interaction of humans and
computers, rather than on
computers alone, it is also
referred to as “HCI.”
http://www.financialsense.com/contributors/guild/artificial-intelligence-vs-
intelligence-augmentation-debate
115. Twitter taught Microsoft’s AI chatbot to
be a racist asshole in less than a day
http://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist
117. Prof. Han Woo PARK
Department of Media and Communincation,
YeungNam University, Korea
hanpark@ynu.ac.kr
http://www.hanpark.net
WCU
WEBOMETRICS
INSTITUTE
INVESTIGATING INTERNET-BASED POLITIC WITH E-RESEARCH TOOLS