SlideShare ist ein Scribd-Unternehmen logo
1 von 40
NE7012 SOCIAL
NETWORK ANATYSIS
PREPARED BY: A.RATHNADEVI A.V.C COLLEGE OF
ENGINEERING
UNIT 2-MODELING AND VISUALIZATION
UNIT II MODELING AND VISUALIZATION
Visualizing Online Social Networks - A Taxonomy of Visualizations - Graph Representation -
Centrality- Clustering - Node-Edge Diagrams - Visualizing Social Networks with Matrix-Based
Representations- Node-Link Diagrams - Hybrid Representations - Modelling and aggregating
social network data – Random Walks and their Applications –Use of Hadoop and Map Reduce -
Ontological representation of social individuals and relationships.
2.1 VISUALIZING ONLINE SOCIAL NETWORKS
 As the fast development of the Internet, many online social network services are created
to connect social relationships among people.
 Versatile visualization skills are employed to facilitate analyzing online social networks,
and more in-depth studies have been investigated to improve the presentation of online
social networks.
 When the attributes of sociality are concerned, online social networks can be classified
into different social communities. In addition, from the daily social activities of various
online social communities, social networks may evolve into different patterns of social
structure.
 Therefore, visualizations of online social networks were developed according to the
attributes of network sociality to present their network structure.
 In this section, many state-of-the-arts visualization applications are surveyed to present
the versatile social network visualizations on the Web.
 In the following, we first depict and analyze the visualization of online social networks
according to their attributes of sociality, including Web communities, email groups,
digital libraries, and Web 2.0 services.
 In the end of this section, we summarize online social network visualizations based on
different views of social relationships, e.g. user centric social relationships, content-
centric social relationships, and hybrid social relationships.
2.1.1 web communities
 After the six degrees of separation concept and Web services were becoming popular
in late 1990s, different social network services were created on the Web to help people
maintain their social relationships.
 The SixDegrees.com website was an early representative created on the basis of the Web
interaction model during 1997 and 2001.
 Since the start of SixDegrees.com, various social network websites and Web-based
dating services have been established to provide people more convenient ways to build
up their social relationships and communities.
 In addition, many social network websites are developed with interactive visualization
interfaces to facilitate people connecting their communities and maintaining social
relationships.
 In 2003, Club Nexus was established based on the friendship network data of Stanford
students and allowed them to explicitly list their friends by their profiles.
 For example, students registered on Club Nexus were provided with the profiles of their
year in school, major, residence, gender, personalities, hobbies and interests to facilitate
interacting with their online social networks.
 The Club Nexus community thus provided very rich profiles for its users to allow for
detailed social network analysis, including identifying activities and preferences that
determine the formation of friendship.
 In addition to listing actors with their profiles for social network analysis, a modern
visualization of social networks, Vizster, was contributed with customized techniques to
visualize social relationships and community structures in 2005.
 Vizster was developed based on node-edge network layouts for exploring connectivity
in large graph structures, supporting visual search and analysis, and automatically
identifying and visualizing community structures.
 Form the visualization design of Vizster, many advanced search and interactive
navigation techniques are developed on Vizster to facilitate the analysis of social
networks, such as highlighting, panning, zooming, and distortion techniques.
 From the previous visualization of Web communities, the visualization techniques are
mainly introduced to deal with the complex social relationships based on human-centric
or user-centric views.
 As the development of SemanticWeb, a project called FOAF (Friend-of-a-friend) was
proposed to visualize such human-centric social relationships based on Semantic Web
social metadata. With XML/RDF format, the FOAF relations can be explicitly defined
for further social network analysis and visualization.
 Recently, Microsoft Research Asia proposed a novel object-level search service, called
Entity Cube, to help people discover real-world entities, such as people, locations, and
organizations, and explore their social relationships.
2.1.2 Email groups
 Email service is one of the most popular applications that people often use to connect
each other and deliver messages in their daily lives.
 Personal online social networks are thus constructed through people’s daily social
interactions. For analyzing the social structures of the daily email activities, visualization
techniques are employed to explore different patterns.
 In 2004, Soylent was developed to study the social patterns and the temporal rhythms of
daily email activities. Through the Soylent visualization, mutual interactions between
different users and groups, and their everyday collaboration activities can be clearly
displayed.
 Some recurrent patterns discovered in the social networks, such as the onion pattern, the
nexus pattern, and the butterfly pattern, can thus suggest regular ways of understanding
their interactions.
 For example, the butterfly pattern, named for its two large “wings” surrounding a single
center, can be interpreted as a member of a design team also involving in a research team.
 Full network view of Soylent
 In addition to the network view of email visualization, different aspects of social
networks can be visualized for analysis.
 In 2005, two visual metaphors, Social Network Fragments (SNF) and Post History, were
employed to visualize the major two dimensions of email activities: people and time.
 The social relationships derived from the TO and CC lists in email archives are
highlighted in SNF, and the temporal rhythms of interactions between ego and individual
contacts over time are illustrated with a calendar metaphor in Post History, respectively.
 Post History presents social network visualization with a calendar panel on the left and a
contacts panel on the right. The email exchange activities with time progress can thus be
visualized in an interactive calendar-like interface.
 In 2006, an improved calendar-like visualization interface, called Themail, was
developed to help analyze email-based social networks with a chronological sequence
and the corresponding topics.
 Through the analysis of email content, the social relations and mutual interactions
between a user and her contacts can be clear presented in Themail.
2.1.3Digital libraries
 In digital libraries, social networks can be mainly analyzed from two aspects: authors and
writings.
2.1.3.1 Co-Authorship Networks
 On the aspect of authors, co-authorships can be mined from the existing publications
and organize the co-authorship networks.
 With the visualization of co-authorships, some characteristics, such as clustering
coefficient and average path length, can be analyzed in co-authorship networks.
 From the visualization of the co-authorship networks illustrated a small world graph can
be drawn with the connections of authors from different places in the world. In 2005,
social network analysis for co-authorship was in-depth studied in digital libraries.
 In addition to the node-edge representation, a matrix representation was used in the co-
authorship network to help analyze different co-authorship patterns.
 With the matrix representation, the interlaced problem of the node-edge representation
caused by a large amount of nodes and complex relations can be effectively improved.
2.1.3.2 Co-Citation Relations
 On contrary to the authorship view, social networks in digital libraries can be discovered
from the citations and co-citations among writings themselves.
 Since references are a crucial part of a document for readers to obtain information source,
Co-citation social networks can be formed through the continuously accumulated
publications.
 With proper visualization of co-citation networks, documents with high impacts or
similar citation patterns can be immediately identified, and the co-citation relationships
can be intuitively observed as well.
 In 2006, a novel visualization tool, called CircleView, was proposed to visualize
academic citation relations with interactive design and highlighted color.
 In CircleView, the summarized contents of the focus papers, such as references, detail
reference, and references of reference, are marked with different colors and highlighted
circles.
 In 2007, an interactive visualization tool was developed to present large co-citation
networks with latent visual cues and allows direct interaction with the visualized graphs.
 In interaction-based citation visualization system was proposed to visualize larger
datasets with scalable functionalities and complemented the citation visualization of
domain content analysis.
 In 2009, an innovative visualization technique, called FP-tree, was developed to present
co-citation network from a new perspective, namely, visualizing social networks based
on a paper-reference matrix instead of using a reference-reference matrix.
 The paper-reference matrix was transformed into an FP-tree visualization to analyze the
intellectual structure of two domains: Information Visualization and Sloan Digital Sky
Survey (SDSS).
2.1.4 web 2.0 services
 Since the concept of Web 2.0 was proposed in 2004, online social activities are becoming
more prosperous than before.
 Many Web 2.0 applications are popularly accessed by users to connect their social
networks, such as Twitter and Facebook.
 For example, Twitter provides users convenient functionalities to share the up-todate
status with their followers. Since the following and the followed relationships can be
quickly constructed, large social networks are created over the Twitter service, and many
visualization applications are developed to analyze Twitter social networks.
 Another example of Web 2.0 services that forms large social networks is Facebook.
 As Facebook was proposed at Harvard in 2004, its social community grows explosively
from U.S. to the rest of the world.
 Until today, Facebook has over 400 million active users worldwide, and large and
complex social networks come along with the explosive growth of Facebook
communities.
 Therefore, many interactive visualization applications and tools are developed to assist
users to access and analyze such complicated social networks.
 Nexus is a visualization application on Facebook communities to illustrate their large
network graphs. With the Nexus visualization, user relationships can be properly
displayed, and user names will pop up when mouse is moved over the nodes.
 However, when user relationships are complicated, the social relationships in the Nexus
visualization may become difficult to be recognized.
 In 2010, an advanced interactive visualization interface, called IRNet, was proposed
to further improve the shortcomings of Nexus and TouchGraph on visualizing Facebook
communities.
 IRNet visualizes interpersonal relationships in social networks with focus context
techniques to present both local and global views.
2.1.5 visualization of online social networks classification
 In addition, visualization of online social networks can be further categorized into three
types by their social relationships: user-centric visualization, content-centric
visualization, and hybrid visualization.
2.1.5.1user-centric visualization
 From the perspective of actors, visualizing user-centric social relationships can present
various characteristics of actors and helps explore different subjects and relationships
of interests.
 For example, visualizing user-centric social relationships can help people discover
individuals and communities that meet the following expectations:
1. actors or groups with similar/complementary features,
2. key actors or those with high social impacts, and
3. actors with popular interpersonal relationships or active social interactions.
 A great variety of online social networks are created and visualized according to the
above user-centric social relationships, such as Club Nexus, Vizster, People Entity Cube,
FOAF, SNF, and co-authorship networks.
 Such user-centric visualizations are widely utilized to help people access their social
networks and discover the social networks of their interests.
2.1.5.2 content-centric visualization
 As the boosted development and evolution of Internet applications, particularly in the era
of Web 2.0, an explosive amount of Web information was rapidly produced via various
social network communities.
 Therefore, from the view of visualizing content-centric social relationships, various kinds
of contents can be properly presented to facilitate people analyzing social networks.
 For example, at least the following three sorts of social network contents can displayed
with content-centric visualization:
1. the distribution of user opinions, including key opinions and controversial comments,
2. user opinions with high impacts toward their social communities, especially the effective-
impact period of time, and
3. the relations among different content groups.
 In according with content-centric social relationships, several visual applications for
online social networks are developed, such as visualizations for citation and cocitation
social networks.
2.1.5.3 Hybrid visualization
 Social activities among actors are generally more than one form and thus may consist of
different kinds of relationships and interactions.
 Therefore, besides the user-centric visualization and content-centric visualization
interfaces, still many visualization interfaces are designed according to hybrid
relationships.
 Hybrid visualization is to visualize social networks from different aspects of attributes,
e.g. people and contents. Particularly, online social activities, such as email and dating
services, usually include such elements.
 Therefore, these online social networks are displayed with hybrid visualization
techniques, such as PostHistory, Themail,TouchGraph, and IRNet.
2.2 SOCIAL NETWORK ANALYSIS ( A TAXONOMY)
 Social networks are built when actors belonging to different social groups are connected
to each other.
 In late 1800s and early 1900s, sociologists have been looking for social relationships,
groups, positions, and organizations in human activities.
 In early 1930s, Moreno brought in the concept of sociograms to represent social networks
and started to explore the social relations in a formal study.
 Until today,social network analysis is still a hot issue that attracts many concerns,
particularly for analyzing the online social networks.
2.3 GRAPH REPRESENTATION
 Many fundamental concepts and metrics in social network analysis are derived from
graph theory, because graph theory formally represents social networks with structural
properties.
Node degree:
 In graph theory, the degree of a node in a graph is the number of edges incident to the
node. If there are loops in the graph, the degree of a node will be counted twice.
 Therefore, the maximum number of unique edges in a graph can be obtained when the
loops are excluded. There are at most N*(N-1)/2edges for an undirected graph and at
most N*(N-1) edges for a directed graph, where N is the number of nodes.
Node density:
 The density of an undirected graph can be defined as (2*E)/N*(N-1), where E is the
number of edges. On the other hand, the density of a directed graph can be defined as
E/N*(N-1).
Path length:
 The path length is the number of edges in the sequence that a walk follows. In a path, all
nodes and edges appear only once in the sequence.
 Therefore, the path length can be defined as the distances between pairs of nodes in a
network graph, and average path length is the average of these distances between all pairs
of nodes.
Component size:
 When the component size is concerned, a connected graph needs to be discovered first
since the component size is counted by the number of connected nodes in a graph.
 A graph is connected if all pairs of nodes are reachable, and for each pair of two nodes,
one of them is reachable from the other.
 On the other hand, if a graph is not connected, the graph can be partitioned into several
connected subgraphs where each component size can be calculated by the number of
connected nodes in each subgraph.
2.4 CENTRALITY
 One of the key applications in social networks is to identify the most important or central
nodes in the network.
 The measure of centrality is thus used to give a rough indication of the social power of a
node based on how well they connect the network. HITS analyzes the important nodes
based on calculating Authorities (indegrees) and Hubs (out-degrees) , and PageRank
calculates node values based on out-degrees.
 In social network analysis, “Degree”, “Betweenness”, and “Closeness”centrality are three
most popularly adopted methods to measure the centrality of a social network.
2.4.1 Degree centrality:
 Degree centrality is defined as the number of edges incident upon a node, and thus it is
usually the first way to calculate the nodes that are most potential to determine other
nodes.
 For calculating degree centrality, the nodes that have direct connections to a large number
of nodes are considered. If the edges in a graph are directed, the in-degree centrality is
differentiated from the out-degree centrality.
2.4.2 Betweenness Centrality:
 In addition to degree centrality, betweeness centrality is another key metrics for
computing the extent to which a node lies between other nodes in the network.
 If a node is the only node that links two groups of nodes in the network, this node shall be
seen as an important node for keeping the social network together.
 Therefore, betweenness centrality is to measure the connectivity of the neighbors of
anode and to give a higher value for nodes which bridge clusters.
2.4.3 Closeness centrality:
 The measure of closeness centrality is to take into account how distant a node is to
the other nodes in the network.
 Hence, closeness centrality is to measure the order of magnitude that a node is near all
other nodes in a social network by calculating the mean shortest path for a node to all
other nodes in the graph.
2.5 CLUSTERING
 Many social networks contain subsets of nodes that are highly connected within the
subset and have relatively few connections to nodes outside the subset.
 The nodes in such subsets are likely to share some attributes and form their own
communities.
 Since the detection of these community structures is not trivial, how to efficiently and
effectively discover such community structures is important.
 Therefore ,the main measure described below is to help explore the grouping effects by
Clustering coefficient.
2.5.1 Clustering coefficient:
 A clustering coefficient is to measure the degrees of nodes to decide which nodes in a
graph tend to be clustered together.
 Thus, the clustering coefficient measure is to quantify how close its neighbors are to
being a complete graph.
 As the nodes grouped in the real-world social network tend to have relatively high
density of ties,the clustering coefficient is also utilized for small world analysis.
2.6 NODE EDGE DIAGRAM
 A node-edge diagram is an intuitive way to visualize social networks. With the node-edge
visualization, many network analysis tasks, such as component size calculation, centrality
analysis, and pattern sketching, can be better presented in a more straightforward manner.
 Many node-edge layouts have been presented to place the nodes in the graph for users to
clearly recognize the structure of the social network.
 For example, some layouts are suitable to display social networks in a moderate size, but
they are not suitable for showing larger networks.
 Three kinds of layouts, namely, random layout, force-directed layout, and tree layout, are
described to explain the node-edge diagrams.
2.6.1 Random Layout
 A random layout is to put the nodes at random geometric locations in the graph, and thus
it may not yield very clear visualization results, particularly when the number of nodes
immensely increases, e.g. more than thousands of nodes.
 However, since a random layout algorithm can efficiently draw the social network graph
in linear time, O(N), sometimes it can be usable to visualize very large network graphs.
 Therefore, random graphs have been proposed as a possible model to take into account
the structural characteristics of instances that appear in many practical applications.
2.6.2 Force-Directed Layout
 A force-directed layout is also known as a spring layout, which simulates the graph as a
virtual physical system.
 In a force-directed layout, the edges act as spring and the nodes act as repelling objects,
just like the Hooke’s law and the Coulomb’s law.
 Hence, there exists gravitational attraction or magnetic repulsion between each node
in the graph.
 Generally, an initial random layout will be yielded first, and then the force-directed
algorithms will run iteratively to adjust the positions of nodes until all graph nodes and
attractive forces between the adjacent nodes run to convergence.
 Since a force-directed layout may take hundreds of iterations to obtain a stable layout, the
running time is at least O.(N logN) or O.(E), where N is the number of nodes and E is the
number of edges.
2.6.3 Tree Layout
 A basic tree layout is to choose a node as the root of tree, and the nodes connected to the
root become children of the root node.
 Nodes that are at more levels away from the root become the grand-children of the root
and so on.
 A tree layout can display a more structural layout than graph layouts by considering more
contextual information.
 Because of the hierarchical nature of a tree layout, trees are more straightforward to grasp
human eye than general graphs.
 For a better visual presentation of domain specific information, more suitable variants of
the tree layout were proposed, such as hyperbolic tree layout and a radial tree layout.
 These tree visualizations utilize the idea of focus+context to better the visualization
effects with animation techniques and help users to obtain both global and local views of
a social network in a 2D display.
 In addition, an advanced visualization technique, called H3, further maps the hyperbolic
tree into a 3D space with a fisheye effect to better utilize the visualization space.
2.6.4 Matrix Representations
 Since a social network graph consists of nodes connected with edges, it can be
transformed into a simple Boolean matrix whose rows and columns represent the vertices
of the graph.
 Moreover, the Boolean values in the matrix can be further replaced with valued attributes
associated with the edges to provide more informative network visualizations.
 Since a matrix presentation can help minimize the occlusion problems caused by the
node-edge diagram, the matrix-based representation of graphs offers an alternative to the
traditional node-edge diagrams.
 With a matrix-based representation, clusters and associations among the nodes can also
be better discovered when the number of nodes increases.
 In 2006, an enhanced matrix-based representation, called MatrixExplorer,was developed
to visualize social networks with a Dual-Representation.
2.7 VISUALIZING SOCIAL NETWORKS WITH MATRIX-BASE REPRESENTATIONS
 Matrices or node-link diagrams both have advantages and drawbacks for visualizing
Social networks. In this section, we present the pros and cons of each representation and
propose a set of visualizations combining the best of both worlds.
2.7.1 Matrix or Node-Link Diagram?
 Matrix and node-link diagrams have different properties making them suitable
representations for different tasks and datasets.
The advantages of matrices:
1. Matrices provide powerful overview visualization since the time to create them is low
and since they are always readable. They constitute a good representation to initiate an
exploration.
2. Matrices do not suffer from node overlapping, if the task requires to always read the
actors’ labels, this representation is more appropriate.
3. Matrices do not suffer from link crossing each other; therefore they are a viable
alternative for dense networks.
4. Matrices show all possible pairs of vertices, they can highlight the lack of connection and
also the directedness of the connections. They are particularly appropriate for directed
and dense networks.
The advantages of node-link diagrams:
 These representations are familiar to a wide audience; they constitute a powerful
communication tool. In contrast, matrices require training and help decoding their
meaning for novice users.
 For small or sparse networks, Ghoniem et al. proved that node-link diagrams were more
effective than matrices.
 For a similar level of details, the space used by matrices is larger than the space to display
node-link diagrams. Therefore, for a compact representation, node-link diagrams are a
better choice.
 When the analysis requires to perform a number of path-related tasks (e.g. find the
shortest path from John to Mary), node-link diagrams are more appropriate.Ghoniem et
al. showed that such tasks were difficult to perform with matrices.
2.7.2 Matrix + Node-Link Diagrams
 To combine advantages of both representations and to support the visual exploration
of social networks, we designed MatrixExplorer.
 To conceive this system, we observed and discussed with a small group of social
scientists. We divided their analysis process in four main stages. For each, we describe
how matrices and node-link diagrams can be combined to achieve the best of both
worlds.
1. Initiate the exploration
2. Explore interactively and iteratively
3. Find a consensus in the data or validate an hypothesis
4. Present the findings
2.7.3 Node link diagram
 Jacob Moreno was the first pioneer of social network visualization. More than 70 years
ago, he published visual depictions of social friendship in schools, using these
visualizations to support his findings.
 The principle of node-link diagrams is to graphically represent actors of the network by
nodes and connections by links.
 Node-link diagrams are the most commonly used representation of graphs and networks.
It is well illustrated by Freeman in his survey and history of social network visualization.
 Node-link representations are widely used and familiar to a very large audience, making
them a powerful communication tool. However, their readability and the message they
convey greatly depends on the positions of their nodes.
2.8 HYBRID REPRESENTATION
 Providing both matrix and node-link diagrams to the user has a number of advantages but
also drawbacks. First, it requires a large amount of display space.
 At least two display monitors are required to comfortably use MatrixExplorer; a third one
is strongly recommended to display textual and detailed views.
 Secondly,we observed that switching from one representation to the other may induce
high cognitive load to the user and split attention is always tiresome.
 Indeed, a single node on the node-link diagram becomes both a row and column in the
matrix and a link, visually represented by a line, becomes a cell, i.e. a rectangle, in the
matrix.
 Switching representations between tasks require a few seconds of adjustment,
disorienting momentarily users.
 To minimize the display space required and limit the cognitive cost when switching
representations, we developed two hybrid representations: MatLink and NodeTrix.
2.8.1 Augmenting Matrices
 As Ghoniem et al. demonstrated in their study, matrices do not support well path
following tasks.
 For example, finding the shortest path between two given actors is far easier in a node-
link diagram, in which users can quickly investigate the multiple paths and assess what
the shortest path is.
 These tasks being very common in social network analysis, we proposed to create a
hybrid representation to overcome the problem in matrices: MatLink
 The principle of MatLink is to augment a standard matrix representation with links on its
borders.
 These links provides a dual encoding of the connections between actors and ease path-
following tasks since they use the visual representation of node-link diagrams.
2.8.2 Merging Matrix and Node-Link Diagram
 Node-link diagrams or matrices perform differently according to the types of visualized
networks.
 For example, node-link diagrams or hybrids Treemap+links are well suited to represent
tree-like networks. Conversely, for dense networks or bipartite networks, matrices are
better suited, maximizing the use of space and remove any link crossing.
 For small-world networks, however, the choice of representation is not so clear.
 When visualizing such network with a node-link diagram, the dense regions (e.g.
communities) suffer from link crossing and become difficult to read. However, when
using a matrix representation, the visualization is very sparse and requires a lot of
navigation for exploration.
2.9 MODELLING AND AGGREGATING SOCIAL NETWORK DATA
 There are two fundamental reasons for developing semantic-based representations of
social networks.
 Firstly, as we will demonstrate, maintaining the semantics of social network data is
crucial for aggregating social network information, especially in heterogeneous
environments where the individual sources of data are under diverse control.
 The benefits are easiest to realize in cases where the data are already available in an
electronic format, which is typically the case in the network analysis of online
communities but also in the intelligence community.
 Secondly, semantically representations can facilitate the exchange and reuse of case study
data in the academic field of Social Network Analysis.
 The possibilities for electronic data exchange has already revolutionized a number of
sciences with the most well-known examples of bio-informatics and genetics.
2.9.1 State-of-the-art in network data representation
 In the chapters before we have seen that the most common kind of social network data
can be modeled by a graph where the nodes represent individuals and the edges represent
binary social relationships.
 higher-arity relationships may be represented using hyper-edges, i.e. edges connecting
multiple nodes.
 The most commonly encountered formats are those used by the popular network analysis
packages Pajek and UCINET. These are text-based formats which have been designed in
a way so that they can be easily edited using simple text editors.
 Unfortunately, the two formats are incompatible. (UCINET has the ability to read and
write the .net format of Pajek, but not vice versa.) Further, researchers in the social
sciences often represent their data initially using Microsoft Excel spreadsheets, which can
be exported in the simple CSV (Comma Separated Values) format.
 The GraphML format represents an advancement over the previously mentioned formats
in terms of both interoperability and extensibility [BEL04, BEH+01].
 GraphML originates from the information visualization community where a shared
format greatly increases the usability of new visualization methods.
 GraphML is therefore based on XML with a schema defined in XML Schema.
 This has the advantage that GraphML files can be edited, stored, queried, transformed
etc. using generic XML tools.
 However, none of these formats support the aggregation and reuse of electronic Data.
2.9.2 Ontological representation of social individuals
 The Friend-of-a-Friend (FOAF) ontology that we use in our work is an OWL based
format for representing personal information and an individual’s social network.
 FOAF greatly surpasses graph description languages in expressivity by using the
powerful OWL vocabulary to characterize individuals.
 More importantly, however, using FOAF one can rely on the extendibility of the
RDF/OWL representation framework for enhancing the basic ontology with domain-
specific knowledge about identity.
 In terms of deployment, two studies give an insight of how much the individual terms of
the FOAF are used on the Web [PW04, DZFJ05].
 Both studies note that the majority of FOAF profiles on the Web are auto-generated by
community sites such as LiveJournal, Opera Communities5 and Tribe.net.
 As FOAF profiles are scattered across the Web it is difficult to estimate their number, but
even the number of manually maintained profiles is likely to be in the tens of thousands.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#label> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix example: <http://www.example.org/> .
example:Rembrandt rdf:type foaf:Person .
example:Saskia rdf:type foaf:Person .
example:Rembrandt foaf:name "Rembrandt" .
example:Rembrandt foaf:mbox <mailto:rembrandt@example.org>
. example:Rembrandt foaf:knows example:Saskia .
example:Saskia foaf:name "Saskia" .
Figure 4.3 A set of triples describing two persons represented in the Turtle language.
 FOAF started as experimentation with Semantic Web technology.
 The idea of FOAF was to provide a machine processable format for representing the kind
of information that made the original Web successful, namely the kind of personal
information described in homepages of individuals.
 Thus FOAF has a vocabulary for describing personal attribute information typically
found on homepages such as name and email address of the individual, projects, interests,
links to work and school homepage etc.
 FOAF profiles, however, can also contain a description of the individual’s friends using
the same vocabulary that is used to describe the individual himself.
 Even more importantly, FOAF profiles can be linked together to form networks of web-
based profiles.
 FOAF became the center point of interest in 2003 with the spread of Social Networking
Services such Friendster, Orkut, LinkedIn etc.
 Despite their early popularity, users have later discovered a number of drawbacks to
centralized social networking services.
 First, the information is under the control of the database owner who has an interest in
keeping the information bound to the site and is willing to protect the data through
technical and legal means.
 The profiles stored in these systems typically cannot be exported in machine processable
formats (or cannot be exported legally) and therefore the data cannot be transferred from
one system to the next.
 Second, centralized systems do not allow users to control the information they provide on
their own terms. Although Friendster follow-ups offer several levels of sharing (e.g.
public information vs. only for friends), users often still find out the hard way that their
information was used in ways that were not intended.
 These problems have been addressed with the use of Semantic Web technology.
2.9.3 Ontological representation of social relationships
 Ontological representations of social networks such as FOAF need to be extended with a
framework for modelling and characterizing social relationships for two principle
reasons:
1. To support the automated integration of social information on a semantically basis
2. To capture established concepts in Social Network Analysis.
 In this section we approach this task using the engineering method of decomposition.
 The key idea is that the representation of social relationships needs to be fine-grained
enough so that we can capture all the detail from the individual sources of information in
a way that these can be later recombined and taken as an evidence of a certain
relationship.
 Network analysis itself can be a help in that it has a rich vocabulary of characterizing
Social relationships.
 As these are the terms that are used by social scientists, they are prime candidates to be
included in any ontology.
 For illustration, we list below some of the most commonly discussed characteristics of
social relationships.
1) Sign
 A relationship can represent both positive and negative attitudes such as like or hate.
 The positive or negative charge of relationships is important on its own for the study of
balance within social networks, which is the subject of balance theory.
2) Strength
 The notion of tie strength was first introduced by Granovetter in his groundbreaking work
on the benefits of weak ties. Tie strength itself is a complex construct of several
characteristics of social relations.
 In her survey, Hite lists the following aspects of tie strength discussed in the literature:
Affect/ philos/passions , Frequency/frequent contact , Reciprocity, Trust/enforceable
trust , Complementarity, Accommodation/adaptation, Indebtedness/imbalance,
Collaboration, Transaction investments, Strong history, Fungible skills, Expectations,
Social capital, Bounded solidarity, Lower opportunistic behavior, Density, Maximize
relationship over org., Fine-grained information transfer, Problem solving, Duration,
Multiplexity, Diffusion, Facilitation, Personal involvement, Low formality (few
contracts), Connectedness.
3) Provenance
 A social relationship may be viewed differently by the individual participants of the
relationship, sometimes even to the degree that the tie is unreciprocated, i.e. perceived by
only one member of the dyad. Similarly, outsiders may provide different accounts of the
relationship, which is a well-known bias in SNA.
4) Relationship history
 Social relationships come into existence by some event (in the most generic,
philosophical sense) involving two individuals. (Such an event may not require personal
contact (e.g. online friendships), but it has to involve social interaction.
 From this event, social relationships begin a lifecycle of their own during which the
characteristics of the relationship may change through interaction or the lack of (see e.g.
Hite and Hesterly ).
5) Relationship roles:
 A social relationship may have a number of social roles associated with it, which we call
relationship roles.
 For example, in a student/professor relationship within a university setting there is one
individual playing the role of professor, while another individual is playing the role of a
student.
 Both the relationship and the roles may be limited in their interpretation and use to a
certain social context.
2.9.3.1 Conceptual model
 The importance of social relationships alone suggests that they should be treated as first-
class citizens.
 Alternatively, social relations could be represented as n-ary predicates; however, n-ary
relations are not supported directly by the RDF/OWL languages.
 There are several alternatives to n-ary relations in RDF/ OWL described in a document
created by the Semantic Web Best Practices and Deployment Working Group of the
W3C.
 In all cases dealing with n-ary relations we employ the technique that is known as
reification:we represent the relation as a class, whose instances are concrete relations
of that type.
 One may recall that RDF itself has a reified representation of statements: the rdf
:Statement object represents the class of statements.
 This class has three properties that correspond to the components of a statement, namely
rdf :subject, rdf :predicate, rdf :object. These properties are used to link the statement
instance to the resources involved in the statement.
Fig: An RDF representation of social relationships.
 This model has the advantage that it becomes more natural to represent parameter values
and restrictions on them.
 The disadvantage is that this solution is not compliant with OWL DL: declaring
properties ranging on properties and creating subclasses rdf :Property are not allowed in
this species of OWL.
An alternative RDF representation of social relationships.
 Further, there are two related characteristics of social relations that we would like to
capture in a more advanced representation.
 The first characteristic is the context dependence of social relations and the second is the
separation between an observable situation and its interpretation as a social relation
between two individuals.
 The representation of context and the separation of the level of state-of-affairs
(observations of objects and sequences of events) from the higher level of descriptions
(contexts) that can be used to interpret those state-of-affairs turns out to be common
problem in the representation of much of human knowledge.
 A solution proposed by is the Descriptions and Situations ontology design pattern that
provides a model of context and allows to clearly delineate these two layers of
representation.
Figure - The Descriptions and Situations ontology design pattern.
2.10 RANDOM WALK AND THEIR APPLICATION
 A Random Walk in synthesis:
 Given an indirected graph and a starting point, select a neighbour at random
 Move to the selected neighbour and repeat the same process till a termination condition is
verified
 The random sequence of points selected in this way is a random walk of the graph
Important parameters of random walk
 Access time or hitting time: Hij is the expected number of steps before node j is visited,
starting from node i
 Commute time: i  j  i: Hij + Hji
 Cover time: Starting from a node/distribution the expected number of steps to reach
every node.
Applications of Random Walks on Graphs
 Ranking Web Pages
 HITS on citation network
 Clustering using random walk
2.11 USE OF HADOOP AND MAP REDUCE
Map reduce
 Data-parallel programming model for clusters of commodity machines
 Pioneered by Google
- Processes 20 PB of data per day
 Popularized by open-source Hadoop project
- Used by Yahoo!, Facebook, Amazon, …
Map Reduce used for
 At Google:
1. Index building for Google Search
2. Article clustering for Google News
3. Statistical machine translation
 At Yahoo!:
1. Index building for Yahoo! Search
2. Spam detection for Yahoo! Mail
 At Facebook:
1. Data mining
2. Ad optimization
3. Spam detection
In research:
 Analyzing Wikipedia conflicts (PARC)
 Natural language processing (CMU)
 Bioinformatics (Maryland)
 Particle physics (Nebraska)
 Ocean climate simulation (Washington)
Map Reduce Goals
1. Scalability to large data volumes:
 Scan 100 TB on 1 node @ 50 MB/s = 24 days
 Scan on 1000-node cluster = 35 minutes
2. Cost-efficiency:
 Commodity nodes (cheap, but unreliable)
 Commodity network
 Automatic fault-tolerance (fewer admins)
 Easy to use (fewer programmers)
Typical Hadoop Cluster:
 40 nodes/rack, 1000-4000 nodes in cluster
 1 GBps bandwidth in rack, 8 GBps out of rack
 Node specs (Yahoo! terasort): 8 x 2.0 GHz cores, 8 GB RAM, 4 disks (= 4 TB?)
Challenges
 Cheap nodes fail, especially if you have many
- Mean time between failures for 1 node = 3 years
- MTBF for 1000 nodes = 1 day
- Solution: Build fault-tolerance into system
 Commodity network = low bandwidth
- Solution: Push computation to the data
 Programming distributed systems is hard
- Solution: Users write data-parallel “map” and “reduce” functions, system handles work
distribution and faults
Hadoop Components:
 Distributed file system (HDFS)
- Single namespace for entire cluster
- Replicates data 3x for fault-tolerance
 MapReduce framework
- Executes user jobs specified as “map” and “reduce” functions
- Manages work distribution & fault-tolerance
Hadoop Distributed File System:
 Files split into 128MB blocks
 Blocks replicated across several datanodes (usually 3)
 Namenode stores metadata (file names, locations, etc)
 Optimized for large files, sequential reads
 Files are append-only
MapReduce Programming Model:
 Data type: key-value records
 Map function:
 Reduce function:
Example: Word Count:
Word Count Execution:
An Optimization: The Combiner
 Local aggregation function for repeated keys produced by same map
• For associative ops. like sum, count, max
• Decreases size of intermediate data
Word Count with Combiner
MapReduce Execution Details :
• Mappers preferentially placed on same node or same rack as their input block
– Push computation to data, minimize network use
• Mappers save outputs to local disk before serving to reducers
– Allows having more reducers than nodes
– Allows recovery if a reducer crashes
Fault Tolerance in MapReduce:
 If a task crashes:
– Retry on another node
• OK for a map because it had no dependencies
• OK for reduce because map outputs are on disk
– If the same task repeatedly fails, fail the job or ignore that input block
 If a node crashes:
– Relaunch its current tasks on other nodes
– Relaunch any maps the node previously ran
• Necessary because their output files were lost along with the crashed node
 If a task is going slowly (straggler):
– Launch second copy of task on another node
– Take the output of whichever copy finishes first, and kill the other one

Weitere ähnliche Inhalte

Was ist angesagt?

Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithmsAlireza Andalib
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISrathnaarul
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networksFrancisco Restivo
 
Social Media Mining - Chapter 6 (Community Analysis)
Social Media Mining - Chapter 6 (Community Analysis)Social Media Mining - Chapter 6 (Community Analysis)
Social Media Mining - Chapter 6 (Community Analysis)SocialMediaMining
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPTChhavi Mathur
 
Social Media Mining - Chapter 8 (Influence and Homophily)
Social Media Mining - Chapter 8 (Influence and Homophily)Social Media Mining - Chapter 8 (Influence and Homophily)
Social Media Mining - Chapter 8 (Influence and Homophily)SocialMediaMining
 
Social Media Mining - Chapter 9 (Recommendation in Social Media)
Social Media Mining - Chapter 9 (Recommendation in Social Media)Social Media Mining - Chapter 9 (Recommendation in Social Media)
Social Media Mining - Chapter 9 (Recommendation in Social Media)SocialMediaMining
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social MediaSymeon Papadopoulos
 
Data Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering AlgorithmData Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering Algorithmnishant24894
 
Social Media Mining - Chapter 3 (Network Measures)
Social Media Mining - Chapter 3 (Network Measures)Social Media Mining - Chapter 3 (Network Measures)
Social Media Mining - Chapter 3 (Network Measures)SocialMediaMining
 
Social Media Mining - Chapter 7 (Information Diffusion)
Social Media Mining - Chapter 7 (Information Diffusion)Social Media Mining - Chapter 7 (Information Diffusion)
Social Media Mining - Chapter 7 (Information Diffusion)SocialMediaMining
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
 
Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)SocialMediaMining
 
Social Network Analysis To Blog Based Online Communities
Social Network Analysis To Blog Based Online CommunitiesSocial Network Analysis To Blog Based Online Communities
Social Network Analysis To Blog Based Online Communitiessubby88
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisFred Stutzman
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Doug Needham
 
Social Network Analysis power point presentation
Social Network Analysis power point presentation Social Network Analysis power point presentation
Social Network Analysis power point presentation Ratnesh Shah
 
Social Media Mining: An Introduction
Social Media Mining: An IntroductionSocial Media Mining: An Introduction
Social Media Mining: An IntroductionAli Abbasi
 

Was ist angesagt? (20)

Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithms
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networks
 
Social Media Mining - Chapter 6 (Community Analysis)
Social Media Mining - Chapter 6 (Community Analysis)Social Media Mining - Chapter 6 (Community Analysis)
Social Media Mining - Chapter 6 (Community Analysis)
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPT
 
Social Media Mining - Chapter 8 (Influence and Homophily)
Social Media Mining - Chapter 8 (Influence and Homophily)Social Media Mining - Chapter 8 (Influence and Homophily)
Social Media Mining - Chapter 8 (Influence and Homophily)
 
Social Media Mining - Chapter 9 (Recommendation in Social Media)
Social Media Mining - Chapter 9 (Recommendation in Social Media)Social Media Mining - Chapter 9 (Recommendation in Social Media)
Social Media Mining - Chapter 9 (Recommendation in Social Media)
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social Media
 
Data Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering AlgorithmData Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering Algorithm
 
Ppt
PptPpt
Ppt
 
Social Media Mining - Chapter 3 (Network Measures)
Social Media Mining - Chapter 3 (Network Measures)Social Media Mining - Chapter 3 (Network Measures)
Social Media Mining - Chapter 3 (Network Measures)
 
Social Media Mining - Chapter 7 (Information Diffusion)
Social Media Mining - Chapter 7 (Information Diffusion)Social Media Mining - Chapter 7 (Information Diffusion)
Social Media Mining - Chapter 7 (Information Diffusion)
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
 
Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)
 
Social Network Analysis To Blog Based Online Communities
Social Network Analysis To Blog Based Online CommunitiesSocial Network Analysis To Blog Based Online Communities
Social Network Analysis To Blog Based Online Communities
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
 
Social Network Analysis power point presentation
Social Network Analysis power point presentation Social Network Analysis power point presentation
Social Network Analysis power point presentation
 
Social Media Mining: An Introduction
Social Media Mining: An IntroductionSocial Media Mining: An Introduction
Social Media Mining: An Introduction
 

Ähnlich wie NE7012 SOCIAL NETWORK ANALYSIS VISUALIZATION TECHNIQUES

Mining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social NetworksMining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social NetworksEditor IJCATR
 
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...inventionjournals
 
2009-Social computing-Analyzing social media networks
2009-Social computing-Analyzing social media networks2009-Social computing-Analyzing social media networks
2009-Social computing-Analyzing social media networksMarc Smith
 
Sampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkSampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkEditor IJCATR
 
The Mathematics of Social Network Analysis: Metrics for Academic Social Networks
The Mathematics of Social Network Analysis: Metrics for Academic Social NetworksThe Mathematics of Social Network Analysis: Metrics for Academic Social Networks
The Mathematics of Social Network Analysis: Metrics for Academic Social NetworksEditor IJCATR
 
Visually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeVisually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeHarish Vaidyanathan
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic WebJohn Breslin
 
Scholars@Cornell: Visualizing the Scholarship Data
Scholars@Cornell: Visualizing the Scholarship DataScholars@Cornell: Visualizing the Scholarship Data
Scholars@Cornell: Visualizing the Scholarship DataMuhammad Javed
 
LEARNER CENTERED NETWORK MODELS: A SURVEY
LEARNER CENTERED NETWORK MODELS: A SURVEYLEARNER CENTERED NETWORK MODELS: A SURVEY
LEARNER CENTERED NETWORK MODELS: A SURVEYIJITE
 
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKSAPPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKSIJwest
 
UNITII -MODELING and AGGRIGATION-sss.docx
UNITII -MODELING  and AGGRIGATION-sss.docxUNITII -MODELING  and AGGRIGATION-sss.docx
UNITII -MODELING and AGGRIGATION-sss.docxsharwarisolapur
 
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONS
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONSSTRUCTURAL COUPLING IN WEB 2.0 APPLICATIONS
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONSijcsit
 
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...Rick Vogel
 
VIZ-VIVO: Towards Visualizations-driven Linked Data Navigation
VIZ-VIVO: Towards Visualizations-driven Linked Data NavigationVIZ-VIVO: Towards Visualizations-driven Linked Data Navigation
VIZ-VIVO: Towards Visualizations-driven Linked Data NavigationMuhammad Javed
 

Ähnlich wie NE7012 SOCIAL NETWORK ANALYSIS VISUALIZATION TECHNIQUES (20)

Q046049397
Q046049397Q046049397
Q046049397
 
Mining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social NetworksMining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social Networks
 
B1803020915
B1803020915B1803020915
B1803020915
 
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
 
2009-Social computing-Analyzing social media networks
2009-Social computing-Analyzing social media networks2009-Social computing-Analyzing social media networks
2009-Social computing-Analyzing social media networks
 
Sampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkSampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social Network
 
The Mathematics of Social Network Analysis: Metrics for Academic Social Networks
The Mathematics of Social Network Analysis: Metrics for Academic Social NetworksThe Mathematics of Social Network Analysis: Metrics for Academic Social Networks
The Mathematics of Social Network Analysis: Metrics for Academic Social Networks
 
Visually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeVisually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of Life
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic Web
 
Scholars@Cornell: Visualizing the Scholarship Data
Scholars@Cornell: Visualizing the Scholarship DataScholars@Cornell: Visualizing the Scholarship Data
Scholars@Cornell: Visualizing the Scholarship Data
 
LEARNER CENTERED NETWORK MODELS: A SURVEY
LEARNER CENTERED NETWORK MODELS: A SURVEYLEARNER CENTERED NETWORK MODELS: A SURVEY
LEARNER CENTERED NETWORK MODELS: A SURVEY
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKSAPPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
 
UNITII -MODELING and AGGRIGATION-sss.docx
UNITII -MODELING  and AGGRIGATION-sss.docxUNITII -MODELING  and AGGRIGATION-sss.docx
UNITII -MODELING and AGGRIGATION-sss.docx
 
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONS
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONSSTRUCTURAL COUPLING IN WEB 2.0 APPLICATIONS
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONS
 
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
 
Social network
Social networkSocial network
Social network
 
Citizens Speak Out:Public e-Engagement Experienceof Slovakia
Citizens Speak Out:Public e-Engagement Experienceof SlovakiaCitizens Speak Out:Public e-Engagement Experienceof Slovakia
Citizens Speak Out:Public e-Engagement Experienceof Slovakia
 
VIZ-VIVO: Towards Visualizations-driven Linked Data Navigation
VIZ-VIVO: Towards Visualizations-driven Linked Data NavigationVIZ-VIVO: Towards Visualizations-driven Linked Data Navigation
VIZ-VIVO: Towards Visualizations-driven Linked Data Navigation
 

Kürzlich hochgeladen

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 

Kürzlich hochgeladen (20)

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 

NE7012 SOCIAL NETWORK ANALYSIS VISUALIZATION TECHNIQUES

  • 1. NE7012 SOCIAL NETWORK ANATYSIS PREPARED BY: A.RATHNADEVI A.V.C COLLEGE OF ENGINEERING UNIT 2-MODELING AND VISUALIZATION
  • 2. UNIT II MODELING AND VISUALIZATION Visualizing Online Social Networks - A Taxonomy of Visualizations - Graph Representation - Centrality- Clustering - Node-Edge Diagrams - Visualizing Social Networks with Matrix-Based Representations- Node-Link Diagrams - Hybrid Representations - Modelling and aggregating social network data – Random Walks and their Applications –Use of Hadoop and Map Reduce - Ontological representation of social individuals and relationships. 2.1 VISUALIZING ONLINE SOCIAL NETWORKS  As the fast development of the Internet, many online social network services are created to connect social relationships among people.  Versatile visualization skills are employed to facilitate analyzing online social networks, and more in-depth studies have been investigated to improve the presentation of online social networks.  When the attributes of sociality are concerned, online social networks can be classified into different social communities. In addition, from the daily social activities of various online social communities, social networks may evolve into different patterns of social structure.  Therefore, visualizations of online social networks were developed according to the attributes of network sociality to present their network structure.  In this section, many state-of-the-arts visualization applications are surveyed to present the versatile social network visualizations on the Web.  In the following, we first depict and analyze the visualization of online social networks according to their attributes of sociality, including Web communities, email groups, digital libraries, and Web 2.0 services.  In the end of this section, we summarize online social network visualizations based on different views of social relationships, e.g. user centric social relationships, content- centric social relationships, and hybrid social relationships. 2.1.1 web communities  After the six degrees of separation concept and Web services were becoming popular
  • 3. in late 1990s, different social network services were created on the Web to help people maintain their social relationships.  The SixDegrees.com website was an early representative created on the basis of the Web interaction model during 1997 and 2001.  Since the start of SixDegrees.com, various social network websites and Web-based dating services have been established to provide people more convenient ways to build up their social relationships and communities.  In addition, many social network websites are developed with interactive visualization interfaces to facilitate people connecting their communities and maintaining social relationships.  In 2003, Club Nexus was established based on the friendship network data of Stanford students and allowed them to explicitly list their friends by their profiles.  For example, students registered on Club Nexus were provided with the profiles of their year in school, major, residence, gender, personalities, hobbies and interests to facilitate interacting with their online social networks.  The Club Nexus community thus provided very rich profiles for its users to allow for detailed social network analysis, including identifying activities and preferences that determine the formation of friendship.  In addition to listing actors with their profiles for social network analysis, a modern visualization of social networks, Vizster, was contributed with customized techniques to visualize social relationships and community structures in 2005.  Vizster was developed based on node-edge network layouts for exploring connectivity in large graph structures, supporting visual search and analysis, and automatically identifying and visualizing community structures.  Form the visualization design of Vizster, many advanced search and interactive navigation techniques are developed on Vizster to facilitate the analysis of social networks, such as highlighting, panning, zooming, and distortion techniques.  From the previous visualization of Web communities, the visualization techniques are mainly introduced to deal with the complex social relationships based on human-centric or user-centric views.
  • 4.  As the development of SemanticWeb, a project called FOAF (Friend-of-a-friend) was proposed to visualize such human-centric social relationships based on Semantic Web social metadata. With XML/RDF format, the FOAF relations can be explicitly defined for further social network analysis and visualization.  Recently, Microsoft Research Asia proposed a novel object-level search service, called Entity Cube, to help people discover real-world entities, such as people, locations, and organizations, and explore their social relationships. 2.1.2 Email groups  Email service is one of the most popular applications that people often use to connect each other and deliver messages in their daily lives.  Personal online social networks are thus constructed through people’s daily social interactions. For analyzing the social structures of the daily email activities, visualization techniques are employed to explore different patterns.  In 2004, Soylent was developed to study the social patterns and the temporal rhythms of daily email activities. Through the Soylent visualization, mutual interactions between different users and groups, and their everyday collaboration activities can be clearly displayed.  Some recurrent patterns discovered in the social networks, such as the onion pattern, the nexus pattern, and the butterfly pattern, can thus suggest regular ways of understanding their interactions.  For example, the butterfly pattern, named for its two large “wings” surrounding a single center, can be interpreted as a member of a design team also involving in a research team.  Full network view of Soylent
  • 5.  In addition to the network view of email visualization, different aspects of social networks can be visualized for analysis.  In 2005, two visual metaphors, Social Network Fragments (SNF) and Post History, were employed to visualize the major two dimensions of email activities: people and time.  The social relationships derived from the TO and CC lists in email archives are highlighted in SNF, and the temporal rhythms of interactions between ego and individual contacts over time are illustrated with a calendar metaphor in Post History, respectively.  Post History presents social network visualization with a calendar panel on the left and a contacts panel on the right. The email exchange activities with time progress can thus be visualized in an interactive calendar-like interface.  In 2006, an improved calendar-like visualization interface, called Themail, was developed to help analyze email-based social networks with a chronological sequence and the corresponding topics.  Through the analysis of email content, the social relations and mutual interactions between a user and her contacts can be clear presented in Themail. 2.1.3Digital libraries
  • 6.  In digital libraries, social networks can be mainly analyzed from two aspects: authors and writings. 2.1.3.1 Co-Authorship Networks  On the aspect of authors, co-authorships can be mined from the existing publications and organize the co-authorship networks.  With the visualization of co-authorships, some characteristics, such as clustering coefficient and average path length, can be analyzed in co-authorship networks.  From the visualization of the co-authorship networks illustrated a small world graph can be drawn with the connections of authors from different places in the world. In 2005, social network analysis for co-authorship was in-depth studied in digital libraries.  In addition to the node-edge representation, a matrix representation was used in the co- authorship network to help analyze different co-authorship patterns.  With the matrix representation, the interlaced problem of the node-edge representation caused by a large amount of nodes and complex relations can be effectively improved. 2.1.3.2 Co-Citation Relations  On contrary to the authorship view, social networks in digital libraries can be discovered from the citations and co-citations among writings themselves.
  • 7.  Since references are a crucial part of a document for readers to obtain information source, Co-citation social networks can be formed through the continuously accumulated publications.  With proper visualization of co-citation networks, documents with high impacts or similar citation patterns can be immediately identified, and the co-citation relationships can be intuitively observed as well.  In 2006, a novel visualization tool, called CircleView, was proposed to visualize academic citation relations with interactive design and highlighted color.  In CircleView, the summarized contents of the focus papers, such as references, detail reference, and references of reference, are marked with different colors and highlighted circles.  In 2007, an interactive visualization tool was developed to present large co-citation networks with latent visual cues and allows direct interaction with the visualized graphs.
  • 8.  In interaction-based citation visualization system was proposed to visualize larger datasets with scalable functionalities and complemented the citation visualization of domain content analysis.  In 2009, an innovative visualization technique, called FP-tree, was developed to present co-citation network from a new perspective, namely, visualizing social networks based on a paper-reference matrix instead of using a reference-reference matrix.  The paper-reference matrix was transformed into an FP-tree visualization to analyze the intellectual structure of two domains: Information Visualization and Sloan Digital Sky Survey (SDSS). 2.1.4 web 2.0 services  Since the concept of Web 2.0 was proposed in 2004, online social activities are becoming more prosperous than before.  Many Web 2.0 applications are popularly accessed by users to connect their social networks, such as Twitter and Facebook.  For example, Twitter provides users convenient functionalities to share the up-todate status with their followers. Since the following and the followed relationships can be quickly constructed, large social networks are created over the Twitter service, and many visualization applications are developed to analyze Twitter social networks.  Another example of Web 2.0 services that forms large social networks is Facebook.  As Facebook was proposed at Harvard in 2004, its social community grows explosively from U.S. to the rest of the world.  Until today, Facebook has over 400 million active users worldwide, and large and complex social networks come along with the explosive growth of Facebook communities.  Therefore, many interactive visualization applications and tools are developed to assist users to access and analyze such complicated social networks.  Nexus is a visualization application on Facebook communities to illustrate their large network graphs. With the Nexus visualization, user relationships can be properly displayed, and user names will pop up when mouse is moved over the nodes.
  • 9.  However, when user relationships are complicated, the social relationships in the Nexus visualization may become difficult to be recognized.  In 2010, an advanced interactive visualization interface, called IRNet, was proposed to further improve the shortcomings of Nexus and TouchGraph on visualizing Facebook communities.  IRNet visualizes interpersonal relationships in social networks with focus context techniques to present both local and global views. 2.1.5 visualization of online social networks classification  In addition, visualization of online social networks can be further categorized into three types by their social relationships: user-centric visualization, content-centric visualization, and hybrid visualization. 2.1.5.1user-centric visualization  From the perspective of actors, visualizing user-centric social relationships can present various characteristics of actors and helps explore different subjects and relationships of interests.  For example, visualizing user-centric social relationships can help people discover individuals and communities that meet the following expectations: 1. actors or groups with similar/complementary features, 2. key actors or those with high social impacts, and 3. actors with popular interpersonal relationships or active social interactions.  A great variety of online social networks are created and visualized according to the above user-centric social relationships, such as Club Nexus, Vizster, People Entity Cube, FOAF, SNF, and co-authorship networks.  Such user-centric visualizations are widely utilized to help people access their social networks and discover the social networks of their interests.
  • 10. 2.1.5.2 content-centric visualization  As the boosted development and evolution of Internet applications, particularly in the era of Web 2.0, an explosive amount of Web information was rapidly produced via various social network communities.  Therefore, from the view of visualizing content-centric social relationships, various kinds of contents can be properly presented to facilitate people analyzing social networks.  For example, at least the following three sorts of social network contents can displayed with content-centric visualization: 1. the distribution of user opinions, including key opinions and controversial comments, 2. user opinions with high impacts toward their social communities, especially the effective- impact period of time, and 3. the relations among different content groups.  In according with content-centric social relationships, several visual applications for online social networks are developed, such as visualizations for citation and cocitation social networks. 2.1.5.3 Hybrid visualization  Social activities among actors are generally more than one form and thus may consist of different kinds of relationships and interactions.  Therefore, besides the user-centric visualization and content-centric visualization interfaces, still many visualization interfaces are designed according to hybrid relationships.  Hybrid visualization is to visualize social networks from different aspects of attributes, e.g. people and contents. Particularly, online social activities, such as email and dating services, usually include such elements.  Therefore, these online social networks are displayed with hybrid visualization techniques, such as PostHistory, Themail,TouchGraph, and IRNet. 2.2 SOCIAL NETWORK ANALYSIS ( A TAXONOMY)
  • 11.  Social networks are built when actors belonging to different social groups are connected to each other.  In late 1800s and early 1900s, sociologists have been looking for social relationships, groups, positions, and organizations in human activities.  In early 1930s, Moreno brought in the concept of sociograms to represent social networks and started to explore the social relations in a formal study.  Until today,social network analysis is still a hot issue that attracts many concerns, particularly for analyzing the online social networks. 2.3 GRAPH REPRESENTATION  Many fundamental concepts and metrics in social network analysis are derived from graph theory, because graph theory formally represents social networks with structural properties. Node degree:  In graph theory, the degree of a node in a graph is the number of edges incident to the node. If there are loops in the graph, the degree of a node will be counted twice.  Therefore, the maximum number of unique edges in a graph can be obtained when the loops are excluded. There are at most N*(N-1)/2edges for an undirected graph and at most N*(N-1) edges for a directed graph, where N is the number of nodes. Node density:  The density of an undirected graph can be defined as (2*E)/N*(N-1), where E is the number of edges. On the other hand, the density of a directed graph can be defined as E/N*(N-1). Path length:  The path length is the number of edges in the sequence that a walk follows. In a path, all nodes and edges appear only once in the sequence.  Therefore, the path length can be defined as the distances between pairs of nodes in a network graph, and average path length is the average of these distances between all pairs of nodes. Component size:
  • 12.  When the component size is concerned, a connected graph needs to be discovered first since the component size is counted by the number of connected nodes in a graph.  A graph is connected if all pairs of nodes are reachable, and for each pair of two nodes, one of them is reachable from the other.  On the other hand, if a graph is not connected, the graph can be partitioned into several connected subgraphs where each component size can be calculated by the number of connected nodes in each subgraph. 2.4 CENTRALITY  One of the key applications in social networks is to identify the most important or central nodes in the network.  The measure of centrality is thus used to give a rough indication of the social power of a node based on how well they connect the network. HITS analyzes the important nodes based on calculating Authorities (indegrees) and Hubs (out-degrees) , and PageRank calculates node values based on out-degrees.  In social network analysis, “Degree”, “Betweenness”, and “Closeness”centrality are three most popularly adopted methods to measure the centrality of a social network. 2.4.1 Degree centrality:  Degree centrality is defined as the number of edges incident upon a node, and thus it is usually the first way to calculate the nodes that are most potential to determine other nodes.  For calculating degree centrality, the nodes that have direct connections to a large number of nodes are considered. If the edges in a graph are directed, the in-degree centrality is differentiated from the out-degree centrality.
  • 13. 2.4.2 Betweenness Centrality:  In addition to degree centrality, betweeness centrality is another key metrics for computing the extent to which a node lies between other nodes in the network.  If a node is the only node that links two groups of nodes in the network, this node shall be seen as an important node for keeping the social network together.  Therefore, betweenness centrality is to measure the connectivity of the neighbors of anode and to give a higher value for nodes which bridge clusters. 2.4.3 Closeness centrality:  The measure of closeness centrality is to take into account how distant a node is to
  • 14. the other nodes in the network.  Hence, closeness centrality is to measure the order of magnitude that a node is near all other nodes in a social network by calculating the mean shortest path for a node to all other nodes in the graph. 2.5 CLUSTERING  Many social networks contain subsets of nodes that are highly connected within the subset and have relatively few connections to nodes outside the subset.  The nodes in such subsets are likely to share some attributes and form their own communities.  Since the detection of these community structures is not trivial, how to efficiently and effectively discover such community structures is important.  Therefore ,the main measure described below is to help explore the grouping effects by Clustering coefficient. 2.5.1 Clustering coefficient:  A clustering coefficient is to measure the degrees of nodes to decide which nodes in a graph tend to be clustered together.  Thus, the clustering coefficient measure is to quantify how close its neighbors are to being a complete graph.
  • 15.  As the nodes grouped in the real-world social network tend to have relatively high density of ties,the clustering coefficient is also utilized for small world analysis. 2.6 NODE EDGE DIAGRAM  A node-edge diagram is an intuitive way to visualize social networks. With the node-edge visualization, many network analysis tasks, such as component size calculation, centrality analysis, and pattern sketching, can be better presented in a more straightforward manner.  Many node-edge layouts have been presented to place the nodes in the graph for users to clearly recognize the structure of the social network.  For example, some layouts are suitable to display social networks in a moderate size, but they are not suitable for showing larger networks.  Three kinds of layouts, namely, random layout, force-directed layout, and tree layout, are described to explain the node-edge diagrams. 2.6.1 Random Layout  A random layout is to put the nodes at random geometric locations in the graph, and thus it may not yield very clear visualization results, particularly when the number of nodes immensely increases, e.g. more than thousands of nodes.  However, since a random layout algorithm can efficiently draw the social network graph in linear time, O(N), sometimes it can be usable to visualize very large network graphs.  Therefore, random graphs have been proposed as a possible model to take into account the structural characteristics of instances that appear in many practical applications.
  • 16. 2.6.2 Force-Directed Layout  A force-directed layout is also known as a spring layout, which simulates the graph as a virtual physical system.  In a force-directed layout, the edges act as spring and the nodes act as repelling objects, just like the Hooke’s law and the Coulomb’s law.  Hence, there exists gravitational attraction or magnetic repulsion between each node in the graph.  Generally, an initial random layout will be yielded first, and then the force-directed algorithms will run iteratively to adjust the positions of nodes until all graph nodes and attractive forces between the adjacent nodes run to convergence.  Since a force-directed layout may take hundreds of iterations to obtain a stable layout, the running time is at least O.(N logN) or O.(E), where N is the number of nodes and E is the number of edges.
  • 17. 2.6.3 Tree Layout  A basic tree layout is to choose a node as the root of tree, and the nodes connected to the root become children of the root node.  Nodes that are at more levels away from the root become the grand-children of the root and so on.  A tree layout can display a more structural layout than graph layouts by considering more contextual information.  Because of the hierarchical nature of a tree layout, trees are more straightforward to grasp human eye than general graphs.  For a better visual presentation of domain specific information, more suitable variants of the tree layout were proposed, such as hyperbolic tree layout and a radial tree layout.  These tree visualizations utilize the idea of focus+context to better the visualization effects with animation techniques and help users to obtain both global and local views of a social network in a 2D display.  In addition, an advanced visualization technique, called H3, further maps the hyperbolic tree into a 3D space with a fisheye effect to better utilize the visualization space.
  • 18. 2.6.4 Matrix Representations  Since a social network graph consists of nodes connected with edges, it can be transformed into a simple Boolean matrix whose rows and columns represent the vertices of the graph.  Moreover, the Boolean values in the matrix can be further replaced with valued attributes associated with the edges to provide more informative network visualizations.  Since a matrix presentation can help minimize the occlusion problems caused by the node-edge diagram, the matrix-based representation of graphs offers an alternative to the traditional node-edge diagrams.
  • 19.  With a matrix-based representation, clusters and associations among the nodes can also be better discovered when the number of nodes increases.  In 2006, an enhanced matrix-based representation, called MatrixExplorer,was developed to visualize social networks with a Dual-Representation. 2.7 VISUALIZING SOCIAL NETWORKS WITH MATRIX-BASE REPRESENTATIONS  Matrices or node-link diagrams both have advantages and drawbacks for visualizing Social networks. In this section, we present the pros and cons of each representation and propose a set of visualizations combining the best of both worlds. 2.7.1 Matrix or Node-Link Diagram?  Matrix and node-link diagrams have different properties making them suitable representations for different tasks and datasets. The advantages of matrices: 1. Matrices provide powerful overview visualization since the time to create them is low and since they are always readable. They constitute a good representation to initiate an exploration. 2. Matrices do not suffer from node overlapping, if the task requires to always read the actors’ labels, this representation is more appropriate. 3. Matrices do not suffer from link crossing each other; therefore they are a viable alternative for dense networks.
  • 20. 4. Matrices show all possible pairs of vertices, they can highlight the lack of connection and also the directedness of the connections. They are particularly appropriate for directed and dense networks. The advantages of node-link diagrams:  These representations are familiar to a wide audience; they constitute a powerful communication tool. In contrast, matrices require training and help decoding their meaning for novice users.  For small or sparse networks, Ghoniem et al. proved that node-link diagrams were more effective than matrices.  For a similar level of details, the space used by matrices is larger than the space to display node-link diagrams. Therefore, for a compact representation, node-link diagrams are a better choice.  When the analysis requires to perform a number of path-related tasks (e.g. find the shortest path from John to Mary), node-link diagrams are more appropriate.Ghoniem et al. showed that such tasks were difficult to perform with matrices. 2.7.2 Matrix + Node-Link Diagrams  To combine advantages of both representations and to support the visual exploration of social networks, we designed MatrixExplorer.  To conceive this system, we observed and discussed with a small group of social scientists. We divided their analysis process in four main stages. For each, we describe how matrices and node-link diagrams can be combined to achieve the best of both worlds. 1. Initiate the exploration 2. Explore interactively and iteratively 3. Find a consensus in the data or validate an hypothesis 4. Present the findings 2.7.3 Node link diagram  Jacob Moreno was the first pioneer of social network visualization. More than 70 years ago, he published visual depictions of social friendship in schools, using these visualizations to support his findings.
  • 21.  The principle of node-link diagrams is to graphically represent actors of the network by nodes and connections by links.  Node-link diagrams are the most commonly used representation of graphs and networks. It is well illustrated by Freeman in his survey and history of social network visualization.  Node-link representations are widely used and familiar to a very large audience, making them a powerful communication tool. However, their readability and the message they convey greatly depends on the positions of their nodes. 2.8 HYBRID REPRESENTATION  Providing both matrix and node-link diagrams to the user has a number of advantages but also drawbacks. First, it requires a large amount of display space.  At least two display monitors are required to comfortably use MatrixExplorer; a third one is strongly recommended to display textual and detailed views.  Secondly,we observed that switching from one representation to the other may induce high cognitive load to the user and split attention is always tiresome.  Indeed, a single node on the node-link diagram becomes both a row and column in the matrix and a link, visually represented by a line, becomes a cell, i.e. a rectangle, in the matrix.  Switching representations between tasks require a few seconds of adjustment, disorienting momentarily users.  To minimize the display space required and limit the cognitive cost when switching representations, we developed two hybrid representations: MatLink and NodeTrix.
  • 22. 2.8.1 Augmenting Matrices  As Ghoniem et al. demonstrated in their study, matrices do not support well path following tasks.  For example, finding the shortest path between two given actors is far easier in a node- link diagram, in which users can quickly investigate the multiple paths and assess what the shortest path is.  These tasks being very common in social network analysis, we proposed to create a hybrid representation to overcome the problem in matrices: MatLink  The principle of MatLink is to augment a standard matrix representation with links on its borders.  These links provides a dual encoding of the connections between actors and ease path- following tasks since they use the visual representation of node-link diagrams. 2.8.2 Merging Matrix and Node-Link Diagram  Node-link diagrams or matrices perform differently according to the types of visualized networks.  For example, node-link diagrams or hybrids Treemap+links are well suited to represent tree-like networks. Conversely, for dense networks or bipartite networks, matrices are better suited, maximizing the use of space and remove any link crossing.  For small-world networks, however, the choice of representation is not so clear.  When visualizing such network with a node-link diagram, the dense regions (e.g. communities) suffer from link crossing and become difficult to read. However, when using a matrix representation, the visualization is very sparse and requires a lot of navigation for exploration.
  • 23. 2.9 MODELLING AND AGGREGATING SOCIAL NETWORK DATA  There are two fundamental reasons for developing semantic-based representations of social networks.  Firstly, as we will demonstrate, maintaining the semantics of social network data is crucial for aggregating social network information, especially in heterogeneous environments where the individual sources of data are under diverse control.  The benefits are easiest to realize in cases where the data are already available in an electronic format, which is typically the case in the network analysis of online communities but also in the intelligence community.  Secondly, semantically representations can facilitate the exchange and reuse of case study data in the academic field of Social Network Analysis.  The possibilities for electronic data exchange has already revolutionized a number of sciences with the most well-known examples of bio-informatics and genetics. 2.9.1 State-of-the-art in network data representation  In the chapters before we have seen that the most common kind of social network data can be modeled by a graph where the nodes represent individuals and the edges represent binary social relationships.  higher-arity relationships may be represented using hyper-edges, i.e. edges connecting multiple nodes.
  • 24.  The most commonly encountered formats are those used by the popular network analysis packages Pajek and UCINET. These are text-based formats which have been designed in a way so that they can be easily edited using simple text editors.  Unfortunately, the two formats are incompatible. (UCINET has the ability to read and write the .net format of Pajek, but not vice versa.) Further, researchers in the social sciences often represent their data initially using Microsoft Excel spreadsheets, which can be exported in the simple CSV (Comma Separated Values) format.  The GraphML format represents an advancement over the previously mentioned formats in terms of both interoperability and extensibility [BEL04, BEH+01].
  • 25.  GraphML originates from the information visualization community where a shared format greatly increases the usability of new visualization methods.  GraphML is therefore based on XML with a schema defined in XML Schema.  This has the advantage that GraphML files can be edited, stored, queried, transformed etc. using generic XML tools.  However, none of these formats support the aggregation and reuse of electronic Data. 2.9.2 Ontological representation of social individuals  The Friend-of-a-Friend (FOAF) ontology that we use in our work is an OWL based format for representing personal information and an individual’s social network.  FOAF greatly surpasses graph description languages in expressivity by using the powerful OWL vocabulary to characterize individuals.  More importantly, however, using FOAF one can rely on the extendibility of the RDF/OWL representation framework for enhancing the basic ontology with domain- specific knowledge about identity.  In terms of deployment, two studies give an insight of how much the individual terms of the FOAF are used on the Web [PW04, DZFJ05].  Both studies note that the majority of FOAF profiles on the Web are auto-generated by community sites such as LiveJournal, Opera Communities5 and Tribe.net.  As FOAF profiles are scattered across the Web it is difficult to estimate their number, but even the number of manually maintained profiles is likely to be in the tens of thousands. @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#label> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix example: <http://www.example.org/> . example:Rembrandt rdf:type foaf:Person . example:Saskia rdf:type foaf:Person .
  • 26. example:Rembrandt foaf:name "Rembrandt" . example:Rembrandt foaf:mbox <mailto:rembrandt@example.org> . example:Rembrandt foaf:knows example:Saskia . example:Saskia foaf:name "Saskia" . Figure 4.3 A set of triples describing two persons represented in the Turtle language.  FOAF started as experimentation with Semantic Web technology.  The idea of FOAF was to provide a machine processable format for representing the kind of information that made the original Web successful, namely the kind of personal information described in homepages of individuals.  Thus FOAF has a vocabulary for describing personal attribute information typically found on homepages such as name and email address of the individual, projects, interests, links to work and school homepage etc.  FOAF profiles, however, can also contain a description of the individual’s friends using the same vocabulary that is used to describe the individual himself.  Even more importantly, FOAF profiles can be linked together to form networks of web- based profiles.
  • 27.  FOAF became the center point of interest in 2003 with the spread of Social Networking Services such Friendster, Orkut, LinkedIn etc.  Despite their early popularity, users have later discovered a number of drawbacks to centralized social networking services.  First, the information is under the control of the database owner who has an interest in keeping the information bound to the site and is willing to protect the data through technical and legal means.  The profiles stored in these systems typically cannot be exported in machine processable formats (or cannot be exported legally) and therefore the data cannot be transferred from one system to the next.  Second, centralized systems do not allow users to control the information they provide on their own terms. Although Friendster follow-ups offer several levels of sharing (e.g. public information vs. only for friends), users often still find out the hard way that their information was used in ways that were not intended.  These problems have been addressed with the use of Semantic Web technology.
  • 28. 2.9.3 Ontological representation of social relationships  Ontological representations of social networks such as FOAF need to be extended with a framework for modelling and characterizing social relationships for two principle reasons: 1. To support the automated integration of social information on a semantically basis 2. To capture established concepts in Social Network Analysis.  In this section we approach this task using the engineering method of decomposition.  The key idea is that the representation of social relationships needs to be fine-grained enough so that we can capture all the detail from the individual sources of information in a way that these can be later recombined and taken as an evidence of a certain relationship.
  • 29.  Network analysis itself can be a help in that it has a rich vocabulary of characterizing Social relationships.  As these are the terms that are used by social scientists, they are prime candidates to be included in any ontology.  For illustration, we list below some of the most commonly discussed characteristics of social relationships. 1) Sign  A relationship can represent both positive and negative attitudes such as like or hate.  The positive or negative charge of relationships is important on its own for the study of balance within social networks, which is the subject of balance theory. 2) Strength  The notion of tie strength was first introduced by Granovetter in his groundbreaking work on the benefits of weak ties. Tie strength itself is a complex construct of several characteristics of social relations.  In her survey, Hite lists the following aspects of tie strength discussed in the literature: Affect/ philos/passions , Frequency/frequent contact , Reciprocity, Trust/enforceable trust , Complementarity, Accommodation/adaptation, Indebtedness/imbalance, Collaboration, Transaction investments, Strong history, Fungible skills, Expectations, Social capital, Bounded solidarity, Lower opportunistic behavior, Density, Maximize relationship over org., Fine-grained information transfer, Problem solving, Duration, Multiplexity, Diffusion, Facilitation, Personal involvement, Low formality (few contracts), Connectedness. 3) Provenance  A social relationship may be viewed differently by the individual participants of the relationship, sometimes even to the degree that the tie is unreciprocated, i.e. perceived by only one member of the dyad. Similarly, outsiders may provide different accounts of the relationship, which is a well-known bias in SNA. 4) Relationship history  Social relationships come into existence by some event (in the most generic, philosophical sense) involving two individuals. (Such an event may not require personal contact (e.g. online friendships), but it has to involve social interaction.
  • 30.  From this event, social relationships begin a lifecycle of their own during which the characteristics of the relationship may change through interaction or the lack of (see e.g. Hite and Hesterly ). 5) Relationship roles:  A social relationship may have a number of social roles associated with it, which we call relationship roles.  For example, in a student/professor relationship within a university setting there is one individual playing the role of professor, while another individual is playing the role of a student.  Both the relationship and the roles may be limited in their interpretation and use to a certain social context. 2.9.3.1 Conceptual model  The importance of social relationships alone suggests that they should be treated as first- class citizens.  Alternatively, social relations could be represented as n-ary predicates; however, n-ary relations are not supported directly by the RDF/OWL languages.  There are several alternatives to n-ary relations in RDF/ OWL described in a document created by the Semantic Web Best Practices and Deployment Working Group of the W3C.  In all cases dealing with n-ary relations we employ the technique that is known as reification:we represent the relation as a class, whose instances are concrete relations of that type.  One may recall that RDF itself has a reified representation of statements: the rdf :Statement object represents the class of statements.  This class has three properties that correspond to the components of a statement, namely rdf :subject, rdf :predicate, rdf :object. These properties are used to link the statement instance to the resources involved in the statement.
  • 31. Fig: An RDF representation of social relationships.  This model has the advantage that it becomes more natural to represent parameter values and restrictions on them.  The disadvantage is that this solution is not compliant with OWL DL: declaring properties ranging on properties and creating subclasses rdf :Property are not allowed in this species of OWL.
  • 32. An alternative RDF representation of social relationships.  Further, there are two related characteristics of social relations that we would like to capture in a more advanced representation.  The first characteristic is the context dependence of social relations and the second is the separation between an observable situation and its interpretation as a social relation between two individuals.  The representation of context and the separation of the level of state-of-affairs (observations of objects and sequences of events) from the higher level of descriptions (contexts) that can be used to interpret those state-of-affairs turns out to be common problem in the representation of much of human knowledge.  A solution proposed by is the Descriptions and Situations ontology design pattern that provides a model of context and allows to clearly delineate these two layers of representation.
  • 33. Figure - The Descriptions and Situations ontology design pattern. 2.10 RANDOM WALK AND THEIR APPLICATION  A Random Walk in synthesis:  Given an indirected graph and a starting point, select a neighbour at random  Move to the selected neighbour and repeat the same process till a termination condition is verified  The random sequence of points selected in this way is a random walk of the graph
  • 34. Important parameters of random walk  Access time or hitting time: Hij is the expected number of steps before node j is visited, starting from node i  Commute time: i  j  i: Hij + Hji  Cover time: Starting from a node/distribution the expected number of steps to reach every node. Applications of Random Walks on Graphs  Ranking Web Pages  HITS on citation network  Clustering using random walk 2.11 USE OF HADOOP AND MAP REDUCE Map reduce  Data-parallel programming model for clusters of commodity machines  Pioneered by Google - Processes 20 PB of data per day  Popularized by open-source Hadoop project - Used by Yahoo!, Facebook, Amazon, … Map Reduce used for  At Google: 1. Index building for Google Search 2. Article clustering for Google News 3. Statistical machine translation  At Yahoo!: 1. Index building for Yahoo! Search 2. Spam detection for Yahoo! Mail  At Facebook:
  • 35. 1. Data mining 2. Ad optimization 3. Spam detection In research:  Analyzing Wikipedia conflicts (PARC)  Natural language processing (CMU)  Bioinformatics (Maryland)  Particle physics (Nebraska)  Ocean climate simulation (Washington) Map Reduce Goals 1. Scalability to large data volumes:  Scan 100 TB on 1 node @ 50 MB/s = 24 days  Scan on 1000-node cluster = 35 minutes 2. Cost-efficiency:  Commodity nodes (cheap, but unreliable)  Commodity network  Automatic fault-tolerance (fewer admins)  Easy to use (fewer programmers) Typical Hadoop Cluster:
  • 36.  40 nodes/rack, 1000-4000 nodes in cluster  1 GBps bandwidth in rack, 8 GBps out of rack  Node specs (Yahoo! terasort): 8 x 2.0 GHz cores, 8 GB RAM, 4 disks (= 4 TB?) Challenges  Cheap nodes fail, especially if you have many - Mean time between failures for 1 node = 3 years - MTBF for 1000 nodes = 1 day - Solution: Build fault-tolerance into system  Commodity network = low bandwidth - Solution: Push computation to the data  Programming distributed systems is hard - Solution: Users write data-parallel “map” and “reduce” functions, system handles work distribution and faults Hadoop Components:  Distributed file system (HDFS) - Single namespace for entire cluster - Replicates data 3x for fault-tolerance  MapReduce framework - Executes user jobs specified as “map” and “reduce” functions - Manages work distribution & fault-tolerance Hadoop Distributed File System:  Files split into 128MB blocks  Blocks replicated across several datanodes (usually 3)  Namenode stores metadata (file names, locations, etc)  Optimized for large files, sequential reads  Files are append-only
  • 37. MapReduce Programming Model:  Data type: key-value records  Map function:  Reduce function:
  • 38. Example: Word Count: Word Count Execution: An Optimization: The Combiner  Local aggregation function for repeated keys produced by same map • For associative ops. like sum, count, max • Decreases size of intermediate data Word Count with Combiner
  • 39. MapReduce Execution Details : • Mappers preferentially placed on same node or same rack as their input block – Push computation to data, minimize network use • Mappers save outputs to local disk before serving to reducers – Allows having more reducers than nodes – Allows recovery if a reducer crashes Fault Tolerance in MapReduce:  If a task crashes: – Retry on another node • OK for a map because it had no dependencies • OK for reduce because map outputs are on disk – If the same task repeatedly fails, fail the job or ignore that input block  If a node crashes: – Relaunch its current tasks on other nodes – Relaunch any maps the node previously ran
  • 40. • Necessary because their output files were lost along with the crashed node  If a task is going slowly (straggler): – Launch second copy of task on another node – Take the output of whichever copy finishes first, and kill the other one