SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
Identifying Semantic Concepts
Selection of CN’s
Data Collection
Preprocessing
Extraction of CN, LN,
IWL, GN, page-size
and access log data
Calculation of REPk, REPv, REPt and REL
Analysis of data
Eric Tessenow,1 Mirko KĂ€mpf,2 and Jan W. Kantelhardt 2
Abstract
Since the numbers of hypertext pages and hyperlinks in the WWW have
been continuously growing for more than 20 years, the problem of
finding relevant content has become increasingly important. We have
developed and evaluated techniques for a time-dependent characteri-
zation of the global and local relevance of WWW pages based on
document length, number of links, and cross-correlations in user-access
time series. We focus on content and user activity in selected groups of
Wikipedia articles as a first application mainly because of data availa-
bility. Our goal is the assignment of ranking values to a hypertext page
(node). The values shall cover static properties of the node and its
neighbourhood (context) as well as dynamic properties derived from its
page-view rates that depend on underlying communication processes.
We show in several examples how this goal can be achieved.
1 Institute of Communications Studies, University of Leeds, LS2 9JT, Leeds, United Kingdom
2 Institut fĂŒr Physik, Martin-Luther-UniversitĂ€t Halle-Wittenberg, 06099 Halle (Saale), Germany
Motivation
Since many aspects have to be taken into account in the analysis of
global social networks, it is challenging to compare data collections and
obtain results from their analysis. We, therefore, require a robust and at
the same time flexible framework, which enables interdisciplinary
research as scientist from different fields look at different parts of a data
set. Our work suggests a methodology for comparable measurements of
a node‘s relevance in local graphs defined by the node’s local
neighbourhood, while considering local link structure, text volume, user
access activity and editorial activity.
This enables a qualitative and also an efficient quantitative analysis of
parts of a global social network without having to explore and analyze
the whole graph.
In order to identify and to compare different communication pro-
cesses on multiple channels, one has to quantify the influence of
the environment in which an individual process is embedded in,
e.g. for different topics and different regions on earth we study
usage patterns and embedding of content in one of the largest
public and open content networks, Wikipedia.
Information Flow in Correlation Networks Outlook
Local Representation Indexes: REPk,v and REPa,e(t)
Data Sets & Processing
SOE
6.1
References
[1] KĂ€mpf, M., Tessenow, E., Kantelhardt, J.W., Context Sensitive and Time Resolved Relevance in Complex Networks. Unpublished (in preparation, 2014).
[2] KĂ€mpf M., Tismer S., Kantelhardt J.W., Muchnik L., Fluctuations in Wikipedia access-rate and edit-event data. Physica A, 391: 6101-6111 (2012).
[3] KĂ€mpf M., Kantelhardt J.W., Muchnik L., From time series to co-evolving functional networks: dynamics of the complex system ‘Wikipedia’, Proc. Europ. Conf. Complex Syst. (2012).
[4] Schreck B., KĂ€mpf M., Kantelhardt J.W., Motzkau H., Comparing the usage of global and local Wikipedias with focus on Swedish Wikipedia, arXiv:1308.1776 (2013).
[5] KĂ€mpf M., Kantelhardt J.W., Hadoop.TS: large-scale time-series processing, International Journal of Computer Applications (IJCA) 74: 17 (2013), DOI: 10.5120/12974-0233.
[6] Segev E., Mapping the international: Global and local salience and news-links between countries in popular news sites worldwide. Int. Journal for Internet Science, 5: 48-71. (2010)
Contact
We compare different media types – in
particular channels which push information to
consumer (TV news, radio news, Twitter and
Facebook communication) - opposed to pull-media
like Wikipedia, forum or blackboard websites, from
which customers pull data on demand.
We evaluate how properties of different
network types, e.g. social-, content-, and
communication-networks influence each
other and if such couplings depend more on the
content or more on the way information is offered and
spread.
Finally we are interested in the question:
To what extend and how can automatied tools
influence the communication processes?
Relevance Indexes: RELv and RELa(t)
We measure characteristic static and dynamic properties of a Wikipedia page based on I.) node degree k,
II.) average text-volume v, and III.) their access-rate or edit-rates time series (a(t), e(t)) in order to
determine and quantify the level of representation in a semantic or lingual context.
I.) Node-degree III.) Time-dependent access-rate a(t)
II.) Average text volume
We measure the time dependent or tempo-
ral relevance of a Wikipedia page during a
time period for access rates (a,b) of the
central node CN (black), the group IWL
(green), the local neighbourhood (LN, blue)
And the global neighbourhood (GN, red).
a) Relevance Index: shows the level of
attraction of a topic, e.g. for a Wikipedia
page in one selected language.
It compares the user interest in pages in
the selected language and average values
for pages with the same content for all other
languages.
Fig. 1: Definition of partial
data sets (local networks)
Fig. 2: Comparison of local network
structures with identical nodes based
on (a) direct links and (b) functional
link strengths derived from access
activity.
We calculate the time-dependent link strengths correlation by:
Fig. 3: Comparison of static representation indexes for two semantic concepts (data sets 1 and 2).
Fig. 4: Comparison of two local page networks with an assumed
higher global relevance (left) and with higher local relevance (right).
Fig. 5: Distribution of dynamic link strengths
for statically linked pages (a,d), for pages within
groups LN (blue) and GN (red) (b,e), and for
pages in different groups (c,f). Lines show
results for real data and are compared with
results from randomly shuffled data series
(filled areas).
Average values and maximum values of the
distribution function vary over time. Hence, we
cannot define a simple threshold to identify
relevant links. However, the distributions differ
significantly for real data and surrogate data in
(a,d,e).
In the presence of extreme events in access
time series (bottom row) we find a significant
increase in cross-correlation based link
strengths for page pairs in the local and global
neighbourhoods.
RAW data set
large scale data management
Partial data set
preparation
Result data set
Communication
Process
Modelling and Analysis
of Complex Systems
Definition of data sets (Fig. 1)
a) Central node (CN), all directly linked nodes in the same language (local neighbourhood,
LN), all nodes regarding the same topic in other languages (linked by inter-wiki links, IWL),
and the all nodes linked to nodes in the IWL group (global neighbourhood, GN).
b) The CN group and the IWL group are the core of the local network for one topic. Both
neighbourhoods, local (LN) and global (GN) form the hull of the local network.
Data stets for preliminary results and method tests
We address three data sets with differently chosen CNs (Wikipedia pages):
(1) Four German cities (Berlin, Heidelberg, Bad Harzburg, Sulingen) and two British cities
(Oxford, Birmingham);
(2) ’United States of America’, ’Germany’, the ’President of the United States Barack
Obama’, and the ’Federal Chancellor Angela Merkel’ in German and English language;
(3) Selected CNs with predominantly local and global relevance: Erfurt rampage and
Illuminati book – both already used in a previous study of the fluctuations in Wikipedia
access-rate time series [2] – and four times three pairs of CNs within the categories:
minorities, cities, politicians, and meals.
Comparison of static link network and dynamic correlation networks (Fig. 2)
a) Direct Wikipedia links between all nodes in the groups CN, LN, IWL, and GN.
b) Functional link strengths calculated from user access-rate time series.
Illuminati (book) Erfurt rampage
eric.tessenow@gmail.com, mirko.kaempf@gmail.com, jan.kantelhardt@physik.uni-halle.de
This work was supported by:
Acknowledgement
Global and local relevance seem to be a characteristic property of a
page. In (c) the local relevance decreases (blue dashed line) and in
(d) it is similar for all languages. To compare L.REL and G.REL we
show the cross-correlation for sliding windows of different sizes in (e)
and (f).
CN: Erfurt rampage (languge: de)
Jan-Feb 2009
Mar-Apr 2009
Fig. 6: Time resolved average link strength for local
functional networks around two selected CNs (see Fig. 4).
Fig. 6a) shows a significant change in the average cross-
correlation for pages in group GN (area A). At the same time the
correlation in group LN drops significantly. In Fig. 6b) one can see
that a decreasing local correlation is not necessarily related to a
change in global correlations. This way one might be able to
distinguish between local and global relevance as well.
We visualize the relevance of semantic concepts for specific regions, while we take the natural density of
speakers and topic-specific relevance of languages within a specific region into account.
Such a language-dependent visualization will help to distinctively identify global and local trends for a semantic concept
in a specific continent, country, or region based on public data sources and social communication and content networks
like Wikipedia, but also Facebook, Google+, Twitter or even internal system, used in global Enterprises can be analyzed
this way – even in a multilingual environment.
Fig. 7: Collaboration networks for pages regarding
the same topic in different languages (central large
violet nodes) show inhomogeneous structure with
clusters of multiple sizes. Connections between
editor-clusters are “automatic editing tools (robots)”.
Do such robots influence the spread of information?
Context Sensitive and Time Resolved Relevance
of Wikipedia Articles

Weitere Àhnliche Inhalte

Andere mochten auch

In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
Cloudera, Inc.
 

Andere mochten auch (17)

Hadoop Puzzlers
Hadoop PuzzlersHadoop Puzzlers
Hadoop Puzzlers
 
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015
 
Taming Operations in the Hadoop Ecosystem
Taming Operations in the Hadoop EcosystemTaming Operations in the Hadoop Ecosystem
Taming Operations in the Hadoop Ecosystem
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
 
Introducing the TPCx-HS Benchmark for Big Data
Introducing the TPCx-HS Benchmark for Big DataIntroducing the TPCx-HS Benchmark for Big Data
Introducing the TPCx-HS Benchmark for Big Data
 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...
 
Debugging (Docker) containers in production
Debugging (Docker) containers in productionDebugging (Docker) containers in production
Debugging (Docker) containers in production
 
Nested Types in Impala
Nested Types in ImpalaNested Types in Impala
Nested Types in Impala
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
 
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 

Ähnlich wie DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"

APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKSAPPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
IJwest
 
Characterization of National Web Domains
Characterization of National Web DomainsCharacterization of National Web Domains
Characterization of National Web Domains
webhostingguy
 
Characterization of National Web Domains
Characterization of National Web DomainsCharacterization of National Web Domains
Characterization of National Web Domains
webhostingguy
 
KAPA_2011_Seoul_Conference_Khan & Park
KAPA_2011_Seoul_Conference_Khan & ParkKAPA_2011_Seoul_Conference_Khan & Park
KAPA_2011_Seoul_Conference_Khan & Park
Gohar Feroz Khan
 
Kapa conference scientometrics-e-govt_khan & park
Kapa conference scientometrics-e-govt_khan & parkKapa conference scientometrics-e-govt_khan & park
Kapa conference scientometrics-e-govt_khan & park
Han Woo PARK
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
dannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
IJwest
 
A Picture Is Worth A Thousand Questions Docx
A Picture Is Worth A Thousand Questions DocxA Picture Is Worth A Thousand Questions Docx
A Picture Is Worth A Thousand Questions Docx
Webometrics Class
 

Ähnlich wie DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles" (20)

APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKSAPPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
 
Liao and petzold opensym berlin wikipedia geolinguistic normalization
Liao and petzold opensym berlin wikipedia geolinguistic normalizationLiao and petzold opensym berlin wikipedia geolinguistic normalization
Liao and petzold opensym berlin wikipedia geolinguistic normalization
 
The Mathematics of Social Network Analysis: Metrics for Academic Social Networks
The Mathematics of Social Network Analysis: Metrics for Academic Social NetworksThe Mathematics of Social Network Analysis: Metrics for Academic Social Networks
The Mathematics of Social Network Analysis: Metrics for Academic Social Networks
 
Characterization of National Web Domains
Characterization of National Web DomainsCharacterization of National Web Domains
Characterization of National Web Domains
 
Characterization of National Web Domains
Characterization of National Web DomainsCharacterization of National Web Domains
Characterization of National Web Domains
 
Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database
 
KAPA_2011_Seoul_Conference_Khan & Park
KAPA_2011_Seoul_Conference_Khan & ParkKAPA_2011_Seoul_Conference_Khan & Park
KAPA_2011_Seoul_Conference_Khan & Park
 
Kapa conference scientometrics-e-govt_khan & park
Kapa conference scientometrics-e-govt_khan & parkKapa conference scientometrics-e-govt_khan & park
Kapa conference scientometrics-e-govt_khan & park
 
Baroclinic Channel Model in Fluid Dynamics
Baroclinic Channel Model in Fluid DynamicsBaroclinic Channel Model in Fluid Dynamics
Baroclinic Channel Model in Fluid Dynamics
 
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
 
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpages
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpages
 
A Picture Is Worth A Thousand Questions Docx
A Picture Is Worth A Thousand Questions DocxA Picture Is Worth A Thousand Questions Docx
A Picture Is Worth A Thousand Questions Docx
 
Informatics systems
Informatics systemsInformatics systems
Informatics systems
 
Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large G...
Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large G...Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large G...
Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large G...
 
Q046049397
Q046049397Q046049397
Q046049397
 

Mehr von Dr. Mirko KĂ€mpf

Mehr von Dr. Mirko KĂ€mpf (13)

Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
 
IoT meets AI in the Clouds
IoT meets AI in the CloudsIoT meets AI in the Clouds
IoT meets AI in the Clouds
 
Improving computer vision models at scale (Strata Data NYC)
Improving computer vision models at scale  (Strata Data NYC)Improving computer vision models at scale  (Strata Data NYC)
Improving computer vision models at scale (Strata Data NYC)
 
Improving computer vision models at scale presentation
Improving computer vision models at scale presentationImproving computer vision models at scale presentation
Improving computer vision models at scale presentation
 
Enterprise Metadata Integration
Enterprise Metadata IntegrationEnterprise Metadata Integration
Enterprise Metadata Integration
 
PCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System TuningPCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System Tuning
 
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapEtosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road map
 
From Events to Networks: Time Series Analysis on Scale
From Events to Networks: Time Series Analysis on ScaleFrom Events to Networks: Time Series Analysis on Scale
From Events to Networks: Time Series Analysis on Scale
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific Applications
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
DPG Berlin - SOE 18 - talk v1.2.4
DPG Berlin - SOE 18 - talk v1.2.4DPG Berlin - SOE 18 - talk v1.2.4
DPG Berlin - SOE 18 - talk v1.2.4
 
Information Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation OptimizationInformation Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation Optimization
 
Hadoop & Complex Systems Research
Hadoop & Complex Systems ResearchHadoop & Complex Systems Research
Hadoop & Complex Systems Research
 

KĂŒrzlich hochgeladen

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Stand
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 đŸ„” Book Your One night Stand
amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
SUHANI PANDEY
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

KĂŒrzlich hochgeladen (20)

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Stand
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls In Doddaballapur Road ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 đŸ„” Book Your One night Stand
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"

  • 1. Identifying Semantic Concepts Selection of CN’s Data Collection Preprocessing Extraction of CN, LN, IWL, GN, page-size and access log data Calculation of REPk, REPv, REPt and REL Analysis of data Eric Tessenow,1 Mirko KĂ€mpf,2 and Jan W. Kantelhardt 2 Abstract Since the numbers of hypertext pages and hyperlinks in the WWW have been continuously growing for more than 20 years, the problem of finding relevant content has become increasingly important. We have developed and evaluated techniques for a time-dependent characteri- zation of the global and local relevance of WWW pages based on document length, number of links, and cross-correlations in user-access time series. We focus on content and user activity in selected groups of Wikipedia articles as a first application mainly because of data availa- bility. Our goal is the assignment of ranking values to a hypertext page (node). The values shall cover static properties of the node and its neighbourhood (context) as well as dynamic properties derived from its page-view rates that depend on underlying communication processes. We show in several examples how this goal can be achieved. 1 Institute of Communications Studies, University of Leeds, LS2 9JT, Leeds, United Kingdom 2 Institut fĂŒr Physik, Martin-Luther-UniversitĂ€t Halle-Wittenberg, 06099 Halle (Saale), Germany Motivation Since many aspects have to be taken into account in the analysis of global social networks, it is challenging to compare data collections and obtain results from their analysis. We, therefore, require a robust and at the same time flexible framework, which enables interdisciplinary research as scientist from different fields look at different parts of a data set. Our work suggests a methodology for comparable measurements of a node‘s relevance in local graphs defined by the node’s local neighbourhood, while considering local link structure, text volume, user access activity and editorial activity. This enables a qualitative and also an efficient quantitative analysis of parts of a global social network without having to explore and analyze the whole graph. In order to identify and to compare different communication pro- cesses on multiple channels, one has to quantify the influence of the environment in which an individual process is embedded in, e.g. for different topics and different regions on earth we study usage patterns and embedding of content in one of the largest public and open content networks, Wikipedia. Information Flow in Correlation Networks Outlook Local Representation Indexes: REPk,v and REPa,e(t) Data Sets & Processing SOE 6.1 References [1] KĂ€mpf, M., Tessenow, E., Kantelhardt, J.W., Context Sensitive and Time Resolved Relevance in Complex Networks. Unpublished (in preparation, 2014). [2] KĂ€mpf M., Tismer S., Kantelhardt J.W., Muchnik L., Fluctuations in Wikipedia access-rate and edit-event data. Physica A, 391: 6101-6111 (2012). [3] KĂ€mpf M., Kantelhardt J.W., Muchnik L., From time series to co-evolving functional networks: dynamics of the complex system ‘Wikipedia’, Proc. Europ. Conf. Complex Syst. (2012). [4] Schreck B., KĂ€mpf M., Kantelhardt J.W., Motzkau H., Comparing the usage of global and local Wikipedias with focus on Swedish Wikipedia, arXiv:1308.1776 (2013). [5] KĂ€mpf M., Kantelhardt J.W., Hadoop.TS: large-scale time-series processing, International Journal of Computer Applications (IJCA) 74: 17 (2013), DOI: 10.5120/12974-0233. [6] Segev E., Mapping the international: Global and local salience and news-links between countries in popular news sites worldwide. Int. Journal for Internet Science, 5: 48-71. (2010) Contact We compare different media types – in particular channels which push information to consumer (TV news, radio news, Twitter and Facebook communication) - opposed to pull-media like Wikipedia, forum or blackboard websites, from which customers pull data on demand. We evaluate how properties of different network types, e.g. social-, content-, and communication-networks influence each other and if such couplings depend more on the content or more on the way information is offered and spread. Finally we are interested in the question: To what extend and how can automatied tools influence the communication processes? Relevance Indexes: RELv and RELa(t) We measure characteristic static and dynamic properties of a Wikipedia page based on I.) node degree k, II.) average text-volume v, and III.) their access-rate or edit-rates time series (a(t), e(t)) in order to determine and quantify the level of representation in a semantic or lingual context. I.) Node-degree III.) Time-dependent access-rate a(t) II.) Average text volume We measure the time dependent or tempo- ral relevance of a Wikipedia page during a time period for access rates (a,b) of the central node CN (black), the group IWL (green), the local neighbourhood (LN, blue) And the global neighbourhood (GN, red). a) Relevance Index: shows the level of attraction of a topic, e.g. for a Wikipedia page in one selected language. It compares the user interest in pages in the selected language and average values for pages with the same content for all other languages. Fig. 1: Definition of partial data sets (local networks) Fig. 2: Comparison of local network structures with identical nodes based on (a) direct links and (b) functional link strengths derived from access activity. We calculate the time-dependent link strengths correlation by: Fig. 3: Comparison of static representation indexes for two semantic concepts (data sets 1 and 2). Fig. 4: Comparison of two local page networks with an assumed higher global relevance (left) and with higher local relevance (right). Fig. 5: Distribution of dynamic link strengths for statically linked pages (a,d), for pages within groups LN (blue) and GN (red) (b,e), and for pages in different groups (c,f). Lines show results for real data and are compared with results from randomly shuffled data series (filled areas). Average values and maximum values of the distribution function vary over time. Hence, we cannot define a simple threshold to identify relevant links. However, the distributions differ significantly for real data and surrogate data in (a,d,e). In the presence of extreme events in access time series (bottom row) we find a significant increase in cross-correlation based link strengths for page pairs in the local and global neighbourhoods. RAW data set large scale data management Partial data set preparation Result data set Communication Process Modelling and Analysis of Complex Systems Definition of data sets (Fig. 1) a) Central node (CN), all directly linked nodes in the same language (local neighbourhood, LN), all nodes regarding the same topic in other languages (linked by inter-wiki links, IWL), and the all nodes linked to nodes in the IWL group (global neighbourhood, GN). b) The CN group and the IWL group are the core of the local network for one topic. Both neighbourhoods, local (LN) and global (GN) form the hull of the local network. Data stets for preliminary results and method tests We address three data sets with differently chosen CNs (Wikipedia pages): (1) Four German cities (Berlin, Heidelberg, Bad Harzburg, Sulingen) and two British cities (Oxford, Birmingham); (2) ’United States of America’, ’Germany’, the ’President of the United States Barack Obama’, and the ’Federal Chancellor Angela Merkel’ in German and English language; (3) Selected CNs with predominantly local and global relevance: Erfurt rampage and Illuminati book – both already used in a previous study of the fluctuations in Wikipedia access-rate time series [2] – and four times three pairs of CNs within the categories: minorities, cities, politicians, and meals. Comparison of static link network and dynamic correlation networks (Fig. 2) a) Direct Wikipedia links between all nodes in the groups CN, LN, IWL, and GN. b) Functional link strengths calculated from user access-rate time series. Illuminati (book) Erfurt rampage eric.tessenow@gmail.com, mirko.kaempf@gmail.com, jan.kantelhardt@physik.uni-halle.de This work was supported by: Acknowledgement Global and local relevance seem to be a characteristic property of a page. In (c) the local relevance decreases (blue dashed line) and in (d) it is similar for all languages. To compare L.REL and G.REL we show the cross-correlation for sliding windows of different sizes in (e) and (f). CN: Erfurt rampage (languge: de) Jan-Feb 2009 Mar-Apr 2009 Fig. 6: Time resolved average link strength for local functional networks around two selected CNs (see Fig. 4). Fig. 6a) shows a significant change in the average cross- correlation for pages in group GN (area A). At the same time the correlation in group LN drops significantly. In Fig. 6b) one can see that a decreasing local correlation is not necessarily related to a change in global correlations. This way one might be able to distinguish between local and global relevance as well. We visualize the relevance of semantic concepts for specific regions, while we take the natural density of speakers and topic-specific relevance of languages within a specific region into account. Such a language-dependent visualization will help to distinctively identify global and local trends for a semantic concept in a specific continent, country, or region based on public data sources and social communication and content networks like Wikipedia, but also Facebook, Google+, Twitter or even internal system, used in global Enterprises can be analyzed this way – even in a multilingual environment. Fig. 7: Collaboration networks for pages regarding the same topic in different languages (central large violet nodes) show inhomogeneous structure with clusters of multiple sizes. Connections between editor-clusters are “automatic editing tools (robots)”. Do such robots influence the spread of information? Context Sensitive and Time Resolved Relevance of Wikipedia Articles