Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems
1. Steffen Staab
staab@uni-koblenz.de
1WeST
Web Science & Technologies
University of Koblenz ▪ Landau, Germany
Modelling the Web
Examples of Modelling Text, Knowledge Networks
and Physical-Social Systems
Steffen Staab
2. Steffen Staab
staab@uni-koblenz.de
2WeST
What do people want from the Web?
Web as storage
library
memory
Web as tool
search
transaction
Web as social medium
communication cooperation
Web as
mirror of self
Identification
outreach
4. Steffen Staab
staab@uni-koblenz.de
4WeST
My Agenda in the Large
Web Content
Discovering patterns
Building tools
Understanding
Web Interaction
Monitoring
Exploiting
Guiding
Understanding
Web Evolution
Monitoring Predicting
Guiding Understanding
10. Steffen Staab
staab@uni-koblenz.de
10WeST
Modified Kneser-Ney Smoothing of n-grams
If sequence is hard to observe
then approximate recursively observing marginal frequencies
of
First recursion step:
Problem:
If last word in the sequnce is rare, the overall sequence will be rare,
then the approximation will be of low quality.
11. Steffen Staab
staab@uni-koblenz.de
11WeST
Generalized Language Models [ACL14]
If sequence is too hard to observe,
then approximate based on marginal probabilities of
...
recursively.
Core idea of formal solution:
Recursively applicable, commutative skip operators
12. Steffen Staab
staab@uni-koblenz.de
12WeST
Improvement of GLMs [ACL14]
Evaluation measure: Perplexity
Data set: English Wikipedia, different sample sizes
Relative improvement: 2,6% (most training data, smallest model) to
13,9% (least training data, largest model)
Perplexity (normalized)
13. Steffen Staab
staab@uni-koblenz.de
13WeST
Outlook for Generalized Language Models
Correcting mistakes that are done in all tools
Lack of appropriate models
Other operators („the wild black cat“)
Delete: „the black cat“
Part-of-speech: „the adj adj cat“
Application: e.g. next word prediction
Other data structures
Tree-like data
Graph data
proposal
for Google
current
focus
Semantic
Web
16. Steffen Staab
staab@uni-koblenz.de
16WeST
Related Work in Brief
Prediction feature f assigns a score to node pair (i, j)
implies to be ranked above
• Link Prediction: edge likelier to be added
• Unlink Prediction: edge likelier to be removed
f (i , j) > f (i ,k) (i , j) (i , k)
17. Steffen Staab
staab@uni-koblenz.de
17WeST
Related Work in Brief
Static features
degree
common-neighbours
path3
local-clustering-
coefficient/embeddedness
...
Prediction feature f assigns a score to node pair (i, j)
implies to be ranked above
• Link Prediction: edge likelier to be added
• Unlink Prediction: edge likelier to be removed
f (i , j) > f (i ,k) (i , j) (i , k)
19. Steffen Staab
staab@uni-koblenz.de
19WeST
Related Work in Brief
Additions RemovalsTraining
Link
Prediction
Problem
Unlink
Prediction
Problem
Markov
assumption:
history irrelevant
Advantage: General Model
Disadvantage: General Model
Idea
Keep generality,
improve prediction
20. Steffen Staab
staab@uni-koblenz.de
20WeST
Our Approach - 1
Additions RemovalsTraining
Link
Prediction
Problem
Unlink
Prediction
Problem
Markov
assumption:
history irrelevant
Hypothesis: Temporal information
generally improves prediction
Idea
1 Nodes concerned
2 Neighbourhood
22. Steffen Staab
staab@uni-koblenz.de
22WeST
Evaluation & Discussion (excerpt)
Temporal link prediction significantly better, but only sightly
Temporal unlink prediction always significantly improved
Temporal preferential attachment best
AUC baseline
qualitative
quantitative
extrapolation
23. Steffen Staab
staab@uni-koblenz.de
23WeST
Outlook for Evolution of Networks
Temporal dynamics still underexplored
lack of datasets!
next experiments:
• Twitter followers
• Xing.de
Unlinks lead to link recommendation
new Wikipedia link (reorganization of Wikipedia pages!)
new job
new friend
38. Steffen Staab
staab@uni-koblenz.de
39WeST
References
[ACL14] R. Pickhardt, T. Gottron, M. Körner, P. G. Wagner, T. Speicher, S.
Staab. A Generalized Language Model as the Combination of Skipped n-
grams and Modified Kneser Ney Smoothing. In: Proc. of ACL-2014 -
The 52nd Annual Meeting of the Association for Computational
Linguistics. Baltimore, June 22-27, 2014.
[WSDM14] C. Kling, J. Kunegis, S. Sizov, S. Staab. Detecting Non-Gaussian
Geographical Topics in Tagged Photo Collections. In: Proc. of the 7th
ACM Conference on Web Search and Data Mining (WSDM2014), New
York, US, February 24-28, 2014.
[ICWSM13] J.Preusse, J.Kunegis, M.Thimm, T.Gottron, S. Staab. Structural
Changes in Collaborative Knowledge Networks. In: Proceedings of the
Seventh International AAAI Conference on Weblogs and Social
Media (ICWSM 2013), Boston, July 8-10, 2013.