Robust Expert Finding in Web-Based Community Information Systems
Ralf Klamma
Advanced Community Information Systems (ACIS)RWTH Aachen University, Germany
HTML Injection Attacks: Impact and Mitigation Strategies
Robust Expert Finding in Web-Based Community Information Systems
1. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
1
Learning
Layers
This slide deck is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Robust Expert Finding in Web-Based
Community Information Systems
Ralf Klamma
Advanced Community Information Systems (ACIS)
RWTH Aachen University, Germany
klamma@dbis.rwth-aachen.de
The Future of Scientifically Founded
Databases on Experts
June 30th – July 02nd 2013 in Graz (Austria)
2. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
2
Learning
Layers
Responsive
Open
Community
Information
Systems
Community
Visualization
and
Simulation
Community
Analytics
Community
Support
WebAnalytics
WebEngineering
Advanced Community Information
Systems (ACIS) Group @ RWTH Aachen
Requirements
Engineering
3. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
3
Learning
Layers
Agenda
WebCommunityInformation
Systems
ExpertsinCommunityInformation
Systems
RobustExpertIdentification
TrustinExperts
Conclusions&Outlook
4. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
4
Learning
Layers
A Brief History of
Community Information Systems
Digital Media
Technology
Communities of
Practice
(Web 2.0)
Business Processes
Meta
Data
Media
Traces
Semantic
Web
(XML, RDF,
Ontologien)
Multimedia
(XML, VRML,
DC, MPEG)
Organisational
Memories
(XML, HTML,
XTM)
Groupware /
E-Learning
(XML, LOM,
XML-RPC)
Workflows
(XML,
BPEL)
Web Services
(XML, WSDL,
SOAP,UDDI)
Klamma: Social Software and Community Information Systems, 2010
Social
Software
(XML, HTTP,
RSS)
5. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
5
Learning
Layers
The Long Tail & Fragments
The Web is a scale-free, fragmented network
– The power law (Pareto-Distribution etc.)
– 95 % of users are located in the Long Tail (Communities)
– Trust and passion based cooperation
IslandTendrils
IN Continent Central Core OUT Continent
Tunnels
[Barabasi, 2002]
[Anderson, 2006]
6. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
6
Learning
Layers
Communities of Practice
Communities of practice (CoP) are groups of people
who share a concern or a passion for something they
do and who interact regularly to learn how to do it
better (Wenger, 1998)
Characterization of experts in CoP
– Shared competence in the domain
– Shared practice over time by interactions
– Expertise based on gaining and having reputation within the CoP
– Being an expert vs. being a layman, a newcomer, an amateur etc.
– Informal leadership
– Identity as an expert depends on the lifecycle of the communities
Expertise in highly dynamic, locally distributed multi-disciplinary
and heterogeneous communities?
8. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
8
Learning
Layers
International Communities
Communication /
Cooperation ?
Cultural heritage
in Afghanistan
Database
Content input / request
Content retrieval
Surveying/
safeguarding
Sketch
drawing
Photographing
Surveying/
recording
GPS
positioning
Experiences
imparting
Administration
UNESCO
Teaching/
presentation
Asia
ICOMOS
Standards
defining
Research
RWTH
Aachen
SPACH
www.bamiyan-development.org
9. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
9
Learning
Layers
Experts in
International Communities
In international communities
many experts from
different fields meet
– Intergenerational learning
– Interdisciplinary learning
New Openness for Amateur
Contributions
Methods, Tools & CoP
co-develop
– Expert role models needed
– Expert identification based
on complex media traces
10. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
10
Learning
Layers
YouTell - A Web 2.0 Service for
Collaborative Storytelling
Collaborative storytelling
Web 2.0 Service
Story search and
“pro-sumption”
Tagging
Ranking/Feedback
Expert finding
Recommending
Klamma, Cao, Jarke: Storytelling on the Web 2.0 as a New Means of Creating Arts
Handbook of Multimedia for Digital Entertainment and Arts, Springer, 2009
11. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
11
Learning
Layers
Expert Finding – Computation of
Actual Knowledge
Data vector consists of
– Personal data vector
– Competences, skills,
qualification profile
– Self-entered data
– Story data vector
– Visits of stories
– Involvement in projects
– Expert data vector
– Advice given
– Advice received
– Value = #Keywords Date
Decay Feedback
Motivation
PESE:
Web 2.0 –Anwen-
dung für community-
basiertes Storytelling
Der PESE-
Prototyp
Evaluierung des
Prototypen
Zusammen-
fassung
Ausblick
Find the most appropriate expert
Data vector represents knowledge of the expert
12. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
12
Learning
Layers
Knowledge-Dependent
Learning Behaviour in Communities
Renzel, Cao, Lottko, Klamma: Collaborative Video Annotation for Multimedia Sharing between Experts and Amateurs,
WISMA 2010, Barcelona, Spain, May 19-20, 2010
Expert finding algorithm: Knowledge value of community sorted by keywords
Community behavior: Experts spent more time on the services
Experts prefers semantic tags while amateurs uses “simple” tags frequently
Community tags: Experts use more precise tags
13. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
13
Learning
Layers
Threads to Expert Finding
Compromising techniques
— Sybil attack [Douc 2002], Reputation theft, Whitewashing attack, etc..
— Compromising the input and the output of the expert identification algorithm
Example: Sybil attacks
— Fundamental problem in open collaborative Web systems
— A malicious user creates many fake accounts (Sybils) which all reference the user
to boost his reputation (attacker’s goal is to be higher up in the rankings)
Sybil regionHonest region
Attack edges
15. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
15
Learning
Layers
HITS algorithm [Kleinberg 1999]
— Authorities and hubs
— HITS -mutual reinforcement between
web pages “a better hub points to
many good authorities, and a better
authority is pointed to by many good
hubs”
HITS: Expert Ranking Algorithm
Hub Authority
— Users (hubs)
— Media (authorities)
— Mutual reinforcement between users and
media files and trust network is considered
— Expert users tend to have many correctly
evaluated media, correctly rated media are
rated by trusted users of high expertise
Hub Authority
User Set
Rate MediaUpload Media Web of Trust
Media Set
16. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
16
Learning
Layers
MHITS computation
— Trust in range [0, 1]
— Ratings: 0.5 for a fake vote, 1 for an authentic vote
Symbol Description
Authority score
Set of users pointing to media file m
Hubness score
Rating of user u for media file m
Average trust of the direct connected users to
user u
Set of media files to which user u points
Coefficient that weights the influence of the two
terms, in range [0, 1]
MHITS: Expert Ranking Algorithm
Rashed, Balasoiu, Klamma: Robust Expert Ranking in Online Communities - Fighting Sybil Attacks. 8th IEEE International
Conference on Collaborative Computing: Networking, Applications and Worksharing. Pittsburgh, United States, 2012.
17. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
17
Learning
Layers
Countermeasures
Against Sybil Attack
SybilGuard [Yu et al. 2006] SybilLimit [Yu et al. 2008] SumUp [Tran et al. 2009]
Protocol type Decentralized Decentralized Centralized
Accepted Sybils per
attack edge
SumUp Method [Tran et al. 2009]
— Adaptive vote flow aggregation technique
— Assigns and adjusts link capacities in the trust graph to collect the votes
— Include at most a few votes from Sybil
— Include most votes from honest users
18. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
18
Learning
Layers
Prune the trust network
Capacity assignment for
resulting trust network
Collect votes
Compute aggregate
votes
Detect bogus votes
Assign negative history
Delete bogus links and
add back pruned links
Call MHITS ranking
algorithm
Input : Rating
network
Integration of SumUp with MHITS
Call MHITS ranking
algorithm
Robust ranking results
Cmax =6
Assign levels for trust
network users
Input : Trust network
3
1
3
1
0
1
1
0
0
0
0
L0 L1 L2 L3
0
0
0
These steps are
applied for each
media rated
20. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
20
Learning
Layers
Use Case:
Media Distribution Networks
THOMSON REUTERS
ZEITUNG FÜR DEUTSCHLAND
21. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
21
Learning
Layers
Trust Management: Network
Construction Approach
Consumers
Mediator
Sources
Basic building block
Renzel, Rashed, Klamma: Collaborative Fake Media Detection in a Trust-Aware Real-Time Distribution Network.
2nd Workshop on Semantic Multimedia Database Technologies 2010, pp. 17–28.
22. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
22
Learning
Layers
Aggregate for m to decide action regarding i
Trust update algorithm
Trust Management: Authenticity
Ratings and Trust
Symbol Description
Mediator
Information (media) item
Rating of source s towards media item i
Trust level of mediator m towards source s
Mediator publishes media item as x(fake, true)
23. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
23
Learning
Layers
Trust inference TidalTrust [Golbeck 2005]
— Social network topology is used
— Use shortest paths
— Use weighted average of trust
— Accept ratings from only the highest
rated neighbors
Modified TidalTrust algorithm
— Directed graph for XMPP network
— Stores assigned trust ratings
Extension to dynamic trust management [Gans 2008]
— Inclusion of temporal dimension and confidence
— Inclusion of distrust: not inverse of trust
Trust Management: Trust Inference
and Dynamicity of Trust
SourceMediator
Trustworthy?
Rashed, Renzel, Klamma, Jarke: Community and trust-aware fake media detection.
Multimedia Tools and Applications, pp. 1–30, 2012
24. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
24
Learning
Layers
Evaluation of Trust Management
Experiment design
— 16 participants rate authenticity of 20 images (10 fake, 10 authentic)
— Evil group with 4 participants (should vote contrary)
— Good group with 12 participants
— Group membership kept secret
Progression of the average trust rating of the good group
Average trust rating of the good group
Progression of the average trust rating of the evil group
a
Average trust rating of the good groupTime
Time
25. Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
25
Learning
Layers
Conclusions & Outlook
Experts in Community Information Systems
– Communities of Practice are natural resources for the development
of expertise
– International communities consists of heterogeneous experts with
different roles
– Community & expert identification are key processes on the Web
– Amateurs and deceivers are future challenges
Openess of future expert databases
– Robustness of established algorithms for expert identification
– Computation and spread of trust essential for decision making
– Near real-time support in modern mobile Web applications