SlideShare a Scribd company logo
1 of 23
STiki: An Anti-Vandalism Tool for Wikipedia using Spatio-Temporal Analysis of Revision Metadata A.G. West, S. Kannan, and I. Lee Wikimania `10 – July 10, 2010
STiki = Huggle++ 2 STiki = Huggle, but: CENTRALIZED: STiki is always scoring edits, in bot-like fashion. QUEUING: STiki uses 15+ ML-features to set presentation order (not a static rule set) ,[object Object],.
Outline/Summary 3 Vandalism detection methodology [6]  Wikipedia revision metadata (not the article or diff text) can be used to detect vandalism ML over simple features and aggregate reputation values for articles, editors, spatial groups thereof The STiki software tool Straightforward application of above technique Demonstration of the tool and functionality Alternative uses for the open-source code
Metadata 4 Wikipedia provides metadata via dumps/API:
5 Labeling Vandalism ROLLBACK is used to label edits as vandalism: Only true-rollback, no software-based ones Edit summaries used to locate (Native, Huggle, Twinkle, ClueBot) Bad ones= {OFF. EDITS}, others = {UNLABELED} Why rollback? Automated (v. manual) High-confidence Per case (vs. definition) Why do edits need labels?: ,[object Object]
 (2) Building block of reputation buildingPrevalence/Source of Rollbacks
Simple Features ,[object Object]
Spatial props: Appropriate wherever a size, distance, or membership function can be definedSIMPLE FEATURES * Discussion abbreviated to concentrate on aggregate ones 6
Edit Time, Day-of-Week 7 Use IP-geo-location  data to determine origin time-zone, adjust UTC timestamp Vandalism most prevalent during working hours/week: Kids are in school(?) Fun fact: Vandalism almost twice as prevalent on a Tuesday versus a Sunday Unlabeled Local time-of-day when edits made UnLbl Local day-of-week when edits made
Time-Since (TS)… 8 High-edit pages most often vandalized ≈2% of pages have 5+ OEs, yet these pages have 52% of all edits Other work [3] has shown these are also articles most visited ,[object Object]
“Registration”: time-stamp of first edit made by user
Sybil-attack to abuse benefits?,[object Object]
Huge contributors, but rarely vandalize,[object Object]
PreSTA Algorithm PreSTA [5]: Model for ST-rep: CORE IDEA: No entity specific data? Examine spatially-adjacent entities (homophily)  Rep(group) =   Σ time_decay (TSvandalism) size(group) Timestamps (TS) of vandalism incidents by group members A Alice Polish Europeans Grouping functions (spatial) define memberships Observations of misbehavior form feedback – and observ-ations are decayed (temporal) rep(A) rep(POL) rep(EUR) Higher-Order Reputation 11
Article Reputation 12 Intuitively some topics are contro-versial and likely targets for vandalism(or temporally so). 85% of OEs have non-zero rep (just 45% of random) UnLbl CDF of Article Reputation Articles w/most OEs
Category Reputation 13 ,[object Object]
Wiki provides cats./memberships – use only topical.
97% of OEs have non-zero reputation (85% in article case)Categories with most OEs Category: President Reputation: Barack Obama Presidents Article: Abraham Lincoln G.W. Bush …… Lawyers MAXIMUM(?) Category: Lawyer Feat. Value …… …… Example of Category Rep. Calculation
Editor Reputation 14 Straightforward use of the rep() function, one- editor groups UnLbl UnLbl CDF of Editor Reputation ,[object Object]
Mediocre performance. Meaningful correlation with other features, however.,[object Object]
16 Off-line Performance ,[object Object]
Use as an intelligent routing (IR) toolRecall: % total OEs 	classified correctly Precision: % of edits classified OE that are vandalism 50% @ 50%
STiki 17 STiki [4]: A real-time, on-Wikipediaimplementation of the technique

More Related Content

Viewers also liked

Viewers also liked (13)

Shimon hameiri the art of holography and photography
Shimon hameiri   the art of holography and photographyShimon hameiri   the art of holography and photography
Shimon hameiri the art of holography and photography
 
Search Engine Optimization
Search Engine OptimizationSearch Engine Optimization
Search Engine Optimization
 
Security Testing For Web Applications
Security Testing For Web ApplicationsSecurity Testing For Web Applications
Security Testing For Web Applications
 
Load Runner
Load RunnerLoad Runner
Load Runner
 
vandalism- research work
vandalism- research workvandalism- research work
vandalism- research work
 
Rest Console
Rest ConsoleRest Console
Rest Console
 
Selenium
SeleniumSelenium
Selenium
 
Havij
HavijHavij
Havij
 
Automation Testing
Automation TestingAutomation Testing
Automation Testing
 
What Are The Advantages and Disadvantages Of Studying And Working Together?
What Are The Advantages and Disadvantages Of Studying And Working Together?What Are The Advantages and Disadvantages Of Studying And Working Together?
What Are The Advantages and Disadvantages Of Studying And Working Together?
 
Web Services Testing
Web Services TestingWeb Services Testing
Web Services Testing
 
Karahasan sa paaralan
Karahasan sa paaralanKarahasan sa paaralan
Karahasan sa paaralan
 
Children’s rights power
Children’s rights powerChildren’s rights power
Children’s rights power
 

Similar to STiki: An Anti-vandalism Tool for Wikipedia using the Spatio-temporal Properties of Revision Metadata

Stuxnet redux. malware attribution & lessons learned
Stuxnet redux. malware attribution & lessons learnedStuxnet redux. malware attribution & lessons learned
Stuxnet redux. malware attribution & lessons learnedYury Chemerkin
 
Software rotting - DevOpsCon Berlin
Software rotting - DevOpsCon BerlinSoftware rotting - DevOpsCon Berlin
Software rotting - DevOpsCon BerlinGiulio Vian
 
18 applied architectures_part_2
18 applied architectures_part_218 applied architectures_part_2
18 applied architectures_part_2Majong DevJfu
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)HPCC Systems
 
Nurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureNurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureTom Mens
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Tao Xie
 
L'impatto della sicurezza su DevOps
L'impatto della sicurezza su DevOpsL'impatto della sicurezza su DevOps
L'impatto della sicurezza su DevOpsGiulio Vian
 
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...Roberto Casadei
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaborationJulien Pivotto
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigmJonathan Challener
 
Thug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientThug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientAngelo Dell'Aera
 
OORPT Dynamic Analysis
OORPT Dynamic AnalysisOORPT Dynamic Analysis
OORPT Dynamic Analysislienhard
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software AnalyticsMargaret-Anne Storey
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)Tao Xie
 
Cytoscape CI Chapter 2
Cytoscape CI Chapter 2Cytoscape CI Chapter 2
Cytoscape CI Chapter 2bdemchak
 
Machine programming
Machine programmingMachine programming
Machine programmingDESMOND YUEN
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석datasciencekorea
 

Similar to STiki: An Anti-vandalism Tool for Wikipedia using the Spatio-temporal Properties of Revision Metadata (20)

Stuxnet redux. malware attribution & lessons learned
Stuxnet redux. malware attribution & lessons learnedStuxnet redux. malware attribution & lessons learned
Stuxnet redux. malware attribution & lessons learned
 
Software rotting - DevOpsCon Berlin
Software rotting - DevOpsCon BerlinSoftware rotting - DevOpsCon Berlin
Software rotting - DevOpsCon Berlin
 
18 applied architectures_part_2
18 applied architectures_part_218 applied architectures_part_2
18 applied architectures_part_2
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
 
Nurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureNurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the Future
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...
 
L'impatto della sicurezza su DevOps
L'impatto della sicurezza su DevOpsL'impatto della sicurezza su DevOps
L'impatto della sicurezza su DevOps
 
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
 
Of Changes and Their History
Of Changes and Their HistoryOf Changes and Their History
Of Changes and Their History
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaboration
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigm
 
Thug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientThug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclient
 
OORPT Dynamic Analysis
OORPT Dynamic AnalysisOORPT Dynamic Analysis
OORPT Dynamic Analysis
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software Analytics
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
 
Cytoscape CI Chapter 2
Cytoscape CI Chapter 2Cytoscape CI Chapter 2
Cytoscape CI Chapter 2
 
Machine programming
Machine programmingMachine programming
Machine programming
 
Rakuten openstack
Rakuten openstackRakuten openstack
Rakuten openstack
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

STiki: An Anti-vandalism Tool for Wikipedia using the Spatio-temporal Properties of Revision Metadata

  • 1. STiki: An Anti-Vandalism Tool for Wikipedia using Spatio-Temporal Analysis of Revision Metadata A.G. West, S. Kannan, and I. Lee Wikimania `10 – July 10, 2010
  • 2.
  • 3. Outline/Summary 3 Vandalism detection methodology [6] Wikipedia revision metadata (not the article or diff text) can be used to detect vandalism ML over simple features and aggregate reputation values for articles, editors, spatial groups thereof The STiki software tool Straightforward application of above technique Demonstration of the tool and functionality Alternative uses for the open-source code
  • 4. Metadata 4 Wikipedia provides metadata via dumps/API:
  • 5.
  • 6. (2) Building block of reputation buildingPrevalence/Source of Rollbacks
  • 7.
  • 8. Spatial props: Appropriate wherever a size, distance, or membership function can be definedSIMPLE FEATURES * Discussion abbreviated to concentrate on aggregate ones 6
  • 9. Edit Time, Day-of-Week 7 Use IP-geo-location data to determine origin time-zone, adjust UTC timestamp Vandalism most prevalent during working hours/week: Kids are in school(?) Fun fact: Vandalism almost twice as prevalent on a Tuesday versus a Sunday Unlabeled Local time-of-day when edits made UnLbl Local day-of-week when edits made
  • 10.
  • 11. “Registration”: time-stamp of first edit made by user
  • 12.
  • 13.
  • 14. PreSTA Algorithm PreSTA [5]: Model for ST-rep: CORE IDEA: No entity specific data? Examine spatially-adjacent entities (homophily) Rep(group) = Σ time_decay (TSvandalism) size(group) Timestamps (TS) of vandalism incidents by group members A Alice Polish Europeans Grouping functions (spatial) define memberships Observations of misbehavior form feedback – and observ-ations are decayed (temporal) rep(A) rep(POL) rep(EUR) Higher-Order Reputation 11
  • 15. Article Reputation 12 Intuitively some topics are contro-versial and likely targets for vandalism(or temporally so). 85% of OEs have non-zero rep (just 45% of random) UnLbl CDF of Article Reputation Articles w/most OEs
  • 16.
  • 17. Wiki provides cats./memberships – use only topical.
  • 18. 97% of OEs have non-zero reputation (85% in article case)Categories with most OEs Category: President Reputation: Barack Obama Presidents Article: Abraham Lincoln G.W. Bush …… Lawyers MAXIMUM(?) Category: Lawyer Feat. Value …… …… Example of Category Rep. Calculation
  • 19.
  • 20.
  • 21.
  • 22. Use as an intelligent routing (IR) toolRecall: % total OEs classified correctly Precision: % of edits classified OE that are vandalism 50% @ 50%
  • 23. STiki 17 STiki [4]: A real-time, on-Wikipediaimplementation of the technique
  • 24.
  • 25. Popped: GUI client shows likely vandalism first
  • 26.
  • 27. STiki Performance 20 Competition inhibits maximal performance Metric: Hit-rate (% of edits displayed that are vandalism) Offline analysis shows it could be 50%+ Competing (often autonomous) tools make it ≈10% STiki successes and use-cases Has reverted over 5000+ instances of vandalism May be more appropriate in less patrolled installations Any of Wikipedia’s foreign language editions Embedded vandalism: That escaping initial detection. Median age of STiki revert is 4.25 hours, 200× RBs. Further, average STiki revert had 210 views during active duration.
  • 28. Alternative Uses 21 All code is available [4] and open source (Java) Backend (server-side) re-use Large portion of MediaWiki API implemented (bots) Trivial to add new features (including NLP ones) Frontend (client-side) re-use Useful whenever edits require human inspection Offline inspection tool for corpus building Data re-use Incorporate vandalism score into more robust tools Willing to provide data to other researchers
  • 29. Crowd-sourcing 22 Shared queue := Pending changes trial Abuse of “pass” by an edit hoarding user Do ‘reviewers’ need to be reviewed? Where does it stop? Multi-layer verification checks to find anomalies Could reviewer reputations also be created? Threshold for queue access? Registered? Auto-confirmed? Or more? Cache-22: Use vs. perceived success More users = more vandalism found. But deep in queue, vandalism unlikely = User abandonment.
  • 30. References 23 [1] S. Hao, N.A. Syed, N. Feamster, A.G. Gray, and S. Krasser. Detecting spammers with SNARE: Spatiotemporal network-level automated reputation engine. In 18th USENIX Security Symposium, 2009 [2] M. Potthast, B. Stein, and R. Gerling. Automatic vandalism detection in Wikipedia. In Advances in Information Retrieval, 2008. [3] R. Priedhorsky, J. Chen, S.K. Lam, K. Achier, L. Terveen, and J. Riedl. Creating, destroying, and restoring value in Wikipedia. In GROUP `07, 2007. [4] A.G. West. STiki: A vandalism detection tool for Wikipedia. http://en.wikipedia.org/wiki/Wikipedia:STiki. Software, 2010. [5] A.G. West, A.J. Aviv, J. Chang, and I. Lee. Mitigating spam using spatio-temporal reputation. Technical report UPENN-MS-CIS-10-04, Feb. 2010. [6] A.G. West, S. Kannan, and I. Lee. Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata. In EUROSEC `10, April 2010.