Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Wwsss intro2016-final

643 Aufrufe

Veröffentlicht am

Introduction to Web Science, Lecture at the WSTNet Web Science Summer School, WWSSS16

Veröffentlicht in: Internet
  • Als Erste(r) kommentieren

Wwsss intro2016-final

  1. 1. Steffen Staab Web Science 1Institute for Web Science and Technologies · University of Koblenz-Landau, Germany Web and Internet Science Group · ECS · University of Southampton, UK & Introduction to Web Science Steffen Staab University of Southampton & Universität Koblenz-Landau
  2. 2. Steffen Staab Web Science 2 Welcome in Koblenz!
  3. 3. Steffen Staab Web Science 3 9.00 am 10.30 am 3.00 pm Thu Introduction to Web Science Noshir Contractor Tutorial Poster session Fri Internet and Law Nikolaus Forgo Tutorial Project work Sat Web and Politics Stéphane Bazan Tutorial Project work Mon Social Machines Jim Hendler Tutorial Project work Tue Web Entrepreneurship Simon Köhl Tutorial Project work Wed Computational Social Science Nuria Oliver Tutorial Project presentation Program Overview Details at http://wwsss16.webscience.org/schedule
  4. 4. Steffen Staab Web Science 4 • Work independently and self-organised • Get feedback from your advisor • Choose your working space: – D 238, D 239, A 308, B 005 on weekdays – E 412, E 413, E 414, A 308, B 005 on Saturday – or anywhere you like • Final presentation on Wednesday afternoon (10 minutes per team) Project work Details at http://wwsss16.webscience.org/schedule
  5. 5. Steffen Staab Web Science 5 Social Events Thu Cocktails at city beach 6.00 pm Meet at main campus entry Fri SommerUni party 8.00 pm Campus Sat Free time Old town, KaleidosKOp festival, fortress, beer gardens, ... Sun Excursion (few seats available!) 10.45 am Meet at main campus entry Quarter final France vs. Island 9.00 pm Campus Mon Free time Old town, fortress, beer gardens, ... Tue Barbecue 7 pm Campus SommerUni party 9 pm Campus Wed SommerUni barbecue 6 pm Campus
  6. 6. Steffen Staab Web Science 6 Thursday Noshir Contractor: Leveraging Web/Internet/Network Sciences (WINS) to address Grand Societal Challenges 9.00 am 10.30 am Steffen Staab and Jérôme Kunegis: Introduction to Web Science 3.00 pm Poster session 6.00 pm Cocktails at city beach meeting point: main campus entrance Breaks: • 10 am, • 12 am, • 2.30 pm
  7. 7. Steffen Staab Web Science 7 Friday Nikolaus Forgó: Privacy Law and the Web: A Story of Love and Hate? 9.00 am 10.30 am Rüdiger Grimm: Internet and Law 3.00 pm Supervised project work 8.00 pm SommerUni party At the campus – impossible to miss! Breaks: • 10 am, • 12 am, • 2.30 pm
  8. 8. Steffen Staab Web Science 8 Thanks to our sponsors!
  9. 9. Steffen Staab Web Science 9 Let‘s get started!
  10. 10. Steffen Staab Web Science 10 Produce Consume Cognition Emotion Behavior Socialisation Knowledge Observable Micro- interactions in the Web Apps Protocols Data & Information Governance WWW Observable Macro- effects in the Web Web Science
  11. 11. Steffen Staab Web Science 11 Tertium Datur or Where Dijkstra was wrong Computer Science Science about Computers Astronomy Telescope Science Web Science Science about the Web What is Web Science?
  12. 12. Steffen Staab Web Science 12 Web science is an emerging interdisciplinary field concerned with the study of large-scale socio-technical systems, such as the World Wide Web. It considers the relationship between people and technology, the ways that society and technology co-constitute one another and the impact of this co-constitution on broader society. Wikipedia, 2016-06-29 Definition of Web Science
  13. 13. Steffen Staab Web Science 13 Agenda • What is Web Science? • What is the Web? – Aspects of the Web at Large • How to investigate the Web? – Observing the Web – An example using the architecture: bias in the Web – How to model aspects of the Web • What is the past and the future of the Web?
  14. 14. Steffen Staab Web Science 14 I try to • classify, describe commonalities, find generalizations I will not • be 100% correct, 100% complete BUT • Please ask, suggest, complement,... WHENEVER you feel like it! DON‘T HESITATE ASKING! A Note
  15. 15. Steffen Staab Web Science 15 What is the Web?
  16. 16. Steffen Staab Web Science 16 What is the Web: Aspects of the Web at Large Architecture methodology: • Draw pictures in different dimensions • Pictures do not compete with, but complement each other Not only technical pictures!
  17. 17. Steffen Staab Web Science 17 The Web as a Device Software • Browsers – IE, Firefox, Chrome,... • Web Servers – Apache, Tomcat,.. • Content Management and Data Delivery – Wordpress, drupal, databases... • Search Engines • ... Standards • Uniform Resource Locator (URL) • HyperText Transfer Protocol (http) • HyperText Markup Language (html) • Domain name service • ...+ many more
  18. 18. Steffen Staab Web Science 18 The Web as Content For human consumpton (primarily) Text, Hypertext Images Video Audio Multimedia Interaction (Games...) Braille Mathematics For machine consumption (primarily) Metadata Data Ontologies
  19. 19. Steffen Staab Web Science 19 The Web as Content Free • Informational • Advertisements • Goods & services • Web 2.0 – Wikipedia – facebook Paid • Individual payments • Subscription • Micropayment Freemium
  20. 20. Steffen Staab Web Science 20 The Web and its Stakeholders People • Citizens • Customers • Leisure seekers • Workers • Software developers Internet providers • Landline • Mobile • Nested providers (internet cafe...) • Website hosts • Peer2peer networks (Bittorrent...) Platform operators • Shops • News • Web 2.0 • Payment • Advertisement networks • Trust centers Government • Police • Military • Secret service • Law • Citizen services • Administration • Politics
  21. 21. Steffen Staab Web Science 21 The Web as a Process Governing • Standards processes – W3C – Internet Engineering Task Force (IETF) – RFC • Domain name registration • Internet routing – E.g. „great Chinese firewall“ Regulation • Legal – copyright • where enforced – hate speech – ... • Private – E.g. Facebook, Instagram ... Pictures, nipple double standard
  22. 22. Steffen Staab Web Science 22 Observing the Web as a Medium and Mirror of (anti-)social Practices • (Self-)Expression • Dark Web – Crime – Gold farming – Violence – Pornography – Identity theft • Sex lifes – Fetishes – prostitution • Relationships – Breakups – Mobbing – Stalking – Advising – Counseling – Democracy
  23. 23. Steffen Staab Web Science 23 The Web as Medium & Mirror of NEW (?) (anti-)Social Practices Automation • Twitter bots • Online dating bots • Smart contracts („Code is law“) – E.g. DRM • Quantified everything – Physique – Psyche • Infer your Big 5 personality traits from your facebook profile • Threats to Internet Security & Safety – Hacking – Social engineering
  24. 24. Steffen Staab Web Science 24 Produce Consume Cognition Emotion Behavior Socialisation Knowledge Observable Micro- interactions in the Web Apps Protocols Data & Information Governance WWW Observable Macro- effects in the Web Web Science: Discipline or Transdisciplinary Endeavour?
  25. 25. Steffen Staab Web Science 25 How to investigate the Web?
  26. 26. Steffen Staab Web Science 26 Web Observatories Konect
  27. 27. Steffen Staab Web Science 27 Why to observe? • Understanding – Collecting – Describing – Analyzing – Modeling – Predicting – Repeating!
  28. 28. Steffen Staab Web Science 28 Challenges – Data Collection Issues Legal and/or Ethical • Crawling – May be disallowed by provider • Usage logging – Privacy of individuals • Even if it is allowed....
  29. 29. Steffen Staab Web Science 29 Challenges – Data Collection Issues • Crawling – What does it mean to crawl a heavily interactive site? – Incomplete data • Unreachability • Time outs
  30. 30. Steffen Staab Web Science 30 Challenges – Data Collection Issues • Crawling – What does it mean to crawl a heavily interactive site? – Incomplete data – Where to start? • We cannot observe everything! – Even just for data size! – What appear to be most fruitful starting points?
  31. 31. Steffen Staab Web Science 31 Challenges – Data Collection Issues • Crawling – What does it mean to crawl a heavily interactive site? – Incomplete data – Where to start? – Where to stop? • Each crawl is a view – Twitter » Tweet • URL • Web Page • Subweb » Followers • Followers‘ Followers • ...
  32. 32. Steffen Staab Web Science 32 Challenges – Data Collection Issues • Crawling – What does it mean to crawl a heavily interactive site? – Incomplete data – Where to start? – Where to stop? – Synchronous vs asynchronous • Strictly speaking: only asynchronous crawling possible – But in [Dellschaft&Staab] we targeted the construction of models for streams of tags
  33. 33. Steffen Staab Web Science 33 Challenges – Data Publishing Issues Legal and/or Ethical Example Issues • AOL query log • Netflix challenge • Delicious – http://www.tagora-project.eu/data/ • Twitter – Collecting, but no sharing • SocialSensor project
  34. 34. Steffen Staab Web Science 34 Challenges – Data Publishing Issues Technical/Modelling issues • Generic format, e.g. RDF • Format ready for digestion by a certain software, e.g. for Matlab processing • Openness to other data – E.g. references to DBPedia/Wikipedia • Accuracy of publishing – http://me.org showed „...“ – http://me.org showed „...“@2013-05-01:0900CEST – http://me.org showed „...“@2013-05-01:0900CEST called from IP 193.99.144.85 using browser...version...history...
  35. 35. Steffen Staab Web Science 35 Sharing Software • Software – For crawling or usage logging – Rather than sharing the data, share the code for observing • Example:  code for crawling Twitter in a certain way • Issues – Limited repeatability – Disturbance liability („Störerhaftung“) – at least in DE • If you provide source code for crawling, e.g., Facebook, even if you do not crawl FB, FB can sue you
  36. 36. Steffen Staab Web Science 36 More later by Jerome
  37. 37. Steffen Staab Web Science 37 Model the Web What is in the content? What is in the algorithm? What is in the Social machine? WebObservatory
  38. 38. Steffen Staab Web Science 38 Example Topic: Bias
  39. 39. Steffen Staab Web Science 39 Bias in the Device Accessibility • Impaired eyesight • Impaired use of mouse and keyboard • ... HTML5 semantics helps – but is not used much
  40. 40. Steffen Staab Web Science 40 Bias in the Device Accessibility • Impaired eyesight • Impaired use of mouse and keyboard • ... HTML5 semantics helps – but is not used much Haris Aslanidis
  41. 41. Steffen Staab Web Science 41 Bias in the Device Accessibility • Impaired eyesight • Impaired use of mouse and keyboard • ... HTML5 semantics helps – but is not used much http://west.uni- koblenz.de/en/research/projects/ mamem http://www.mamem.eu/mamem- makes-available-gazetheweb- browse-application/
  42. 42. Steffen Staab Web Science 42 Search engines • Categorizing people and animals – White vs black – http://www.nytimes.com/2016/06/26/opinion/sunday/artificia l-intelligences-white-guy-problem.html?_r=0 • Job advertisements – Well-paid job not offered to females Bias in the Software
  43. 43. Steffen Staab Web Science 43 Bias in Content/Data Credit Hire Sex Ethnic Zip Height ... ... + + + - - + + + - - correlated Data protection laws suggest not to process sensitive data attributes like „sex“ or „ethnic“
  44. 44. Steffen Staab Web Science 44
  45. 45. Steffen Staab Web Science 45 Bias in Content: Social Networks (Lerman et al 15)
  46. 46. Steffen Staab Web Science 46 fish, rice seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon fish, salmon, wine rice, fish lobster, seafood, shrimp coffee coffee, wine coffee wine wine pizza, wine pizza, wine pasta, wine pasta, shrimp lobster, shrimp seafood, shrimp Tagged photos with geo-coordinates from Flickr
  47. 47. Steffen Staab Web Science 47 fish, rice seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta seafood, shrimp lobster, shrimp Bias in the Algorithm: Shape of Clusters
  48. 48. Steffen Staab Web Science 48 Evaluation: Anectodal, Perplexity, Gaming Gaming study: intrusion detection Precision 8 topics avg / median LGTA 0.60 / 0.58 Basic model 0.64 / 0.58 MGTM 0.78 / 0.75
  49. 49. Steffen Staab Web Science 49 • The Web reflects current and past discrimination! • Example: – Predict who will leave the company based on email features (ICWSM-16) • People with outsiders‘ vocabulary more prone to fail • Unresolved: – Do they fail because they are less adaptable or – Do they fail because the environment is hostile to outsider? Bias and Social Practices
  50. 50. Steffen Staab Web Science 50 Wikipedia • Efforts to counter bias Law • E.g. UK equality act • Protected characteristics: – Age, disability, gender, marriage, religion,... Bias and Processes
  51. 51. Steffen Staab Web Science 51 Biases in the Social Machine: The Case of Liquid Feedback
  52. 52. Steffen Staab Web Science 52 ...
  53. 53. Steffen Staab Web Science 53 Online Delegative Democracy CC-BY-SA Ilmari Karonen
  54. 54. Steffen Staab Web Science 54 Delegative Democracy • Between direct and representative democracy CC-BY-SA Ilmari Karonen
  55. 55. Steffen Staab Web Science 55 Delegative Democracy • Between direct and representative democracy • Voters can delegate their vote to other voters CC-BY-SA Ilmari Karonen
  56. 56. Steffen Staab Web Science 56 CC-BY-SA Ilmari Karonen
  57. 57. Steffen Staab Web Science 57 CC-BY-SA Ilmari Karonen
  58. 58. Steffen Staab Web Science 58 CC-BY-SA Ilmari Karonen Delegative Democracy • Between direct and representative democracy • Voters can delegate their vote to other voters • Delegations can be revoked at any time
  59. 59. Steffen Staab Web Science 59 CC-BY-SA Ilmari Karonen Delegative Democracy • Between direct and representative democracy • Voters can delegate their vote to other voters • Delegations can be revoked at any time • Votes are public!
  60. 60. Steffen Staab Web Science 60 Dataset: LiquidFeedback (German Pirate Party)
  61. 61. Steffen Staab Web Science 61 LiquidFeedback – Pirate Party • Observation: 08/2010 – 11/2013 • 13,836 Members • 14,964 Delegations • 499,009 Votes
  62. 62. Steffen Staab Web Science 62 LiquidFeedback – German Pirate Party • Users create initiatives, which are grouped by issues and belong to areas
  63. 63. Steffen Staab Web Science 63 LiquidFeedback – German Pirate Party • Users create initiatives, which are grouped by issues and belong to areas Area: Environmental issues Issue: CO2 output has to be reduced. Initiative: Subsidise wind turbines!
  64. 64. Steffen Staab Web Science 64 LiquidFeedback – German Pirate Party • Users create initiatives, which are grouped by issues and belong to areas Area: Environmental issues Issue: CO2 output has to be reduced. Initiative: Subsidise wind turbines! Areas: 22 Issues: 3,565 Initiatives: 6,517
  65. 65. Steffen Staab Web Science 65 LiquidFeedback – German Pirate Party • Users create initiatives, which are grouped by issues and belong to areas Delegations on global, initiative, issue and area level → “Back-delegations” possible
  66. 66. Steffen Staab Web Science 66 Dataset – First Impressions •
  67. 67. Steffen Staab Web Science 67 Dataset – First Impressions • Voting Weight
  68. 68. Steffen Staab Web Science 68 Dataset – Bias ? • 3,658 members > 10 votes 1,156 members > 100 votes 54 members > 1,000 votesMedian all: 8 votes Median delegating: 42 votes Median delegates: 64 votes
  69. 69. Steffen Staab Web Science 69 Delegation Network • Temporal analysis •
  70. 70. Steffen Staab Web Science 70 Delegation Network • Temporal analysis •
  71. 71. Steffen Staab Web Science 71 Delegation Network • Temporal analysis •
  72. 72. Steffen Staab Web Science 72 Delegation Network • Temporal analysis •
  73. 73. Steffen Staab Web Science 73 Investigation • What are the social processes? • How are data collected? • Which algorithms are used? • How is the algorithm used in system processes? • What are the system effects? Purposes of investigation • Social sciences: Observe bias in action! • Intervention: Change the overall system! • System sciences: Understand system limits! • Engineering: Build „proper“ technical system! Summary: Bias
  74. 74. Steffen Staab Web Science 74 Web Models
  75. 75. Steffen Staab Web Science 75 • Descriptive – Qualitative – Statistical • Predictive – Modeling deterministic regularities • Generative – Modeling non-deterministic principles • Liking a song • Creating a link Web Models
  76. 76. Steffen Staab Web Science 76 Descriptive Models Example: Bow Tie Structure of the Web
  77. 77. Steffen Staab Web Science 77 Bow-tie structure of the Web
  78. 78. Steffen Staab Web Science 78 Predictive Models Example: Link Prediction by Triangle Closing
  79. 79. Steffen Staab Web Science 79 Social Network Person Friendship
  80. 80. Steffen Staab Web Science 80 Recommender Systems Predict who I will add as friend next Standard algorithm: find friends-of-friends me
  81. 81. Steffen Staab Web Science 81 Friend of a Friend 1 2 4 5 6 3 Count the number of ways a person can be found as the friend of a friend.
  82. 82. Steffen Staab Web Science 82 Generative Models Example: Link Creation by Barabasi-Albert
  83. 83. Steffen Staab Web Science 83 • Many large networks are scale free – Matthew effect: rich get richer • Vote delegation! • The degree distribution has a power-law behavior for large k (far from a Poisson distribution) • Random graph theory and the Watts-Strogatz model cannot reproduce this feature General considerations
  84. 84. Steffen Staab Web Science 84 Scale free Same shape of hull, no matter which resolution
  85. 85. Steffen Staab Web Science 85 Scale-free of Web distribution Same shape of distribution: no matter which k
  86. 86. Steffen Staab Web Science 86 Two generic mechanisms common in many real networks: • Growth (www, research literature, ...) • Preferential attachment: attractiveness of popularity The two are necessary Barabasi-Albert model (1999) Barabási & Albert, Science 286, 509 (1999)
  87. 87. Steffen Staab Web Science 87 • t=0, m0 nodes • Each time step we add a new node with m ( m0) edges that link the new node to m different nodes already present in the system Growth
  88. 88. Steffen Staab Web Science 88 • When choosing the nodes to which the new connects, the probability that a new node will be connected to node i depends on the degree ki of node Preferential attachment ( ) i i j j k k k    Linear attachment (more general models) Sum over all existing nodes
  89. 89. Steffen Staab Web Science 89 Numerical simulations • Power-law P(k) k- SF=3 • The exponent does not depend on m (the only parameter of the model)
  90. 90. Steffen Staab Web Science 90 The Past and the Future Web
  91. 91. Steffen Staab Web Science 91 • 1945 Vannevar Bush, „As we may think“, Memex • 1962 Ted Nelson, Hypertext • 1965 Wide area network • 1968 Doug Engelbart, The mother of all demos • 1972 Public Arpanet, Email • 1974-82 Internet protocol TCP/IP • 1978 Consumer information services & Email • 1983 AOL, online service for games, communities... • 1984 Domain name service Pre-Web
  92. 92. Steffen Staab Web Science 92 The World Wide Web • 1989 Concept drafted by Tim Berners-Lee • 1993 National Center for SuperComputing Applications launched Mosaic X • 1994 First WWW conference • 1994 W3C started at MIT • Commercial websites began their proliferation • Followed by local school/club/family sites • The web exploded – 1994 – 3,2 million hosts and 3,000 websites – 1995 – 6,4 million hosts and 25,000 websites – 1997 – 19,5 million hosts and 1,2 million websites – January 2001 – 110 million hosts and 30 million websites
  93. 93. Steffen Staab Web Science 93 The World Wide Web – 1994/1995 Amazon – 1994/1995 Wiki – 1995 AltaVista Search Engine – 1995 Internet Explorer – 1997-2001 Browser wars – 1996-1998 XML recommendation – 1998 Google – 1999 First W3C recommendation on RDF (Semantic Web) – 2001 Dot.Com bubble bursts – 2001 Wikipedia – 2003/2004 Facebook – 2004 Flickr – 2005 YouTube
  94. 94. Steffen Staab Web Science 94 Concepts Example Applications Web of People Physical transport service (Uber (2009), Lyft), accommodation service (AirBnB, Couchsurfing), online dating service, 2009 Web of Things Smart city, ambient intelligence, personal and public health information, personal and public transport information 2007 Web of Services Cloud services, Digital transformation, Programmable Web (2005) 2005 Web of Data User generated content applications (Facebook, Wikipedia (2001) and Wikidata,…), Linked open gov data 2001 Web of Documents HTTP, HTML, XML, Browser (Mosaic 1993) 1993 Computer Networks/Internet Document delivery (internet 1982), VOIP, Streaming 1982
  95. 95. Steffen Staab Web Science 95 Internet of Things vs Web of Things IoT • Internet • P2P networks • Sensors Web of things • How to link? • What is „linking WoT“? • How to use by me? • How to discover? • How to search?
  96. 96. Steffen Staab Web Science 96 Web of People
  97. 97. Steffen Staab Web Science 97 http://static.tapastic.com/cartoons/19/f5/11/a0/a70ea5d7dbdd477d9e3bc2e0a2bfa286.gif
  98. 98. Steffen Staab Web Science 98 Aaron Swartz † Co-author of RSS1.0 at the age of 14
  99. 99. Steffen Staab Web Science 99 http://static.tapastic.com/cartoons/19/f5/11/a0/a70ea5d7dbdd477d9e3bc2e0a2bfa286.gif
  100. 100. Steffen Staab Web Science 100 Web of People • What is it? – Identification • Orcid • Facebook ID • Oauth 2.0 ? – Trust • Uber score • Airbnb score • Ebay score • Tinder score • Group score – Chinese social score – Credit rating score ?
  101. 101. Steffen Staab Web Science 101 Concepts Delivered technical capabilities Web of People Identification and rating by/of people Web of Things Identification, linking, aggregation, monitoring and controlling of things Web of Services Identification, composition and calling of services Web of Data Identification, linking and retrieval of data Web of Documents Identification, linking and retrieval of documents Computer Networks/Internet Identification of and communication between computers
  102. 102. Steffen Staab Web Science 102 Concepts Standards Web of People No mature standards yet Web of Things No mature standards yet Web of Services REST, JSON, JSONLD Web of Data RDF, SPARQL Web of Documents HTTP, HTML, XML, AJAX Computer networks/Internet Internet, TCP/IP, Optical fibre, 5G
  103. 103. Steffen Staab Web Science 103 Your thoughts?
  104. 104. Steffen Staab Web Science 104 Conclusions
  105. 105. Steffen Staab Web Science 105 What is in the data? What is in the algorithm? What is in the Social machine? WebObservatory
  106. 106. Steffen Staab Web Science 106 Telling the Story with Web Science What is in the Data? What is in the Algorithm? What is in the Social Machine? Story telling Under- standing Modelling
  107. 107. Steffen Staab Web Science 107 Web Accomplishments • Web of Services • Web of Data • Web of Documents • Computer networks/Internet Future in the making • Web of People • Web of Things How to identify and use? How to observe? How to resolve issues?
  108. 108. Steffen Staab Web Science 108 Thank you!

×