SlideShare ist ein Scribd-Unternehmen logo
1 von 46
From Text and Data to Knowledge: Via
Semantic Wikis
The Social Semantic Web in the Small
Jesse Wang
The Bottleneck of AI is Knowledge Acquisition
2
Human
Intelligence
Computer
Intelligence
COMPUTER INTELLIGENCE IS IN THE
CONNECTIONS
3
Connecting both Information and People
Connections between people
ConnectionsbetweenInformation
Email
Social Networking
Groupware
Javascript
Weblogs
Databases
File Systems
HTTP
Keyword Search
USENET
Wikis
Websites
Directory Portals
2010 - 2020
Web 1.0
2000 - 2010
1990 - 2000
PC Era
1980 - 1990
RSS
Widgets
PC’s
2020 - 2030
Office 2.0
XML
RDF
SPARQLAJAX
FTP IRC
SOAP
Mashups
File Servers
Social Media Sharing
Lightweight Collaboration
ATOM
Web 3.0
Web 4.0
Semantic Search
Semantic Databases
Distributed Search
Intelligent personal agents
Java
SaaS
Web 2.0Flash
OWL
HTML
SGML
SQL
Gopher
P2P
The Web
The PC
Windows
MacOS
SWRL
OpenID
BBS
MMO’s
VR
Semantic Web
Intelligent Web
The Internet
Social Web
Web OS
At Multiple Levels of Understanding
5
Signal entity (Words)
Signal form (Syntax)
Signal semantics (Concepts)
Categories (taxonomy)
Statements
Models
Decision-making
HOW DO WE CAPTURE ALL?
At least, the semantics?
6
Two Paths for Semantics (>>KB Construction)
 “Bottom-Up”
– Add semantic metadata to pages and databases all over the Web
• Alternatively train models to extract above info (machine-assisted)
– Every Website becomes semantic
• except for those not tagged, trained, or errors
 “Top-Down”
– Experts build models and rules for semantics
– Create services that provide this as an overlay to non-semantic
Web
– Every website becomes semantic
• except for those not covered 
-- Alex Iskold
Five Approaches to Semantics
 Tagging
 Statistics
 Linguistics
 Semantic Web
 Artificial Intelligence
The Tagging Approach
 Pros
– Easy for users to add and read tags
– Tags are just strings
– No algorithms or ontologies to deal
with
– No technology to learn
 Cons
– Easy for users to add and read tags
– Tags are just strings
– No algorithms or ontologies to deal
with
– No technology to learn
 Technorati
 Del.icio.us
 Flickr
 Wikipedia
 YouTube
The Statistical Approach
 Pros:
– Pure mathematical algorithms
– Massively scalable with good training
data
– Language independent
 Cons:
– No understanding of the content
– Hard to craft good queries
– Best for finding really popular things –
not good at finding needles in
haystacks
– Limited by data (esp. quality training
data)
– Not great for sparse structured data
with strong inherent semantics
 Google
 Lucene
 Autonomy
 Farecast (Bing Travel)
The Linguistic Approach
 Pros:
– Almost-true language understanding
– Extract knowledge from text
– Best for search for particular facts or
relationships
– More precise queries
 Cons:
– Computationally intensive
– Difficult to scale
– Lots of special case and other errors
– Language-dependent
 Powerset
 Hakia
 Inxight, Attensity, and others…
The Semantic Web Approach
 Pros:
– More precise queries
– Smarter apps with less work
– Not as computationally intensive
– Share & link data between apps
– Works for both unstructured and
structured data
 Cons:
– Lack of tools
– Difficult to scale
– Who makes all the metadata?
 Radar Networks
 DBpedia Project
 Metaweb (Freebase)
The Artificial Intelligence Approach
 Pros:
– Smart in narrow domains
– Answer questions intelligently
– Reasoning and learning
 Cons:
– Computationally intensive
– Difficult to scale
– Extremely hard to program
– Does not work well outside of narrow
domains
– Training takes a lot of work
 Cycorp
 AURA (Project Halo)
The Approaches Compared
Make the software smarter
Make the Data Smarter
Statistics
Linguistics
Semantic
Web
A.I.
Tagging
In Practice
Tagging
Semantic Web
Statistics
Linguistics
Artificial intelligence
From Tagging to AI
Data Structure
Intelligence
16
The Semantic Web is a Key Enabler
 Moves the “intelligence” out of applications, into the data
 Data need special structures
 becomes self-describing; Meaning of data becomes part of
the data
 Apps can become smarter with less work, because the data
carries knowledge about what it is and how to use it
 Data can be shared and linked more easily
The Semantic Web = Open Database Layer for the Web
User
Profiles
Web
Content
Data
Records
Apps &
Services
Ads &
Listings
Open Data Mappings
Open Data Records
Open Rules
Open Ontologies
Open Query Interfaces
And The Web IS the Database!
Application A Application B
BUT THERE IS STILL SOMETHING
MISSING
20
21
In Every Part or Layer of the Semantic Web, We Need
22
Now a Complete Web
23
Crowd Wisdom To Best Map Human Knowledge for Human
24
Clear Semantics for Machine to Understand Knowledge
25
Semantic Wikis: the Social Semantic Web in Action!
26
Semantic
Wikis
What is a Wiki? A Key Feature of Wikis is
27
This distinguishes wikis from other publication tools
Consensus in Wikis Comes from
 Collaboration
– ~17 edits/page on average in
Wikipedia (with high variance)
– Wikipedia’s Neutral Point of View
 Convention
– Users follow customs and
conventions to engage with
articles effectively
28
Software Support Makes Wikis Successful
 Trivial to edit by anyone
 Tracking of all changes, one-
step rollback
 Every article has a “Talk” page
for discussion
 Notification facility allows anyone
to “watch” an article
 Sufficient security on
pages, logins can be required
 A hierarchy of
administrators, gardeners, and
editors
 Software Bots recognize certain
kinds of vandalism and auto-
revert, or recognize articles that
need work, and flag them for
editors 29
Success of Wikis
30
Actual number of articles on en.wikipedia.org (thick
blue line) compared with a Gompertz model that leads
eventually to a maximum of about 4.4 million articles
(thin green line)
Summary: What Wiki Is Really About
Quick and Easy – No download
Layered Community Authoring
Interlinked Hierarchical Content
Revision Control
Notification
What is a Semantic Wiki
 A wiki that has an underlying model of the
knowledge described in its pages.
 To allow users to make their knowledge explicit and formal
 Semantic Web Compatible
32
Semantic Wiki
Combining Human Knowledge and Data Structures
Wikis for
Metadata
Metadata
for Wikis
33
Basics of Semantic Wikis
 Still a wiki, with regular wiki features
– E.g. Category/Tags, Namespaces, Title, Versioning, ...
 Typed Content
– E.g. Page/Card, Date, Number, URL/Email, String, …
 Typed Links
– E.g. “capital_of”, “contains”, “born_in”…
 Querying Interface Support
– E.g. “[[Category:Person]] [[Age::<30]]”
34
Advanced Semantic Wiki Features
 Semantic forms or templates
 Auto-completion based on semantics
 Powerful visualizations based on semantics/structures/types
 Rules and reasoning support
 Advanced search and queries (faceted
search, SPARQL, etc.)
 Semantic notifications (personalized information filtering)
 Import and Export of Semantic Data
 Data Integration:
identification, disambiguation, merging, trust, security/privac
y, …
35
Characteristics of Semantic Wikis
36
What is the Promise of Semantic Wikis?
 Semantic Wikis facilitate
Consensus over Data
(Knowledge)
 Combine low-expressivity
data authorship with the
best features of traditional
wikis
 User-governed, user-
maintained, user-defined
 Easy to use as an
extension of text authoring
37
One Key Helpful Feature of Semantic Wikis
Semantic Wikis are “Schema-Last”
Databases require DBAs and schema design;
Semantic Wikis develop and maintain the schema in the wiki
Great Candidate for Knowledge Acquisition
 Combining both unstructured and semi-structured data
 High connectivity on both information and social dimensions
 Collaboration with sophisticated software support
 Expected low-cost for crowd-sourcing
 Evolving category and template systems
 But…
39
BUT – Plain Wikis Are Not Good Enough
for Deep Knowledge Acquisition
40
Knowledge is represented
MOSTLY in unstructured and
semi-structured ways
• Plain text
• Templates
• Infoboxes
• Tables
• Section headers
• Links
• References
• Redirects
• …
Software/Feature Enhancements Are Needed
Quick and easy way to view and edit schema
Machine assistence (NLP, Auto-suggest…)
Better visualizations with structured data
More user layers for better KB construction
Better targeted (semantic) notifications
41
 K.A. is the well-known Artificial Intelligence Problem
– AI authoring is too expensive, too slow, not scalable
 Three Possible Solutions
– Automatic Machine Parsing (e.g. NELL, ReVerb)
• Quality (depth) not good enough for textbook sentences
• Error rates are too high
• Still need humans in the loop for training data
– Crowd Sourced Authoring (e.g. AMT)
• Biology and Knowledge Engineering expertise is difficult to get
• Mechanical Turk uses individuals, but the Knowledge Entry tasks appear to
require coordination, judgment, discussion, and working together
– Social Authoring and Crowdsourcing with Intelligence Software
Assistance
• Wikipedia showed this could work for text
• Semantic Wiki software R&D to make it work for more structured knowledge
Best Bet for Knowledge Acquisition?
42
With All These Features…
Effective
Knowledge
acquisition via
Semantic
Wikis
Combine the
strength of
human and
machines
Connecting
Human and
Machines
High Quality
while low cost
43
Conclusion: To Bridge Machine and Human Intelligence
44
To Dive Into Social Semantic Web
45
THANK YOU!
Credits: some slides are originally from the following people, with little or no
modifications:
Nova Spivack
Denny Vrandecic
Mark Greaves
Bao Jie
46

Weitere ähnliche Inhalte

Mehr von Jesse Wang

Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)
Jesse Wang
 
Agile and effective project management of for-by wikis
Agile and effective project management of for-by wikisAgile and effective project management of for-by wikis
Agile and effective project management of for-by wikis
Jesse Wang
 

Mehr von Jesse Wang (20)

Agile lean workshop
Agile lean workshopAgile lean workshop
Agile lean workshop
 
Big data analytic platform
Big data analytic platformBig data analytic platform
Big data analytic platform
 
Smart datamining semtechbiz 2013 report
Smart datamining semtechbiz 2013 reportSmart datamining semtechbiz 2013 report
Smart datamining semtechbiz 2013 report
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Hybrid system architecture overview
Hybrid system architecture overviewHybrid system architecture overview
Hybrid system architecture overview
 
Experiment on Knowledge Acquisition
Experiment on Knowledge AcquisitionExperiment on Knowledge Acquisition
Experiment on Knowledge Acquisition
 
SemTech 2012 Talk semantify office
SemTech 2012 Talk  semantify officeSemTech 2012 Talk  semantify office
SemTech 2012 Talk semantify office
 
Building SMWCon Spring 2012 Site
Building SMWCon Spring 2012 SiteBuilding SMWCon Spring 2012 Site
Building SMWCon Spring 2012 Site
 
SMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev UpdateSMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev Update
 
SMWCon Spring 2012 Welcome Remarks
SMWCon Spring 2012 Welcome RemarksSMWCon Spring 2012 Welcome Remarks
SMWCon Spring 2012 Welcome Remarks
 
Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)
 
Msra talk smw+apps
Msra talk smw+appsMsra talk smw+apps
Msra talk smw+apps
 
Jist tutorial semantic wikis and applications
Jist tutorial   semantic wikis and applicationsJist tutorial   semantic wikis and applications
Jist tutorial semantic wikis and applications
 
Semantic Wiki Page Maker
Semantic Wiki Page MakerSemantic Wiki Page Maker
Semantic Wiki Page Maker
 
Smwcon widget editor - first preview
Smwcon widget editor - first previewSmwcon widget editor - first preview
Smwcon widget editor - first preview
 
Microsoft Office Connector Update at SMWCon Spring 2011
Microsoft Office Connector Update at SMWCon Spring 2011Microsoft Office Connector Update at SMWCon Spring 2011
Microsoft Office Connector Update at SMWCon Spring 2011
 
Smwcon spring2011 tutorial applied semantic mediawiki
Smwcon spring2011 tutorial applied semantic mediawikiSmwcon spring2011 tutorial applied semantic mediawiki
Smwcon spring2011 tutorial applied semantic mediawiki
 
Semantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in ActionSemantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in Action
 
Agile and effective project management of for-by wikis
Agile and effective project management of for-by wikisAgile and effective project management of for-by wikis
Agile and effective project management of for-by wikis
 
Aswc2009 Smw Tutorial Part 4 Wiki Tags
Aswc2009 Smw Tutorial Part 4 Wiki TagsAswc2009 Smw Tutorial Part 4 Wiki Tags
Aswc2009 Smw Tutorial Part 4 Wiki Tags
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Semantic Wiki, Great Candidate for Knowledge Acquisition

  • 1. From Text and Data to Knowledge: Via Semantic Wikis The Social Semantic Web in the Small Jesse Wang
  • 2. The Bottleneck of AI is Knowledge Acquisition 2 Human Intelligence Computer Intelligence
  • 3. COMPUTER INTELLIGENCE IS IN THE CONNECTIONS 3
  • 4. Connecting both Information and People Connections between people ConnectionsbetweenInformation Email Social Networking Groupware Javascript Weblogs Databases File Systems HTTP Keyword Search USENET Wikis Websites Directory Portals 2010 - 2020 Web 1.0 2000 - 2010 1990 - 2000 PC Era 1980 - 1990 RSS Widgets PC’s 2020 - 2030 Office 2.0 XML RDF SPARQLAJAX FTP IRC SOAP Mashups File Servers Social Media Sharing Lightweight Collaboration ATOM Web 3.0 Web 4.0 Semantic Search Semantic Databases Distributed Search Intelligent personal agents Java SaaS Web 2.0Flash OWL HTML SGML SQL Gopher P2P The Web The PC Windows MacOS SWRL OpenID BBS MMO’s VR Semantic Web Intelligent Web The Internet Social Web Web OS
  • 5. At Multiple Levels of Understanding 5 Signal entity (Words) Signal form (Syntax) Signal semantics (Concepts) Categories (taxonomy) Statements Models Decision-making
  • 6. HOW DO WE CAPTURE ALL? At least, the semantics? 6
  • 7. Two Paths for Semantics (>>KB Construction)  “Bottom-Up” – Add semantic metadata to pages and databases all over the Web • Alternatively train models to extract above info (machine-assisted) – Every Website becomes semantic • except for those not tagged, trained, or errors  “Top-Down” – Experts build models and rules for semantics – Create services that provide this as an overlay to non-semantic Web – Every website becomes semantic • except for those not covered  -- Alex Iskold
  • 8. Five Approaches to Semantics  Tagging  Statistics  Linguistics  Semantic Web  Artificial Intelligence
  • 9. The Tagging Approach  Pros – Easy for users to add and read tags – Tags are just strings – No algorithms or ontologies to deal with – No technology to learn  Cons – Easy for users to add and read tags – Tags are just strings – No algorithms or ontologies to deal with – No technology to learn  Technorati  Del.icio.us  Flickr  Wikipedia  YouTube
  • 10. The Statistical Approach  Pros: – Pure mathematical algorithms – Massively scalable with good training data – Language independent  Cons: – No understanding of the content – Hard to craft good queries – Best for finding really popular things – not good at finding needles in haystacks – Limited by data (esp. quality training data) – Not great for sparse structured data with strong inherent semantics  Google  Lucene  Autonomy  Farecast (Bing Travel)
  • 11. The Linguistic Approach  Pros: – Almost-true language understanding – Extract knowledge from text – Best for search for particular facts or relationships – More precise queries  Cons: – Computationally intensive – Difficult to scale – Lots of special case and other errors – Language-dependent  Powerset  Hakia  Inxight, Attensity, and others…
  • 12. The Semantic Web Approach  Pros: – More precise queries – Smarter apps with less work – Not as computationally intensive – Share & link data between apps – Works for both unstructured and structured data  Cons: – Lack of tools – Difficult to scale – Who makes all the metadata?  Radar Networks  DBpedia Project  Metaweb (Freebase)
  • 13. The Artificial Intelligence Approach  Pros: – Smart in narrow domains – Answer questions intelligently – Reasoning and learning  Cons: – Computationally intensive – Difficult to scale – Extremely hard to program – Does not work well outside of narrow domains – Training takes a lot of work  Cycorp  AURA (Project Halo)
  • 14. The Approaches Compared Make the software smarter Make the Data Smarter Statistics Linguistics Semantic Web A.I. Tagging
  • 16. From Tagging to AI Data Structure Intelligence 16
  • 17. The Semantic Web is a Key Enabler  Moves the “intelligence” out of applications, into the data  Data need special structures  becomes self-describing; Meaning of data becomes part of the data  Apps can become smarter with less work, because the data carries knowledge about what it is and how to use it  Data can be shared and linked more easily
  • 18. The Semantic Web = Open Database Layer for the Web User Profiles Web Content Data Records Apps & Services Ads & Listings Open Data Mappings Open Data Records Open Rules Open Ontologies Open Query Interfaces
  • 19. And The Web IS the Database! Application A Application B
  • 20. BUT THERE IS STILL SOMETHING MISSING 20
  • 21. 21
  • 22. In Every Part or Layer of the Semantic Web, We Need 22
  • 23. Now a Complete Web 23
  • 24. Crowd Wisdom To Best Map Human Knowledge for Human 24
  • 25. Clear Semantics for Machine to Understand Knowledge 25
  • 26. Semantic Wikis: the Social Semantic Web in Action! 26 Semantic Wikis
  • 27. What is a Wiki? A Key Feature of Wikis is 27 This distinguishes wikis from other publication tools
  • 28. Consensus in Wikis Comes from  Collaboration – ~17 edits/page on average in Wikipedia (with high variance) – Wikipedia’s Neutral Point of View  Convention – Users follow customs and conventions to engage with articles effectively 28
  • 29. Software Support Makes Wikis Successful  Trivial to edit by anyone  Tracking of all changes, one- step rollback  Every article has a “Talk” page for discussion  Notification facility allows anyone to “watch” an article  Sufficient security on pages, logins can be required  A hierarchy of administrators, gardeners, and editors  Software Bots recognize certain kinds of vandalism and auto- revert, or recognize articles that need work, and flag them for editors 29
  • 30. Success of Wikis 30 Actual number of articles on en.wikipedia.org (thick blue line) compared with a Gompertz model that leads eventually to a maximum of about 4.4 million articles (thin green line)
  • 31. Summary: What Wiki Is Really About Quick and Easy – No download Layered Community Authoring Interlinked Hierarchical Content Revision Control Notification
  • 32. What is a Semantic Wiki  A wiki that has an underlying model of the knowledge described in its pages.  To allow users to make their knowledge explicit and formal  Semantic Web Compatible 32 Semantic Wiki
  • 33. Combining Human Knowledge and Data Structures Wikis for Metadata Metadata for Wikis 33
  • 34. Basics of Semantic Wikis  Still a wiki, with regular wiki features – E.g. Category/Tags, Namespaces, Title, Versioning, ...  Typed Content – E.g. Page/Card, Date, Number, URL/Email, String, …  Typed Links – E.g. “capital_of”, “contains”, “born_in”…  Querying Interface Support – E.g. “[[Category:Person]] [[Age::<30]]” 34
  • 35. Advanced Semantic Wiki Features  Semantic forms or templates  Auto-completion based on semantics  Powerful visualizations based on semantics/structures/types  Rules and reasoning support  Advanced search and queries (faceted search, SPARQL, etc.)  Semantic notifications (personalized information filtering)  Import and Export of Semantic Data  Data Integration: identification, disambiguation, merging, trust, security/privac y, … 35
  • 37. What is the Promise of Semantic Wikis?  Semantic Wikis facilitate Consensus over Data (Knowledge)  Combine low-expressivity data authorship with the best features of traditional wikis  User-governed, user- maintained, user-defined  Easy to use as an extension of text authoring 37
  • 38. One Key Helpful Feature of Semantic Wikis Semantic Wikis are “Schema-Last” Databases require DBAs and schema design; Semantic Wikis develop and maintain the schema in the wiki
  • 39. Great Candidate for Knowledge Acquisition  Combining both unstructured and semi-structured data  High connectivity on both information and social dimensions  Collaboration with sophisticated software support  Expected low-cost for crowd-sourcing  Evolving category and template systems  But… 39
  • 40. BUT – Plain Wikis Are Not Good Enough for Deep Knowledge Acquisition 40 Knowledge is represented MOSTLY in unstructured and semi-structured ways • Plain text • Templates • Infoboxes • Tables • Section headers • Links • References • Redirects • …
  • 41. Software/Feature Enhancements Are Needed Quick and easy way to view and edit schema Machine assistence (NLP, Auto-suggest…) Better visualizations with structured data More user layers for better KB construction Better targeted (semantic) notifications 41
  • 42.  K.A. is the well-known Artificial Intelligence Problem – AI authoring is too expensive, too slow, not scalable  Three Possible Solutions – Automatic Machine Parsing (e.g. NELL, ReVerb) • Quality (depth) not good enough for textbook sentences • Error rates are too high • Still need humans in the loop for training data – Crowd Sourced Authoring (e.g. AMT) • Biology and Knowledge Engineering expertise is difficult to get • Mechanical Turk uses individuals, but the Knowledge Entry tasks appear to require coordination, judgment, discussion, and working together – Social Authoring and Crowdsourcing with Intelligence Software Assistance • Wikipedia showed this could work for text • Semantic Wiki software R&D to make it work for more structured knowledge Best Bet for Knowledge Acquisition? 42
  • 43. With All These Features… Effective Knowledge acquisition via Semantic Wikis Combine the strength of human and machines Connecting Human and Machines High Quality while low cost 43
  • 44. Conclusion: To Bridge Machine and Human Intelligence 44
  • 45. To Dive Into Social Semantic Web 45
  • 46. THANK YOU! Credits: some slides are originally from the following people, with little or no modifications: Nova Spivack Denny Vrandecic Mark Greaves Bao Jie 46