Semantic Wiki, Great Candidate for Knowledge Acquisition

From Text and Data to Knowledge: Via
Semantic Wikis
The Social Semantic Web in the Small
Jesse Wang

The Bottleneck of AI is Knowledge Acquisition
2
Human
Intelligence
Computer
Intelligence

COMPUTER INTELLIGENCE IS IN THE
CONNECTIONS
3

Connecting both Information and People
Connections between people
ConnectionsbetweenInformation
Email
Social Networking
Groupware
Javascript
Weblogs
Databases
File Systems
HTTP
Keyword Search
USENET
Wikis
Websites
Directory Portals
2010 - 2020
Web 1.0
2000 - 2010
1990 - 2000
PC Era
1980 - 1990
RSS
Widgets
PC’s
2020 - 2030
Office 2.0
XML
RDF
SPARQLAJAX
FTP IRC
SOAP
Mashups
File Servers
Social Media Sharing
Lightweight Collaboration
ATOM
Web 3.0
Web 4.0
Semantic Search
Semantic Databases
Distributed Search
Intelligent personal agents
Java
SaaS
Web 2.0Flash
OWL
HTML
SGML
SQL
Gopher
P2P
The Web
The PC
Windows
MacOS
SWRL
OpenID
BBS
MMO’s
VR
Semantic Web
Intelligent Web
The Internet
Social Web
Web OS

At Multiple Levels of Understanding
5
Signal entity (Words)
Signal form (Syntax)
Signal semantics (Concepts)
Categories (taxonomy)
Statements
Models
Decision-making

HOW DO WE CAPTURE ALL?
At least, the semantics?
6

Two Paths for Semantics (>>KB Construction)
 “Bottom-Up”
– Add semantic metadata to pages and databases all over the Web
• Alternatively train models to extract above info (machine-assisted)
– Every Website becomes semantic
• except for those not tagged, trained, or errors
 “Top-Down”
– Experts build models and rules for semantics
– Create services that provide this as an overlay to non-semantic
Web
– Every website becomes semantic
• except for those not covered 
-- Alex Iskold

Five Approaches to Semantics
 Tagging
 Statistics
 Linguistics
 Semantic Web
 Artificial Intelligence

The Tagging Approach
 Pros
– Easy for users to add and read tags
– Tags are just strings
– No algorithms or ontologies to deal
with
– No technology to learn
 Cons
– Easy for users to add and read tags
– Tags are just strings
– No algorithms or ontologies to deal
with
– No technology to learn
 Technorati
 Del.icio.us
 Flickr
 Wikipedia
 YouTube

The Statistical Approach
 Pros:
– Pure mathematical algorithms
– Massively scalable with good training
data
– Language independent
 Cons:
– No understanding of the content
– Hard to craft good queries
– Best for finding really popular things –
not good at finding needles in
haystacks
– Limited by data (esp. quality training
data)
– Not great for sparse structured data
with strong inherent semantics
 Google
 Lucene
 Autonomy
 Farecast (Bing Travel)

The Linguistic Approach
 Pros:
– Almost-true language understanding
– Extract knowledge from text
– Best for search for particular facts or
relationships
– More precise queries
 Cons:
– Computationally intensive
– Difficult to scale
– Lots of special case and other errors
– Language-dependent
 Powerset
 Hakia
 Inxight, Attensity, and others…

The Semantic Web Approach
 Pros:
– More precise queries
– Smarter apps with less work
– Not as computationally intensive
– Share & link data between apps
– Works for both unstructured and
structured data
 Cons:
– Lack of tools
– Who makes all the metadata?
 Radar Networks
 DBpedia Project
 Metaweb (Freebase)

The Artificial Intelligence Approach
 Pros:
– Smart in narrow domains
– Answer questions intelligently
– Reasoning and learning
 Cons:
– Computationally intensive
– Extremely hard to program
– Does not work well outside of narrow
domains
– Training takes a lot of work
 Cycorp
 AURA (Project Halo)

The Approaches Compared
Make the software smarter
Make the Data Smarter
Statistics
Linguistics
Semantic
Web
A.I.
Tagging

In Practice
Tagging
Semantic Web
Statistics
Linguistics
Artificial intelligence

From Tagging to AI
Data Structure
Intelligence
16

The Semantic Web is a Key Enabler
 Moves the “intelligence” out of applications, into the data
 Data need special structures
 becomes self-describing; Meaning of data becomes part of
the data
 Apps can become smarter with less work, because the data
carries knowledge about what it is and how to use it
 Data can be shared and linked more easily

The Semantic Web = Open Database Layer for the Web
User
Profiles
Web
Content
Data
Records
Apps &
Services
Ads &
Listings
Open Data Mappings
Open Data Records
Open Rules
Open Ontologies
Open Query Interfaces

And The Web IS the Database!
Application A Application B

BUT THERE IS STILL SOMETHING
MISSING
20

In Every Part or Layer of the Semantic Web, We Need
22

Crowd Wisdom To Best Map Human Knowledge for Human
24

Clear Semantics for Machine to Understand Knowledge
25

Semantic Wikis: the Social Semantic Web in Action!
26
Semantic
Wikis

What is a Wiki? A Key Feature of Wikis is
27
This distinguishes wikis from other publication tools

Consensus in Wikis Comes from
 Collaboration
– ~17 edits/page on average in
Wikipedia (with high variance)
– Wikipedia’s Neutral Point of View
 Convention
– Users follow customs and
conventions to engage with
articles effectively
28

Software Support Makes Wikis Successful
 Trivial to edit by anyone
 Tracking of all changes, one-
step rollback
 Every article has a “Talk” page
for discussion
 Notification facility allows anyone
to “watch” an article
 Sufficient security on
pages, logins can be required
 A hierarchy of
administrators, gardeners, and
editors
 Software Bots recognize certain
kinds of vandalism and auto-
revert, or recognize articles that
need work, and flag them for
editors 29

Success of Wikis
30
Actual number of articles on en.wikipedia.org (thick
blue line) compared with a Gompertz model that leads
eventually to a maximum of about 4.4 million articles
(thin green line)

Summary: What Wiki Is Really About
Quick and Easy – No download
Layered Community Authoring
Interlinked Hierarchical Content
Revision Control
Notification

What is a Semantic Wiki
 A wiki that has an underlying model of the
knowledge described in its pages.
 To allow users to make their knowledge explicit and formal
 Semantic Web Compatible
32
Semantic Wiki

Combining Human Knowledge and Data Structures
Wikis for
Metadata
Metadata
for Wikis
33

Basics of Semantic Wikis
 Still a wiki, with regular wiki features
– E.g. Category/Tags, Namespaces, Title, Versioning, ...
 Typed Content
– E.g. Page/Card, Date, Number, URL/Email, String, …
 Typed Links
– E.g. “capital_of”, “contains”, “born_in”…
 Querying Interface Support
– E.g. “[[Category:Person]] [[Age::<30]]”
34

Advanced Semantic Wiki Features
 Semantic forms or templates
 Auto-completion based on semantics
 Powerful visualizations based on semantics/structures/types
 Rules and reasoning support
 Advanced search and queries (faceted
search, SPARQL, etc.)
 Semantic notifications (personalized information filtering)
 Import and Export of Semantic Data
 Data Integration:
identification, disambiguation, merging, trust, security/privac
y, …
35

Characteristics of Semantic Wikis
36

What is the Promise of Semantic Wikis?
 Semantic Wikis facilitate
Consensus over Data
(Knowledge)
 Combine low-expressivity
data authorship with the
best features of traditional
wikis
 User-governed, user-
maintained, user-defined
 Easy to use as an
extension of text authoring
37

One Key Helpful Feature of Semantic Wikis
Semantic Wikis are “Schema-Last”
Databases require DBAs and schema design;
Semantic Wikis develop and maintain the schema in the wiki

Great Candidate for Knowledge Acquisition
 Combining both unstructured and semi-structured data
 High connectivity on both information and social dimensions
 Collaboration with sophisticated software support
 Expected low-cost for crowd-sourcing
 Evolving category and template systems
 But…
39

BUT – Plain Wikis Are Not Good Enough
for Deep Knowledge Acquisition
40
Knowledge is represented
MOSTLY in unstructured and
semi-structured ways
• Plain text
• Templates
• Infoboxes
• Tables
• Section headers
• Links
• References
• Redirects
• …

Software/Feature Enhancements Are Needed
Quick and easy way to view and edit schema
Machine assistence (NLP, Auto-suggest…)
Better visualizations with structured data
More user layers for better KB construction
Better targeted (semantic) notifications
41

 K.A. is the well-known Artificial Intelligence Problem
– AI authoring is too expensive, too slow, not scalable
 Three Possible Solutions
– Automatic Machine Parsing (e.g. NELL, ReVerb)
• Quality (depth) not good enough for textbook sentences
• Error rates are too high
• Still need humans in the loop for training data
– Crowd Sourced Authoring (e.g. AMT)
• Biology and Knowledge Engineering expertise is difficult to get
• Mechanical Turk uses individuals, but the Knowledge Entry tasks appear to
require coordination, judgment, discussion, and working together
– Social Authoring and Crowdsourcing with Intelligence Software
Assistance
• Wikipedia showed this could work for text
• Semantic Wiki software R&D to make it work for more structured knowledge
Best Bet for Knowledge Acquisition?
42

With All These Features…
Effective
Knowledge
acquisition via
Semantic
Wikis
Combine the
strength of
human and
machines
Connecting
Human and
Machines
High Quality
while low cost
43

Conclusion: To Bridge Machine and Human Intelligence
44

To Dive Into Social Semantic Web
45

THANK YOU!
Credits: some slides are originally from the following people, with little or no
modifications:
Nova Spivack
Denny Vrandecic
Mark Greaves
Bao Jie
46

Semantic Wiki, Great Candidate for Knowledge Acquisition

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Mehr von Jesse Wang

Mehr von Jesse Wang (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Semantic Wiki, Great Candidate for Knowledge Acquisition