this is a high-level pitch deck for knowledge acquisition (KA), beside the textual part. We already decide on matter that we need low level textual entailment based KA, while the high-level part involving more human computation is partially ignored at the point of presentation. This deck is an introduction to social semantic web and let people know how it can help with our KA tasks.
4. Connecting both Information and People
Connections between people
ConnectionsbetweenInformation
Email
Social Networking
Groupware
Javascript
Weblogs
Databases
File Systems
HTTP
Keyword Search
USENET
Wikis
Websites
Directory Portals
2010 - 2020
Web 1.0
2000 - 2010
1990 - 2000
PC Era
1980 - 1990
RSS
Widgets
PC’s
2020 - 2030
Office 2.0
XML
RDF
SPARQLAJAX
FTP IRC
SOAP
Mashups
File Servers
Social Media Sharing
Lightweight Collaboration
ATOM
Web 3.0
Web 4.0
Semantic Search
Semantic Databases
Distributed Search
Intelligent personal agents
Java
SaaS
Web 2.0Flash
OWL
HTML
SGML
SQL
Gopher
P2P
The Web
The PC
Windows
MacOS
SWRL
OpenID
BBS
MMO’s
VR
Semantic Web
Intelligent Web
The Internet
Social Web
Web OS
5. At Multiple Levels of Understanding
5
Signal entity (Words)
Signal form (Syntax)
Signal semantics (Concepts)
Categories (taxonomy)
Statements
Models
Decision-making
6. HOW DO WE CAPTURE ALL?
At least, the semantics?
6
7. Two Paths for Semantics (>>KB Construction)
“Bottom-Up”
– Add semantic metadata to pages and databases all over the Web
• Alternatively train models to extract above info (machine-assisted)
– Every Website becomes semantic
• except for those not tagged, trained, or errors
“Top-Down”
– Experts build models and rules for semantics
– Create services that provide this as an overlay to non-semantic
Web
– Every website becomes semantic
• except for those not covered
-- Alex Iskold
8. Five Approaches to Semantics
Tagging
Statistics
Linguistics
Semantic Web
Artificial Intelligence
9. The Tagging Approach
Pros
– Easy for users to add and read tags
– Tags are just strings
– No algorithms or ontologies to deal
with
– No technology to learn
Cons
– Easy for users to add and read tags
– Tags are just strings
– No algorithms or ontologies to deal
with
– No technology to learn
Technorati
Del.icio.us
Flickr
Wikipedia
YouTube
10. The Statistical Approach
Pros:
– Pure mathematical algorithms
– Massively scalable with good training
data
– Language independent
Cons:
– No understanding of the content
– Hard to craft good queries
– Best for finding really popular things –
not good at finding needles in
haystacks
– Limited by data (esp. quality training
data)
– Not great for sparse structured data
with strong inherent semantics
Google
Lucene
Autonomy
Farecast (Bing Travel)
11. The Linguistic Approach
Pros:
– Almost-true language understanding
– Extract knowledge from text
– Best for search for particular facts or
relationships
– More precise queries
Cons:
– Computationally intensive
– Difficult to scale
– Lots of special case and other errors
– Language-dependent
Powerset
Hakia
Inxight, Attensity, and others…
12. The Semantic Web Approach
Pros:
– More precise queries
– Smarter apps with less work
– Not as computationally intensive
– Share & link data between apps
– Works for both unstructured and
structured data
Cons:
– Lack of tools
– Difficult to scale
– Who makes all the metadata?
Radar Networks
DBpedia Project
Metaweb (Freebase)
13. The Artificial Intelligence Approach
Pros:
– Smart in narrow domains
– Answer questions intelligently
– Reasoning and learning
Cons:
– Computationally intensive
– Difficult to scale
– Extremely hard to program
– Does not work well outside of narrow
domains
– Training takes a lot of work
Cycorp
AURA (Project Halo)
14. The Approaches Compared
Make the software smarter
Make the Data Smarter
Statistics
Linguistics
Semantic
Web
A.I.
Tagging
17. The Semantic Web is a Key Enabler
Moves the “intelligence” out of applications, into the data
Data need special structures
becomes self-describing; Meaning of data becomes part of
the data
Apps can become smarter with less work, because the data
carries knowledge about what it is and how to use it
Data can be shared and linked more easily
18. The Semantic Web = Open Database Layer for the Web
User
Profiles
Web
Content
Data
Records
Apps &
Services
Ads &
Listings
Open Data Mappings
Open Data Records
Open Rules
Open Ontologies
Open Query Interfaces
19. And The Web IS the Database!
Application A Application B
27. What is a Wiki? A Key Feature of Wikis is
27
This distinguishes wikis from other publication tools
28. Consensus in Wikis Comes from
Collaboration
– ~17 edits/page on average in
Wikipedia (with high variance)
– Wikipedia’s Neutral Point of View
Convention
– Users follow customs and
conventions to engage with
articles effectively
28
29. Software Support Makes Wikis Successful
Trivial to edit by anyone
Tracking of all changes, one-
step rollback
Every article has a “Talk” page
for discussion
Notification facility allows anyone
to “watch” an article
Sufficient security on
pages, logins can be required
A hierarchy of
administrators, gardeners, and
editors
Software Bots recognize certain
kinds of vandalism and auto-
revert, or recognize articles that
need work, and flag them for
editors 29
30. Success of Wikis
30
Actual number of articles on en.wikipedia.org (thick
blue line) compared with a Gompertz model that leads
eventually to a maximum of about 4.4 million articles
(thin green line)
31. Summary: What Wiki Is Really About
Quick and Easy – No download
Layered Community Authoring
Interlinked Hierarchical Content
Revision Control
Notification
32. What is a Semantic Wiki
A wiki that has an underlying model of the
knowledge described in its pages.
To allow users to make their knowledge explicit and formal
Semantic Web Compatible
32
Semantic Wiki
34. Basics of Semantic Wikis
Still a wiki, with regular wiki features
– E.g. Category/Tags, Namespaces, Title, Versioning, ...
Typed Content
– E.g. Page/Card, Date, Number, URL/Email, String, …
Typed Links
– E.g. “capital_of”, “contains”, “born_in”…
Querying Interface Support
– E.g. “[[Category:Person]] [[Age::<30]]”
34
35. Advanced Semantic Wiki Features
Semantic forms or templates
Auto-completion based on semantics
Powerful visualizations based on semantics/structures/types
Rules and reasoning support
Advanced search and queries (faceted
search, SPARQL, etc.)
Semantic notifications (personalized information filtering)
Import and Export of Semantic Data
Data Integration:
identification, disambiguation, merging, trust, security/privac
y, …
35
37. What is the Promise of Semantic Wikis?
Semantic Wikis facilitate
Consensus over Data
(Knowledge)
Combine low-expressivity
data authorship with the
best features of traditional
wikis
User-governed, user-
maintained, user-defined
Easy to use as an
extension of text authoring
37
38. One Key Helpful Feature of Semantic Wikis
Semantic Wikis are “Schema-Last”
Databases require DBAs and schema design;
Semantic Wikis develop and maintain the schema in the wiki
39. Great Candidate for Knowledge Acquisition
Combining both unstructured and semi-structured data
High connectivity on both information and social dimensions
Collaboration with sophisticated software support
Expected low-cost for crowd-sourcing
Evolving category and template systems
But…
39
40. BUT – Plain Wikis Are Not Good Enough
for Deep Knowledge Acquisition
40
Knowledge is represented
MOSTLY in unstructured and
semi-structured ways
• Plain text
• Templates
• Infoboxes
• Tables
• Section headers
• Links
• References
• Redirects
• …
41. Software/Feature Enhancements Are Needed
Quick and easy way to view and edit schema
Machine assistence (NLP, Auto-suggest…)
Better visualizations with structured data
More user layers for better KB construction
Better targeted (semantic) notifications
41
42. K.A. is the well-known Artificial Intelligence Problem
– AI authoring is too expensive, too slow, not scalable
Three Possible Solutions
– Automatic Machine Parsing (e.g. NELL, ReVerb)
• Quality (depth) not good enough for textbook sentences
• Error rates are too high
• Still need humans in the loop for training data
– Crowd Sourced Authoring (e.g. AMT)
• Biology and Knowledge Engineering expertise is difficult to get
• Mechanical Turk uses individuals, but the Knowledge Entry tasks appear to
require coordination, judgment, discussion, and working together
– Social Authoring and Crowdsourcing with Intelligence Software
Assistance
• Wikipedia showed this could work for text
• Semantic Wiki software R&D to make it work for more structured knowledge
Best Bet for Knowledge Acquisition?
42
43. With All These Features…
Effective
Knowledge
acquisition via
Semantic
Wikis
Combine the
strength of
human and
machines
Connecting
Human and
Machines
High Quality
while low cost
43
46. THANK YOU!
Credits: some slides are originally from the following people, with little or no
modifications:
Nova Spivack
Denny Vrandecic
Mark Greaves
Bao Jie
46