The document discusses content analytics and smart content. It notes that content is exploding and becoming more social, mobile, and streaming. Content analytics involves using various technologies like natural language processing, text mining, and machine learning to extract information and insights from large amounts of unstructured content. The goal is to move from simply finding documents to extracting knowledge and relationships to help drive virtuous cycles and findability. However, the technologies are still imperfect so expectations need to be realistic.
4. Three Views of Content Analytics
Business Strategist End User Research Scientist
It’s about money,
business models,
advertising, and
money.
It’s about finding
things, having
fun, and getting
stuff done.
It’s about fast
algorithms,
massive scales,
and machine
learning.
26. Social Search Needs
• Relevance
– Filtering the document web
• Social Media Content
– Filtering the social web
• Trends / Group Insight
– Tapping Community Knowledge
• Answers
– Trusted Advisor
Recommendation
• “Java” (coffee, island, or language?)
• “compliance”
• “What should I do in New York?”
• Where are my friends now?
• Why did power go out in Palo Alto?
• How does adoption work?
• ( on FB update) anybody give their
babies baby Benedryl for travel/jet
lag? Want to hear from parents
whether they have or not and how it
went
27. Enables 1:1 relevance
based on user profile
Complexity
Value
3. Social
Recommendations
(users to users)
1. Content or “Related item”
Recommendations
(items to item)
2. Personalized
Recommendations
(items to user)
Enables connections
between like users
Drives service stickiness
Enables users to
‘browse sideways’ from
any item
Recommendations
“Personalized” to “Social”
31. The Long-Tail of Online Business
70 %30 %
QUERY
TRAFFIC
+70%
Y/Y
32. Virtuous Cycles in Findability
Tuned experience
Social behavior affects relevance
Socially driven feedback loop
People and expertise location are the key ‘lens’
Structure drives exploration
Aligned with taxonomy and tags
Refinement
Social
Relevance
37. Grab-Bag of Related Technologies
• Problem – linguistic variations in concept expression
– Technology: natural language processing (NLP)
• Problem – huge numbers of documents that are the same or
versions of the same
– Technologies : text mining, text analytics, normalizing & de-duping
• Problem – amount of content exceeds amount of human
expertise to analyze & categorize
– Technologies : entity extraction, contextual analysis, auto-
categorization
• Problem – understanding trends and relative values expressed
in content
– Technology : sentiment analysis
• Problem – retrieving & federating contextually related and
relevant content
– Technologies – All of the above
46. Solving the Knife problem
Man Allegedly Attacked Wife With Knife
A Tyler man is awaiting arraignment this afternoon after
allegedly attacking his wife with a knife, said Tyler police.
The 41-year-old man will face aggravated assault and
aggravated robbery charges, said Don Martin, the
department's spokesman.
Officers took the man in custody near Garden Valley
and Loop 323. He ran from his residence after
"assaulting his wife with a knife and taking her purse
at knifepoint," said information released by Martin. The
woman refused medical treatment and did not appear
to be seriously injured, the statement said.
Excellent Knives!!!
Mere frequency counting of key
words can lead to undesired
results...
...understanding relationships
between words can reveal the
true topic of the document.
Objective:
Automatically
insert an
advertisement
that matches the
content best.
47. Actor Director Movi
e
TV
Show
Adventure Comedy Face Image
Actor 0 0.6 1 1 1 1 0.9
Director 0 1 1 1 1 0.3
Movie 0 0.6 1 1 -1
TVShow 0 1 1 -1
Adventure 0 0.14 -1
Comedy 0 -1
FaceImage 0
48. 48
Cyc Knowledge Base
Thing
Intangible
Thing
Individual
Temporal
Thing
Spatial
Thing
Partially
Tangible
Thing
Paths
Sets
Relations
Logic
Math
Human
Artifacts
Social
Relations,
Culture
Human
Anatomy &
Physiology
Emotion
Perception
Belief
Human
Behavior &
Actions
Products
Devices
Conceptual
Works
Vehicles
Buildings
Weapons
Mechanical
& Electrical
Devices
Software
Literature
Works of Art
Language
Agent
Organizations
Organizational
Actions
Organizational
Plans
Types of
Organizations
Human
Organizations
Nations
Governments
Geo-Politics
Business,
Military
Organizations
Law
Business &
Commerce
Politics
Warfare
Professions
Occupations
Purchasing
Shopping
Travel
Communication
Transportation
& Logistics
Social
Activities
Everyday
Living
Sports
Recreation
Entertainment
Artifacts
Movement
State Change
Dynamics
Materials
Parts
Statics
Physical
Agents
Borders
Geometry
Events
Scripts
Spatial
Paths
Actors
Actions
Plans
Goals
Time
Agents
Space
Physical
Objects
Human
Beings
Organ-
ization
Human
Activities
Living
Things
Social
Behavior
Life
Forms
Animals
Plants
Ecology
Natural
Geography
Earth &
Solar System
Political
Geography
Weather
General Knowledge about Various Domains
Cyc contains:
17,000 Predicates
400,000 Concepts
5,000,000 Assertions
Represented in:
• First Order Logic
• Higher Order Logic
• Modal Logic
• Context Logic
• Micro-theories
Specific data, facts, and observations
49. Machine Learning Techniques
Create
Examples Model
Trainer
„Let the occurrence of the term
‚is host to‘ between a location and a
substance increase the probability that
this is a location x substance relation by
10%, because we have seen it more
often in positive than in negative
examples.“
Good
enough
?
Deploy
yesno
50. Example: The Semantic Associative Search Method
(MMM: The Mathematical Model of Meaning)
A
B
C
A
B
C
|| A || = || B || = || C ||
A
B
C
|| A || > || C || > || B ||
impression words
(as a context):
light, bright
impression words
(as a context):
dark, black
A,B,C: image data vectors
semantic space:
2,000 dimensional space
(presently)
(retrieval candidate image data)
2 2 2
w w w
|| C || > || B || > || A ||w w w
A: a sunny image
B: a silent image
C: a shady image
semantic
subspace
semantic
projection
semantic
projection
USP: 6,138,116Yasushi Kiyoki, 2009
52. Audience-specific search experiences
User context
Inform-
ation
context
Application
context
Social
context
Renee Lo
Engineering
Contoso Consulting
”What should I know about
implementing ERP?”
Alan Brewer
Sales Manager
Contoso Consulting
”What should I know about
selling ERP consulting?”
Username&Group
Memberships
Location
Languages
BusinessUnit
Department
Team
TimeofDay
PreferredSites
SharePointAudiences
Interests&CurrentProjects
ContextofCurrentTask
60. From Documents to Knowledge
Value
Document
Search
Finds documents
containing terms
Relationship
Extraction
Finds relationships
within documents
Assertion
Clustering
Finds assertions and the
evidence for them
Profiling
Summarizes different
kinds of information
Join
Creates indirect correlations
and connections