A conference report of SemTechBiz 2013 in San Francisco, from a datamining and knowledge-management point of view. It covers several companies with their automatic algorithms to extract data from cleverly discovered crowed-curated data sources, or using UI tools to leverage existing utility to lure user help mark up the data...
2. Summary
People found great use of external data to help
extract knowledge, build models
These valuable data are generated by crowds but
harvested by mining algorithms and/or UI tools
LOD to enrich attributes and synonyms (WalmartLabs),
NLP on recipes and build deep models (Whisk.com)
Webmaster tools to markup content (Google)
3. SemTechBiz SF 13
SemTechBiz2013 in San Francisco is still the largest in the
world on semantic web related technologies
With many new comers from various industries
An indicator of the technologies entering prime time
Has up to 7 parallel talks – broad coverage and interests
Now a 2nd tier conference in my humble opinion
Diluted to 3 times/locations: US West + US East + EU / year
Attendees: 1200 in 2011, 800 in 2012, 600 in 2013
Now missing elite researchers and/or top executives
More practical, real-world, business, startups, less academic
4. Context and Scope
This is a themed report on building knowledge-base
and/or semantic models
The theme title is decided post-conference due to the
obvious similarity among all relevant presentations
6. @WalmartLabs
• Color search and presentation: WordNet!
“Red Shirt”
• Intent? Linked Data can help, on related products too.
“Green Lantern”
• DVD or Halloween costume? Time/news is thy friend.
“Dark Knight”
7. External Data by @WalmartLabs
Vast amount of external data sets: WordNet, Dbpedia, LOD
cloud, Twitter stream, third-party prices (crawled), product
descriptions, user click streams (web logs)…
10. TipSense Technologies
A platform for pulling statistically significant
knowledge from unstructured semantic data sets
Transforming vast amounts of unstructured and
semi-structured content into a fully annotated
conceptual model.
Conceptual entity recognition
Contextualized content fingerprinting
Concepts/topic model, sentiment analysis
11.
12. Whisk.com
Keynote: Understanding Recipes
UK startup Whisk.com @nickholzherr on collecting
recipe ingredients, enriching with
semantics, recommending dishes and help ordering
from stores.
Wrapper induction, NLP for data collection
Coping with missing info, noises, vague data
Model flavor profiles, portion changing
Challenges and opportunities
Leftovers, geo-data, local shopping, coupons…
15. Understanding Intents
Entity, Relationship Mining
Built database of millions of concepts
Shallow ontology modeling via entity and attribute
extraction/mining
Rich semantics (units, colors, patterns, cities…)
Concept propagation (tagging by training on user
weblogs)
19. Structured Data Markup
Not something entirely new: Rich Snippet
We experimented it 2 years ago (extension of
Semantic Job Search proposal)
Supporting more types now
An ecosystem no one afford to lose
Google leveraged the SEO utility to gain more
structured data (free labor)
20. Others
Gannett (News)
Use a combination of auto-tagging and rules to match news
articles with an evolving taxonomy (low-tech, but works )
ISS (Intelligent Software Solutions)
Complex Event Processing (in “expressive” language)
Fuzzy matching with patterns with Bayesian Networks
Semantic Search and Automatic question answering
Google now answers (factoid questions)
E.g. “What did Steve Jobs die?”, “What is the height of Mt.
Everest”, “Who is the CEO of Apple?”
22. Query Interpretation
@SemTechBiz
“Red Shirt”
Shirt (Red)
Red ~=
Crimson, scarlet, ruby, cher
ry, rose, …
T-shirt a Shirt?
@ProjectHalo
“Dead Duck”
Bird (dead)
Dead ~= not
alive, gone, expired, killed,
…
Beijing Duck a Duck?
Build structured queries from natural languages
Disambiguation Query expansion
23. Intent & Process
@ SemTechBiz
“Eco-friendly gift for dad”
Need products as gifts
Related to “dad”, “father”
Expand “eco-friendly” to
close related concepts
Weigh purchases/views
during special event
(Christmas, Father’s Day)*
@ Project Halo
“How do we feel the sense
of heat?”
Need sentences on feeling
Related to “heat/hot”
Expand “heat”, “sense” to
related concepts
Weigh on signal
transmission in neuron*
The Process of getting
something done
* Learned from past user activities
24. Abstract Concept
Concrete Instances
@ SemTechBiz
“Eco-friendly” (gift)
Mine related product
review sites and blogs
~=
Organic, Recycled, Solar, R
eclaimed, …
@ Project Halo
“Feeling” (heat)
Mine related biological
sites, books, tutorials
~=
Sense, Experience, Feel, Te
mperature Sensation, …
Build abstract concept, entity, instance
networks/graphs
25. Ranking Support
@ SemTechBiz2013
Products related to “Gift”
Recipes for “Sweet
Seafood”
Apps that are “Free, Pretty
and Fun”
@ Project Halo
Concepts related to “Feel”
Sentences on “Red
Producer”
Creatures that can be “both
a prey and a predator”
Scoring algorithm to return the
most relevant results
26. Modeling
@ SemTechBiz2013
“Flavor” model (Whisk)
“Special Occasion” learning
(BloomSearch)
“Cooking” process
(ingredients, portion, left-
over, purchase…)
@ Project Halo
“Function” model in AURA
“Neural signal
transmission”
“Mitosis” event
(steps, components, tempo
ral process, result…)
From Facts, Relations to
Casual and Deep Models
27. Crowd-sourcing
@ SemTechBiz2013
Use webmasters to
generate structured
markups
(Author, Category, Title, Pri
ce, Rating, …)
@ Project Halo
Use students to generate
metadata for
sentences, questions and
answers
(Relevance, UT, Type, Chapt
er, Exact/Various, …)
Crowd-Sourcing works, if it has a limited
quantity and can be done cheaply
Google provides other utility (incentives for SEO) to lure webmasters
Project Halo need figure out our game plan
28. Summary of Use of
(Big, Wild) Data
@SemTech
Parse vague user query into best
structured queries for databases
Understand user’s underlying
intent
Link concept entity to concrete
entities
Rank apps, products …
Deep, contextual models
(flavor, time and location…)
Use crowds directly for free
@ProjectHalo
Translate Find-A-Value and other
simple questions into complex IR
queries
Understand sentence’s purpose
Relate category/class to
instances
Rank answers, evidence…
Deep contextual models
(location, process, events…)
Need leverage crowd cheaply