SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Utilizing Crowd-sourced
Data for Knowledge
Extraction
A Themed Report of SemTechBiz San Francisco 2013.06
Summary
People found great use of external data to help
extract knowledge, build models
These valuable data are generated by crowds but
harvested by mining algorithms and/or UI tools
LOD to enrich attributes and synonyms (WalmartLabs),
NLP on recipes and build deep models (Whisk.com)
Webmaster tools to markup content (Google)
SemTechBiz SF 13
SemTechBiz2013 in San Francisco is still the largest in the
world on semantic web related technologies
With many new comers from various industries
An indicator of the technologies entering prime time
Has up to 7 parallel talks – broad coverage and interests
Now a 2nd tier conference in my humble opinion
Diluted to 3 times/locations: US West + US East + EU / year
Attendees: 1200 in 2011, 800 in 2012, 600 in 2013
Now missing elite researchers and/or top executives
More practical, real-world, business, startups, less academic
Context and Scope
This is a themed report on building knowledge-base
and/or semantic models
The theme title is decided post-conference due to the
obvious similarity among all relevant presentations
@WalmartLabs
Using heterogeneous data
Connect
People
and
Product
s
@WalmartLabs
• Color search and presentation: WordNet!
“Red Shirt”
• Intent? Linked Data can help, on related products too.
“Green Lantern”
• DVD or Halloween costume? Time/news is thy friend.
“Dark Knight”
External Data by @WalmartLabs
Vast amount of external data sets: WordNet, Dbpedia, LOD
cloud, Twitter stream, third-party prices (crawled), product
descriptions, user click streams (web logs)…
appcrawlr
TipSense Technologies
A platform for pulling statistically significant
knowledge from unstructured semantic data sets
Transforming vast amounts of unstructured and
semi-structured content into a fully annotated
conceptual model.
Conceptual entity recognition
Contextualized content fingerprinting
Concepts/topic model, sentiment analysis
Whisk.com
Keynote: Understanding Recipes
UK startup Whisk.com @nickholzherr on collecting
recipe ingredients, enriching with
semantics, recommending dishes and help ordering
from stores.
Wrapper induction, NLP for data collection
Coping with missing info, noises, vague data
Model flavor profiles, portion changing
Challenges and opportunities
Leftovers, geo-data, local shopping, coupons…
BloomReach.Search
Understanding Intents
Entity, Relationship Mining
Built database of millions of concepts
Shallow ontology modeling via entity and attribute
extraction/mining
Rich semantics (units, colors, patterns, cities…)
Concept propagation (tagging by training on user
weblogs)
Product Annotation
Network of Concepts
Google Webmaster
Tools: Markup
Structured Data
Structured Data Markup
Not something entirely new: Rich Snippet
We experimented it 2 years ago (extension of
Semantic Job Search proposal)
Supporting more types now
An ecosystem no one afford to lose
Google leveraged the SEO utility to gain more
structured data (free labor)
Others
Gannett (News)
Use a combination of auto-tagging and rules to match news
articles with an evolving taxonomy (low-tech, but works )
ISS (Intelligent Software Solutions)
Complex Event Processing (in “expressive” language)
Fuzzy matching with patterns with Bayesian Networks
Semantic Search and Automatic question answering
Google now answers (factoid questions)
E.g. “What did Steve Jobs die?”, “What is the height of Mt.
Everest”, “Who is the CEO of Apple?”
Closely Related to
Knowledge Acquisition
Similar Underlying Use Cases, Datasets and
Technologies
Query Interpretation
@SemTechBiz
“Red Shirt”
Shirt (Red)
Red ~=
Crimson, scarlet, ruby, cher
ry, rose, …
T-shirt a Shirt?
@ProjectHalo
“Dead Duck”
Bird (dead)
Dead ~= not
alive, gone, expired, killed,
…
Beijing Duck a Duck?
Build structured queries from natural languages
Disambiguation Query expansion
Intent & Process
@ SemTechBiz
“Eco-friendly gift for dad”
Need products as gifts
Related to “dad”, “father”
Expand “eco-friendly” to
close related concepts
Weigh purchases/views
during special event
(Christmas, Father’s Day)*
@ Project Halo
“How do we feel the sense
of heat?”
Need sentences on feeling
Related to “heat/hot”
Expand “heat”, “sense” to
related concepts
Weigh on signal
transmission in neuron*
The Process of getting
something done
* Learned from past user activities
Abstract Concept
Concrete Instances
@ SemTechBiz
“Eco-friendly” (gift)
Mine related product
review sites and blogs
~=
Organic, Recycled, Solar, R
eclaimed, …
@ Project Halo
“Feeling” (heat)
Mine related biological
sites, books, tutorials
~=
Sense, Experience, Feel, Te
mperature Sensation, …
Build abstract concept, entity, instance
networks/graphs
Ranking Support
@ SemTechBiz2013
Products related to “Gift”
Recipes for “Sweet
Seafood”
Apps that are “Free, Pretty
and Fun”
@ Project Halo
Concepts related to “Feel”
Sentences on “Red
Producer”
Creatures that can be “both
a prey and a predator”
Scoring algorithm to return the
most relevant results
Modeling
@ SemTechBiz2013
“Flavor” model (Whisk)
“Special Occasion” learning
(BloomSearch)
“Cooking” process
(ingredients, portion, left-
over, purchase…)
@ Project Halo
“Function” model in AURA
“Neural signal
transmission”
“Mitosis” event
(steps, components, tempo
ral process, result…)
From Facts, Relations to
Casual and Deep Models
Crowd-sourcing
@ SemTechBiz2013
Use webmasters to
generate structured
markups
(Author, Category, Title, Pri
ce, Rating, …)
@ Project Halo
Use students to generate
metadata for
sentences, questions and
answers
(Relevance, UT, Type, Chapt
er, Exact/Various, …)
Crowd-Sourcing works, if it has a limited
quantity and can be done cheaply
Google provides other utility (incentives for SEO) to lure webmasters
Project Halo need figure out our game plan
Summary of Use of
(Big, Wild) Data
@SemTech
Parse vague user query into best
structured queries for databases
Understand user’s underlying
intent
Link concept entity to concrete
entities
Rank apps, products …
Deep, contextual models
(flavor, time and location…)
Use crowds directly for free
@ProjectHalo
Translate Find-A-Value and other
simple questions into complex IR
queries
Understand sentence’s purpose
Relate category/class to
instances
Rank answers, evidence…
Deep contextual models
(location, process, events…)
Need leverage crowd cheaply
Many Different Data
Sources and Techniques
One Thing in Common
What Can We Learn?

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Datascienceindia article
Datascienceindia articleDatascienceindia article
Datascienceindia articleHimanshuPise1
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceCaserta
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overviewColleen Farrelly
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
 
Total Data Industry Report
Total Data Industry ReportTotal Data Industry Report
Total Data Industry ReportRan Zhang
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Connected Data World
 
Dallas datascienceconference jasongeng-v3
Dallas datascienceconference jasongeng-v3Dallas datascienceconference jasongeng-v3
Dallas datascienceconference jasongeng-v3Haoran Du
 
“Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services” “Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services” diannepatricia
 
Data science team (new version)
Data science team (new version)Data science team (new version)
Data science team (new version)Omid Mogharian
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryDomino Data Lab
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataDomino Data Lab
 
Data science as a professional career
Data science as a professional careerData science as a professional career
Data science as a professional careerDavid Rostcheck
 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyPeter Kua
 
Loras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium KeynoteLoras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium KeynoteRich Clayton
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo UnstructuredCambridge Semantics
 
Stanford DeepDive Framework
Stanford DeepDive FrameworkStanford DeepDive Framework
Stanford DeepDive FrameworkRan Zhang
 

Was ist angesagt? (20)

Building up a Data Science Team from Scratch
Building up a Data Science Team from ScratchBuilding up a Data Science Team from Scratch
Building up a Data Science Team from Scratch
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Datascienceindia article
Datascienceindia articleDatascienceindia article
Datascienceindia article
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
Total Data Industry Report
Total Data Industry ReportTotal Data Industry Report
Total Data Industry Report
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
 
Dallas datascienceconference jasongeng-v3
Dallas datascienceconference jasongeng-v3Dallas datascienceconference jasongeng-v3
Dallas datascienceconference jasongeng-v3
 
“Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services” “Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services”
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Data science team (new version)
Data science team (new version)Data science team (new version)
Data science team (new version)
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive Industry
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked Data
 
Data science as a professional career
Data science as a professional careerData science as a professional career
Data science as a professional career
 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi Periasamy
 
Loras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium KeynoteLoras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium Keynote
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
Stanford DeepDive Framework
Stanford DeepDive FrameworkStanford DeepDive Framework
Stanford DeepDive Framework
 

Andere mochten auch

туранбакыт+люди+транспорт
туранбакыт+люди+транспорттуранбакыт+люди+транспорт
туранбакыт+люди+транспортБакыт Туран
 
Джентльменский набор гемов. Поддержка единого стиля кода. Доставка кода на се...
Джентльменский набор гемов. Поддержка единого стиля кода. Доставка кода на се...Джентльменский набор гемов. Поддержка единого стиля кода. Доставка кода на се...
Джентльменский набор гемов. Поддержка единого стиля кода. Доставка кода на се...Denis Evgrafov
 
Audio Assignment Client Pitch
Audio Assignment Client PitchAudio Assignment Client Pitch
Audio Assignment Client PitchChris Gavin
 
Lesson 2 Basicstructure
Lesson 2 BasicstructureLesson 2 Basicstructure
Lesson 2 BasicstructureRyan Chung
 
Brighton Ruby 2016 Recap
Brighton Ruby 2016 RecapBrighton Ruby 2016 Recap
Brighton Ruby 2016 RecapMatias Korhonen
 
Webinar slides: ClusterControl New Features Webinar
Webinar slides: ClusterControl New Features Webinar Webinar slides: ClusterControl New Features Webinar
Webinar slides: ClusterControl New Features Webinar Severalnines
 
Webinar slides: Managing MySQL Replication for High Availability
Webinar slides: Managing MySQL Replication for High AvailabilityWebinar slides: Managing MySQL Replication for High Availability
Webinar slides: Managing MySQL Replication for High AvailabilitySeveralnines
 
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNsTemporal Action Localization in Untrimmed Videos via Multi Stage CNNs
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNsUniversitat Politècnica de Catalunya
 
Marcapasos: Aspectos Prácticos
Marcapasos: Aspectos PrácticosMarcapasos: Aspectos Prácticos
Marcapasos: Aspectos PrácticosCardioTeca
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache SparkQuantUniversity
 
MNIST for ML beginners
MNIST for ML beginnersMNIST for ML beginners
MNIST for ML beginners홍배 김
 
Creative AI & multimodality: looking ahead
Creative AI & multimodality: looking aheadCreative AI & multimodality: looking ahead
Creative AI & multimodality: looking aheadRoelof Pieters
 

Andere mochten auch (15)

туранбакыт+люди+транспорт
туранбакыт+люди+транспорттуранбакыт+люди+транспорт
туранбакыт+люди+транспорт
 
Hour of Code
Hour of CodeHour of Code
Hour of Code
 
улпан 2
улпан 2улпан 2
улпан 2
 
Джентльменский набор гемов. Поддержка единого стиля кода. Доставка кода на се...
Джентльменский набор гемов. Поддержка единого стиля кода. Доставка кода на се...Джентльменский набор гемов. Поддержка единого стиля кода. Доставка кода на се...
Джентльменский набор гемов. Поддержка единого стиля кода. Доставка кода на се...
 
Audio Assignment Client Pitch
Audio Assignment Client PitchAudio Assignment Client Pitch
Audio Assignment Client Pitch
 
Lesson 2 Basicstructure
Lesson 2 BasicstructureLesson 2 Basicstructure
Lesson 2 Basicstructure
 
Brighton Ruby 2016 Recap
Brighton Ruby 2016 RecapBrighton Ruby 2016 Recap
Brighton Ruby 2016 Recap
 
Webinar slides: ClusterControl New Features Webinar
Webinar slides: ClusterControl New Features Webinar Webinar slides: ClusterControl New Features Webinar
Webinar slides: ClusterControl New Features Webinar
 
Webinar slides: Managing MySQL Replication for High Availability
Webinar slides: Managing MySQL Replication for High AvailabilityWebinar slides: Managing MySQL Replication for High Availability
Webinar slides: Managing MySQL Replication for High Availability
 
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNsTemporal Action Localization in Untrimmed Videos via Multi Stage CNNs
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs
 
Marcapasos: Aspectos Prácticos
Marcapasos: Aspectos PrácticosMarcapasos: Aspectos Prácticos
Marcapasos: Aspectos Prácticos
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache Spark
 
MNIST for ML beginners
MNIST for ML beginnersMNIST for ML beginners
MNIST for ML beginners
 
Creative AI & multimodality: looking ahead
Creative AI & multimodality: looking aheadCreative AI & multimodality: looking ahead
Creative AI & multimodality: looking ahead
 
svaneke
svanekesvaneke
svaneke
 

Ähnlich wie Smart datamining semtechbiz 2013 report

Social Web 2.0 Class Week 8: Social Metadata, Ratings, Social Tagging
Social Web 2.0 Class Week 8: Social Metadata, Ratings, Social TaggingSocial Web 2.0 Class Week 8: Social Metadata, Ratings, Social Tagging
Social Web 2.0 Class Week 8: Social Metadata, Ratings, Social TaggingShelly D. Farnham, Ph.D.
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic WaveKaniska Mandal
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
 
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorialThengo Kim
 
Recent Trends in Semantic Search Technologies
Recent Trends in Semantic Search TechnologiesRecent Trends in Semantic Search Technologies
Recent Trends in Semantic Search TechnologiesThanh Tran
 
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...e-ROSA
 
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Amit Sheth
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleAndy Petrella
 
Inforum 2007 Into The User environment
Inforum 2007 Into The User environmentInforum 2007 Into The User environment
Inforum 2007 Into The User environmentGuus van den Brekel
 
Intelligentcontent2009
Intelligentcontent2009Intelligentcontent2009
Intelligentcontent2009Salim Ismail
 
Academic SEO, or: How do I get my research to show up in search engines and d...
Academic SEO, or: How do I get my research to show up in search engines and d...Academic SEO, or: How do I get my research to show up in search engines and d...
Academic SEO, or: How do I get my research to show up in search engines and d...Open Knowledge Maps
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021Gérard Dupont
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Amit Sheth
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic SearchPaul Wlodarczyk
 

Ähnlich wie Smart datamining semtechbiz 2013 report (20)

Social Web 2.0 Class Week 8: Social Metadata, Ratings, Social Tagging
Social Web 2.0 Class Week 8: Social Metadata, Ratings, Social TaggingSocial Web 2.0 Class Week 8: Social Metadata, Ratings, Social Tagging
Social Web 2.0 Class Week 8: Social Metadata, Ratings, Social Tagging
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic Wave
 
OpenML data@Sheffield
OpenML data@SheffieldOpenML data@Sheffield
OpenML data@Sheffield
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
Libraries meet research 2.0
Libraries meet research 2.0Libraries meet research 2.0
Libraries meet research 2.0
 
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorial
 
Recent Trends in Semantic Search Technologies
Recent Trends in Semantic Search TechnologiesRecent Trends in Semantic Search Technologies
Recent Trends in Semantic Search Technologies
 
SLA Summer 2008
SLA Summer 2008SLA Summer 2008
SLA Summer 2008
 
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
 
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
Inforum 2007 Into The User environment
Inforum 2007 Into The User environmentInforum 2007 Into The User environment
Inforum 2007 Into The User environment
 
Intelligentcontent2009
Intelligentcontent2009Intelligentcontent2009
Intelligentcontent2009
 
Academic SEO, or: How do I get my research to show up in search engines and d...
Academic SEO, or: How do I get my research to show up in search engines and d...Academic SEO, or: How do I get my research to show up in search engines and d...
Academic SEO, or: How do I get my research to show up in search engines and d...
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
Gic2011 aula10-ingles
Gic2011 aula10-inglesGic2011 aula10-ingles
Gic2011 aula10-ingles
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
Technology Trends
Technology TrendsTechnology Trends
Technology Trends
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
 

Mehr von Jesse Wang

Agile lean workshop
Agile lean workshopAgile lean workshop
Agile lean workshopJesse Wang
 
Big data analytic platform
Big data analytic platformBig data analytic platform
Big data analytic platformJesse Wang
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commonsJesse Wang
 
Hybrid system architecture overview
Hybrid system architecture overviewHybrid system architecture overview
Hybrid system architecture overviewJesse Wang
 
Experiment on Knowledge Acquisition
Experiment on Knowledge AcquisitionExperiment on Knowledge Acquisition
Experiment on Knowledge AcquisitionJesse Wang
 
Chinese New Year
Chinese New Year Chinese New Year
Chinese New Year Jesse Wang
 
SemTech 2012 Talk semantify office
SemTech 2012 Talk  semantify officeSemTech 2012 Talk  semantify office
SemTech 2012 Talk semantify officeJesse Wang
 
Building SMWCon Spring 2012 Site
Building SMWCon Spring 2012 SiteBuilding SMWCon Spring 2012 Site
Building SMWCon Spring 2012 SiteJesse Wang
 
SMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev UpdateSMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev UpdateJesse Wang
 
SMWCon Spring 2012 Welcome Remarks
SMWCon Spring 2012 Welcome RemarksSMWCon Spring 2012 Welcome Remarks
SMWCon Spring 2012 Welcome RemarksJesse Wang
 
Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)Jesse Wang
 
Msra talk smw+apps
Msra talk smw+appsMsra talk smw+apps
Msra talk smw+appsJesse Wang
 
Jist tutorial semantic wikis and applications
Jist tutorial   semantic wikis and applicationsJist tutorial   semantic wikis and applications
Jist tutorial semantic wikis and applicationsJesse Wang
 
Semantic Wiki Page Maker
Semantic Wiki Page MakerSemantic Wiki Page Maker
Semantic Wiki Page MakerJesse Wang
 
Facets of applied smw
Facets of applied smwFacets of applied smw
Facets of applied smwJesse Wang
 
Smwcon widget editor - first preview
Smwcon widget editor - first previewSmwcon widget editor - first preview
Smwcon widget editor - first previewJesse Wang
 
Microsoft Office Connector Update at SMWCon Spring 2011
Microsoft Office Connector Update at SMWCon Spring 2011Microsoft Office Connector Update at SMWCon Spring 2011
Microsoft Office Connector Update at SMWCon Spring 2011Jesse Wang
 
Smwcon spring2011 tutorial applied semantic mediawiki
Smwcon spring2011 tutorial applied semantic mediawikiSmwcon spring2011 tutorial applied semantic mediawiki
Smwcon spring2011 tutorial applied semantic mediawikiJesse Wang
 
Semantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in ActionSemantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in ActionJesse Wang
 
Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action: Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action: Jesse Wang
 

Mehr von Jesse Wang (20)

Agile lean workshop
Agile lean workshopAgile lean workshop
Agile lean workshop
 
Big data analytic platform
Big data analytic platformBig data analytic platform
Big data analytic platform
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Hybrid system architecture overview
Hybrid system architecture overviewHybrid system architecture overview
Hybrid system architecture overview
 
Experiment on Knowledge Acquisition
Experiment on Knowledge AcquisitionExperiment on Knowledge Acquisition
Experiment on Knowledge Acquisition
 
Chinese New Year
Chinese New Year Chinese New Year
Chinese New Year
 
SemTech 2012 Talk semantify office
SemTech 2012 Talk  semantify officeSemTech 2012 Talk  semantify office
SemTech 2012 Talk semantify office
 
Building SMWCon Spring 2012 Site
Building SMWCon Spring 2012 SiteBuilding SMWCon Spring 2012 Site
Building SMWCon Spring 2012 Site
 
SMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev UpdateSMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev Update
 
SMWCon Spring 2012 Welcome Remarks
SMWCon Spring 2012 Welcome RemarksSMWCon Spring 2012 Welcome Remarks
SMWCon Spring 2012 Welcome Remarks
 
Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)
 
Msra talk smw+apps
Msra talk smw+appsMsra talk smw+apps
Msra talk smw+apps
 
Jist tutorial semantic wikis and applications
Jist tutorial   semantic wikis and applicationsJist tutorial   semantic wikis and applications
Jist tutorial semantic wikis and applications
 
Semantic Wiki Page Maker
Semantic Wiki Page MakerSemantic Wiki Page Maker
Semantic Wiki Page Maker
 
Facets of applied smw
Facets of applied smwFacets of applied smw
Facets of applied smw
 
Smwcon widget editor - first preview
Smwcon widget editor - first previewSmwcon widget editor - first preview
Smwcon widget editor - first preview
 
Microsoft Office Connector Update at SMWCon Spring 2011
Microsoft Office Connector Update at SMWCon Spring 2011Microsoft Office Connector Update at SMWCon Spring 2011
Microsoft Office Connector Update at SMWCon Spring 2011
 
Smwcon spring2011 tutorial applied semantic mediawiki
Smwcon spring2011 tutorial applied semantic mediawikiSmwcon spring2011 tutorial applied semantic mediawiki
Smwcon spring2011 tutorial applied semantic mediawiki
 
Semantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in ActionSemantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in Action
 
Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action: Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action:
 

Kürzlich hochgeladen

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Kürzlich hochgeladen (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Smart datamining semtechbiz 2013 report

  • 1. Utilizing Crowd-sourced Data for Knowledge Extraction A Themed Report of SemTechBiz San Francisco 2013.06
  • 2. Summary People found great use of external data to help extract knowledge, build models These valuable data are generated by crowds but harvested by mining algorithms and/or UI tools LOD to enrich attributes and synonyms (WalmartLabs), NLP on recipes and build deep models (Whisk.com) Webmaster tools to markup content (Google)
  • 3. SemTechBiz SF 13 SemTechBiz2013 in San Francisco is still the largest in the world on semantic web related technologies With many new comers from various industries An indicator of the technologies entering prime time Has up to 7 parallel talks – broad coverage and interests Now a 2nd tier conference in my humble opinion Diluted to 3 times/locations: US West + US East + EU / year Attendees: 1200 in 2011, 800 in 2012, 600 in 2013 Now missing elite researchers and/or top executives More practical, real-world, business, startups, less academic
  • 4. Context and Scope This is a themed report on building knowledge-base and/or semantic models The theme title is decided post-conference due to the obvious similarity among all relevant presentations
  • 6. @WalmartLabs • Color search and presentation: WordNet! “Red Shirt” • Intent? Linked Data can help, on related products too. “Green Lantern” • DVD or Halloween costume? Time/news is thy friend. “Dark Knight”
  • 7. External Data by @WalmartLabs Vast amount of external data sets: WordNet, Dbpedia, LOD cloud, Twitter stream, third-party prices (crawled), product descriptions, user click streams (web logs)…
  • 8.
  • 10. TipSense Technologies A platform for pulling statistically significant knowledge from unstructured semantic data sets Transforming vast amounts of unstructured and semi-structured content into a fully annotated conceptual model. Conceptual entity recognition Contextualized content fingerprinting Concepts/topic model, sentiment analysis
  • 11.
  • 12. Whisk.com Keynote: Understanding Recipes UK startup Whisk.com @nickholzherr on collecting recipe ingredients, enriching with semantics, recommending dishes and help ordering from stores. Wrapper induction, NLP for data collection Coping with missing info, noises, vague data Model flavor profiles, portion changing Challenges and opportunities Leftovers, geo-data, local shopping, coupons…
  • 13.
  • 15. Understanding Intents Entity, Relationship Mining Built database of millions of concepts Shallow ontology modeling via entity and attribute extraction/mining Rich semantics (units, colors, patterns, cities…) Concept propagation (tagging by training on user weblogs)
  • 19. Structured Data Markup Not something entirely new: Rich Snippet We experimented it 2 years ago (extension of Semantic Job Search proposal) Supporting more types now An ecosystem no one afford to lose Google leveraged the SEO utility to gain more structured data (free labor)
  • 20. Others Gannett (News) Use a combination of auto-tagging and rules to match news articles with an evolving taxonomy (low-tech, but works ) ISS (Intelligent Software Solutions) Complex Event Processing (in “expressive” language) Fuzzy matching with patterns with Bayesian Networks Semantic Search and Automatic question answering Google now answers (factoid questions) E.g. “What did Steve Jobs die?”, “What is the height of Mt. Everest”, “Who is the CEO of Apple?”
  • 21. Closely Related to Knowledge Acquisition Similar Underlying Use Cases, Datasets and Technologies
  • 22. Query Interpretation @SemTechBiz “Red Shirt” Shirt (Red) Red ~= Crimson, scarlet, ruby, cher ry, rose, … T-shirt a Shirt? @ProjectHalo “Dead Duck” Bird (dead) Dead ~= not alive, gone, expired, killed, … Beijing Duck a Duck? Build structured queries from natural languages Disambiguation Query expansion
  • 23. Intent & Process @ SemTechBiz “Eco-friendly gift for dad” Need products as gifts Related to “dad”, “father” Expand “eco-friendly” to close related concepts Weigh purchases/views during special event (Christmas, Father’s Day)* @ Project Halo “How do we feel the sense of heat?” Need sentences on feeling Related to “heat/hot” Expand “heat”, “sense” to related concepts Weigh on signal transmission in neuron* The Process of getting something done * Learned from past user activities
  • 24. Abstract Concept Concrete Instances @ SemTechBiz “Eco-friendly” (gift) Mine related product review sites and blogs ~= Organic, Recycled, Solar, R eclaimed, … @ Project Halo “Feeling” (heat) Mine related biological sites, books, tutorials ~= Sense, Experience, Feel, Te mperature Sensation, … Build abstract concept, entity, instance networks/graphs
  • 25. Ranking Support @ SemTechBiz2013 Products related to “Gift” Recipes for “Sweet Seafood” Apps that are “Free, Pretty and Fun” @ Project Halo Concepts related to “Feel” Sentences on “Red Producer” Creatures that can be “both a prey and a predator” Scoring algorithm to return the most relevant results
  • 26. Modeling @ SemTechBiz2013 “Flavor” model (Whisk) “Special Occasion” learning (BloomSearch) “Cooking” process (ingredients, portion, left- over, purchase…) @ Project Halo “Function” model in AURA “Neural signal transmission” “Mitosis” event (steps, components, tempo ral process, result…) From Facts, Relations to Casual and Deep Models
  • 27. Crowd-sourcing @ SemTechBiz2013 Use webmasters to generate structured markups (Author, Category, Title, Pri ce, Rating, …) @ Project Halo Use students to generate metadata for sentences, questions and answers (Relevance, UT, Type, Chapt er, Exact/Various, …) Crowd-Sourcing works, if it has a limited quantity and can be done cheaply Google provides other utility (incentives for SEO) to lure webmasters Project Halo need figure out our game plan
  • 28. Summary of Use of (Big, Wild) Data @SemTech Parse vague user query into best structured queries for databases Understand user’s underlying intent Link concept entity to concrete entities Rank apps, products … Deep, contextual models (flavor, time and location…) Use crowds directly for free @ProjectHalo Translate Find-A-Value and other simple questions into complex IR queries Understand sentence’s purpose Relate category/class to instances Rank answers, evidence… Deep contextual models (location, process, events…) Need leverage crowd cheaply
  • 29. Many Different Data Sources and Techniques
  • 30. One Thing in Common
  • 31. What Can We Learn?