SlideShare a Scribd company logo
1 of 41
Download to read offline
Audible Tech Talk
     23. April 2012




      Andraz Tori
    andraz@zemanta.com
          @andraz
Today's plan
• Short story of Zemanta
• The Zemanta technology
Where am I right now?
Wonders of modern
 communication
Ljubljana
Strip mine
• A system for Slovenian National television in 2006
• Closed captioning → web page for each episode of
each show
• Natural Langauge Processing, Information
Retrieval...
Start-up? Why not?

      v
Tour de Slovénie
Sales
Seedcamp

• First European program inspired by YC (2007)
• London based
• 3 months, 50.000 EUR / 10%
Roller coaster
12. August          Deadline
20. August          Shortlist
23. August          Phone interview
24. August          Results

3. September        London week start
7. September        London week end
16. September ==>   London
3 months in London
Back to Ljubljana
Back to Ljubljana
And then ...

• Figuring out US is our target market
• Figuring out where in US to be and who to have here
• Partnerships
• And naturally the business model
Technology
What do we do?
• Zemanta – Personal Writing Assistant
     - on your current platform
• While bloggers write we suggest:
     - images
     - related articles
     - in-text links
     - tags
Some stats

• 80k bloggers monthly
• 1.3 million posts enhanced in 2011
How does it work
• Natural Language Processing
• Big database of “meanings” (entities, concepts, topics)
• Word Sense Disambiguation
 • Linking out to Wikipedia, Freebase, …
 • Categorization, Named Entity Recognition


• Information Retrieval
 • Solr based, using features from NLP
 • With some twists
Indexed content



                                            Content
                                            suggestions
Plain text                 Semantic
 (article)   Analysis
                            search




             Background
             knowledge
“Text Understanding”
- Input is meaningful chunk of text (not a keyword or a
phrase)
- Input is (semi) English language
- Has to work across all domains in the open world
- music, celebrities, finance, entertainment, politics,
gardening, parenting, …
Indexed content



                                            Content
                                            suggestions
Plain text                 Semantic
 (article)   Analysis
                            search




             Background
             knowledge
Background knowledge
- Data from Wikipedia, MusicBrainz, Freebase… and the
  world wild web
- Includes linguistical and semantical properties and
  unstructured data
- Present in two forms:
  - in “original” custom built triple store on top of MySQL
    (150 GB)
  - processed into 7 GB optimized “memory mapped
    dump”
Analysis pipeline
                                    Known phrases
Named Entity
                                      extraction
 Extraction
                                    (aho-corasick)

                                                     Triple store
      Surface form features evaluation

          Statistical comparison to
           background knowledge


               Semantic coherence
                 and hand-tuned
                    heuristics


                                                         etc.

         Disambiguated entities
Indexed content



                                            Content
                                            suggestions
Plain text                 Semantic
 (article)   Analysis
                            search




             Background
             knowledge
Connecting content
• Indexing blogosphere and mediasphere
• Solr based index
 • Twist: complicated queries – 50 terms
• Filtering out spam is “fun”
• Probably best “related content” in terms of accuracy
• Coming soon: social signal
But why just for bloggers?

 Let's open up the API!
Some API users
Back to reality.
Age of “smart”
Blog me up, Scotty!
      23. April 2012
Some takeaways
• Accelerators are good
• World is getting flatter
          But it will never be flat
• Start monetizing soon – to learn, not to earn
• Be where your market is
• Many markets left to innovate in
Thank you!

More Related Content

Similar to Zemanta Tech Talk at Audible

Learning Emergent Knowledge from Blog Postings
Learning Emergent Knowledge from Blog PostingsLearning Emergent Knowledge from Blog Postings
Learning Emergent Knowledge from Blog PostingsSaltlux Inc.
 
Introduction
IntroductionIntroduction
Introductionsriniefs
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Peter Mika
 
Statistical Entity Linking
Statistical Entity LinkingStatistical Entity Linking
Statistical Entity LinkingPyDataParis
 
Semantic engagement
Semantic engagementSemantic engagement
Semantic engagementSTIinnsbruck
 
Quality, Quantity, Web and Semantics
Quality, Quantity, Web and SemanticsQuality, Quantity, Web and Semantics
Quality, Quantity, Web and SemanticsZemanta
 
Quality, quantity, web and semantics
Quality, quantity, web and semanticsQuality, quantity, web and semantics
Quality, quantity, web and semanticsAndraz Tori
 
Search, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSearch, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSeth Grimes
 
Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Heimo Hänninen
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialBarbara Starr
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20Tibor Lipusz
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overviewAmit Sheth
 
Adding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to DeliveryAdding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to DeliveryOntotext
 
Semantics and the Humanities: some lessons from my journey 2000-2012
Semantics and the Humanities: some lessons from my journey 2000-2012Semantics and the Humanities: some lessons from my journey 2000-2012
Semantics and the Humanities: some lessons from my journey 2000-2012Guus Schreiber
 
2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solr2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solrLucidworks (Archived)
 
Semantic engagement handouts
Semantic engagement handoutsSemantic engagement handouts
Semantic engagement handoutsSTIinnsbruck
 
How the Semantic Web is transforming information access
How the Semantic Web is transforming information accessHow the Semantic Web is transforming information access
How the Semantic Web is transforming information accessGuus Schreiber
 
Knowledge Management inside Alfresco
Knowledge Management inside AlfrescoKnowledge Management inside Alfresco
Knowledge Management inside AlfrescoXeniT Solutions nv
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search SolutionsFindwise
 

Similar to Zemanta Tech Talk at Audible (20)

Learning Emergent Knowledge from Blog Postings
Learning Emergent Knowledge from Blog PostingsLearning Emergent Knowledge from Blog Postings
Learning Emergent Knowledge from Blog Postings
 
Introduction
IntroductionIntroduction
Introduction
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012
 
Statistical Entity Linking
Statistical Entity LinkingStatistical Entity Linking
Statistical Entity Linking
 
Semantic engagement
Semantic engagementSemantic engagement
Semantic engagement
 
Quality, Quantity, Web and Semantics
Quality, Quantity, Web and SemanticsQuality, Quantity, Web and Semantics
Quality, Quantity, Web and Semantics
 
Quality, quantity, web and semantics
Quality, quantity, web and semanticsQuality, quantity, web and semantics
Quality, quantity, web and semantics
 
Search, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSearch, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled Vision
 
Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
 
Adding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to DeliveryAdding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to Delivery
 
Semantics and the Humanities: some lessons from my journey 2000-2012
Semantics and the Humanities: some lessons from my journey 2000-2012Semantics and the Humanities: some lessons from my journey 2000-2012
Semantics and the Humanities: some lessons from my journey 2000-2012
 
2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solr2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solr
 
Semantic engagement handouts
Semantic engagement handoutsSemantic engagement handouts
Semantic engagement handouts
 
How the Semantic Web is transforming information access
How the Semantic Web is transforming information accessHow the Semantic Web is transforming information access
How the Semantic Web is transforming information access
 
Knowledge Management inside Alfresco
Knowledge Management inside AlfrescoKnowledge Management inside Alfresco
Knowledge Management inside Alfresco
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search Solutions
 

More from Andraz Tori

Ljubljana je Zakon 2013
Ljubljana je Zakon 2013Ljubljana je Zakon 2013
Ljubljana je Zakon 2013Andraz Tori
 
Triple your blog post frequency
Triple your blog post frequencyTriple your blog post frequency
Triple your blog post frequencyAndraz Tori
 
Future of content cration
Future of content crationFuture of content cration
Future of content crationAndraz Tori
 
Augmenting Content
Augmenting ContentAugmenting Content
Augmenting ContentAndraz Tori
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!Andraz Tori
 
#LjubljanaJeZakon
#LjubljanaJeZakon#LjubljanaJeZakon
#LjubljanaJeZakonAndraz Tori
 
Semantic web user interfaces - Do they have to be ugly?
Semantic web user interfaces - Do they have to be ugly?Semantic web user interfaces - Do they have to be ugly?
Semantic web user interfaces - Do they have to be ugly?Andraz Tori
 
SemWeb install-fest presentation
SemWeb install-fest presentationSemWeb install-fest presentation
SemWeb install-fest presentationAndraz Tori
 
Beyond who else bought what
Beyond who else bought whatBeyond who else bought what
Beyond who else bought whatAndraz Tori
 

More from Andraz Tori (9)

Ljubljana je Zakon 2013
Ljubljana je Zakon 2013Ljubljana je Zakon 2013
Ljubljana je Zakon 2013
 
Triple your blog post frequency
Triple your blog post frequencyTriple your blog post frequency
Triple your blog post frequency
 
Future of content cration
Future of content crationFuture of content cration
Future of content cration
 
Augmenting Content
Augmenting ContentAugmenting Content
Augmenting Content
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
 
#LjubljanaJeZakon
#LjubljanaJeZakon#LjubljanaJeZakon
#LjubljanaJeZakon
 
Semantic web user interfaces - Do they have to be ugly?
Semantic web user interfaces - Do they have to be ugly?Semantic web user interfaces - Do they have to be ugly?
Semantic web user interfaces - Do they have to be ugly?
 
SemWeb install-fest presentation
SemWeb install-fest presentationSemWeb install-fest presentation
SemWeb install-fest presentation
 
Beyond who else bought what
Beyond who else bought whatBeyond who else bought what
Beyond who else bought what
 

Recently uploaded

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 

Recently uploaded (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 

Zemanta Tech Talk at Audible

  • 1. Audible Tech Talk 23. April 2012 Andraz Tori andraz@zemanta.com @andraz
  • 2. Today's plan • Short story of Zemanta • The Zemanta technology
  • 3. Where am I right now?
  • 4. Wonders of modern communication
  • 6. Strip mine • A system for Slovenian National television in 2006 • Closed captioning → web page for each episode of each show • Natural Langauge Processing, Information Retrieval...
  • 10.
  • 11. Seedcamp • First European program inspired by YC (2007) • London based • 3 months, 50.000 EUR / 10%
  • 12.
  • 13. Roller coaster 12. August Deadline 20. August Shortlist 23. August Phone interview 24. August Results 3. September London week start 7. September London week end 16. September ==> London
  • 14. 3 months in London
  • 15.
  • 16.
  • 19.
  • 20. And then ... • Figuring out US is our target market • Figuring out where in US to be and who to have here • Partnerships • And naturally the business model
  • 22. What do we do? • Zemanta – Personal Writing Assistant - on your current platform • While bloggers write we suggest: - images - related articles - in-text links - tags
  • 23.
  • 24.
  • 25.
  • 26. Some stats • 80k bloggers monthly • 1.3 million posts enhanced in 2011
  • 27. How does it work • Natural Language Processing • Big database of “meanings” (entities, concepts, topics) • Word Sense Disambiguation • Linking out to Wikipedia, Freebase, … • Categorization, Named Entity Recognition • Information Retrieval • Solr based, using features from NLP • With some twists
  • 28. Indexed content Content suggestions Plain text Semantic (article) Analysis search Background knowledge
  • 29. “Text Understanding” - Input is meaningful chunk of text (not a keyword or a phrase) - Input is (semi) English language - Has to work across all domains in the open world - music, celebrities, finance, entertainment, politics, gardening, parenting, …
  • 30. Indexed content Content suggestions Plain text Semantic (article) Analysis search Background knowledge
  • 31. Background knowledge - Data from Wikipedia, MusicBrainz, Freebase… and the world wild web - Includes linguistical and semantical properties and unstructured data - Present in two forms: - in “original” custom built triple store on top of MySQL (150 GB) - processed into 7 GB optimized “memory mapped dump”
  • 32. Analysis pipeline Known phrases Named Entity extraction Extraction (aho-corasick) Triple store Surface form features evaluation Statistical comparison to background knowledge Semantic coherence and hand-tuned heuristics etc. Disambiguated entities
  • 33. Indexed content Content suggestions Plain text Semantic (article) Analysis search Background knowledge
  • 34. Connecting content • Indexing blogosphere and mediasphere • Solr based index • Twist: complicated queries – 50 terms • Filtering out spam is “fun” • Probably best “related content” in terms of accuracy • Coming soon: social signal
  • 35. But why just for bloggers? Let's open up the API!
  • 39. Blog me up, Scotty! 23. April 2012
  • 40. Some takeaways • Accelerators are good • World is getting flatter But it will never be flat • Start monetizing soon – to learn, not to earn • Be where your market is • Many markets left to innovate in