SlideShare a Scribd company logo
1 of 38
Building Better Search for Wikipedia:
          How We Did It Using Amazon
                  CloudSearch


                                                                       July 26, 2012


© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Speakers




    Paul Nelson                                            Michael Bohlig                                                       Jon Handler
        CTO                                               Marketing Manager                                                  Solutions Architect
Search Technologies                                      Amazon CloudSearch                                                 Amazon CloudSearch
    © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Housekeeping items
!   Polling questions
!   Q&A will be at the end
!   Recording and slides will be distributed and posted
     (Slideshare & YouTube)




© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Agenda

!        Amazon CloudSearch Overview
!        Data Acquisition – Getting the Files from Wikipedia
!        Data Processing – Clean-up and Preparation
!        Indexing
!        Queries and Relevancy Ranking
!        Building the UI
!        Final Results & Recommendations
!        Q&A

    © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon CloudSearch
!        Fully-managed, full-featured search service
!        Automatically scales for data & traffic
!        Handles both structured and unstructured data
!        Near real-time indexing
!        Up and running in less than 1 hour




    © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Polling Question #1


What Are You Using For Search Today?




© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Introduction

   SEARCHING WIKIPEDIA

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Why Wikipedia?
!   It’s awesome
!   Default Wikipedia search is pretty bad &
    everyone knows it
!   It’s publicly available data
!   It’s awesome




© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Why CloudSearch for Wikipedia?
!   It’s awesome
!   A great choice for a public search engine –
    it lives in the internet
!   First version up & running quickly
!   Automatically scales to required query volume
!   Rank expressions work great for Wikipedia relevancy
!   Easy Search Domain Creation = Easy system iteration


© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Let’s try it!


        http://wikipedia.searchtechnologies.com




© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Getting the Files from Wikipedia

   DATA ACQUISITION

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Wikipedia Dump Files


       http://dumps.wikimedia.org/enwiki/latest/

!   Desired files have the pattern:
     enwiki-latest-pages-articles#.xml-*.bz2



© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Our Solution
                                                                            Wikipedia
                                                                            dump files



                                                                                                                Content Processing Framework

    Fetch                            Identify Article                             Open                          File                               Send to
Files Listing                        Files to Fetch                              Stream                      Processing                          CloudSearch



                                                                                                                                                 Amazon
                                                                                                                                               CloudSearch

 © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Content Processing Framework Advantages
!   Process multiple files simultaneously
!   Fully Streaming
        •  Files are never downloaded to local disk
        •  From Wikipedia à Streaming Processor à CloudSearch
!   Very Fast (450 documents per second, end-to-end)
!   Integrated Connectors / Web Crawlers
        •  SharePoint, Documentum, Web Sites, RDBMS, RightNow,
           Confluence, Salesforce.com, etc.
!   Text extraction (from PDF, Office Docs, etc.)
        •  Using Apache Tika
!   Entity Extraction
        •  Names, places, companies, dates, phone numbers, zip codes, etc.
 © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Polling Question #2


Where is your data stored?




© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Preparing the Data for Search

   DATA PROCESSING

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
What Do Wikipedia Files Look Like?


                                         Sample Wikipedia Data




© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Data Processing: Basic Requirements
!   Decompression: BZip2 à UTF-8
!   Process each page as a separate CloudSearch
    document
       •  Multiple pages specified in a single XML file
!   Skip #REDIRECT pages
!   Compute document statistics
       •  Necessary for relevancy ranking
       •  Includes: Content size, title size, number of outbound links
       •  (FUTURE: Number of inbound links)
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Data Processing: Advanced Feature Support
!        Extract Categories
!        Extract Author (IP address or author name)
!        Extract Update Date
!        Extract Document Type
           •  Wikipedia “name space” based on title prefix
!   Determine Disambiguation Pages
           •  Based on certain Wikipedia {{templates}}
           •  Template whitelist and blacklist
!   Produce Static Teaser
                                                            Before                                          After
    © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Sending Documents to CloudSearch

   INDEXING

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
CloudSearch: Document ID
!   @id = uniquely identifies every document in the index
       •  Must be made up of letters and digits (no spaces or punctuation)



 <batch>
   <add lang="en" version="5438086" id="wikipedia930503">
       . . . FIELDS GO HERE . . .
   </add>
 </batch>




© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
CloudSearch: Document Version
!   @version = identifies most recent document
       •  Integer number, must always increase
       •  Updates or deletes to same doc ID must have larger @version
       •  My Formula: (System.currentTimeMillis() - 1325394000000)/1000
!   Why does it exist?
       •  So that multiple processes can submit updates simultaneously
       •  Updates processed quickly are not overwritten by older updates
          processed slowly
 <batch>
   <add lang="en" version="5438086" id="wikipedia930503">
       . . . FIELDS GO HERE . . .
   </add>
 </batch>
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
CloudSearch Indexing Details
!        Form Fields into CloudSearch SDF
!        Submit in batches to CloudSearch
!        Multiple open connections to CloudSearch
!        Co-locate indexer on EC2 instance in same zone as
         CloudSearch
           •  Several times better performance




    © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
CloudSearch SDF for Indexing
<batch>
  <add lang="en" version="5438086" id="wikipedia930503">
    <field name="title">Terran Trade Authority</field>
    <field name="title_size">22</field>
    <field name="content">
The 'Terran Trade Authority' is a science-fiction setting originally presented in a collection of four
large illustrated science…
    </field>
    <field name="content_size">893</field>
    <field name="teaser"> The 'Terran Trade Authority' is a science-fiction setting originally presented
in a collection of four large illustrated science fiction books published between 1978 and…
    </field>
    <field name="url">http://en.wikipedia.org/wiki/Terran_Trade_Authority</field>
    <field name="type">Article</field>
    <field name="f_type">Article</field>
    <field name="year">2012</field>
    <field name="f_year">2012</field>
    <field name="year_month">2012/01</field>
    <field name="f_year_month">2012/01</field>
    <field name="categories">Science fiction book series</field>
    <field name="f_categories">Science fiction book series</field>
    <field name="author">76.173.50.22</field>
    <field name="f_author">76.173.50.22</field>
  </add>
</batch>
    © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
XHTML	
  Page

                 Process	
  Latest	
  Listing	
  Pipeline

                             Fetch                                    Extract	
  
  Start              dumps.wikimedia.org/                              URLS
                        enwiki/latest/                            (Groovy	
  Script)


                                                                                         27	
  URLs	
  to	
  27	
  Dump	
  
                                                                                                   Files

                                                                                             Process	
  File	
  Pipeline


                                                                 Open	
  Stream                    BZip2                              XML	
  Sub	
  Job	
  
                                                                    URL                         Decompress                             Extractor


                                                                               Compressed	
                        Decompressed	
  
                                                                               data	
  stream                         stream                                      Single	
  <page>	
  XML	
  
                                                                                                                                                                    plus	
  Metadata


                                                                                                                                                   Process	
  Page	
  Pipeline
                                                                                                                                      Extract	
  Metadata	
  

End-to-End Indexing                                                                                                                     and	
  Cleanse	
  
                                                                                                                                           Content
                                                                                                                                       (Groovy	
  Script)
                                                                                                                                                                              Post	
  XML
                                                                                                                                                                                                  Amazon
                                                                                                                                                                                                CloudSearch

Dataflow                                                                                                                                           Cleansed	
  
                                                                                                                                                                                  XSL
                                                                                                                                                   Metadata
                                                                                                                                                                               Transform
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Providing good search results

   QUERIES AND RELEVANCY

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Recommendation: Debug Interface
!   Useful tool for testing CloudSearch query behavior




                                                  Sample Debug Interface




© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Queries for Wikipedia
!   Uses simple “q” parameter for user query string
!   Selecting facets uses “bq” parameter
       •      For filtering a facet value: bq=(field name ‘value’)
       •      For excluding a facet value: bq=(not name: ‘value’)
       •      Can handle AND & OR
       •      Don’t forget to escape single-quotes




© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Relevancy Ranking
!   In CloudSearch, this is done with Rank Expressions
       •  Affect relevancy using document-quality data, such as:
                 •    Document Statistics
                 •    Ratings
                 •    Link Counting
                 •    Editorial Comments
                 •    Popularity
!   Expressions are very flexible
       •  All types of mathematical functions available

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Relevancy Ranking for Wikipedia
                                            	
   content	
                                            title	
                             text	
  
                                                   size	
             clog	
       cboost	
           size	
       tlog	
   tboost	
   relevance	
                          FINAL	
  
       Germany	
                                      65253	
        4.815	
         192.58	
               7	
   0.845	
   -­‐12.676	
      572	
                          751.90	
  
       Outline	
  of	
  Germany	
                     14238	
        4.153	
         166.14	
             18	
   1.255	
   -­‐18.829	
       601	
                          748.30	
  
       History	
  of	
  Germany	
                     74750	
        4.874	
         194.94	
              30	
   1.477	
         -­‐22.157	
                  574	
        746.78	
  
       British	
  Army	
  Germany	
                    2201	
        3.343	
         133.70	
              37	
   1.568	
         -­‐23.523	
                 589	
         699.18	
  
       rugby	
  union	
  team	
  
       New	
  Germany	
                                   337	
      2.528	
         101.11	
              11	
   1.041	
         -­‐15.621	
                 598	
         683.48	
  
       Embassy	
  of	
  Germany	
                         516	
      2.713	
         108.51	
              28	
   1.447	
         -­‐21.707	
                 596	
         682.79	
  
       in	
  Moscow	
  
	
  
       RANK_EXPRESSION =
          text_relevance + log10(content_size)*40.0 - log10(title_size)*15.0

         © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Relevancy Ranking for Wikipedia:
De-Weighting “Wikipedia:” Types
!   “Wikipedia:” docs not of general interest
       •  About the running and managing of Wikipedia
!   Often very large
       •  Skews the statistics

RANK_EXPRESSION (adjusted) =
   text_relevance
   + log10(content_size) * ( doc_boost == 1 ? 25.0:40.0 )
   - log10(title_size)*15.0
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Adding the Sizzle

   BUILDING THE USER INTERFACE

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Wikipedia Search UI Architecture
                                                      Tomcat

                                                      Twigkit

                                      CloudSearch Platform

                                     CloudSearch Java API
                                                                                                                                                 CloudSearch




© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
UI Architecture
!   Tomcat
       •  Java application container
! Twigkit
       •  Graphical user interface templates
       •  Handles navigators, controller events, presentation
!   CloudSearch Platform
       •  API Translation Interface between Twigkit and CloudSearch API
!   CloudSearch Java API
       •  Manages all communcations to/from CloudSearch
       •  Parameter construction / results parsing
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Let’s Wrap It Up!

   SUMMARY

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Summary – Problems & Solutions
!   Problem: Data Acquisition
       •  Solution: Content Processing Framework (Aspire)
!   Problem: Data Processing
       •  Solution: Content Processing Framework (Aspire)
!   Problem: Indexing
       •  Solution: CloudSearch SDF – Very easy to work with
!   Problem: Query
       •  Solution: CloudSearch Query Parameters & Rank Expressions
!   Problem: User Interface
       •  Solution: New CloudSearch Platform for Twigkit
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Q&A

                                                Enter questions on your
                                                        screen


© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Thank You


                                        For More Information:
                                                  http://aws.amazon.com/cloudsearch/

                    http://www.searchtechnologies.com/wikipedia-cloudsearch.html




© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

More Related Content

What's hot

re:Invent re:Cap - An overview of Artificial Intelligence and Machine Learnin...
re:Invent re:Cap - An overview of Artificial Intelligence and Machine Learnin...re:Invent re:Cap - An overview of Artificial Intelligence and Machine Learnin...
re:Invent re:Cap - An overview of Artificial Intelligence and Machine Learnin...Adrian Hornsby
 
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot
 
Introduction to Artificial Intelligence on AWS
Introduction to Artificial Intelligence on AWSIntroduction to Artificial Intelligence on AWS
Introduction to Artificial Intelligence on AWSAmazon Web Services
 
Demystifying Machine Learning On AWS - AWS Summit Sydney 2018
Demystifying Machine Learning On AWS - AWS Summit Sydney 2018Demystifying Machine Learning On AWS - AWS Summit Sydney 2018
Demystifying Machine Learning On AWS - AWS Summit Sydney 2018Amazon Web Services
 
Introduction to Artificial Intelligence (AI) at Amazon
Introduction to Artificial Intelligence (AI) at Amazon Introduction to Artificial Intelligence (AI) at Amazon
Introduction to Artificial Intelligence (AI) at Amazon Amanda Mackay (she/her)
 
Automate for Efficiency with Amazon Transcribe & Amazon Translate: Machine Le...
Automate for Efficiency with Amazon Transcribe & Amazon Translate: Machine Le...Automate for Efficiency with Amazon Transcribe & Amazon Translate: Machine Le...
Automate for Efficiency with Amazon Transcribe & Amazon Translate: Machine Le...Amazon Web Services
 
Add Intelligence to Applications with AWS ML: Machine Learning Workshops SF
Add Intelligence to Applications with AWS ML: Machine Learning Workshops SFAdd Intelligence to Applications with AWS ML: Machine Learning Workshops SF
Add Intelligence to Applications with AWS ML: Machine Learning Workshops SFAmazon Web Services
 
Workshop Build an Image-Based Automatic Alert System with Amazon Rekognition:...
Workshop Build an Image-Based Automatic Alert System with Amazon Rekognition:...Workshop Build an Image-Based Automatic Alert System with Amazon Rekognition:...
Workshop Build an Image-Based Automatic Alert System with Amazon Rekognition:...Amazon Web Services
 
AI & Deep Learning At Amazon - April 2017 AWS Online Tech Talks
AI & Deep Learning At Amazon - April 2017 AWS Online Tech TalksAI & Deep Learning At Amazon - April 2017 AWS Online Tech Talks
AI & Deep Learning At Amazon - April 2017 AWS Online Tech TalksAmazon Web Services
 
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch ServiceAmazon Web Services Japan
 
AI and Innovations on AWS
AI and Innovations on AWSAI and Innovations on AWS
AI and Innovations on AWSAdrian Hornsby
 
An Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech TalksAn Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech TalksAmazon Web Services
 
SEC303 Top 10 AWS Identity and Access Management Best Practices - AWS re:Inve...
SEC303 Top 10 AWS Identity and Access Management Best Practices - AWS re:Inve...SEC303 Top 10 AWS Identity and Access Management Best Practices - AWS re:Inve...
SEC303 Top 10 AWS Identity and Access Management Best Practices - AWS re:Inve...Amazon Web Services
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansAmazon Web Services
 
Seo Bootcamp for Small Buisinesses
 Seo Bootcamp for Small Buisinesses Seo Bootcamp for Small Buisinesses
Seo Bootcamp for Small BuisinessesCharlie Kalech
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 Ap...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 Ap...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 Ap...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 Ap...TAUS - The Language Data Network
 
DAT203_Running MySQL Databases on AWS
DAT203_Running MySQL Databases on AWSDAT203_Running MySQL Databases on AWS
DAT203_Running MySQL Databases on AWSAmazon Web Services
 

What's hot (20)

re:Invent re:Cap - An overview of Artificial Intelligence and Machine Learnin...
re:Invent re:Cap - An overview of Artificial Intelligence and Machine Learnin...re:Invent re:Cap - An overview of Artificial Intelligence and Machine Learnin...
re:Invent re:Cap - An overview of Artificial Intelligence and Machine Learnin...
 
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
 
Deep Learning Summit (DLS01-4)
Deep Learning Summit (DLS01-4)Deep Learning Summit (DLS01-4)
Deep Learning Summit (DLS01-4)
 
Introduction to Artificial Intelligence on AWS
Introduction to Artificial Intelligence on AWSIntroduction to Artificial Intelligence on AWS
Introduction to Artificial Intelligence on AWS
 
Demystifying Machine Learning On AWS - AWS Summit Sydney 2018
Demystifying Machine Learning On AWS - AWS Summit Sydney 2018Demystifying Machine Learning On AWS - AWS Summit Sydney 2018
Demystifying Machine Learning On AWS - AWS Summit Sydney 2018
 
Introduction to Artificial Intelligence (AI) at Amazon
Introduction to Artificial Intelligence (AI) at Amazon Introduction to Artificial Intelligence (AI) at Amazon
Introduction to Artificial Intelligence (AI) at Amazon
 
Automate for Efficiency with Amazon Transcribe & Amazon Translate: Machine Le...
Automate for Efficiency with Amazon Transcribe & Amazon Translate: Machine Le...Automate for Efficiency with Amazon Transcribe & Amazon Translate: Machine Le...
Automate for Efficiency with Amazon Transcribe & Amazon Translate: Machine Le...
 
Add Intelligence to Applications with AWS ML: Machine Learning Workshops SF
Add Intelligence to Applications with AWS ML: Machine Learning Workshops SFAdd Intelligence to Applications with AWS ML: Machine Learning Workshops SF
Add Intelligence to Applications with AWS ML: Machine Learning Workshops SF
 
Workshop Build an Image-Based Automatic Alert System with Amazon Rekognition:...
Workshop Build an Image-Based Automatic Alert System with Amazon Rekognition:...Workshop Build an Image-Based Automatic Alert System with Amazon Rekognition:...
Workshop Build an Image-Based Automatic Alert System with Amazon Rekognition:...
 
AI & Deep Learning At Amazon - April 2017 AWS Online Tech Talks
AI & Deep Learning At Amazon - April 2017 AWS Online Tech TalksAI & Deep Learning At Amazon - April 2017 AWS Online Tech Talks
AI & Deep Learning At Amazon - April 2017 AWS Online Tech Talks
 
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
 
The basics of seo
The basics of seoThe basics of seo
The basics of seo
 
An Introduction to Amazon AI
An Introduction to Amazon AIAn Introduction to Amazon AI
An Introduction to Amazon AI
 
AI and Innovations on AWS
AI and Innovations on AWSAI and Innovations on AWS
AI and Innovations on AWS
 
An Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech TalksAn Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech Talks
 
SEC303 Top 10 AWS Identity and Access Management Best Practices - AWS re:Inve...
SEC303 Top 10 AWS Identity and Access Management Best Practices - AWS re:Inve...SEC303 Top 10 AWS Identity and Access Management Best Practices - AWS re:Inve...
SEC303 Top 10 AWS Identity and Access Management Best Practices - AWS re:Inve...
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data Oceans
 
Seo Bootcamp for Small Buisinesses
 Seo Bootcamp for Small Buisinesses Seo Bootcamp for Small Buisinesses
Seo Bootcamp for Small Buisinesses
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 Ap...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 Ap...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 Ap...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 Ap...
 
DAT203_Running MySQL Databases on AWS
DAT203_Running MySQL Databases on AWSDAT203_Running MySQL Databases on AWS
DAT203_Running MySQL Databases on AWS
 

Similar to Building Better Search For Wikipedia: How We Did It Using Amazon CloudSearch - Webinar

Backup and Recovery for Linux With Amazon S3
Backup and Recovery for Linux With Amazon S3Backup and Recovery for Linux With Amazon S3
Backup and Recovery for Linux With Amazon S3Amazon Web Services
 
DynamoDB and Amazon Cloudsearch
DynamoDB and Amazon CloudsearchDynamoDB and Amazon Cloudsearch
DynamoDB and Amazon CloudsearchMichael Bohlig
 
AWS Webcast - Data Integration into Amazon Redshift
AWS Webcast - Data Integration into Amazon RedshiftAWS Webcast - Data Integration into Amazon Redshift
AWS Webcast - Data Integration into Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon CloudSearch
Getting Started with Amazon CloudSearchGetting Started with Amazon CloudSearch
Getting Started with Amazon CloudSearchAmazon Web Services
 
AWS를 활용한 미디어 스트리밍 서비스
AWS를 활용한 미디어 스트리밍 서비스AWS를 활용한 미디어 스트리밍 서비스
AWS를 활용한 미디어 스트리밍 서비스Amazon Web Services Korea
 
Disaster Recovery with AWS - Simone Brunozzi - AWS Summit 2012 Australia -
Disaster Recovery with AWS - Simone Brunozzi - AWS Summit 2012 Australia - Disaster Recovery with AWS - Simone Brunozzi - AWS Summit 2012 Australia -
Disaster Recovery with AWS - Simone Brunozzi - AWS Summit 2012 Australia - Amazon Web Services
 
AWS Webinar - Intro to Amazon Cloudfront 13-09-17
AWS Webinar -  Intro to Amazon Cloudfront 13-09-17AWS Webinar -  Intro to Amazon Cloudfront 13-09-17
AWS Webinar - Intro to Amazon Cloudfront 13-09-17Amazon Web Services
 
AWS Webcast - Amazon CloudFront Zone Apex Support & Custom SSL Domain Names
AWS Webcast - Amazon CloudFront Zone Apex Support & Custom SSL Domain Names  AWS Webcast - Amazon CloudFront Zone Apex Support & Custom SSL Domain Names
AWS Webcast - Amazon CloudFront Zone Apex Support & Custom SSL Domain Names Amazon Web Services
 
Intro to Building & Marketing Your Own Website
Intro to Building & Marketing Your Own WebsiteIntro to Building & Marketing Your Own Website
Intro to Building & Marketing Your Own WebsiteTom McCracken
 
Technical SEO | Joomla Day Chicago 2012
Technical SEO | Joomla Day Chicago 2012 Technical SEO | Joomla Day Chicago 2012
Technical SEO | Joomla Day Chicago 2012 Jessica Dunbar
 
Media Processing and Delivery on AWS, Santa Monica Meetup 6/25/14
Media Processing and Delivery on AWS, Santa Monica Meetup 6/25/14Media Processing and Delivery on AWS, Santa Monica Meetup 6/25/14
Media Processing and Delivery on AWS, Santa Monica Meetup 6/25/14Amazon Web Services
 
20120723 aws meister-reloaded-awssd-kfor_ruby-php-python-public
20120723 aws meister-reloaded-awssd-kfor_ruby-php-python-public20120723 aws meister-reloaded-awssd-kfor_ruby-php-python-public
20120723 aws meister-reloaded-awssd-kfor_ruby-php-python-publicAmazon Web Services Japan
 
AWS Webcast - Introducing Amazon Redshift
AWS Webcast - Introducing Amazon RedshiftAWS Webcast - Introducing Amazon Redshift
AWS Webcast - Introducing Amazon RedshiftAmazon Web Services
 
Sql 2012 development and programming
Sql 2012  development and programmingSql 2012  development and programming
Sql 2012 development and programmingLearnNowOnline
 
AWS Webcast - Accelerating Application Performance Using In-Memory Caching in...
AWS Webcast - Accelerating Application Performance Using In-Memory Caching in...AWS Webcast - Accelerating Application Performance Using In-Memory Caching in...
AWS Webcast - Accelerating Application Performance Using In-Memory Caching in...Amazon Web Services
 
BCP Webinar - Enabling Today's Workforce Anywhere
BCP Webinar - Enabling Today's Workforce AnywhereBCP Webinar - Enabling Today's Workforce Anywhere
BCP Webinar - Enabling Today's Workforce AnywhereAmazon Web Services
 
Integrate Amazon WorkDocs with Security & Compliance Solutions & Applications...
Integrate Amazon WorkDocs with Security & Compliance Solutions & Applications...Integrate Amazon WorkDocs with Security & Compliance Solutions & Applications...
Integrate Amazon WorkDocs with Security & Compliance Solutions & Applications...Amazon Web Services
 
Just dev it presenation modified word press 101
Just dev it presenation   modified word press 101Just dev it presenation   modified word press 101
Just dev it presenation modified word press 101roguevoice
 

Similar to Building Better Search For Wikipedia: How We Did It Using Amazon CloudSearch - Webinar (20)

Backup and Recovery for Linux With Amazon S3
Backup and Recovery for Linux With Amazon S3Backup and Recovery for Linux With Amazon S3
Backup and Recovery for Linux With Amazon S3
 
DynamoDB and Amazon Cloudsearch
DynamoDB and Amazon CloudsearchDynamoDB and Amazon Cloudsearch
DynamoDB and Amazon Cloudsearch
 
AWS Webcast - Data Integration into Amazon Redshift
AWS Webcast - Data Integration into Amazon RedshiftAWS Webcast - Data Integration into Amazon Redshift
AWS Webcast - Data Integration into Amazon Redshift
 
Getting Started with Amazon CloudSearch
Getting Started with Amazon CloudSearchGetting Started with Amazon CloudSearch
Getting Started with Amazon CloudSearch
 
AWS를 활용한 미디어 스트리밍 서비스
AWS를 활용한 미디어 스트리밍 서비스AWS를 활용한 미디어 스트리밍 서비스
AWS를 활용한 미디어 스트리밍 서비스
 
Disaster Recovery with AWS - Simone Brunozzi - AWS Summit 2012 Australia -
Disaster Recovery with AWS - Simone Brunozzi - AWS Summit 2012 Australia - Disaster Recovery with AWS - Simone Brunozzi - AWS Summit 2012 Australia -
Disaster Recovery with AWS - Simone Brunozzi - AWS Summit 2012 Australia -
 
Hadoop on the Cloud
Hadoop on the CloudHadoop on the Cloud
Hadoop on the Cloud
 
AWS Webinar - Intro to Amazon Cloudfront 13-09-17
AWS Webinar -  Intro to Amazon Cloudfront 13-09-17AWS Webinar -  Intro to Amazon Cloudfront 13-09-17
AWS Webinar - Intro to Amazon Cloudfront 13-09-17
 
AWS Webcast - Amazon CloudFront Zone Apex Support & Custom SSL Domain Names
AWS Webcast - Amazon CloudFront Zone Apex Support & Custom SSL Domain Names  AWS Webcast - Amazon CloudFront Zone Apex Support & Custom SSL Domain Names
AWS Webcast - Amazon CloudFront Zone Apex Support & Custom SSL Domain Names
 
Intro to Building & Marketing Your Own Website
Intro to Building & Marketing Your Own WebsiteIntro to Building & Marketing Your Own Website
Intro to Building & Marketing Your Own Website
 
Technical SEO | Joomla Day Chicago 2012
Technical SEO | Joomla Day Chicago 2012 Technical SEO | Joomla Day Chicago 2012
Technical SEO | Joomla Day Chicago 2012
 
Media Processing and Delivery on AWS, Santa Monica Meetup 6/25/14
Media Processing and Delivery on AWS, Santa Monica Meetup 6/25/14Media Processing and Delivery on AWS, Santa Monica Meetup 6/25/14
Media Processing and Delivery on AWS, Santa Monica Meetup 6/25/14
 
20120723 aws meister-reloaded-awssd-kfor_ruby-php-python-public
20120723 aws meister-reloaded-awssd-kfor_ruby-php-python-public20120723 aws meister-reloaded-awssd-kfor_ruby-php-python-public
20120723 aws meister-reloaded-awssd-kfor_ruby-php-python-public
 
AWS Webcast - Introducing Amazon Redshift
AWS Webcast - Introducing Amazon RedshiftAWS Webcast - Introducing Amazon Redshift
AWS Webcast - Introducing Amazon Redshift
 
Sql 2012 development and programming
Sql 2012  development and programmingSql 2012  development and programming
Sql 2012 development and programming
 
AWS Webcast - Accelerating Application Performance Using In-Memory Caching in...
AWS Webcast - Accelerating Application Performance Using In-Memory Caching in...AWS Webcast - Accelerating Application Performance Using In-Memory Caching in...
AWS Webcast - Accelerating Application Performance Using In-Memory Caching in...
 
BCP Webinar - Enabling Today's Workforce Anywhere
BCP Webinar - Enabling Today's Workforce AnywhereBCP Webinar - Enabling Today's Workforce Anywhere
BCP Webinar - Enabling Today's Workforce Anywhere
 
Integrate Amazon WorkDocs with Security & Compliance Solutions & Applications...
Integrate Amazon WorkDocs with Security & Compliance Solutions & Applications...Integrate Amazon WorkDocs with Security & Compliance Solutions & Applications...
Integrate Amazon WorkDocs with Security & Compliance Solutions & Applications...
 
50 Shades of SharePoint: SharePoint 2013 Insanity Demystified
50 Shades of SharePoint: SharePoint 2013 Insanity Demystified50 Shades of SharePoint: SharePoint 2013 Insanity Demystified
50 Shades of SharePoint: SharePoint 2013 Insanity Demystified
 
Just dev it presenation modified word press 101
Just dev it presenation   modified word press 101Just dev it presenation   modified word press 101
Just dev it presenation modified word press 101
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Building Better Search For Wikipedia: How We Did It Using Amazon CloudSearch - Webinar

  • 1. Building Better Search for Wikipedia: How We Did It Using Amazon CloudSearch July 26, 2012 © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2. Speakers Paul Nelson Michael Bohlig Jon Handler CTO Marketing Manager Solutions Architect Search Technologies Amazon CloudSearch Amazon CloudSearch © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 3. Housekeeping items !   Polling questions !   Q&A will be at the end !   Recording and slides will be distributed and posted (Slideshare & YouTube) © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 4. Agenda !   Amazon CloudSearch Overview !   Data Acquisition – Getting the Files from Wikipedia !   Data Processing – Clean-up and Preparation !   Indexing !   Queries and Relevancy Ranking !   Building the UI !   Final Results & Recommendations !   Q&A © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 5. Amazon CloudSearch !   Fully-managed, full-featured search service !   Automatically scales for data & traffic !   Handles both structured and unstructured data !   Near real-time indexing !   Up and running in less than 1 hour © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 6. Polling Question #1 What Are You Using For Search Today? © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 7. Introduction SEARCHING WIKIPEDIA © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 8. Why Wikipedia? !   It’s awesome !   Default Wikipedia search is pretty bad & everyone knows it !   It’s publicly available data !   It’s awesome © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 9. Why CloudSearch for Wikipedia? !   It’s awesome !   A great choice for a public search engine – it lives in the internet !   First version up & running quickly !   Automatically scales to required query volume !   Rank expressions work great for Wikipedia relevancy !   Easy Search Domain Creation = Easy system iteration © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 10. Let’s try it! http://wikipedia.searchtechnologies.com © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 11. Getting the Files from Wikipedia DATA ACQUISITION © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 12. Wikipedia Dump Files http://dumps.wikimedia.org/enwiki/latest/ !   Desired files have the pattern: enwiki-latest-pages-articles#.xml-*.bz2 © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 13. Our Solution Wikipedia dump files Content Processing Framework Fetch Identify Article Open File Send to Files Listing Files to Fetch Stream Processing CloudSearch Amazon CloudSearch © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 14. Content Processing Framework Advantages !   Process multiple files simultaneously !   Fully Streaming •  Files are never downloaded to local disk •  From Wikipedia à Streaming Processor à CloudSearch !   Very Fast (450 documents per second, end-to-end) !   Integrated Connectors / Web Crawlers •  SharePoint, Documentum, Web Sites, RDBMS, RightNow, Confluence, Salesforce.com, etc. !   Text extraction (from PDF, Office Docs, etc.) •  Using Apache Tika !   Entity Extraction •  Names, places, companies, dates, phone numbers, zip codes, etc. © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 15. Polling Question #2 Where is your data stored? © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 16. Preparing the Data for Search DATA PROCESSING © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 17. What Do Wikipedia Files Look Like? Sample Wikipedia Data © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 18. Data Processing: Basic Requirements !   Decompression: BZip2 à UTF-8 !   Process each page as a separate CloudSearch document •  Multiple pages specified in a single XML file !   Skip #REDIRECT pages !   Compute document statistics •  Necessary for relevancy ranking •  Includes: Content size, title size, number of outbound links •  (FUTURE: Number of inbound links) © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 19. Data Processing: Advanced Feature Support !   Extract Categories !   Extract Author (IP address or author name) !   Extract Update Date !   Extract Document Type •  Wikipedia “name space” based on title prefix !   Determine Disambiguation Pages •  Based on certain Wikipedia {{templates}} •  Template whitelist and blacklist !   Produce Static Teaser Before After © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 20. Sending Documents to CloudSearch INDEXING © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 21. CloudSearch: Document ID !   @id = uniquely identifies every document in the index •  Must be made up of letters and digits (no spaces or punctuation) <batch> <add lang="en" version="5438086" id="wikipedia930503"> . . . FIELDS GO HERE . . . </add> </batch> © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 22. CloudSearch: Document Version !   @version = identifies most recent document •  Integer number, must always increase •  Updates or deletes to same doc ID must have larger @version •  My Formula: (System.currentTimeMillis() - 1325394000000)/1000 !   Why does it exist? •  So that multiple processes can submit updates simultaneously •  Updates processed quickly are not overwritten by older updates processed slowly <batch> <add lang="en" version="5438086" id="wikipedia930503"> . . . FIELDS GO HERE . . . </add> </batch> © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 23. CloudSearch Indexing Details !   Form Fields into CloudSearch SDF !   Submit in batches to CloudSearch !   Multiple open connections to CloudSearch !   Co-locate indexer on EC2 instance in same zone as CloudSearch •  Several times better performance © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 24. CloudSearch SDF for Indexing <batch> <add lang="en" version="5438086" id="wikipedia930503"> <field name="title">Terran Trade Authority</field> <field name="title_size">22</field> <field name="content"> The 'Terran Trade Authority' is a science-fiction setting originally presented in a collection of four large illustrated science… </field> <field name="content_size">893</field> <field name="teaser"> The 'Terran Trade Authority' is a science-fiction setting originally presented in a collection of four large illustrated science fiction books published between 1978 and… </field> <field name="url">http://en.wikipedia.org/wiki/Terran_Trade_Authority</field> <field name="type">Article</field> <field name="f_type">Article</field> <field name="year">2012</field> <field name="f_year">2012</field> <field name="year_month">2012/01</field> <field name="f_year_month">2012/01</field> <field name="categories">Science fiction book series</field> <field name="f_categories">Science fiction book series</field> <field name="author">76.173.50.22</field> <field name="f_author">76.173.50.22</field> </add> </batch> © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 25. XHTML  Page Process  Latest  Listing  Pipeline Fetch Extract   Start dumps.wikimedia.org/ URLS enwiki/latest/ (Groovy  Script) 27  URLs  to  27  Dump   Files Process  File  Pipeline Open  Stream BZip2 XML  Sub  Job   URL Decompress Extractor Compressed   Decompressed   data  stream stream Single  <page>  XML   plus  Metadata Process  Page  Pipeline Extract  Metadata   End-to-End Indexing and  Cleanse   Content (Groovy  Script) Post  XML Amazon CloudSearch Dataflow Cleansed   XSL Metadata Transform © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 26. Providing good search results QUERIES AND RELEVANCY © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 27. Recommendation: Debug Interface !   Useful tool for testing CloudSearch query behavior Sample Debug Interface © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 28. Queries for Wikipedia !   Uses simple “q” parameter for user query string !   Selecting facets uses “bq” parameter •  For filtering a facet value: bq=(field name ‘value’) •  For excluding a facet value: bq=(not name: ‘value’) •  Can handle AND & OR •  Don’t forget to escape single-quotes © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 29. Relevancy Ranking !   In CloudSearch, this is done with Rank Expressions •  Affect relevancy using document-quality data, such as: •  Document Statistics •  Ratings •  Link Counting •  Editorial Comments •  Popularity !   Expressions are very flexible •  All types of mathematical functions available © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 30. Relevancy Ranking for Wikipedia   content   title   text   size   clog   cboost   size   tlog   tboost   relevance   FINAL   Germany   65253   4.815   192.58   7   0.845   -­‐12.676   572   751.90   Outline  of  Germany   14238   4.153   166.14   18   1.255   -­‐18.829   601   748.30   History  of  Germany   74750   4.874   194.94   30   1.477   -­‐22.157   574   746.78   British  Army  Germany   2201   3.343   133.70   37   1.568   -­‐23.523   589   699.18   rugby  union  team   New  Germany   337   2.528   101.11   11   1.041   -­‐15.621   598   683.48   Embassy  of  Germany   516   2.713   108.51   28   1.447   -­‐21.707   596   682.79   in  Moscow     RANK_EXPRESSION = text_relevance + log10(content_size)*40.0 - log10(title_size)*15.0 © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 31. Relevancy Ranking for Wikipedia: De-Weighting “Wikipedia:” Types !   “Wikipedia:” docs not of general interest •  About the running and managing of Wikipedia !   Often very large •  Skews the statistics RANK_EXPRESSION (adjusted) = text_relevance + log10(content_size) * ( doc_boost == 1 ? 25.0:40.0 ) - log10(title_size)*15.0 © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 32. Adding the Sizzle BUILDING THE USER INTERFACE © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 33. Wikipedia Search UI Architecture Tomcat Twigkit CloudSearch Platform CloudSearch Java API CloudSearch © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 34. UI Architecture !   Tomcat •  Java application container ! Twigkit •  Graphical user interface templates •  Handles navigators, controller events, presentation !   CloudSearch Platform •  API Translation Interface between Twigkit and CloudSearch API !   CloudSearch Java API •  Manages all communcations to/from CloudSearch •  Parameter construction / results parsing © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 35. Let’s Wrap It Up! SUMMARY © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 36. Summary – Problems & Solutions !   Problem: Data Acquisition •  Solution: Content Processing Framework (Aspire) !   Problem: Data Processing •  Solution: Content Processing Framework (Aspire) !   Problem: Indexing •  Solution: CloudSearch SDF – Very easy to work with !   Problem: Query •  Solution: CloudSearch Query Parameters & Rank Expressions !   Problem: User Interface •  Solution: New CloudSearch Platform for Twigkit © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 37. Q&A Enter questions on your screen © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 38. Thank You For More Information: http://aws.amazon.com/cloudsearch/ http://www.searchtechnologies.com/wikipedia-cloudsearch.html © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.