SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Downloaden Sie, um offline zu lesen
Using Sphinx
for Search
Mike Lively
Slickdeals, LLC
What is Sphinx?
• A full-text search engine
• Quickly get high quality (relevant) results
• Designed to integrate well with SQL RDBMS
• Can work with any data source
• Can be queried using either an API or SQL
How do I know anything
about Sphinx?
• Manager of Software Architecture for
Slickdeals.net
• Alexa top 150 site (in the US)
• Have been working at improving our Sphinx
search engine for the last 2 months or so.
• Over 7 Million searches a month directly through
the interface, lots more happen indirectly.
When should I use Sphinx?
• Site / Product / Document searches
• Auto-suggest / Auto-Correct functionality
• Finding relevant and related items
Simple Architecture
• Often, search is offloaded
straight to the database
• Search goes to the backend
which performs queries on the
database
• Obviously very easy to
implement
Simple Architecture
• Simple “starts with” searches
on indexed fields can
sometimes work: `city` LIKE
‘Las%’
• Anything else will lock your
database for writes with
MyISAM.
• MySQL is not a great or
flexible full text engine
• It can sometimes be adequate
Sphinx Architecture
• Searchd is responsible for
receiving requests from
clients and executing the
searches against the sphinx
index.
• Indexer is responsible for
getting data into the sphinx
index.
• This separation allows
indexing and searching to be
scaled separately.
Sphinx Architecture
• Searchd has a binary protocol
for which there are several
clients available in multiple
languages.
• Searchd is also binary
compatible with MySQL’s
protocol since mysql 4.1
• Searchd is a daemon that
runs on your search servers
Sphinx Architecture
• Indexer is a shell program that
you can execute to build any
number of indexes.
• Can handle index rotation for
live indexing
Not So Quick Side Note
MySQL IS SLOWWWWWWWWWWWWW
(at text matches)
Still Not Quick Side Note
Indexes won’t help you…
Quicker Side Note
Full Text Search isn’t so bad
IF….
Sphinx Concepts
• Sphinx Indexes “Documents”
• Each document has a unique unsigned, non-
zero integer ID (either 32 bit or 64 bit space)
• Each document has one or more fields
• Each document has zero or more attributes
Indexes / Sources
• Sphinx indexes are created from one or more
sources.
• The source can be a database, xml, or tsv
stream.
• You can use multiple sources
• This is useful for maintaining updated indexes
• Also used to implement a sphinx cluster
Sphinx Fields
• Fields are what the full text index is comprised of.
• When searching you can search against any number
of fields.
• You can assign different relevancy weights to different
fields.
• The original value of a field is never stored by Sphinx.
• You should always have at least one.
Sphinx Attributes
• data that helps further describe the item being
indexed
• Can be returned as a part of the search
• Useful for filtering and sorting results
• These are not a part of the full text index.
MySQL Full Text Search
• You can get away with MyISAM tables or as of
version 5.6 InnoDB.
• You don’t care about morphology (think plurals)
• You don’t need anything but the most basic of
search operators
Creating An Index
• We are going to add an index that sources a
mysql database.
• The data being sourced is a list of the titles of
wikipedia posts.
Creating An Index
Indexer Configuration
• We are going to be peaking into a sphinx
configuration file now.
• You can rebuild the config file by concatenating
each section into a single file.
• On my VM this file is located in /usr/local/etc/
sphinx.conf
Source Definition
Source Definition
Defines the connection information
Connection information
• Ideally, you should create a
separate account for sphinx
• You can also connect via unix
socket
• I didn’t specify it here, but you
can also add a port.
Source Definition
The query that pulls data to populate the index
Source Index
• The index query MUST return
the id field as the first column
• Remember, the id needs to be
a unique, unsigned 64 bit (or
less number)
• The query must be on a single
line. Unless you escape new
lines with back slashes.
• Notice that we converted the
timestamp into a unix
timestamp. That is important.
Source Definition
How data is stored in the index
Source Fields
• The first column in the query is
always the ID.
• You specify any columns that
are attributes.
• Remember, attributes are
stored in the index as fields
that can be used to filter and
sort by.
• Any field besides the id that is
not specified as an attribute, is
assumed to be a text field (title)
Index Definition
Index Definition
• An Index includes one or
more sources.
• Each source gets it’s own
“source” line
• Multiple sources must all
define the same fields and
attributes.
• The ids need to be unique
across resources
Index Definition
• path is not actually a path, it’s
a filename with no extension.
• docinfo dictates if attributes
are stored in the index or
outside of the index.
• dict is not really important
now. Used to be either crc or
keywords. Now crc is
deprecated.
• min_word_len is the minimum
length of words to index
Rest of the Index Configuration
It’s time to build the index
indexer <index name>
Searching the Index
• searchd is the daemon that searches the index
• Binary Protocol



OR
• MySQL Compatible too!
searchd config
Included in the same config file as the rest
Spinning up searchd
–Sphinx
“I know MySQL”
MySQL Compatible
MySQL Compatible
• Tables == Indexes
• SHOW TABLES…Shows indexes.
• Select * From <index> works too.
Selecting from an index
Querying Indexes
• Default limit of 20 rows
• Notice the text fields are not
returned…
• They would be if we made
them attributes
(sql_field_string)
Querying Indexes
• The magic function in
SphinxQL is match()
• match() performs a full text
search against the entire
index…usually
• The ‘@field’ operator can
isolate which field is searched
on.
Querying Indexes
• You can query against
attributes
• You can sort results
• You can use the weight()
function to determine
relevancy.
Querying Indexes
• The 25387283 title was more
relevant because it matched
on the term “testing”
Getting PHP into the mix
• All we need? PDO.
• We will build a basic search page
• Accepts a query, displays up to 100 matching
results by relevancy with the matching keywords
highlighted.
Pulling data from Sphinx
Fetching the data from Mysql
Adding the fancy yellow highlighting
The rest is pretty basic…
Cool things we would talk about
if I had like…3 more hours
• Auto-suggest, Auto-correct
• More on lemmatization and stemming
• Distributed Sphinx Clustering
• Delta indexes
• Real Time Indexes
• The plethora of operators you can use
• Ranged Queries
• ………
Additional Information
• The sphinx documentation is actually pretty
great
• http://sphinxsearch.com/docs/
• Slides are already on Slideshare
• Will link them to the meet up shortly
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchRuslan Zavacky
 
Ekoparty 2017 - The Bug Hunter's Methodology
Ekoparty 2017 - The Bug Hunter's MethodologyEkoparty 2017 - The Bug Hunter's Methodology
Ekoparty 2017 - The Bug Hunter's Methodologybugcrowd
 
Elasticsearch development case
Elasticsearch development caseElasticsearch development case
Elasticsearch development case일규 최
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overviewABC Talks
 
Elastic Search Indexing Internals
Elastic Search Indexing InternalsElastic Search Indexing Internals
Elastic Search Indexing InternalsGaurav Kukal
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchpmanvi
 
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...ScyllaDB
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query OptimizationMongoDB
 
Módulo 02 Estrutura do repositório
Módulo 02 Estrutura do repositórioMódulo 02 Estrutura do repositório
Módulo 02 Estrutura do repositórioRodrigo Prado
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
 
Spark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean WamplerSpark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean WamplerSpark Summit
 
Search and analyze your data with elasticsearch
Search and analyze your data with elasticsearchSearch and analyze your data with elasticsearch
Search and analyze your data with elasticsearchAnton Udovychenko
 
Tracing Microservices with Zipkin
Tracing Microservices with ZipkinTracing Microservices with Zipkin
Tracing Microservices with Zipkintakezoe
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache PinotSiddharth Teotia
 
Java tricks for high-load server programming
Java tricks for high-load server programmingJava tricks for high-load server programming
Java tricks for high-load server programmingAndrei Pangin
 
全文検索サーバ Fess 〜 全文検索システム構築時の悩みどころ
全文検索サーバ Fess 〜 全文検索システム構築時の悩みどころ全文検索サーバ Fess 〜 全文検索システム構築時の悩みどころ
全文検索サーバ Fess 〜 全文検索システム構築時の悩みどころShinsuke Sugaya
 

Was ist angesagt? (20)

Lucene and MySQL
Lucene and MySQLLucene and MySQL
Lucene and MySQL
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Ekoparty 2017 - The Bug Hunter's Methodology
Ekoparty 2017 - The Bug Hunter's MethodologyEkoparty 2017 - The Bug Hunter's Methodology
Ekoparty 2017 - The Bug Hunter's Methodology
 
Elasticsearch development case
Elasticsearch development caseElasticsearch development case
Elasticsearch development case
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Elastic Search Indexing Internals
Elastic Search Indexing InternalsElastic Search Indexing Internals
Elastic Search Indexing Internals
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
 
Módulo 02 Estrutura do repositório
Módulo 02 Estrutura do repositórioMódulo 02 Estrutura do repositório
Módulo 02 Estrutura do repositório
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
Spark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean WamplerSpark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean Wampler
 
Search and analyze your data with elasticsearch
Search and analyze your data with elasticsearchSearch and analyze your data with elasticsearch
Search and analyze your data with elasticsearch
 
Tracing Microservices with Zipkin
Tracing Microservices with ZipkinTracing Microservices with Zipkin
Tracing Microservices with Zipkin
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache Pinot
 
Java tricks for high-load server programming
Java tricks for high-load server programmingJava tricks for high-load server programming
Java tricks for high-load server programming
 
全文検索サーバ Fess 〜 全文検索システム構築時の悩みどころ
全文検索サーバ Fess 〜 全文検索システム構築時の悩みどころ全文検索サーバ Fess 〜 全文検索システム構築時の悩みどころ
全文検索サーバ Fess 〜 全文検索システム構築時の悩みどころ
 

Andere mochten auch

Advanced fulltext search with Sphinx
Advanced fulltext search with SphinxAdvanced fulltext search with Sphinx
Advanced fulltext search with SphinxAdrian Nuta
 
Inverted files for text search engines
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search enginesunyil96
 
Tips for Tuning Solr Search: No Coding Required
Tips for Tuning Solr Search: No Coding RequiredTips for Tuning Solr Search: No Coding Required
Tips for Tuning Solr Search: No Coding RequiredAcquia
 
Real-time индексы (Ярослав Ворожко)
Real-time индексы (Ярослав Ворожко)Real-time индексы (Ярослав Ворожко)
Real-time индексы (Ярослав Ворожко)Ontico
 
Transition to a secure and low-carbon Swiss energy system
Transition to a secure and low-carbon Swiss energy systemTransition to a secure and low-carbon Swiss energy system
Transition to a secure and low-carbon Swiss energy systemIEA-ETSAP
 
Calendario efemérides ambientales
Calendario efemérides ambientalesCalendario efemérides ambientales
Calendario efemérides ambientalesnicogrungelo
 
How to Build Mobile Apps Fast with The Marketing App Cloud by Proscape
How to Build Mobile Apps Fast with The Marketing App Cloud by ProscapeHow to Build Mobile Apps Fast with The Marketing App Cloud by Proscape
How to Build Mobile Apps Fast with The Marketing App Cloud by ProscapeProscape
 
Ecologia miercoles
Ecologia miercolesEcologia miercoles
Ecologia miercolesJulio Castro
 
`Kestrel global portfolio presentation 2015 05_08
`Kestrel global portfolio presentation 2015 05_08`Kestrel global portfolio presentation 2015 05_08
`Kestrel global portfolio presentation 2015 05_08Dominic Hardcastle
 
Heavy Metal Desde Cuba: ¿por Que Usted Debe preocuparse Acerca de la Hipnosis...
Heavy Metal Desde Cuba: ¿por Que Usted Debe preocuparse Acerca de la Hipnosis...Heavy Metal Desde Cuba: ¿por Que Usted Debe preocuparse Acerca de la Hipnosis...
Heavy Metal Desde Cuba: ¿por Que Usted Debe preocuparse Acerca de la Hipnosis...nola3clark6
 
Tiendasvirtuales
TiendasvirtualesTiendasvirtuales
Tiendasvirtualesveronik_gc
 
TCILatinAmerica16 Producción y usos de producción y usos de proteína vegetal
TCILatinAmerica16 Producción y usos de producción y usos de proteína vegetalTCILatinAmerica16 Producción y usos de producción y usos de proteína vegetal
TCILatinAmerica16 Producción y usos de producción y usos de proteína vegetalTCI Network
 
Sprint 2016 Confianza Creativa (3de4) Jobs-to-be-Done
Sprint 2016 Confianza Creativa (3de4) Jobs-to-be-DoneSprint 2016 Confianza Creativa (3de4) Jobs-to-be-Done
Sprint 2016 Confianza Creativa (3de4) Jobs-to-be-DoneP3 Ventures
 
Nuevo folleto del master marketing politico UCV curso 2015-16
Nuevo folleto del master marketing politico UCV curso 2015-16Nuevo folleto del master marketing politico UCV curso 2015-16
Nuevo folleto del master marketing politico UCV curso 2015-16Silvia Moya Rozalén
 
General presentation pshpp Hidro TARNITA
General presentation pshpp Hidro TARNITAGeneral presentation pshpp Hidro TARNITA
General presentation pshpp Hidro TARNITAHIDRO TARNITA SA
 

Andere mochten auch (20)

Advanced fulltext search with Sphinx
Advanced fulltext search with SphinxAdvanced fulltext search with Sphinx
Advanced fulltext search with Sphinx
 
Inverted files for text search engines
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search engines
 
Sphinx y su integracion con PHP
Sphinx y su integracion con PHPSphinx y su integracion con PHP
Sphinx y su integracion con PHP
 
Tips for Tuning Solr Search: No Coding Required
Tips for Tuning Solr Search: No Coding RequiredTips for Tuning Solr Search: No Coding Required
Tips for Tuning Solr Search: No Coding Required
 
Real-time индексы (Ярослав Ворожко)
Real-time индексы (Ярослав Ворожко)Real-time индексы (Ярослав Ворожко)
Real-time индексы (Ярослав Ворожко)
 
CARTAGENA - LORCA
CARTAGENA - LORCACARTAGENA - LORCA
CARTAGENA - LORCA
 
Transition to a secure and low-carbon Swiss energy system
Transition to a secure and low-carbon Swiss energy systemTransition to a secure and low-carbon Swiss energy system
Transition to a secure and low-carbon Swiss energy system
 
Calendario efemérides ambientales
Calendario efemérides ambientalesCalendario efemérides ambientales
Calendario efemérides ambientales
 
Hr tech trends
Hr tech trendsHr tech trends
Hr tech trends
 
How to Build Mobile Apps Fast with The Marketing App Cloud by Proscape
How to Build Mobile Apps Fast with The Marketing App Cloud by ProscapeHow to Build Mobile Apps Fast with The Marketing App Cloud by Proscape
How to Build Mobile Apps Fast with The Marketing App Cloud by Proscape
 
Ecologia miercoles
Ecologia miercolesEcologia miercoles
Ecologia miercoles
 
`Kestrel global portfolio presentation 2015 05_08
`Kestrel global portfolio presentation 2015 05_08`Kestrel global portfolio presentation 2015 05_08
`Kestrel global portfolio presentation 2015 05_08
 
Computech
ComputechComputech
Computech
 
Heavy Metal Desde Cuba: ¿por Que Usted Debe preocuparse Acerca de la Hipnosis...
Heavy Metal Desde Cuba: ¿por Que Usted Debe preocuparse Acerca de la Hipnosis...Heavy Metal Desde Cuba: ¿por Que Usted Debe preocuparse Acerca de la Hipnosis...
Heavy Metal Desde Cuba: ¿por Que Usted Debe preocuparse Acerca de la Hipnosis...
 
Tiendasvirtuales
TiendasvirtualesTiendasvirtuales
Tiendasvirtuales
 
TCILatinAmerica16 Producción y usos de producción y usos de proteína vegetal
TCILatinAmerica16 Producción y usos de producción y usos de proteína vegetalTCILatinAmerica16 Producción y usos de producción y usos de proteína vegetal
TCILatinAmerica16 Producción y usos de producción y usos de proteína vegetal
 
Sprint 2016 Confianza Creativa (3de4) Jobs-to-be-Done
Sprint 2016 Confianza Creativa (3de4) Jobs-to-be-DoneSprint 2016 Confianza Creativa (3de4) Jobs-to-be-Done
Sprint 2016 Confianza Creativa (3de4) Jobs-to-be-Done
 
Nuevo folleto del master marketing politico UCV curso 2015-16
Nuevo folleto del master marketing politico UCV curso 2015-16Nuevo folleto del master marketing politico UCV curso 2015-16
Nuevo folleto del master marketing politico UCV curso 2015-16
 
General presentation pshpp Hidro TARNITA
General presentation pshpp Hidro TARNITAGeneral presentation pshpp Hidro TARNITA
General presentation pshpp Hidro TARNITA
 
Congreso de salud_ocupacional
Congreso de salud_ocupacionalCongreso de salud_ocupacional
Congreso de salud_ocupacional
 

Ähnlich wie Using Sphinx for Search in PHP

ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineDaniel N
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with LuceneWO Community
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Vinay Kumar
 
Sphinx new
Sphinx newSphinx new
Sphinx newrit2010
 
Asp.Net 3.5 Part 2
Asp.Net 3.5 Part 2Asp.Net 3.5 Part 2
Asp.Net 3.5 Part 2asim78
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBAndrew Siemer
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013Roy Russo
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Adrien Grand
 
Plugin Opensql2008 Sphinx
Plugin Opensql2008 SphinxPlugin Opensql2008 Sphinx
Plugin Opensql2008 SphinxLiu Lizhi
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherObjectRocket
 
Web indexing finale
Web indexing finaleWeb indexing finale
Web indexing finaleAjit More
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیEhsan Asgarian
 
Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Lutf Ur Rehman
 
ELK stack introduction
ELK stack introduction ELK stack introduction
ELK stack introduction abenyeung1
 

Ähnlich wie Using Sphinx for Search in PHP (20)

ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
Sphinx new
Sphinx newSphinx new
Sphinx new
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Asp.Net 3.5 Part 2
Asp.Net 3.5 Part 2Asp.Net 3.5 Part 2
Asp.Net 3.5 Part 2
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015
 
Plugin Opensql2008 Sphinx
Plugin Opensql2008 SphinxPlugin Opensql2008 Sphinx
Plugin Opensql2008 Sphinx
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better Together
 
Web indexing finale
Web indexing finaleWeb indexing finale
Web indexing finale
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
 
Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }
 
Elastic search
Elastic searchElastic search
Elastic search
 
ELK stack introduction
ELK stack introduction ELK stack introduction
ELK stack introduction
 

Kürzlich hochgeladen

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Using Sphinx for Search in PHP

  • 1. Using Sphinx for Search Mike Lively Slickdeals, LLC
  • 2. What is Sphinx? • A full-text search engine • Quickly get high quality (relevant) results • Designed to integrate well with SQL RDBMS • Can work with any data source • Can be queried using either an API or SQL
  • 3. How do I know anything about Sphinx? • Manager of Software Architecture for Slickdeals.net • Alexa top 150 site (in the US) • Have been working at improving our Sphinx search engine for the last 2 months or so. • Over 7 Million searches a month directly through the interface, lots more happen indirectly.
  • 4. When should I use Sphinx? • Site / Product / Document searches • Auto-suggest / Auto-Correct functionality • Finding relevant and related items
  • 5. Simple Architecture • Often, search is offloaded straight to the database • Search goes to the backend which performs queries on the database • Obviously very easy to implement
  • 6. Simple Architecture • Simple “starts with” searches on indexed fields can sometimes work: `city` LIKE ‘Las%’ • Anything else will lock your database for writes with MyISAM. • MySQL is not a great or flexible full text engine • It can sometimes be adequate
  • 7. Sphinx Architecture • Searchd is responsible for receiving requests from clients and executing the searches against the sphinx index. • Indexer is responsible for getting data into the sphinx index. • This separation allows indexing and searching to be scaled separately.
  • 8. Sphinx Architecture • Searchd has a binary protocol for which there are several clients available in multiple languages. • Searchd is also binary compatible with MySQL’s protocol since mysql 4.1 • Searchd is a daemon that runs on your search servers
  • 9. Sphinx Architecture • Indexer is a shell program that you can execute to build any number of indexes. • Can handle index rotation for live indexing
  • 10. Not So Quick Side Note MySQL IS SLOWWWWWWWWWWWWW (at text matches)
  • 11. Still Not Quick Side Note Indexes won’t help you…
  • 12. Quicker Side Note Full Text Search isn’t so bad IF….
  • 13. Sphinx Concepts • Sphinx Indexes “Documents” • Each document has a unique unsigned, non- zero integer ID (either 32 bit or 64 bit space) • Each document has one or more fields • Each document has zero or more attributes
  • 14. Indexes / Sources • Sphinx indexes are created from one or more sources. • The source can be a database, xml, or tsv stream. • You can use multiple sources • This is useful for maintaining updated indexes • Also used to implement a sphinx cluster
  • 15. Sphinx Fields • Fields are what the full text index is comprised of. • When searching you can search against any number of fields. • You can assign different relevancy weights to different fields. • The original value of a field is never stored by Sphinx. • You should always have at least one.
  • 16. Sphinx Attributes • data that helps further describe the item being indexed • Can be returned as a part of the search • Useful for filtering and sorting results • These are not a part of the full text index.
  • 17. MySQL Full Text Search • You can get away with MyISAM tables or as of version 5.6 InnoDB. • You don’t care about morphology (think plurals) • You don’t need anything but the most basic of search operators
  • 18. Creating An Index • We are going to add an index that sources a mysql database. • The data being sourced is a list of the titles of wikipedia posts.
  • 20. Indexer Configuration • We are going to be peaking into a sphinx configuration file now. • You can rebuild the config file by concatenating each section into a single file. • On my VM this file is located in /usr/local/etc/ sphinx.conf
  • 22. Source Definition Defines the connection information
  • 23. Connection information • Ideally, you should create a separate account for sphinx • You can also connect via unix socket • I didn’t specify it here, but you can also add a port.
  • 24. Source Definition The query that pulls data to populate the index
  • 25. Source Index • The index query MUST return the id field as the first column • Remember, the id needs to be a unique, unsigned 64 bit (or less number) • The query must be on a single line. Unless you escape new lines with back slashes. • Notice that we converted the timestamp into a unix timestamp. That is important.
  • 26. Source Definition How data is stored in the index
  • 27. Source Fields • The first column in the query is always the ID. • You specify any columns that are attributes. • Remember, attributes are stored in the index as fields that can be used to filter and sort by. • Any field besides the id that is not specified as an attribute, is assumed to be a text field (title)
  • 29. Index Definition • An Index includes one or more sources. • Each source gets it’s own “source” line • Multiple sources must all define the same fields and attributes. • The ids need to be unique across resources
  • 30. Index Definition • path is not actually a path, it’s a filename with no extension. • docinfo dictates if attributes are stored in the index or outside of the index. • dict is not really important now. Used to be either crc or keywords. Now crc is deprecated. • min_word_len is the minimum length of words to index
  • 31. Rest of the Index Configuration
  • 32. It’s time to build the index indexer <index name>
  • 33. Searching the Index • searchd is the daemon that searches the index • Binary Protocol
 
 OR • MySQL Compatible too!
  • 34. searchd config Included in the same config file as the rest
  • 38. MySQL Compatible • Tables == Indexes • SHOW TABLES…Shows indexes. • Select * From <index> works too.
  • 40. Querying Indexes • Default limit of 20 rows • Notice the text fields are not returned… • They would be if we made them attributes (sql_field_string)
  • 41. Querying Indexes • The magic function in SphinxQL is match() • match() performs a full text search against the entire index…usually • The ‘@field’ operator can isolate which field is searched on.
  • 42. Querying Indexes • You can query against attributes • You can sort results • You can use the weight() function to determine relevancy.
  • 43. Querying Indexes • The 25387283 title was more relevant because it matched on the term “testing”
  • 44. Getting PHP into the mix • All we need? PDO. • We will build a basic search page • Accepts a query, displays up to 100 matching results by relevancy with the matching keywords highlighted.
  • 45.
  • 47. Fetching the data from Mysql
  • 48. Adding the fancy yellow highlighting
  • 49. The rest is pretty basic…
  • 50. Cool things we would talk about if I had like…3 more hours • Auto-suggest, Auto-correct • More on lemmatization and stemming • Distributed Sphinx Clustering • Delta indexes • Real Time Indexes • The plethora of operators you can use • Ranged Queries • ………
  • 51. Additional Information • The sphinx documentation is actually pretty great • http://sphinxsearch.com/docs/ • Slides are already on Slideshare • Will link them to the meet up shortly