SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Indexing Stuff &&
Things with Sphinx and Perl
Houston Perl Mongers
May 8th, 2014
Hosted by cPanel, Inc.
Brett Estrade <estrabd@gmail.com>
Sphinx
● full text search indexer and daemon
● indexer - builds indexes
● searchd - services search requests
● very easy to install and configure
Sphinx Data Sources
● Directly from MySQL (MariaDB), PostgreSQL
○ Indexing data from arbitrary SQL
○ Excellent for fast reading of expensive JOINs
● XMLPipe2
○ General intermediate data understood by Sphinx
Search Interface
● Native protocol (e.g., Sphinx::Search)
● Supports MySQL protocol (4.1)
○ Subset of SQL supported is called SphinxQL
indexer data
named index for
searchd
searchd config
Client Example - Sphinx::Search
search term -
empty string
returns “all”
Search Results
Some Common Use Cases
● Rebuild index from database regularly
● Incrementally add to existing index
● Query Sphinx for DB primary keys, make DB
call for related rows
● Query Sphinx for wanted data (no DB at all)
== my use case
Real Life Examples
1. Indexing MariaDB
2. Filtering on string using CRC32
3. Creating sources w/Sphinx::XML::Pipe2
4. Dynamic config w/Sphinx::Config::Builder
Indexing MariaBD ~2.25 Million Rows
● Use case - saving eBay auction data in DB
● Providing search interface to it
● Demo run of indexer
How to Filter on Strings
● Requires CRC32 hashing (strings to ints)
● When indexing, use MySQL’s CRC32 function
● Use Perl’s String::CRC32 to encode string,
○ then set filter
And inside of client, use Perl’s String::CRC32 to encode to the same integer
Transforming Things to XMLPipe2
● XMLPipe2 is Sphinx’s generic data format
● Extract/Transform scripts -> XMLPipe2
● use Sphinx::XML::Pipe2; #’nuff said
Sample XMLPipe2 File
Sample XMLPipe2 Source Conf Entry
Example XMLPipe2 Use Case
● Monitor ephemera,e.g. active eBay listings
● Don’t want to use a database
● Many data partitions (i.e., indexes)
○ e.g., by store, by category, etc
○ > 250 (yikes!)
● Data partitions change over time (slowly)
Dynamic Indexing of XMLPipe2 Stuff
● Fact - Sphinx partitions data by indexes
● Problem - each index uses its own data file
○ data as XMLPipe2
● Challenge - how to manage a changing set
of indexes?
Sphinx’s --config to the Rescue!
● Config files are typically static, right?
● Sphinx can handle executables via --config
● indexer --config ./generate-config.pl --all
Sphinx::Config::Builder
● Module I created specifically for this case
○ uploaded to CPAN
● Why? No Sphinx config builders were a fit
● Module is low level and does what I need
○ i.e., dynamically builds a XMLPipe2 specific config
● A+ 100 Passing
○ http://cpantesters.org/distro/S/Sphinx-Config-Builder.html
Solution
● Expects XML2Pipe data files to already exist
● Iterate over array of indexes to build
● Creates “source” entries for XMLPipe2 data
● Creates “index” entries for each “source”
Demo
Tip of the Iceberg
● Sphinx has TONs of options and modes
● Tons of areas of application
● Many clients, Simple interface
● Super easy to install and maintain
Thank You!
● http://sphinxsearch.com/
● cpan://Sphinx::Search
● cpan://Sphinx::Config::Builder
● http://houston.pm.org

Weitere ähnliche Inhalte

Was ist angesagt?

Prometheus london
Prometheus londonPrometheus london
Prometheus londonwyukawa
 
Writing Well-Behaved Unix Utilities
Writing Well-Behaved Unix UtilitiesWriting Well-Behaved Unix Utilities
Writing Well-Behaved Unix UtilitiesRob Miller
 
Node collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsNode collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsm_richardson
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Andrii Vozniuk
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodbDeep Kapadia
 
Mongo db admin_20110316
Mongo db admin_20110316Mongo db admin_20110316
Mongo db admin_20110316radiocats
 
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Jeremy Zawodny
 
Presto in my_use_case2
Presto in my_use_case2Presto in my_use_case2
Presto in my_use_case2wyukawa
 
Spark Workflow Management
Spark Workflow ManagementSpark Workflow Management
Spark Workflow ManagementRomi Kuntsman
 
Deployment of xlwings-powered spreadsheets (webinar)
Deployment of xlwings-powered spreadsheets (webinar)Deployment of xlwings-powered spreadsheets (webinar)
Deployment of xlwings-powered spreadsheets (webinar)xlwings
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At CraigslistJeremy Zawodny
 
Fusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistFusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistJeremy Zawodny
 

Was ist angesagt? (20)

Kafka Workshop
Kafka WorkshopKafka Workshop
Kafka Workshop
 
Prometheus london
Prometheus londonPrometheus london
Prometheus london
 
Xephon K A Time series database with multiple backends
Xephon K A Time series database with multiple backendsXephon K A Time series database with multiple backends
Xephon K A Time series database with multiple backends
 
Writing Well-Behaved Unix Utilities
Writing Well-Behaved Unix UtilitiesWriting Well-Behaved Unix Utilities
Writing Well-Behaved Unix Utilities
 
Node collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsNode collaboration - sharing information between your systems
Node collaboration - sharing information between your systems
 
Containers and Logging
Containers and LoggingContainers and Logging
Containers and Logging
 
Scrapy.for.dummies
Scrapy.for.dummiesScrapy.for.dummies
Scrapy.for.dummies
 
Logstash
LogstashLogstash
Logstash
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodb
 
What Reika Taught us
What Reika Taught usWhat Reika Taught us
What Reika Taught us
 
Mongo db admin_20110316
Mongo db admin_20110316Mongo db admin_20110316
Mongo db admin_20110316
 
DrupalANDElasticsearch
DrupalANDElasticsearchDrupalANDElasticsearch
DrupalANDElasticsearch
 
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
 
Presto in my_use_case2
Presto in my_use_case2Presto in my_use_case2
Presto in my_use_case2
 
Spark Workflow Management
Spark Workflow ManagementSpark Workflow Management
Spark Workflow Management
 
Fluentd and AWS at classmethod
Fluentd and AWS at classmethodFluentd and AWS at classmethod
Fluentd and AWS at classmethod
 
Deployment of xlwings-powered spreadsheets (webinar)
Deployment of xlwings-powered spreadsheets (webinar)Deployment of xlwings-powered spreadsheets (webinar)
Deployment of xlwings-powered spreadsheets (webinar)
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Fusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistFusion-io and MySQL at Craigslist
Fusion-io and MySQL at Craigslist
 

Andere mochten auch

Pressure Groups and the Millennium Development Goals
Pressure Groups and the Millennium Development GoalsPressure Groups and the Millennium Development Goals
Pressure Groups and the Millennium Development Goalslucyannemorgan
 
Work WIth Redis and Perl
Work WIth Redis and PerlWork WIth Redis and Perl
Work WIth Redis and PerlBrett Estrade
 
Qore for the Perl Programmer
Qore for the Perl ProgrammerQore for the Perl Programmer
Qore for the Perl ProgrammerBrett Estrade
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
 

Andere mochten auch (6)

Pressure Groups and the Millennium Development Goals
Pressure Groups and the Millennium Development GoalsPressure Groups and the Millennium Development Goals
Pressure Groups and the Millennium Development Goals
 
Work WIth Redis and Perl
Work WIth Redis and PerlWork WIth Redis and Perl
Work WIth Redis and Perl
 
6 Suffering Of Christ
6 Suffering Of Christ6 Suffering Of Christ
6 Suffering Of Christ
 
Openmp combined
Openmp combinedOpenmp combined
Openmp combined
 
Qore for the Perl Programmer
Qore for the Perl ProgrammerQore for the Perl Programmer
Qore for the Perl Programmer
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 

Ähnlich wie Sphinx && Perl Houston Perl Mongers - May 8th, 2014

A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientistsStitch Fix Algorithms
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013Emanuel Calvo
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django applicationbangaloredjangousergroup
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_enOgibayashi
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at UberDatabricks
 
Mongodb Performance
Mongodb PerformanceMongodb Performance
Mongodb PerformanceJack
 
InfiniFlux Feature perf comp_v1
InfiniFlux Feature perf comp_v1InfiniFlux Feature perf comp_v1
InfiniFlux Feature perf comp_v1InfiniFlux
 
There is Javascript in my SQL
There is Javascript in my SQLThere is Javascript in my SQL
There is Javascript in my SQLPGConf APAC
 
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data typePostgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data typeJumping Bean
 
TYPO3 Transition Tool
TYPO3 Transition ToolTYPO3 Transition Tool
TYPO3 Transition Toolcrus0e
 
Wattpad - Spark Stories
Wattpad - Spark StoriesWattpad - Spark Stories
Wattpad - Spark StoriesRylan Halteman
 
IniniFlux Feature_Perf_Comparison
IniniFlux Feature_Perf_ComparisonIniniFlux Feature_Perf_Comparison
IniniFlux Feature_Perf_ComparisonInfiniFlux
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark TutorialAhmet Bulut
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaMushfekur Rahman
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic WebSteffen Staab
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
Serverless Clojure and ML prototyping: an experience report
Serverless Clojure and ML prototyping: an experience reportServerless Clojure and ML prototyping: an experience report
Serverless Clojure and ML prototyping: an experience reportMetosin Oy
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Max Lapan
 
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic WebESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Webeswcsummerschool
 

Ähnlich wie Sphinx && Perl Houston Perl Mongers - May 8th, 2014 (20)

A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientists
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013
 
Scaling / optimizing search on netlog
Scaling / optimizing search on netlogScaling / optimizing search on netlog
Scaling / optimizing search on netlog
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_en
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
 
Mongodb Performance
Mongodb PerformanceMongodb Performance
Mongodb Performance
 
InfiniFlux Feature perf comp_v1
InfiniFlux Feature perf comp_v1InfiniFlux Feature perf comp_v1
InfiniFlux Feature perf comp_v1
 
There is Javascript in my SQL
There is Javascript in my SQLThere is Javascript in my SQL
There is Javascript in my SQL
 
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data typePostgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data type
 
TYPO3 Transition Tool
TYPO3 Transition ToolTYPO3 Transition Tool
TYPO3 Transition Tool
 
Wattpad - Spark Stories
Wattpad - Spark StoriesWattpad - Spark Stories
Wattpad - Spark Stories
 
IniniFlux Feature_Perf_Comparison
IniniFlux Feature_Perf_ComparisonIniniFlux Feature_Perf_Comparison
IniniFlux Feature_Perf_Comparison
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic Web
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Serverless Clojure and ML prototyping: an experience report
Serverless Clojure and ML prototyping: an experience reportServerless Clojure and ML prototyping: an experience report
Serverless Clojure and ML prototyping: an experience report
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021
 
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic WebESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
 

Kürzlich hochgeladen

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Kürzlich hochgeladen (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Sphinx && Perl Houston Perl Mongers - May 8th, 2014

  • 1. Indexing Stuff && Things with Sphinx and Perl Houston Perl Mongers May 8th, 2014 Hosted by cPanel, Inc. Brett Estrade <estrabd@gmail.com>
  • 2. Sphinx ● full text search indexer and daemon ● indexer - builds indexes ● searchd - services search requests ● very easy to install and configure
  • 3. Sphinx Data Sources ● Directly from MySQL (MariaDB), PostgreSQL ○ Indexing data from arbitrary SQL ○ Excellent for fast reading of expensive JOINs ● XMLPipe2 ○ General intermediate data understood by Sphinx
  • 4. Search Interface ● Native protocol (e.g., Sphinx::Search) ● Supports MySQL protocol (4.1) ○ Subset of SQL supported is called SphinxQL
  • 8. Client Example - Sphinx::Search search term - empty string returns “all”
  • 10. Some Common Use Cases ● Rebuild index from database regularly ● Incrementally add to existing index ● Query Sphinx for DB primary keys, make DB call for related rows ● Query Sphinx for wanted data (no DB at all) == my use case
  • 11. Real Life Examples 1. Indexing MariaDB 2. Filtering on string using CRC32 3. Creating sources w/Sphinx::XML::Pipe2 4. Dynamic config w/Sphinx::Config::Builder
  • 12. Indexing MariaBD ~2.25 Million Rows ● Use case - saving eBay auction data in DB ● Providing search interface to it ● Demo run of indexer
  • 13. How to Filter on Strings ● Requires CRC32 hashing (strings to ints) ● When indexing, use MySQL’s CRC32 function ● Use Perl’s String::CRC32 to encode string, ○ then set filter
  • 14. And inside of client, use Perl’s String::CRC32 to encode to the same integer
  • 15. Transforming Things to XMLPipe2 ● XMLPipe2 is Sphinx’s generic data format ● Extract/Transform scripts -> XMLPipe2 ● use Sphinx::XML::Pipe2; #’nuff said
  • 18. Example XMLPipe2 Use Case ● Monitor ephemera,e.g. active eBay listings ● Don’t want to use a database ● Many data partitions (i.e., indexes) ○ e.g., by store, by category, etc ○ > 250 (yikes!) ● Data partitions change over time (slowly)
  • 19. Dynamic Indexing of XMLPipe2 Stuff ● Fact - Sphinx partitions data by indexes ● Problem - each index uses its own data file ○ data as XMLPipe2 ● Challenge - how to manage a changing set of indexes?
  • 20. Sphinx’s --config to the Rescue! ● Config files are typically static, right? ● Sphinx can handle executables via --config ● indexer --config ./generate-config.pl --all
  • 21. Sphinx::Config::Builder ● Module I created specifically for this case ○ uploaded to CPAN ● Why? No Sphinx config builders were a fit ● Module is low level and does what I need ○ i.e., dynamically builds a XMLPipe2 specific config ● A+ 100 Passing ○ http://cpantesters.org/distro/S/Sphinx-Config-Builder.html
  • 22. Solution ● Expects XML2Pipe data files to already exist ● Iterate over array of indexes to build ● Creates “source” entries for XMLPipe2 data ● Creates “index” entries for each “source”
  • 23. Demo
  • 24. Tip of the Iceberg ● Sphinx has TONs of options and modes ● Tons of areas of application ● Many clients, Simple interface ● Super easy to install and maintain
  • 25. Thank You! ● http://sphinxsearch.com/ ● cpan://Sphinx::Search ● cpan://Sphinx::Config::Builder ● http://houston.pm.org