SlideShare a Scribd company logo
1 of 40
Download to read offline
Full Text Search in MySQL 5.1
                     New Features and HowTo


                                 Alexander Rubin
                          Senior Consultant, MySQL AB




Copyright 2006 MySQL AB                        The World’s Most Popular Open Source Database   1
Full Text search

      • Natural and popular way to
        search for information
      • Easy to use: enter key words
        and get what you need




Copyright 2006 MySQL AB                 The World’s Most Popular Open Source Database   2
In this presentation

      • Improvements in FT Search in
        MySQL 5.1
      • How to speed up MySQL FT Search
      • How to search with error corrections
      • Benchmark results




Copyright 2006 MySQL AB                   The World’s Most Popular Open Source Database   3
Types of FT Search: Relevance




                      - MySQL FT: Default by relevance!

Copyright 2006 MySQL AB                     The World’s Most Popular Open Source Database   4
Types of FT Search: Boolean Search




                          - MySQL FT: No default sorting!

Copyright 2006 MySQL AB                         The World’s Most Popular Open Source Database   5
Types of FT Search: Phrase Search




Copyright 2006 MySQL AB           The World’s Most Popular Open Source Database   6
Full Text Solutions
   Type                           Solution

   MySQL Built-in                 Full Text Index
                                  (MyISAM only)
   MySQL                          Sphinx
   Integrated/External
   External                       Lucene
                                  MnogoSearch
   “Hardware boxes”               Google box
                                  “Fast” box
Copyright 2006 MySQL AB                      The World’s Most Popular Open Source Database   7
MySQL Full Text Index Features

      • Available only for MyISAM tables
      • Natural Language Search and boolean
        search
      • ft_min_word_len – 4 char per word by
        default
      • Stop word list by default
      • Frequency based ranking
             – Distance between words is not counted



Copyright 2006 MySQL AB                     The World’s Most Popular Open Source Database   8
MySQL Full Text: Creating Full Text Index

  mysql> CREATE TABLE articles (
  -> id INT UNSIGNED
   AUTO_INCREMENT NOT NULL
   PRIMARY KEY,
  -> title VARCHAR(200),
  -> body TEXT,
  -> FULLTEXT (title,body)
  -> ) engine=MyISAM;
Copyright 2006 MySQL AB                  The World’s Most Popular Open Source Database   9
MySQL Full Text: Natural Language mode

    mysql> SELECT * FROM articles
    -> WHERE MATCH (title,body)
    -> AGAINST ('database' IN
     NATURAL LANGUAGE MODE);
    +-------------------+------------------------------------------+
    | title | body                                                 |
    +-------------------+------------------------------------------+
    | MySQL vs. YourSQL | In the following database comparison ... |
    | MySQL Tutorial | DBMS stands for DataBase ...                |


                          In Natural Language Mode:
                          default sorting by relevance!

Copyright 2006 MySQL AB                         The World’s Most Popular Open Source Database   10
MySQL Full Text: Boolean mode

      mysql> SELECT * FROM
       articles
      -> WHERE MATCH (title,body)
      -> AGAINST (‘cat AND dog'
       IN BOOLEAN MODE);


                     No default sorting in Boolean Mode!

Copyright 2006 MySQL AB                       The World’s Most Popular Open Source Database   11
New MySQL 5.1 Full Text Features




Copyright 2006 MySQL AB   The World’s Most Popular Open Source Database   12
New MySQL 5.1 Full Text Features
       • Faster Boolean search in MySQL 5.1
            – New “smart” index merge is implemented
            (forge.mysql.com/worklog/task.php?id=2535)


     • Custom Plug-ins
            – Replacing Default Parser

     • Better Unicode Support
            – full text index work more accurately with
              space and punctuation Unicode character
            (forge.mysql.com/worklog/task.php?id=1386)
Copyright 2006 MySQL AB                  The World’s Most Popular Open Source Database   13
MySQL 5.0 vs 5.1 Benchmark
      • MySQL 5.1 Full Text search:
        500-1000% improvement in Boolean
        mode
             – relevance based search and phrase search
               was not improved in MySQL 5.1.

      • Tested with: Data and Index
             – CDDB (music database)
             – author and title, 2 mil. CDs., varchar(255).
             – CDDB ~= amazon.com’s CD/books
               inventory
Copyright 2006 MySQL AB                    The World’s Most Popular Open Source Database   14
MySQL 5.1: 500-1000% improvement in Boolean mode




 Copyright 2006 MySQL AB      The World’s Most Popular Open Source Database   15
MySQL 5.1 Full Text: Custom “Plugins”
     • Replacing Default Parser to do following:
            – Apply special rules, such as stemming,
              different way of splitting words etc
            – Pre-parsing – processing PDF / HTML files

     • May do same for query string
            – If use stemming, search words also need to
              be stemmed for search to work.



Copyright 2006 MySQL AB                 The World’s Most Popular Open Source Database   16
Available Plugins Examples

      • Different plugins: search for “FullText” at
        forge.mysql.com/projects
      • Stemming
             – MnogoSearch has a stemming plugin
               (www.mnogosearch.com)
             – Porter stemming fulltext plugin
      • N-Gram parsers (Japanese language)
             – Simple n-gram fulltext plugin


Copyright 2006 MySQL AB                   The World’s Most Popular Open Source Database   17
Example: MnogoSearch Stemming

      • MnogoSearch includes stemming plugin
        (www.mnogosearch.org/doc/msearch-
        udmstemmer.html)
      • Configure and install:
             – Configure (follow instructions)
             – mysql> INSTALL PLUGIN stemming SONAME
               'libmnogosearch.so';
             – CREATE TABLE my_table ( my_column TEXT,
               FULLTEXT(my_column) WITH PARSER
               stemming );
             – SELECT * FROM t WHERE MATCH a
               AGAINST('test' IN BOOLEAN MODE);
Copyright 2006 MySQL AB                  The World’s Most Popular Open Source Database   18
Example: MnogoSearch Stemming

      • Configuration: stemming.conf
             – MinWordLength 2
             – Spell en latin1 american.xlg
             – Affix en latin1 english.aff
      • Grab Ispell (not Aspell) dictionaries from
        http://lasr.cs.ucla.edu/geoff/ispell-
        dictionaries.html#English-dicts
      • Any changes in stemming.conf requires
        MySQL restart
Copyright 2006 MySQL AB                   The World’s Most Popular Open Source Database   19
Example: MnogoSearch Stemming

                Stemming adds overhead on insert/update
      mysql> insert into searchindex_stemmer
      select * from enwiki.searchindex limit
      10000;
      Query OK, 10000 rows affected (44.03
      sec)

      mysql> insert into searchindex select
      * from enwiki.searchindex limit 10000;
      Query OK, 10000 rows affected (21.80
      sec)
Copyright 2006 MySQL AB                    The World’s Most Popular Open Source Database   20
Example: MnogoSearch Stemming
      mysql> SELECT count(*) FROM
      searchindex WHERE MATCH si_text
      AGAINST('color' IN BOOLEAN
      MODE);
      count(*): 861
      mysql> SELECT count(*) FROM
      searchindex_stemmer WHERE MATCH
      si_text AGAINST('color' IN
      BOOLEAN MODE);
      count(*): 1017
Copyright 2006 MySQL AB         The World’s Most Popular Open Source Database   21
Other Planned Full Text Features

      • Search for “FullText” at forge.mysql.com
      • CTYPE table for unicode character sets
        (WL#1386), complete
      • Enable fulltext search for non-MyISAM
        engines (WL#2559), Assigned
      • Stemming for fulltext (WL#2423), Assigned
      • Combined BTREE/FULLTEXT indexes
        (WL#828)
      • Many other features, YOU can vote for some

Copyright 2006 MySQL AB             The World’s Most Popular Open Source Database   22
MySQL Full Text HowTo
                Tips and Tricks




Copyright 2006 MySQL AB    The World’s Most Popular Open Source Database   23
DRBD and FullText Search

  Active DRBD                                              Passive DRBD
                                   Synchronous
     Server                                                   Server
                                 Block Replication
    InnoDB                                                   InnoDB
     Tables                                                   Tables

                                                                                  Web/App
                                                                                   Server

                                                                            Full Text
                                                                            Requests
 Normal Replication Slave                    “FullText” Slaves
     InnoDB Tables                            MyISAM Tables
                                             with FT Indexes

Copyright 2006 MySQL AB                              The World’s Most Popular Open Source Database   24
Using MySQL 5.1 as a Slave
                               Master (MySQL 5.0)

                     Web/App
                      Server

                 Normal
                Requests

                                                                                 Web/App
                                                                                  Server

                                                                           Full Text
                                                                           Requests


                  Normal Slave             Full Text Slave
                  (MySQL 5.0)               (MySQL 5.1)

Copyright 2006 MySQL AB                             The World’s Most Popular Open Source Database   25
How To: Speed up MySQL FT Search

                          Fit index into memory!
             – Increase amount of RAM
             – Set key_buffer = <total size of full text
               index>. Max 4GB!
             – Preload FT indexes into buffer
                   • Use additional keys for FT index (to
                     solve 4GB limit problem)



Copyright 2006 MySQL AB                      The World’s Most Popular Open Source Database   26
SpeedUp FullText:
                   Preload FT indexes into buffer

                   mysql> set global
                    ft_key.key_buffer_size=
                   4*1024*1024*1024;
                   mysql> CACHE INDEX S1, S2,
                    <some other tables here>
                    IN ft_key;
                   mysql> LOAD INDEX INTO
                    CACHE S1, S2 …;
Copyright 2006 MySQL AB                 The World’s Most Popular Open Source Database   27
How To: Speed up MySQL FT Search

      • Manual partitioning
             – Partitioning will decrease index
               and table size
                   • Search and updates will be faster
             – Need to change application/no
               auto partitioning


Copyright 2006 MySQL AB                   The World’s Most Popular Open Source Database   28
How To: Speed up MySQL FT Search

      • Setup number of slaves for search
             – Decrease number of queries for each
               box
             – Decrease CPU usage (sorting is
               expensive)
             – Each slave can have its own data
                   • Example: search for east coast – Slaves
                     1-5, search for west coast Slaves 6-10


Copyright 2006 MySQL AB                     The World’s Most Popular Open Source Database   29
FT Scale-Out with MySQL Replication
                                    Master

                     Web/App
                      Server

                 Write
                Requests




                          Slave 1    Slave 2        Slave 3
                                                                                   Web/App
                                                                                    Server
                                    Load Balancer
                                                                             Full Text
                                                                             Requests
Copyright 2006 MySQL AB                               The World’s Most Popular Open Source Database   30
Which queries are performance killers
        – Order by/Group by
                • Natural language mode: order by relevance
                • Boolean mode: no default sorting!
                • Order by date much slower than with no
                  “order by”
         – SQL_CALC_FOUND_ROWS
                • Select SQL_CALC_FOUND_ROWS from T1 …
                  limit 10
                • Will require all result set
         – Other condition in where clause
                • MySQL can use either FT index or other
                  indexes (only FT with natural language)
Copyright 2006 MySQL AB                       The World’s Most Popular Open Source Database   31
Real World Example: Performance Killer

                          Why it is so slow?

      SELECT … FROM `ft`
      WHERE MATCH `album`
      AGAINST
      (‘the way i am’)



Copyright 2006 MySQL AB                The World’s Most Popular Open Source Database   32
Real World Example: Performance Killer

                          Note the stopword list and
                                ft_min_word_len!

             The - stopword
                                                            My.cnf:
             Way - stopword
                                                ft_min_word_len =1
                 - not a stop word
             I
             Am - stopword

           query “the way i am” will filter out all words except “i”
           with standard stoplist and with ft_min_word_len =1 in
           my.cnf
Copyright 2006 MySQL AB                         The World’s Most Popular Open Source Database   33
How To: Search with error correction

      • Example: Music Search Engine
             – Search for music titles/actors
             – Need to correct users typos
                • Bob Dilane (user made typos) ->
                   – Bob Dylan (corrected)
      • Solution:
             – use soundex() mysql function
                • Soundex = sounds similar
                mysql> select soundex('Dilane');
                D450
                mysql> select soundex('Dylan');
                D450
Copyright 2006 MySQL AB                     The World’s Most Popular Open Source Database   34
Search with error corrections

      • Implementation
             1.Alter table artists add
               art_name_sndex varchar(80)
             2.Update artists set
               art_name_sndex =
               soundex(art_name)
             3.Select art_name from artists
               where art_name_sndex =
               soundex(‘Bob Dilane') limit 10

Copyright 2006 MySQL AB                The World’s Most Popular Open Source Database   35
Search with error corrections: Sorting
      • Popularity of the artist
             – Select art_name from artists where
               art_name_sndex = soundex(‘Dilane')
               order by popularity limit 10
      • Most similar matches fist – order by
        levenstein distance
             – The Levenshtein distance between two strings
               = minimum number of operations needed to
               transform one string into the other
             – Levenstein can be done by stored function or
               UDF

Copyright 2006 MySQL AB                   The World’s Most Popular Open Source Database   36
Sphinx Search
      • Features
             – Open Source, http://www.sphinxsearch.com
             – Fast searches for the large data
             – Designed for indexing Database content
             – Supports multi-node clustering out of box, Multiple
               Attributes (date, price, etc)
             – Different sort modes (relevance, data etc)
             – Client available as MySQL Storage Engine plugin
             – Fast Index creation (up to 10M/sec)
      • Disadvantages:
             – External solution: need to be integrated
             – No online index updates (have to build whole
               index)
Copyright 2006 MySQL AB                        The World’s Most Popular Open Source Database   37
How to configure Sphinx with MySQL

      • Sphinx engine/plugin is not full engine:
             – still need to run “searcher” daemon

      • Need to compile MySQL source with
        Sphinx to integrate it
        – MySQL 5.0: need to patch source
          code
        – MySQL 5.1: no need to patch, copy
          Sphinx plugin to plugin dir

Copyright 2006 MySQL AB                   The World’s Most Popular Open Source Database   38
How to Integrate Sphinx with MySQL
      • Sphinx can be MySQL’s storage engine
      CREATE TABLE t1
         (id            INTEGER NOT NULL,
          weight        INTEGER NOT NULL,
          query         VARCHAR(3072) NOT NULL,
          group_id      INTEGER,
          INDEX(query)
         ) ENGINE=SPHINX
           CONNECTION=quot;sphinx://localhost:
           3312/enwikiquot;;
             SELECT * FROM enwiki.searchindex docs
             JOIN test.t1 ON (docs.si_page=t1.id)
             WHERE query=quot;one document;mode=anyquot;
               limit 1;
Copyright 2006 MySQL AB              The World’s Most Popular Open Source Database   39
Time for questions




                          Questions?
                          www.mysqlfulltextsearch.com




Copyright 2006 MySQL AB                        The World’s Most Popular Open Source Database   40

More Related Content

What's hot

Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 
Mysql server query path
Mysql server query pathMysql server query path
Mysql server query pathWenjie Wu
 
Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop UsersKathleen Ting
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
HiveServer2 for Apache Hive
HiveServer2 for Apache HiveHiveServer2 for Apache Hive
HiveServer2 for Apache HiveCarl Steinbach
 
ใบความรู้ที่ 1 เรื่องความรู้เบื้องต้นเกี่ยวกับฐานข้อมูล
ใบความรู้ที่ 1 เรื่องความรู้เบื้องต้นเกี่ยวกับฐานข้อมูลใบความรู้ที่ 1 เรื่องความรู้เบื้องต้นเกี่ยวกับฐานข้อมูล
ใบความรู้ที่ 1 เรื่องความรู้เบื้องต้นเกี่ยวกับฐานข้อมูลNaowarat Jaikaroon
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Lucidworks
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...Duyhai Doan
 
Impala 2.0 Update #impalajp
Impala 2.0 Update #impalajpImpala 2.0 Update #impalajp
Impala 2.0 Update #impalajpCloudera Japan
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr Tommaso Teofili
 
Spark Cassandra 2016
Spark Cassandra 2016Spark Cassandra 2016
Spark Cassandra 2016Duyhai Doan
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsYahoo Developer Network
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 

What's hot (20)

Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Mysql server query path
Mysql server query pathMysql server query path
Mysql server query path
 
MySQL 5.7 + JSON
MySQL 5.7 + JSONMySQL 5.7 + JSON
MySQL 5.7 + JSON
 
Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop Users
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
HiveServer2 for Apache Hive
HiveServer2 for Apache HiveHiveServer2 for Apache Hive
HiveServer2 for Apache Hive
 
ใบความรู้ที่ 1 เรื่องความรู้เบื้องต้นเกี่ยวกับฐานข้อมูล
ใบความรู้ที่ 1 เรื่องความรู้เบื้องต้นเกี่ยวกับฐานข้อมูลใบความรู้ที่ 1 เรื่องความรู้เบื้องต้นเกี่ยวกับฐานข้อมูล
ใบความรู้ที่ 1 เรื่องความรู้เบื้องต้นเกี่ยวกับฐานข้อมูล
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
 
HBase lon meetup
HBase lon meetupHBase lon meetup
HBase lon meetup
 
Optimizing MySQL
Optimizing MySQLOptimizing MySQL
Optimizing MySQL
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Impala 2.0 Update #impalajp
Impala 2.0 Update #impalajpImpala 2.0 Update #impalajp
Impala 2.0 Update #impalajp
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr
 
Spark Cassandra 2016
Spark Cassandra 2016Spark Cassandra 2016
Spark Cassandra 2016
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 

Viewers also liked

Optimizing InnoDB bufferpool usage
Optimizing InnoDB bufferpool usageOptimizing InnoDB bufferpool usage
Optimizing InnoDB bufferpool usageZarafa
 
Web tools to foster creativity
Web tools to foster creativityWeb tools to foster creativity
Web tools to foster creativityPaula Ledesma
 
MySQL: Indexing for Better Performance
MySQL: Indexing for Better PerformanceMySQL: Indexing for Better Performance
MySQL: Indexing for Better Performancejkeriaki
 
Brushless DC Excitation Systems
Brushless DC Excitation SystemsBrushless DC Excitation Systems
Brushless DC Excitation SystemsSharmili Palli
 
Rumah Cantik dijual
Rumah Cantik dijualRumah Cantik dijual
Rumah Cantik dijualRaja Webinfo
 
Presentation Skills - Session One
Presentation Skills - Session OnePresentation Skills - Session One
Presentation Skills - Session OneKevin Smith
 
九方中文輸入法 特徵簡介
九方中文輸入法 特徵簡介九方中文輸入法 特徵簡介
九方中文輸入法 特徵簡介Warren Yip
 
From Collaboration To Social Intelligence
From Collaboration To Social IntelligenceFrom Collaboration To Social Intelligence
From Collaboration To Social Intelligencewww.panorama.com
 
Intext webdistilled elezioni 2011, aggiornamento del 5 Maggio
Intext webdistilled elezioni 2011, aggiornamento del 5 MaggioIntext webdistilled elezioni 2011, aggiornamento del 5 Maggio
Intext webdistilled elezioni 2011, aggiornamento del 5 MaggioMaurizio Pontremoli
 
Dialogbaseret Aftalestyring Powerpoint øKonomiudvalget 4. December 07
Dialogbaseret Aftalestyring   Powerpoint øKonomiudvalget 4. December 07Dialogbaseret Aftalestyring   Powerpoint øKonomiudvalget 4. December 07
Dialogbaseret Aftalestyring Powerpoint øKonomiudvalget 4. December 07ibsis
 
Presentacion I Cities 2009
Presentacion I Cities 2009Presentacion I Cities 2009
Presentacion I Cities 2009Fernando Martin
 
Enhance your microsoft bi stack to drive business user adoption slide share
Enhance your microsoft bi stack to drive business user adoption   slide shareEnhance your microsoft bi stack to drive business user adoption   slide share
Enhance your microsoft bi stack to drive business user adoption slide sharewww.panorama.com
 
Jordan Brain Gain First Meeting in Amman Report
Jordan Brain Gain First Meeting in Amman ReportJordan Brain Gain First Meeting in Amman Report
Jordan Brain Gain First Meeting in Amman ReportBayan Waleed Shadaideh
 
Ib learner profile
Ib learner profileIb learner profile
Ib learner profilekatyaSh
 
Direct Digital Marketing
Direct Digital MarketingDirect Digital Marketing
Direct Digital MarketingJim Hanika
 

Viewers also liked (20)

Optimizing InnoDB bufferpool usage
Optimizing InnoDB bufferpool usageOptimizing InnoDB bufferpool usage
Optimizing InnoDB bufferpool usage
 
Top 1m
Top 1mTop 1m
Top 1m
 
Web tools to foster creativity
Web tools to foster creativityWeb tools to foster creativity
Web tools to foster creativity
 
MySQL: Indexing for Better Performance
MySQL: Indexing for Better PerformanceMySQL: Indexing for Better Performance
MySQL: Indexing for Better Performance
 
Brushless DC Excitation Systems
Brushless DC Excitation SystemsBrushless DC Excitation Systems
Brushless DC Excitation Systems
 
Accessible ux
Accessible uxAccessible ux
Accessible ux
 
Rumah Cantik dijual
Rumah Cantik dijualRumah Cantik dijual
Rumah Cantik dijual
 
Presentation Skills - Session One
Presentation Skills - Session OnePresentation Skills - Session One
Presentation Skills - Session One
 
九方中文輸入法 特徵簡介
九方中文輸入法 特徵簡介九方中文輸入法 特徵簡介
九方中文輸入法 特徵簡介
 
From Collaboration To Social Intelligence
From Collaboration To Social IntelligenceFrom Collaboration To Social Intelligence
From Collaboration To Social Intelligence
 
Intext webdistilled elezioni 2011, aggiornamento del 5 Maggio
Intext webdistilled elezioni 2011, aggiornamento del 5 MaggioIntext webdistilled elezioni 2011, aggiornamento del 5 Maggio
Intext webdistilled elezioni 2011, aggiornamento del 5 Maggio
 
Dialogbaseret Aftalestyring Powerpoint øKonomiudvalget 4. December 07
Dialogbaseret Aftalestyring   Powerpoint øKonomiudvalget 4. December 07Dialogbaseret Aftalestyring   Powerpoint øKonomiudvalget 4. December 07
Dialogbaseret Aftalestyring Powerpoint øKonomiudvalget 4. December 07
 
Term 4 Example
Term 4 ExampleTerm 4 Example
Term 4 Example
 
Presentacion I Cities 2009
Presentacion I Cities 2009Presentacion I Cities 2009
Presentacion I Cities 2009
 
Teds Eport
Teds EportTeds Eport
Teds Eport
 
Enhance your microsoft bi stack to drive business user adoption slide share
Enhance your microsoft bi stack to drive business user adoption   slide shareEnhance your microsoft bi stack to drive business user adoption   slide share
Enhance your microsoft bi stack to drive business user adoption slide share
 
Jordan Brain Gain First Meeting in Amman Report
Jordan Brain Gain First Meeting in Amman ReportJordan Brain Gain First Meeting in Amman Report
Jordan Brain Gain First Meeting in Amman Report
 
Ib learner profile
Ib learner profileIb learner profile
Ib learner profile
 
Unenclosable
UnenclosableUnenclosable
Unenclosable
 
Direct Digital Marketing
Direct Digital MarketingDirect Digital Marketing
Direct Digital Marketing
 

Similar to Mysql Fulltext Search 1

Mysql Fulltext Search
Mysql Fulltext SearchMysql Fulltext Search
Mysql Fulltext Searchjohnymas
 
Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)referenceFromDual GmbH
 
MySQL Fulltext Search Tutorial
MySQL Fulltext Search TutorialMySQL Fulltext Search Tutorial
MySQL Fulltext Search TutorialZhaoyang Wang
 
Tagging and Folksonomy Schema Design for Scalability and Performance
Tagging and Folksonomy Schema Design for Scalability and PerformanceTagging and Folksonomy Schema Design for Scalability and Performance
Tagging and Folksonomy Schema Design for Scalability and PerformanceEduard Bondarenko
 
My sql crashcourse_intro_kdl
My sql crashcourse_intro_kdlMy sql crashcourse_intro_kdl
My sql crashcourse_intro_kdlsqlhjalp
 
DPC2007 MySQL Stored Routines for PHP Developers (Roland Bouman)
DPC2007 MySQL Stored Routines for PHP Developers (Roland Bouman)DPC2007 MySQL Stored Routines for PHP Developers (Roland Bouman)
DPC2007 MySQL Stored Routines for PHP Developers (Roland Bouman)dpc
 
MySQL For Oracle DBA's and Developers
MySQL For Oracle DBA's and DevelopersMySQL For Oracle DBA's and Developers
MySQL For Oracle DBA's and DevelopersRonald Bradford
 
Performance Tuning Best Practices
Performance Tuning Best PracticesPerformance Tuning Best Practices
Performance Tuning Best Practiceswebhostingguy
 
MySQL 5.6, news in 5.7 and our HA options
MySQL 5.6, news in 5.7 and our HA optionsMySQL 5.6, news in 5.7 and our HA options
MySQL 5.6, news in 5.7 and our HA optionsTed Wennmark
 
MySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB ClusterMySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB ClusterMario Beck
 
Optimizing MySQL for Cascade Server
Optimizing MySQL for Cascade ServerOptimizing MySQL for Cascade Server
Optimizing MySQL for Cascade Serverhannonhill
 
MySQL Storage Engines - which do you use? TokuDB? MyRocks? InnoDB?
MySQL Storage Engines - which do you use? TokuDB? MyRocks? InnoDB?MySQL Storage Engines - which do you use? TokuDB? MyRocks? InnoDB?
MySQL Storage Engines - which do you use? TokuDB? MyRocks? InnoDB?Sveta Smirnova
 
MySQL Day Paris 2016 - State Of The Dolphin
MySQL Day Paris 2016 - State Of The DolphinMySQL Day Paris 2016 - State Of The Dolphin
MySQL Day Paris 2016 - State Of The DolphinOlivier DASINI
 
MySQL developing Store Procedure
MySQL developing Store ProcedureMySQL developing Store Procedure
MySQL developing Store ProcedureMarco Tusa
 
SQL Now! How Optiq brings the best of SQL to NoSQL data.
SQL Now! How Optiq brings the best of SQL to NoSQL data.SQL Now! How Optiq brings the best of SQL to NoSQL data.
SQL Now! How Optiq brings the best of SQL to NoSQL data.Julian Hyde
 
Free Software and the Future of Database Technology
Free Software and the Future of Database TechnologyFree Software and the Future of Database Technology
Free Software and the Future of Database Technologyelliando dias
 
Sql php-vibrant course-mumbai(1)
Sql php-vibrant course-mumbai(1)Sql php-vibrant course-mumbai(1)
Sql php-vibrant course-mumbai(1)vibrantuser
 
MySQL Document Store
MySQL Document StoreMySQL Document Store
MySQL Document StoreMario Beck
 

Similar to Mysql Fulltext Search 1 (20)

Mysql Fulltext Search
Mysql Fulltext SearchMysql Fulltext Search
Mysql Fulltext Search
 
Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)reference
 
MySQL Fulltext Search Tutorial
MySQL Fulltext Search TutorialMySQL Fulltext Search Tutorial
MySQL Fulltext Search Tutorial
 
Tagging and Folksonomy Schema Design for Scalability and Performance
Tagging and Folksonomy Schema Design for Scalability and PerformanceTagging and Folksonomy Schema Design for Scalability and Performance
Tagging and Folksonomy Schema Design for Scalability and Performance
 
My sql crashcourse_intro_kdl
My sql crashcourse_intro_kdlMy sql crashcourse_intro_kdl
My sql crashcourse_intro_kdl
 
DPC2007 MySQL Stored Routines for PHP Developers (Roland Bouman)
DPC2007 MySQL Stored Routines for PHP Developers (Roland Bouman)DPC2007 MySQL Stored Routines for PHP Developers (Roland Bouman)
DPC2007 MySQL Stored Routines for PHP Developers (Roland Bouman)
 
Partitioning 20061205
Partitioning 20061205Partitioning 20061205
Partitioning 20061205
 
MySQL For Oracle DBA's and Developers
MySQL For Oracle DBA's and DevelopersMySQL For Oracle DBA's and Developers
MySQL For Oracle DBA's and Developers
 
Performance Tuning Best Practices
Performance Tuning Best PracticesPerformance Tuning Best Practices
Performance Tuning Best Practices
 
MySQL 5.6, news in 5.7 and our HA options
MySQL 5.6, news in 5.7 and our HA optionsMySQL 5.6, news in 5.7 and our HA options
MySQL 5.6, news in 5.7 and our HA options
 
MySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB ClusterMySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB Cluster
 
Optimizing MySQL for Cascade Server
Optimizing MySQL for Cascade ServerOptimizing MySQL for Cascade Server
Optimizing MySQL for Cascade Server
 
MySQL Storage Engines - which do you use? TokuDB? MyRocks? InnoDB?
MySQL Storage Engines - which do you use? TokuDB? MyRocks? InnoDB?MySQL Storage Engines - which do you use? TokuDB? MyRocks? InnoDB?
MySQL Storage Engines - which do you use? TokuDB? MyRocks? InnoDB?
 
MySQL Day Paris 2016 - State Of The Dolphin
MySQL Day Paris 2016 - State Of The DolphinMySQL Day Paris 2016 - State Of The Dolphin
MySQL Day Paris 2016 - State Of The Dolphin
 
MySQL developing Store Procedure
MySQL developing Store ProcedureMySQL developing Store Procedure
MySQL developing Store Procedure
 
SQL Now! How Optiq brings the best of SQL to NoSQL data.
SQL Now! How Optiq brings the best of SQL to NoSQL data.SQL Now! How Optiq brings the best of SQL to NoSQL data.
SQL Now! How Optiq brings the best of SQL to NoSQL data.
 
Mysql Optimization
Mysql OptimizationMysql Optimization
Mysql Optimization
 
Free Software and the Future of Database Technology
Free Software and the Future of Database TechnologyFree Software and the Future of Database Technology
Free Software and the Future of Database Technology
 
Sql php-vibrant course-mumbai(1)
Sql php-vibrant course-mumbai(1)Sql php-vibrant course-mumbai(1)
Sql php-vibrant course-mumbai(1)
 
MySQL Document Store
MySQL Document StoreMySQL Document Store
MySQL Document Store
 

Recently uploaded

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Recently uploaded (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Mysql Fulltext Search 1

  • 1. Full Text Search in MySQL 5.1 New Features and HowTo Alexander Rubin Senior Consultant, MySQL AB Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 1
  • 2. Full Text search • Natural and popular way to search for information • Easy to use: enter key words and get what you need Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 2
  • 3. In this presentation • Improvements in FT Search in MySQL 5.1 • How to speed up MySQL FT Search • How to search with error corrections • Benchmark results Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 3
  • 4. Types of FT Search: Relevance - MySQL FT: Default by relevance! Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 4
  • 5. Types of FT Search: Boolean Search - MySQL FT: No default sorting! Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 5
  • 6. Types of FT Search: Phrase Search Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 6
  • 7. Full Text Solutions Type Solution MySQL Built-in Full Text Index (MyISAM only) MySQL Sphinx Integrated/External External Lucene MnogoSearch “Hardware boxes” Google box “Fast” box Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 7
  • 8. MySQL Full Text Index Features • Available only for MyISAM tables • Natural Language Search and boolean search • ft_min_word_len – 4 char per word by default • Stop word list by default • Frequency based ranking – Distance between words is not counted Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 8
  • 9. MySQL Full Text: Creating Full Text Index mysql> CREATE TABLE articles ( -> id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, -> title VARCHAR(200), -> body TEXT, -> FULLTEXT (title,body) -> ) engine=MyISAM; Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 9
  • 10. MySQL Full Text: Natural Language mode mysql> SELECT * FROM articles -> WHERE MATCH (title,body) -> AGAINST ('database' IN NATURAL LANGUAGE MODE); +-------------------+------------------------------------------+ | title | body | +-------------------+------------------------------------------+ | MySQL vs. YourSQL | In the following database comparison ... | | MySQL Tutorial | DBMS stands for DataBase ... | In Natural Language Mode: default sorting by relevance! Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 10
  • 11. MySQL Full Text: Boolean mode mysql> SELECT * FROM articles -> WHERE MATCH (title,body) -> AGAINST (‘cat AND dog' IN BOOLEAN MODE); No default sorting in Boolean Mode! Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 11
  • 12. New MySQL 5.1 Full Text Features Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 12
  • 13. New MySQL 5.1 Full Text Features • Faster Boolean search in MySQL 5.1 – New “smart” index merge is implemented (forge.mysql.com/worklog/task.php?id=2535) • Custom Plug-ins – Replacing Default Parser • Better Unicode Support – full text index work more accurately with space and punctuation Unicode character (forge.mysql.com/worklog/task.php?id=1386) Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 13
  • 14. MySQL 5.0 vs 5.1 Benchmark • MySQL 5.1 Full Text search: 500-1000% improvement in Boolean mode – relevance based search and phrase search was not improved in MySQL 5.1. • Tested with: Data and Index – CDDB (music database) – author and title, 2 mil. CDs., varchar(255). – CDDB ~= amazon.com’s CD/books inventory Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 14
  • 15. MySQL 5.1: 500-1000% improvement in Boolean mode Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 15
  • 16. MySQL 5.1 Full Text: Custom “Plugins” • Replacing Default Parser to do following: – Apply special rules, such as stemming, different way of splitting words etc – Pre-parsing – processing PDF / HTML files • May do same for query string – If use stemming, search words also need to be stemmed for search to work. Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 16
  • 17. Available Plugins Examples • Different plugins: search for “FullText” at forge.mysql.com/projects • Stemming – MnogoSearch has a stemming plugin (www.mnogosearch.com) – Porter stemming fulltext plugin • N-Gram parsers (Japanese language) – Simple n-gram fulltext plugin Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 17
  • 18. Example: MnogoSearch Stemming • MnogoSearch includes stemming plugin (www.mnogosearch.org/doc/msearch- udmstemmer.html) • Configure and install: – Configure (follow instructions) – mysql> INSTALL PLUGIN stemming SONAME 'libmnogosearch.so'; – CREATE TABLE my_table ( my_column TEXT, FULLTEXT(my_column) WITH PARSER stemming ); – SELECT * FROM t WHERE MATCH a AGAINST('test' IN BOOLEAN MODE); Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 18
  • 19. Example: MnogoSearch Stemming • Configuration: stemming.conf – MinWordLength 2 – Spell en latin1 american.xlg – Affix en latin1 english.aff • Grab Ispell (not Aspell) dictionaries from http://lasr.cs.ucla.edu/geoff/ispell- dictionaries.html#English-dicts • Any changes in stemming.conf requires MySQL restart Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 19
  • 20. Example: MnogoSearch Stemming Stemming adds overhead on insert/update mysql> insert into searchindex_stemmer select * from enwiki.searchindex limit 10000; Query OK, 10000 rows affected (44.03 sec) mysql> insert into searchindex select * from enwiki.searchindex limit 10000; Query OK, 10000 rows affected (21.80 sec) Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 20
  • 21. Example: MnogoSearch Stemming mysql> SELECT count(*) FROM searchindex WHERE MATCH si_text AGAINST('color' IN BOOLEAN MODE); count(*): 861 mysql> SELECT count(*) FROM searchindex_stemmer WHERE MATCH si_text AGAINST('color' IN BOOLEAN MODE); count(*): 1017 Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 21
  • 22. Other Planned Full Text Features • Search for “FullText” at forge.mysql.com • CTYPE table for unicode character sets (WL#1386), complete • Enable fulltext search for non-MyISAM engines (WL#2559), Assigned • Stemming for fulltext (WL#2423), Assigned • Combined BTREE/FULLTEXT indexes (WL#828) • Many other features, YOU can vote for some Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 22
  • 23. MySQL Full Text HowTo Tips and Tricks Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 23
  • 24. DRBD and FullText Search Active DRBD Passive DRBD Synchronous Server Server Block Replication InnoDB InnoDB Tables Tables Web/App Server Full Text Requests Normal Replication Slave “FullText” Slaves InnoDB Tables MyISAM Tables with FT Indexes Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 24
  • 25. Using MySQL 5.1 as a Slave Master (MySQL 5.0) Web/App Server Normal Requests Web/App Server Full Text Requests Normal Slave Full Text Slave (MySQL 5.0) (MySQL 5.1) Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 25
  • 26. How To: Speed up MySQL FT Search Fit index into memory! – Increase amount of RAM – Set key_buffer = <total size of full text index>. Max 4GB! – Preload FT indexes into buffer • Use additional keys for FT index (to solve 4GB limit problem) Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 26
  • 27. SpeedUp FullText: Preload FT indexes into buffer mysql> set global ft_key.key_buffer_size= 4*1024*1024*1024; mysql> CACHE INDEX S1, S2, <some other tables here> IN ft_key; mysql> LOAD INDEX INTO CACHE S1, S2 …; Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 27
  • 28. How To: Speed up MySQL FT Search • Manual partitioning – Partitioning will decrease index and table size • Search and updates will be faster – Need to change application/no auto partitioning Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 28
  • 29. How To: Speed up MySQL FT Search • Setup number of slaves for search – Decrease number of queries for each box – Decrease CPU usage (sorting is expensive) – Each slave can have its own data • Example: search for east coast – Slaves 1-5, search for west coast Slaves 6-10 Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 29
  • 30. FT Scale-Out with MySQL Replication Master Web/App Server Write Requests Slave 1 Slave 2 Slave 3 Web/App Server Load Balancer Full Text Requests Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 30
  • 31. Which queries are performance killers – Order by/Group by • Natural language mode: order by relevance • Boolean mode: no default sorting! • Order by date much slower than with no “order by” – SQL_CALC_FOUND_ROWS • Select SQL_CALC_FOUND_ROWS from T1 … limit 10 • Will require all result set – Other condition in where clause • MySQL can use either FT index or other indexes (only FT with natural language) Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 31
  • 32. Real World Example: Performance Killer Why it is so slow? SELECT … FROM `ft` WHERE MATCH `album` AGAINST (‘the way i am’) Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 32
  • 33. Real World Example: Performance Killer Note the stopword list and ft_min_word_len! The - stopword My.cnf: Way - stopword ft_min_word_len =1 - not a stop word I Am - stopword query “the way i am” will filter out all words except “i” with standard stoplist and with ft_min_word_len =1 in my.cnf Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 33
  • 34. How To: Search with error correction • Example: Music Search Engine – Search for music titles/actors – Need to correct users typos • Bob Dilane (user made typos) -> – Bob Dylan (corrected) • Solution: – use soundex() mysql function • Soundex = sounds similar mysql> select soundex('Dilane'); D450 mysql> select soundex('Dylan'); D450 Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 34
  • 35. Search with error corrections • Implementation 1.Alter table artists add art_name_sndex varchar(80) 2.Update artists set art_name_sndex = soundex(art_name) 3.Select art_name from artists where art_name_sndex = soundex(‘Bob Dilane') limit 10 Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 35
  • 36. Search with error corrections: Sorting • Popularity of the artist – Select art_name from artists where art_name_sndex = soundex(‘Dilane') order by popularity limit 10 • Most similar matches fist – order by levenstein distance – The Levenshtein distance between two strings = minimum number of operations needed to transform one string into the other – Levenstein can be done by stored function or UDF Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 36
  • 37. Sphinx Search • Features – Open Source, http://www.sphinxsearch.com – Fast searches for the large data – Designed for indexing Database content – Supports multi-node clustering out of box, Multiple Attributes (date, price, etc) – Different sort modes (relevance, data etc) – Client available as MySQL Storage Engine plugin – Fast Index creation (up to 10M/sec) • Disadvantages: – External solution: need to be integrated – No online index updates (have to build whole index) Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 37
  • 38. How to configure Sphinx with MySQL • Sphinx engine/plugin is not full engine: – still need to run “searcher” daemon • Need to compile MySQL source with Sphinx to integrate it – MySQL 5.0: need to patch source code – MySQL 5.1: no need to patch, copy Sphinx plugin to plugin dir Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 38
  • 39. How to Integrate Sphinx with MySQL • Sphinx can be MySQL’s storage engine CREATE TABLE t1 (id INTEGER NOT NULL, weight INTEGER NOT NULL, query VARCHAR(3072) NOT NULL, group_id INTEGER, INDEX(query) ) ENGINE=SPHINX CONNECTION=quot;sphinx://localhost: 3312/enwikiquot;; SELECT * FROM enwiki.searchindex docs JOIN test.t1 ON (docs.si_page=t1.id) WHERE query=quot;one document;mode=anyquot; limit 1; Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 39
  • 40. Time for questions Questions? www.mysqlfulltextsearch.com Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 40