Sphinx is a full-text search engine that Spil Games uses to provide fast and complex search across their databases and indexes. Some key ways Spil Games uses Sphinx include searching for games by title or URL, finding friends across their networks, and filtering search results based on browser capabilities. To ensure high availability, Spil Games implements distributed and mirrored Sphinx indexes across multiple nodes and uses load balancers. Benchmarking shows Sphinx significantly outperforms MySQL for certain search queries.
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Percona Live London 2014: Serve out any page with an HA Sphinx environment
1. Serve out any page
with an HA Sphinx
environment
Art van Scheppingen
Head of Database Engineering
2. 2
1. Who is Spil Games?
2. What is Sphinx Search?
3. Make Sphinx highly available
4. How does Spil Games use Sphinx?
5. Sphinx benchmarks
6. Questions?
Overview
4. 4
• Game publishers & distributors
• Company founded in 2001
• 130+ employees
• 150M+ unique visitors per month
• Over 60M registered users
• 45 portals in 19 languages
• Casual games
• Social games
• Real time multiplayer games
• Mobile (html5) games
• 40+ MySQL clusters
• 65k queries per second
• 10 Sphinx servers
• 8k queries per second
Facts
8. 8
• MyISAM / InnoDB (5.6.4 or higher)
CREATE TABLE articles (
id int(11) not null auto_increment,
author varchar(40) not null,
title varchar(50) not null,
body text,
PRIMARY KEY (id),
FULLTEXT idx (title, body)
) ENGINE=InnoDB;
• SELECT id, author FROM articles WHERE MATCH (title,body)
AGAINST (’somephrase');
• Complex queries
• SELECT id, author, MATCH (title,body) AGAINST (’somephrase' IN
BOOLEAN MODE) as score FROM articles ORDER BY score DESC,
id ASC;
• Drawbacks:
• Slow response times
Full text search in MySQL
9. 9
• PostgreSQL tsquery
• Elasticsearch
• Apache Lucene
• Sphinx Search
• Many other alternatives
Alternatives to MySQL full text search
10. 10
• Sphinx
• SELECT author FROM articles WHERE
MATCH('(@title,body) database');
• Complex queries
• SELECT author FROM articles WHERE
MATCH('(@title,body) database') ORDER BY
WEIGHT(), id ASC;
• Drawbacks:
• Not straightforward swap
• Specialized knowledge is needed
Full text search in Sphinx
12. 12
• Consists out of two components
• Indexer
• Index (textual) data
• Search daemon
• Search indexes and return matched items
• Three types of indexes:
• Disk indexes
• Real Time indexes
• Distributed indexes
Sphinx is a full text search engine
13. 13
• Comparable to archive tables
• Indexer indexes data and updates full index
• Index is “written once”
• Only attributes can be changed (run time)
• Use --rotate to reload new indexes
• Less resources needed (ram/cpu)
• Not dependent on a specific database engine
• MySQL
• PostgreSQL
• MSSQL
• ODBC
• Xml/tsp pipes
Disk indexes
14. 14
• Comparable to normal tables
• Online indexes
• Will be (eventually) written to disk
• Dynamically alter the indexes
• Insert/replace/delete operations
• Consume more memory
• Changes are generally updated within milliseconds
• Sometimes stalls for seconds, so not guaranteed
• High update rate influences the performance
Real time indexes
15. 15
• Comparable to federated tables in MySQL
• Distribute the search over multiple nodes
• Many smaller indexes
• Sends queries to all defined nodes/indexes
• Aggregates and merges results
• Slowest node slows down responses
• Setting timeouts can keep this lower
Distributed indexes
16. 16
• Two types of data:
• Fields
• Textual data to be indexed
• Attributes
• Data to sort/filter upon
• Special: unique identifier
• Special: (last update) timestamp
• Example:
+-------+----------------+---------------+-----------------+
| id | author | title | publishing_date |
+-------+----------------+---------------+-----------------+
| 12345 | Linus Torvalds | Just for fun | 2002-06-04 |
+-------+----------------+---------------+-----------------+
Indexing: attributes and fields
17. 17
• Support for stopwords
• Ignore common words like “and”, “the” and “to”
• Ignore specific words like “game” and “juego”
• Still affects the keyword position
• Language and characters
• Morphology
• Similar words
• Lemmatization
• Run/ran/running
• Character folding
• U+FF10..U+FF19->0..9
Indexing: stopwords and stemmers
18. 18
• Search daemon has three interfaces:
• SphinxAPI: Native Sphinx binary protocol
• SphinxQL: MySQL protocol
• SphinxSE: MySQL/MariaDB integration
• Example native:
<?php
$s = new SphinxClient;
$s->setServer("localhost", 6712);
$s->setMatchMode(SPH_MATCH_ANY);
$s->setMaxQueryTime(3);
$result = $s->query(”somephrase”, “articles”);
var_dump($result);
?>
• Example SphinxQL:
echo “SELECT author FROM articles WHERE MATCH('(@title,body)
somephrase') ORDER BY WEIGHT(), id ASC;” | mysql –P 6713
Searching: the interfaces
19. 19
• Supports various ranking algorithms:
• None
• Any
• Phrase proximity
• Okapi BM25 (probabilistic)
• Wordcount
• Many more
• User weighting
• Boost columns with a multiplier
Searching: Search daemon
20. 20
mysql> SELECT title, id, publication_date FROM articles WHERE
MATCH('(@title,body) database') ORDER BY WEIGHT(), publication_date ASC
LIMIT 0,5 OPTION field_weights=(title=10,body=3);
+-----------------------------+-------+------------------+
| title | id | publication_date |
+-----------------------------+-------+------------------+
| MySQL Cookbook | 75532 | 2014-07-01 |
| High performance MySQL | 94325 | 2012-04-02 |
| MySQL Administrator’s Bible | 63627 | 2009-05-11 |
| MySQL (4th Edition) | 39922 | 2008-09-08 |
| MySQL in a nutshell | 58793 | 2008-04-01 |
+-----------------------------+-------+------------------+
5 rows in set (0.01 sec)
Returned data
22. 22
• Application handles:
• Connections
• Failovers
• Timeouts
• Distribution scheme
• Random
• Round robin
• Weighted
• Be creative!
Client side HA
34. 34
• Started using Sphinx in 2009
• Simple game search
• Replaced our MySQL / MyISAM search
• Added search for multiple columns
• Change weight per column
• Distributed mirrored indexes
• Index rebuilds performed per node
• Updates happen more frequently
Game search
40. 40
• ROAR is a database abstraction layer
• See Percona Live Santa Clara 2014 presentation
• Sphinx complementary to MySQL and Couchbase
• Translate a title to a gamepage
• Search url parts to fetch the application id
• Translate keywords to lists of games
• Search url parts to fetch a list of application ids
• Filter applications on portal and brand
• Filter applications on browser capabilities
• Sort on publishing date, popularity and rating
ROAR storage layer
41. 41
• Legacy:
• Url without identifiers
• There can only be one game with the same url
• Sphinx does a fast lookup of (existing) game to id
• Example:
http://www.agame.com/game/rig-bmx
Translates into application id 123456
• Future improvements:
• Correct non-existing pages (404)
http://www.agame.com/game/rig-bmxx
with a redirect (301) to:
http://www.agame.com/game/rig-bmx
Translating a title to a gamepage
44. 44
• Filter on url parts
• One or multiple
• Complex filtering on capabilities
• Blacklist incompatible games (Flash/Unity)
Translating keywords to game listings
45. 45
• Example 1 url part:
http://www.agame.com/games/puzzle
Sends this query to Sphinx:
SELECT title, appid FROM game_index WHERE brandid=1 AND portalid=88 AND
MATCH('@url "puzzle"') ORDER BY date_onsite desc LIMIT 0,10 OPTION
max_matches=10000;
• Example 2 url parts:
http://www.agame.com/games/puzzle/match-3
Sends this query to Sphinx:
SELECT title, appid FROM game_index WHERE brandid=1 AND portalid=88 AND
MATCH('@url "puzzle" && "match-3" ') ORDER BY date_onsite desc LIMIT 0,10
OPTION max_matches=10000;
Search on url parts
48. 48
• Blacklisting performed on capabilities encoded bitmask
• Example normal desktop browser (no filter):
http://www.agame.com/games/puzzle
Opening the puzzle category on a desktop sends this query to Sphinx:
SELECT title, appid,(bitmask1 & 0) AS bitcheck, (bitmask1 & 0) AS bitfilter FROM
game_index WHERE brandid=1 AND portalid=88 AND MATCH('@url "puzzle"') AND
bitcheck = 0 ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000;
• Example Chrome on Android 4.4 (filter out 11):
http://www.agame.com/games/puzzle
Opening the puzzle category on a Nexus 7 sends this query to Sphinx:
SELECT title, appid,(bitmask1 & 11) AS bitcheck, (bitmask1 & 11) AS bitfilter
FROM game_index WHERE brandid=1 AND portalid=88 AND MATCH('@url "puzzle"') AND
bitcheck = 0 ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000;
Filter on browser capabilities
51. 51
• Real time indexes decreased performance
• Make the indexing process “nicer”
/bin/taskset 0x00000001 /usr/bin/indexer --all --config /etc/sphinx.conf
• Send statistics to Graphite
http://engineering.spilgames.com/tamed-sphinx-search/
What we encountered
53. 53
• Sysbench 0.5
• Custom lua scripts
• Disabled caching
• Openstack virtuals:
• Benchmark driver: 4 core CPU, 4GB memory
• Sphinx nodes: 4 core CPU, 16GB memory
• MySQL nodes: 4 core CPU, 16GB memory
• At least 3 runs per test
• Average of tests counts
• Repeat tests when outliers were found
Sphinx Benchmark specifications
54. 54
• InnoDB discrete match
SELECT l.url, gd.title, g.appid, bitmask1, date_onsite FROM games g LEFT
JOIN game_capabilities gc ON g.appid=gc.app INNER JOIN game_cat c ON
g.appid = c.appid AND g.portalid = c.portalid AND g.brandid = c.brandid
INNER JOIN cat_data cd ON c.portalid = cd.portalid AND c.brandid =
cd.brandid AND c.catname = cd.catname WHERE g.brandid=1 AND g.portalid=88
AND cd.url='puzzle' ORDER BY date_onsite desc LIMIT 0,10;
• Sphinx single phrase
SELECT title, appid, (bitmask1 & 0) AS bitcheck, (bitmask1 & 0) AS
bitfilter FROM game_index WHERE brandid=1 AND portalid=88 AND bitcheck = 0
AND MATCH('@url "puzzle"') ORDER BY date_onsite desc LIMIT 0,10 OPTION
max_matches=10000;
InnoDB vs Sphinx
56. 56
• MyISAM single match-against
Select title, appid, (bitmask1 & 0) AS bitfilter, MATCH(`url`)
AGAINST('puzzle') AS score FROM game_index WHERE MATCH(`url`)
AGAINST('puzzle') AND portalid=88 AND brandid=1 AND (bitmask1 & 0) = 0
ORDER BY score DESC, date_onsite DESC LIMIT 0,10;
• Sphinx single phrase
SELECT title, appid, (bitmask1 & 0) AS bitcheck, (bitmask1 & 0) AS
bitfilter FROM game_index WHERE brandid=1 AND portalid=88 AND bitcheck = 0
AND MATCH('@url "puzzle"') ORDER BY date_onsite desc LIMIT 0,10 OPTION
max_matches=10000;
MyISAM full text vs Sphinx 1
57. 57
MyISAM full text vs Sphinx 1
0
200
400
600
800
1000
1200
1400
1600
1800
2000
4 8 16 24 32 48 64
Sphinx single phrase
MyISAM single match-against
threads
95thperc.responsetimeinms
58. 58
• MyISAM multiple match-against
SELECT title, appid, (bitmask1 & 0) AS bitfilter, MATCH(`url`)
AGAINST('+puzzle +sudoku' IN BOOLEAN MODE) AS score FROM game_index WHERE
MATCH(`url`) AGAINST('+puzzle +sudoku' IN BOOLEAN MODE) AND portalid=88 AND
brandid=1 AND (bitmask1 & 0) = 0 ORDER BY score DESC, date_onsite DESC
LIMIT 0,10;
• Sphinx multiple phrases
SELECT title, appid, (bitmask1 & 0) AS bitcheck, (bitmask1 & 0) AS
bitfilter FROM game_index WHERE brandid=1 AND portalid=88 AND bitcheck = 0
AND MATCH('@url "puzzle" && "sudoku"') ORDER BY date_onsite desc LIMIT 0,10
OPTION max_matches=10000;
MyISAM full text vs Sphinx 2
59. 59
MyISAM full text vs Sphinx 2
0
50
100
150
200
250
4 8 16 24 32 48 64
MyISAM multiple match-against
Sphinx multiple phrases
threads
95thperc.responsetimeinms
60. 60
MyISAM full text vs Sphinx 2
0
200
400
600
800
1000
1200
1400
1600
1800
2000
4 8 16 24 32 48 64
Sphinx single phrase
MyISAM multiple match-against
Sphinx multiple phrases
MyISAM single match-against
threads
95thperc.responsetimeinms
61. 61
InnoDB vs MyISAM vs Sphinx
0
500
1000
1500
2000
2500
3000
3500
4000
4 8 16 24 32 48 64
Sphinx single phrase
InnoDB single match-against
MyISAM single match-against
threads
95thperc.responsetimeinms
62. 62
• Sphinx on localhost
• Talks MySQL on localhost
• One or two remote agent(s)
• Sphinx behind loadbalancer
• Proxies MySQL
Sphinx HA solutions
64. 64
• Sphinx Search is faster than MySQL full text search
• Smaller result sets increase performance
• Due to sorting by relevance
• Smaller temporary tables
• InnoDB performs worse than MyISAM
• Sphinx agent mirroring performs better
• Probably due to Sphinx native protocol
• Load balances seems to perform better
• Probably due to dedicated (better) hardware
Conclusion
66. 66
• This presentation can be found at:
http://spil.com/pluk2014sphinx
• Sphinx Search:
http://www.sphinxsearch.com
• Sending Sphinx Search metrics to Graphite:
http://engineering.spilgames.com/tamed-sphinx-search/
• About the ROAR storage layer:
http://spil.com/plsc2014storage
• If you wish to contact me:
Email: art@spilgames.com
Twitter: @banpei
Blog: http://engineering.spilgames.com
Twitter Spil Engineering: @spilengineering
Thank you!
The three main brands:
Girls, aimed at girls ages from 8 to 12
Teens aimed at boys and girls 10 to 15
and Family basically mothers playing with their children
Strong domains localized over 19 different languages
spielen.com, juegos.com, gamesgames.com, games.co.uk, oyunonya.com
All content is localized