3. 自己紹介
● 代表作
● SQLでボウリングのスコアを計算
with recursive
s(idx, pins1, pins2, pins3) as (
select s1.idx, s1.pins, s2.pins, s3.pins
from score s1
left join score s2 on (s2.idx = s1.idx + 1)
left join score s3 on (s3.idx = s1.idx + 2)
),
f(idx, pins1, pins2, pins3) as (
select idx, pins1, pins2, pins3 from s where idx = 1
union all
select s.idx, s.pins1, s.pins2, s.pins3
from s join f
on (s.idx = f.idx + case when f.pins1 = 10 then 1 else 2 end)
),
sof(idx, pins1, pins2, pins3, score_of_frame) as (
select idx
, pins1, pins2, pins3
, case when pins1 = 10 then pins1 + pins2 + pins3
when pins1 + pins2 = 10 then pins1 + pins2 + pins3
else pins1 + pins2
end as score_of_frame
from f
)
select row_number( ) over w as frame
, pins1
, pins2
, case row_number( ) over w when 10 then pins3
else null end as pins3
, score_of_frame
, sum(score_of_frame) over w
from sof
window w as (order by idx)
09/06/20
10. 方法1:関数インデックス
● インデックスを1つ作るだけで高速な全文検索ができ
る
=> SELECT title FROM docs_en WHERE body ILIKE '%search%';
インデックス作成
=> CREATE INDEX docs_en_idx ON docs_en
USING GIN(to_tsvector('english', body));
CREATE INDEX
検索
=> SELECT title FROM docs_en WHERE
to_tsvector('english', body) @@ to_tsquery('english', 'search');
09/06/20
12. 方法2:tsvector用の列を追加
● 高速。特にスコアによるソートをする場合
● 本文が更新された時にtsvectorは自動的に更新されな
い(トリガやバッチを使う必要がある)
● 容量は大きくなる
=> ALTER TABLE docs_en ADD vec tsvector;
ALTER TABLE
=> UPDATE docs_en SET vec = to_tsvector('english', body);
UPDATE 936
=> CREATE INDEX docs_en_idx2 ON docs_en USING GIN(vec);
CREATE INDEX
検索
=> SELECT title FROM docs_en WHERE vec @@ to_tsquery('english', 'search');
09/06/20
13. LIKEと全文検索の違い
=> select count(*) from docs_en
=> select count(*) from docs_en where to_tsvector('english', body)
where body ilike '%html%'; @@ to_tsquery('html');
count count
------- -------
935 14
(1 row) (1 row)
=> select count(*) from docs_en
=> select count(*) from docs_en where to_tsvector('english', body)
where body ilike '%query%'; @@ to_tsquery('query');
count count
------- -------
312 327
(1 row) (1 row)
09/06/20
16. to_tsvector
● Parserが文章を23種類のtokenに分解
a fat cat sat on a mat - it <b>ate</b> a fat rats
Word, all ASCII XML tag Space symbols
a <b> ' '
fat </b> -
cat
sat
on
mat
it
ate
rats
09/06/20
17. to_tsvector
● token種別毎に正規化
Stopwordの削除
語幹に縮小
Word, all ASCII Space symbols
a ' '
fat fat -
cat cat
sat english_stem sat 削除
on XML tag
mat mat <b>
it </b>
ate ate
rats rat
09/06/20
18. token
=> select * from ts_token_type('default');
tokid | alias | description
-------+-----------------+------------------------------------------
1 | asciiword | Word, all ASCII
2 | word | Word, all letters
3 | numword | Word, letters and digits
4 | email | Email address
5 | url | URL
6 | host | Host
7 | sfloat | Scientific notation
8 | version | Version number
9 | hword_numpart | Hyphenated word part, letters and digits
10 | hword_part | Hyphenated word part, all letters
11 | hword_asciipart | Hyphenated word part, all ASCII
12 | blank | Space symbols
13 | tag | XML tag
14 | protocol | Protocol head
...
23 | entity | XML entity
(23 rows)
09/06/20
21. Dictionary
● Simple、Sysnonym、Ispell、Thesaurus、Snowball
● share/tsearch_data/
● どのtoken typeにどの辞書を適用するかは変更可能
(ALTER TEXT SEARCH CONFIG...)
(share/tsearch_data/english.stop)
i
me
my
myself
we
our
ours
ourselves
you
your
...
09/06/20
22. LIKEと全文検索の違い
=> select count(*) from docs_en
=> select count(*) from docs_en where to_tsvector('english', body)
where body ilike '%html%'; @@ to_tsquery('html');
count count
------- -------
935 14
(1 row) (1 row)
=> select count(*) from docs_en
=> select count(*) from docs_en where to_tsvector('english', body)
where body ilike '%query%'; @@ to_tsquery('query');
count count
------- -------
312 327
(1 row) (1 row)
09/06/20
27. ts_headline
● ts_headline([regconfig, ]text, tsquery[, text])
● 結果の強調表示
● StartSel、StopSel等設定可能
=> select ts_headline('fat cat sat mat', to_tsquery('cat'));
ts_headline
------------------------
fat <b>cat</b> sat mat
(1 row)
09/06/20
28. configuration
to_tsvector('english', body)
to_tsvector('simple', body)
to_tsvector(body)
● 全文検索で使う関数のふるまいをまとめたもの
● 省略された場合、default_text_search_configが使われ
る(postgresql.confやsetコマンドで指定できる)
=> dF
List of text search configurations
Schema | Name | Description
------------+------------+---------------------------------------
pg_catalog | danish | configuration for danish language
pg_catalog | dutch | configuration for dutch language
pg_catalog | english | configuration for english language
09/06/20