12. SELECT text FROM phrases WHERE text like '%run%';
Can you run this to the post office for me?
I'm going for a run, want to come along?
Cross country running
I'm too drunk to drive.
I am running out of battery power.
Work is not like wolf - it won't run away.
13. SELECT text FROM phrases WHERE
vectors @@ 'run'::tsquery;
Can you run this to the post office for me?
Sorry I am running really late.
I'm going for a run, want to come along?
Cross country running
I am running out of battery power.
Work is not like wolf - it won't run away.
14.
15. Tokenization and Stemming
Google App Engine /JRuby / Lucene
http://full-text-search.appspot.com
http://
github.com/
ultrasaurus/
full-text-search-appengine
26. a about above after again against all am an and any are
aren't as at be because been before being below between
both but by can't cannot could couldn't did didn't do does
doesn't doing don't down during each few for from further had
hadn't has hasn't have haven't having he he'd he'll he's her
here here's hers herself him himself his how how's i i'd i'll i'm
i've if in into is isn't it it's its itself let's me more most mustn't
my myself no nor not of off on once only or other ought our
ours ourselves out over own same shan't she she'd she'll
she's should shouldn't so some such than that that's the their
theirs them themselves then there there's these they they'd
they'll they're they've this those through to too under until up
very was wasn't we we'd we'll we're we've were weren't what
what's when when's where where's which while who who's
whom why why's with won't would wouldn't you you'd you'll
you're you've your yours yourself yourselves
http://www.ranks.nl/resources/stopwords.html
54. Target Target Source
Text Language Language
We’re running out of daylight en ja
Could you run this? en ja
Cross‐country running en ja
I’m going for a run, want to come along? en ja
60. I’m going for a run, want to come along? en ja
ha shi ri ni iku ke do iAtsho ni ki ma su ka?
Ikuko Kobayashi
2009‐11‐29 20:36:47 UTC
hAp://….16ec695a‐8fce‐4277‐bdd4.flv
61. I’m going for a run, want to come along? en ja
ha shi ri ni iku ke do iAtsho ni ki ma su ka?
Ikuko Kobayashi
2009‐11‐29 20:36:47 UTC
hAp://….16ec695a‐8fce‐4277‐bdd4.flv
hAp://….Japanese_ikuko_kobayashi.jpg
Postgres: In database “tsvector” , partial indexes, acts_as_tsearch\n\nMySql FULLTEXT indices are fully indexed fields which support stopwords, boolean searches, and relevancy ratings: http://onlamp.com/pub/a/onlamp/2003/06/26/fulltext.html\nNote: MySql FULLTEXT requires MyISAM storage engine\nComparison of MySql vs. PostgresQL: http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL\n\nSolr/Lucene: Separate Index, Language Features: Faceted Search, Similar Documents (you may also like…)\nSphinx typically installed on the same machine, directly accessed your database\n
\n
\n
Word boundaries understood by context in: Chinese, Japanese, Korean, Thai\nCJK word boundaries not handled in MySql 5: http://blogs.sun.com/soapbox/entry/fulltext_and_asian_languages_with\n
\n
\n
Rethinking Full-Text Search for Multilingual DatabasesJeffrey Sorensen and Salim Roukos IBM T. J. Watson Research Center Yorktown Heights, New York <sorenj|roukos>@us.ibm.com\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Stop words can cause problems when using a search engine to search for phrases that include them, particularly in names such as 'The Who', 'The The', or 'Take That'\nhttp://en.wikipedia.org/wiki/Stop_words\n
think of a blank canvas... don&#x2019;t think about Solr or Sphinx, first think about what people are trying to find and what will help them most. \nMaybe browse is more im\n