SlideShare ist ein Scribd-Unternehmen logo
1 von 77
Adventures

        in

 Full
Text
Search
Sarah
Allen

@ultrasaurus
class Article < ActiveRecord::Base
  acts_as_solr
end
3
Tokyo
Dystopia
Language
Relevance
Accuracy
 Speed
Text as Language
stemming
   synonyms
  stop
words
word
boundaries
SELECT text FROM phrases WHERE text like '%run%';

 Can you run this to the post office for me?
 I'm going for a run, want to come along?
 Cross country running
 I'm too drunk to drive.
 I am running out of battery power.
 Work is not like wolf - it won't run away.
SELECT text FROM phrases WHERE
            vectors @@ 'run'::tsquery;

 Can you run this to the post office for me?
 Sorry I am running really late.
 I'm going for a run, want to come along?
 Cross country running
 I am running out of battery power.
 Work is not like wolf - it won't run away.
Tokenization and Stemming
Google App Engine /JRuby / Lucene

http://full-text-search.appspot.com

http://

github.com/
ultrasaurus/
full-text-search-appengine
hAp://full‐text‐search.appspot.com/




                                      16
hAp://full‐text‐search.appspot.com/




                                      17
hAp://full‐text‐search.appspot.com/




                                      18
hAp://localhost:8080/_ah/admin/datastore?kind=Notes




                                                      19
./script/generate scaffold note
   content:string index:List -f --skip-migration

./script/generate dd_model note content:string index:List -f
class Note
 include DataMapper::Resource

 property :id,   Serial
 property :content, String,      :required => true, :length => 500
 property :index, List,       :required => true
 timestamps :at

end
java_import org.apache.lucene.analysis.snowball.SnowballAnalyzer
java_import java.io.StringReader
before :valid?, :update_index

def update_index
 analyzer = SnowballAnalyzer.new("English")
 s = StringReader.new(content)
 token_stream = analyzer.tokenStream(nil, s)

 terms = []
 while (token = token_stream.next) do
   terms << token.term
 end
 self.index = terms
end
before :valid?, :update_index

def update_index
 analyzer = SnowballAnalyzer.new("English")
 s = StringReader.new(content)
 token_stream = analyzer.tokenStream(nil, s)

 terms = []
 while (token = token_stream.next) do
   terms << token.term
 end
 self.index = terms
end
hAp://full‐text‐search.appspot.com/




                                      25
a about above after again against all am an and any are
    aren't as at be because been before being below between
   both but by can't cannot could couldn't did didn't do does
doesn't doing don't down during each few for from further had
   hadn't has hasn't have haven't having he he'd he'll he's her
 here here's hers herself him himself his how how's i i'd i'll i'm
i've if in into is isn't it it's its itself let's me more most mustn't
  my myself no nor not of off on once only or other ought our
    ours ourselves out over own same shan't she she'd she'll
 she's should shouldn't so some such than that that's the their
    theirs them themselves then there there's these they they'd
  they'll they're they've this those through to too under until up
 very was wasn't we we'd we'll we're we've were weren't what
   what's when when's where where's which while who who's
  whom why why's with won't would wouldn't you you'd you'll
             you're you've your yours yourself yourselves

           http://www.ranks.nl/resources/stopwords.html
Word Boundaries












        
















        
          





        
     








        
















        
          





        
     





        
   
 I
love
horses 




        
















        
             





        
        





        
   
 I
love
horses 




        
















        
             





        
        





        
   
 I
love
horses 




        

Horses
are
beauSful














        
             





        
        





        
   
 I
love
horses 




        

Horses
are
beauSful














        
             





        
        





           
   
 I
love
horses 




           

Horses
are
beauSful

                               







   deer
in
the
forest




           
             





           
        





           
   
 I
love
horses 




           

Horses
are
beauSful

                               







   deer
in
the
forest




           
             





           
        





           
   
 I
love
horses 




           

Horses
are
beauSful

                              







   deer
in
the
forest




           
             








deer
live
in
the
woods


           
        





           
   
 I
love
horses 




           

Horses
are
beauSful

                              







   deer
in
the
forest




           
             








deer
live
in
the
woods





           
        





           
   
 I
love
horses 




           

Horses
are
beauSful

                              







   deer
in
the
forest




           
             








deer
live
in
the
woods





           
        





           
   
 I
love
horses




           

Horses
are
beauSful

                              







   deer
in
the
forest




           
            








deer
live
in
the
woods





           
        








You
are
an
idiot.


Relevance
Accuracy
Speed
Write
                   Hosted
Database
                   Search



           Rails
Read
                   Hosted
Database
                   Search



           Rails
Target                                     Target     Source
Text                                       Language   Language

We’re
running
out
of
daylight              en         ja

Could
you
run
this?                        en         ja

Cross‐country
running                      en         ja

I’m
going
for
a
run,
want
to
come
along?   en         ja
I’m
going
for
a
run,
want
to
come
along?   en   ja
I’m
going
for
a
run,
want
to
come
along?   en   ja




                                

I’m
going
for
a
run,
want
to
come
along?    en   ja




                                       

ha
shi
ri
ni
iku
ke
do
iAtsho
ni
ki
ma
su
ka?
I’m
going
for
a
run,
want
to
come
along?    en   ja




                                       

ha
shi
ri
ni
iku
ke
do
iAtsho
ni
ki
ma
su
ka?
Ikuko
Kobayashi
I’m
going
for
a
run,
want
to
come
along?    en   ja




                                       

ha
shi
ri
ni
iku
ke
do
iAtsho
ni
ki
ma
su
ka?
Ikuko
Kobayashi
2009‐11‐29
20:36:47
UTC
I’m
going
for
a
run,
want
to
come
along?    en   ja




                                       

ha
shi
ri
ni
iku
ke
do
iAtsho
ni
ki
ma
su
ka?
Ikuko
Kobayashi
2009‐11‐29
20:36:47
UTC
hAp://….16ec695a‐8fce‐4277‐bdd4.flv
I’m
going
for
a
run,
want
to
come
along?    en   ja




                                       

ha
shi
ri
ni
iku
ke
do
iAtsho
ni
ki
ma
su
ka?
Ikuko
Kobayashi
2009‐11‐29
20:36:47
UTC
hAp://….16ec695a‐8fce‐4277‐bdd4.flv
hAp://….Japanese_ikuko_kobayashi.jpg
62
class Page < ActiveRecord::Base
  acts_as_tsearch :fields => [ ... ]
end
Page.send :acts_as_tsearch, :fields => [:title]
PagePart.send :acts_as_tsearch, :fields =>
  [:content]
ProgramPropertyList.send :acts_as_tsearch,
  :fields
  =>[:instructor, :program_desc,
  :program_detail, :resource]
@pages
=
Page.find_by_tsearch(@query)
66
69
70
71
class Phrase < ActiveRecord::Base
  acts_as_tsearch :fields => [:text]
end
Phrase.find_by_tsearch(term,
  :conditions => {:language_id =>
                   target_language.id})
When you think about
     search...
Questions?

Weitere ähnliche Inhalte

Mehr von Sarah Allen

Communication is a Technical Skill
Communication is a Technical SkillCommunication is a Technical Skill
Communication is a Technical SkillSarah Allen
 
Improving Federal Government Services
Improving Federal Government ServicesImproving Federal Government Services
Improving Federal Government ServicesSarah Allen
 
Transparency Wins
Transparency WinsTransparency Wins
Transparency WinsSarah Allen
 
A Short History of Computers
A Short History of ComputersA Short History of Computers
A Short History of ComputersSarah Allen
 
Designing for Fun
Designing for FunDesigning for Fun
Designing for FunSarah Allen
 
Ruby in the US Government for Ruby World Conference
Ruby in the US Government for Ruby World ConferenceRuby in the US Government for Ruby World Conference
Ruby in the US Government for Ruby World ConferenceSarah Allen
 
Identities of Dead People
Identities of Dead PeopleIdentities of Dead People
Identities of Dead PeopleSarah Allen
 
3 Reasons Not to Use Ruby
3 Reasons Not to Use Ruby 3 Reasons Not to Use Ruby
3 Reasons Not to Use Ruby Sarah Allen
 
Ruby Nation: Why no haz Ruby?
Ruby Nation: Why no haz Ruby?Ruby Nation: Why no haz Ruby?
Ruby Nation: Why no haz Ruby?Sarah Allen
 
Why no ruby in gov?
Why no ruby in gov?Why no ruby in gov?
Why no ruby in gov?Sarah Allen
 
People Patterns or What I learned from Toastmasters
People Patterns or What I learned from ToastmastersPeople Patterns or What I learned from Toastmasters
People Patterns or What I learned from ToastmastersSarah Allen
 
Blazing Cloud: Agile Product Development
Blazing Cloud: Agile Product DevelopmentBlazing Cloud: Agile Product Development
Blazing Cloud: Agile Product DevelopmentSarah Allen
 
Crowdsourced Transcription Landscape
Crowdsourced Transcription LandscapeCrowdsourced Transcription Landscape
Crowdsourced Transcription LandscapeSarah Allen
 
Lessons Learned Future Thoughts
Lessons Learned Future ThoughtsLessons Learned Future Thoughts
Lessons Learned Future ThoughtsSarah Allen
 
Mobile Web Video
Mobile Web VideoMobile Web Video
Mobile Web VideoSarah Allen
 
Elementary Computer History
Elementary Computer HistoryElementary Computer History
Elementary Computer HistorySarah Allen
 
Sarah Allen Computer Science Entrepreneur
Sarah Allen Computer Science EntrepreneurSarah Allen Computer Science Entrepreneur
Sarah Allen Computer Science EntrepreneurSarah Allen
 
Agile Business Development
Agile Business DevelopmentAgile Business Development
Agile Business DevelopmentSarah Allen
 
Teaching code literacy
Teaching code literacyTeaching code literacy
Teaching code literacySarah Allen
 
Test First Teaching and the path to TDD
Test First Teaching and the path to TDDTest First Teaching and the path to TDD
Test First Teaching and the path to TDDSarah Allen
 

Mehr von Sarah Allen (20)

Communication is a Technical Skill
Communication is a Technical SkillCommunication is a Technical Skill
Communication is a Technical Skill
 
Improving Federal Government Services
Improving Federal Government ServicesImproving Federal Government Services
Improving Federal Government Services
 
Transparency Wins
Transparency WinsTransparency Wins
Transparency Wins
 
A Short History of Computers
A Short History of ComputersA Short History of Computers
A Short History of Computers
 
Designing for Fun
Designing for FunDesigning for Fun
Designing for Fun
 
Ruby in the US Government for Ruby World Conference
Ruby in the US Government for Ruby World ConferenceRuby in the US Government for Ruby World Conference
Ruby in the US Government for Ruby World Conference
 
Identities of Dead People
Identities of Dead PeopleIdentities of Dead People
Identities of Dead People
 
3 Reasons Not to Use Ruby
3 Reasons Not to Use Ruby 3 Reasons Not to Use Ruby
3 Reasons Not to Use Ruby
 
Ruby Nation: Why no haz Ruby?
Ruby Nation: Why no haz Ruby?Ruby Nation: Why no haz Ruby?
Ruby Nation: Why no haz Ruby?
 
Why no ruby in gov?
Why no ruby in gov?Why no ruby in gov?
Why no ruby in gov?
 
People Patterns or What I learned from Toastmasters
People Patterns or What I learned from ToastmastersPeople Patterns or What I learned from Toastmasters
People Patterns or What I learned from Toastmasters
 
Blazing Cloud: Agile Product Development
Blazing Cloud: Agile Product DevelopmentBlazing Cloud: Agile Product Development
Blazing Cloud: Agile Product Development
 
Crowdsourced Transcription Landscape
Crowdsourced Transcription LandscapeCrowdsourced Transcription Landscape
Crowdsourced Transcription Landscape
 
Lessons Learned Future Thoughts
Lessons Learned Future ThoughtsLessons Learned Future Thoughts
Lessons Learned Future Thoughts
 
Mobile Web Video
Mobile Web VideoMobile Web Video
Mobile Web Video
 
Elementary Computer History
Elementary Computer HistoryElementary Computer History
Elementary Computer History
 
Sarah Allen Computer Science Entrepreneur
Sarah Allen Computer Science EntrepreneurSarah Allen Computer Science Entrepreneur
Sarah Allen Computer Science Entrepreneur
 
Agile Business Development
Agile Business DevelopmentAgile Business Development
Agile Business Development
 
Teaching code literacy
Teaching code literacyTeaching code literacy
Teaching code literacy
 
Test First Teaching and the path to TDD
Test First Teaching and the path to TDDTest First Teaching and the path to TDD
Test First Teaching and the path to TDD
 

Kürzlich hochgeladen

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 

Kürzlich hochgeladen (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

Adventures in Full Text Search

  • 1. Adventures
 in
 Full
Text
Search Sarah
Allen

@ultrasaurus
  • 2. class Article < ActiveRecord::Base acts_as_solr end
  • 3. 3
  • 4.
  • 5.
  • 6.
  • 7.
  • 11. stemming synonyms stop
words word
boundaries
  • 12. SELECT text FROM phrases WHERE text like '%run%';  Can you run this to the post office for me? I'm going for a run, want to come along?  Cross country running  I'm too drunk to drive.  I am running out of battery power.  Work is not like wolf - it won't run away.
  • 13. SELECT text FROM phrases WHERE vectors @@ 'run'::tsquery;  Can you run this to the post office for me?  Sorry I am running really late. I'm going for a run, want to come along?  Cross country running  I am running out of battery power.  Work is not like wolf - it won't run away.
  • 14.
  • 15. Tokenization and Stemming Google App Engine /JRuby / Lucene http://full-text-search.appspot.com http:// github.com/ ultrasaurus/ full-text-search-appengine
  • 20. ./script/generate scaffold note content:string index:List -f --skip-migration ./script/generate dd_model note content:string index:List -f
  • 21. class Note include DataMapper::Resource property :id, Serial property :content, String, :required => true, :length => 500 property :index, List, :required => true timestamps :at end
  • 23. before :valid?, :update_index def update_index analyzer = SnowballAnalyzer.new("English") s = StringReader.new(content) token_stream = analyzer.tokenStream(nil, s) terms = [] while (token = token_stream.next) do terms << token.term end self.index = terms end
  • 24. before :valid?, :update_index def update_index analyzer = SnowballAnalyzer.new("English") s = StringReader.new(content) token_stream = analyzer.tokenStream(nil, s) terms = [] while (token = token_stream.next) do terms << token.term end self.index = terms end
  • 26. a about above after again against all am an and any are aren't as at be because been before being below between both but by can't cannot could couldn't did didn't do does doesn't doing don't down during each few for from further had hadn't has hasn't have haven't having he he'd he'll he's her here here's hers herself him himself his how how's i i'd i'll i'm i've if in into is isn't it it's its itself let's me more most mustn't my myself no nor not of off on once only or other ought our ours ourselves out over own same shan't she she'd she'll she's should shouldn't so some such than that that's the their theirs them themselves then there there's these they they'd they'll they're they've this those through to too under until up very was wasn't we we'd we'll we're we've were weren't what what's when when's where where's which while who who's whom why why's with won't would wouldn't you you'd you'll you're you've your yours yourself yourselves http://www.ranks.nl/resources/stopwords.html
  • 30. 


 

 








 
 


 
 


  • 31. 


 

 








 
 


 
 


  • 32. 

 
 
 I
love
horses 

 

 








 
 


 
 


  • 33. 

 
 
 I
love
horses 

 

 








 
 


 
 


  • 34. 

 
 
 I
love
horses 

 

Horses
are
beauSful 








 
 


 
 


  • 35. 

 
 
 I
love
horses 

 

Horses
are
beauSful 








 
 


 
 


  • 36. 

 
 
 I
love
horses 

 

Horses
are
beauSful 
 





 deer
in
the
forest 
 


 
 


  • 37. 

 
 
 I
love
horses 

 

Horses
are
beauSful 
 





 deer
in
the
forest 
 


 
 


  • 38. 

 
 
 I
love
horses 

 

Horses
are
beauSful 
 





 deer
in
the
forest 
 








deer
live
in
the
woods 
 


  • 39. 

 
 
 I
love
horses 

 

Horses
are
beauSful 
 





 deer
in
the
forest 
 








deer
live
in
the
woods


 
 


  • 40. 

 
 
 I
love
horses 

 

Horses
are
beauSful 
 





 deer
in
the
forest 
 








deer
live
in
the
woods


 
 


  • 41. 

 
 
 I
love
horses

 

Horses
are
beauSful 
 





 deer
in
the
forest 
 








deer
live
in
the
woods


 
 








You
are
an
idiot.


  • 43.
  • 44.
  • 45.
  • 46.
  • 48. Speed
  • 49.
  • 50.
  • 51. Write Hosted Database Search Rails
  • 52. Read Hosted Database Search Rails
  • 53.
  • 54. Target Target Source Text Language Language We’re
running
out
of
daylight en ja Could
you
run
this? en ja Cross‐country
running en ja I’m
going
for
a
run,
want
to
come
along? en ja
  • 57. I’m
going
for
a
run,
want
to
come
along? en ja 
 ha
shi
ri
ni
iku
ke
do
iAtsho
ni
ki
ma
su
ka?
  • 58. I’m
going
for
a
run,
want
to
come
along? en ja 
 ha
shi
ri
ni
iku
ke
do
iAtsho
ni
ki
ma
su
ka? Ikuko
Kobayashi
  • 59. I’m
going
for
a
run,
want
to
come
along? en ja 
 ha
shi
ri
ni
iku
ke
do
iAtsho
ni
ki
ma
su
ka? Ikuko
Kobayashi 2009‐11‐29
20:36:47
UTC
  • 60. I’m
going
for
a
run,
want
to
come
along? en ja 
 ha
shi
ri
ni
iku
ke
do
iAtsho
ni
ki
ma
su
ka? Ikuko
Kobayashi 2009‐11‐29
20:36:47
UTC hAp://….16ec695a‐8fce‐4277‐bdd4.flv
  • 61. I’m
going
for
a
run,
want
to
come
along? en ja 
 ha
shi
ri
ni
iku
ke
do
iAtsho
ni
ki
ma
su
ka? Ikuko
Kobayashi 2009‐11‐29
20:36:47
UTC hAp://….16ec695a‐8fce‐4277‐bdd4.flv hAp://….Japanese_ikuko_kobayashi.jpg
  • 62. 62
  • 63. class Page < ActiveRecord::Base acts_as_tsearch :fields => [ ... ] end
  • 64. Page.send :acts_as_tsearch, :fields => [:title] PagePart.send :acts_as_tsearch, :fields => [:content] ProgramPropertyList.send :acts_as_tsearch, :fields =>[:instructor, :program_desc, :program_detail, :resource]
  • 66. 66
  • 67.
  • 68.
  • 69. 69
  • 70. 70
  • 71. 71
  • 72. class Phrase < ActiveRecord::Base acts_as_tsearch :fields => [:text] end
  • 73. Phrase.find_by_tsearch(term, :conditions => {:language_id => target_language.id})
  • 74.
  • 75. When you think about search...
  • 76.

Hinweis der Redaktion

  1. Photo source: http://www.flickr.com/photos/9mohamed0/4268238013/sizes/o/\n
  2. \n
  3. \n
  4. Photo source: http://www.flickr.com/photos/zehfernando/3457455680/\n
  5. Photo source: http://www.flickr.com/photos/bevvell/4649795989/in/pool-97958286@N00\n
  6. Photo source: http://www.flickr.com/photos/caveman_92223/2763166886/\n
  7. Photo source: http://www.flickr.com/photos/lochaven/2588186224/\n
  8. Postgres: In database &amp;#x201C;tsvector&amp;#x201D; , partial indexes, acts_as_tsearch\n\nMySql FULLTEXT indices are fully indexed fields which support stopwords, boolean searches, and relevancy ratings: http://onlamp.com/pub/a/onlamp/2003/06/26/fulltext.html\nNote: MySql FULLTEXT requires MyISAM storage engine\nComparison of MySql vs. PostgresQL: http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL\n\nSolr/Lucene: Separate Index, Language Features: Faceted Search, Similar Documents (you may also like&amp;#x2026;)\nSphinx typically installed on the same machine, directly accessed your database\n
  9. \n
  10. \n
  11. Word boundaries understood by context in: Chinese, Japanese, Korean, Thai\nCJK word boundaries not handled in MySql 5: http://blogs.sun.com/soapbox/entry/fulltext_and_asian_languages_with\n
  12. \n
  13. \n
  14. Rethinking Full-Text Search for Multilingual DatabasesJeffrey Sorensen and Salim Roukos IBM T. J. Watson Research Center Yorktown Heights, New York &lt;sorenj|roukos&gt;@us.ibm.com\n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. Stop words can cause problems when using a search engine to search for phrases that include them, particularly in names such as &apos;The Who&apos;, &apos;The The&apos;, or &apos;Take That&apos;\nhttp://en.wikipedia.org/wiki/Stop_words\n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. Photo source: http://www.flickr.com/photos/thatguyfromcchs08/2300190277/\n
  49. Photo source: http://www.flickr.com/photos/stuckincustoms/4443168109/sizes/l/\n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. think of a blank canvas... don&amp;#x2019;t think about Solr or Sphinx, first think about what people are trying to find and what will help them most. \nMaybe browse is more im\n
  77. \n