Presentation on using Amazon CloudSearch with databases. What to use when? How can you use CloudSearch with a database? Tom Hill, Solutions Architect, Amazon CloudSearch
It's all about time.Who here is currently using search?
Yes, column oriented databases can be relational. There are lots of ways to classify databases, as there are MANY ways to organize data. Data Base Management Systemtechnically, that "Database" is the data, not the program.
Yes, column oriented databases can be relational. There are lots of ways to classify databases, as there are MANY ways to organize data. Data Base Management Systemtechnically, that "Database" is the data, not the program.
Yes, column oriented databases can be relational. There are lots of ways to classify databases, as there are MANY ways to organize data. Data Base Management Systemtechnically, that "Database" is the data, not the program.
Case folding, stemming, stopwordremoval.synonyms (wizard/philospher)Also accent normalization, UTF-8 normalization, etc.These are generally based on an inverted index, a data structure that is like the index at a back of a book. An inverted index is good for the type of queries that are common with text.
Designed to Search with words
I hate the term denormalized. Things frequently come into the system as a document, then get "normalized" and put into a database. Then they get "Denormalized" back into a document. Sometimes better to skip the middle, and put the document directly into CloudSearch.
I hate the term denormalized. Things frequently come into the system as a document, then get "normalized" and put into a database. Then they get "Denormalized" back into a document. Sometimes better to skip the middle, and put the document directly into CloudSearch.
Can talk about proximity
Can talk about proximity
You're launching a new site, where do you start? Most people start with relational databases.
"handling words" To do anything LIKE what CloudSearch does, you'd have to make a table of words that map to documents that contain it. This is going to be rather inefficient in most relational databases.
"handling words" To do anything LIKE what CloudSearch does, you'd have to make a table of words that map to documents that contain it. This is going to be rather inefficient in most relational databases.
*Relational databases are great at what they do. If you use a wrench for a wrench, it's great. But it doesn't make a very good hammer!You frequently will only use a relational database, if people aren't doing free text search. You might only use a text search engine, if all you do is search (e.g a blog search). But this isn't common.
H.L. Menken said "For every problem, there is a solution that is clear, simple, obvious, and WRONG.The "like" does a linear scan. It's like a database without an index. There's no relevance, doesn’t support multiple words, etc. This is a non-starter.
Depending on relational databases to do text is like depending on the join in a text search engine to do your relational activities.
Hammers and scalpels are both good tools. But you don't want to confuse them. "If all you have is a hammer, every problem looks like a nail"
*Relational databases are great at what they do. If you use a wrench for a wrench, it's great. But it doesn't make a very good hammer!You frequently will only use a relational database, if people aren't doing free text search. You might only use a text search engine, if all you do is search (e.g a blog search).
Now that we've established that you want to use a text search database.
Amazon CloudSearch is a service that allows you to add text search to your application in a minimum amount of time.
Here's how it scales
Amazon CloudSearch is a service that allows you to add text search to your application in a minimum amount of time.
Here's how cloudsearch works
What do you do with all of those features? You build something like smugmug.
For deletes, if all you do is delete the record from the relational database, there is no record of the record existing, so you won't know to delete it from CloudSearch
For deletes, if all you do is delete the record from the relational database, there is no record of the record existing, so you won't know to delete it from CloudSearch
Not a good model. What if one is offline for a while?
Simple, and works. But this relies on being able to detect all of your updates in just the database. You might need another table to keep track of things. In which case it looks like the next slide.
Now we record the records that have changed (not usually their contents, just their ID, and delete or add). The contents are fetched by the CloudSearch loadser
You may delete data from one table, and record it in another table for applying to cloudsearch.
Simple, and works. But this relies on being able to detect all of your updates in just the database. You might need another table to keep track of things. In which case it looks like the next slide.
The java SDK is actually only used for JSON. You can get those classes from JSON.org as well, but then they might conflict with the AWS SDK, which you might want to use later.
This is stripped down, but it contains the essential itemsWe execute the SQLWe iterate through the result setWe build a documentWe build a batchThe batcher posts when it is full. Don't forget to call "flush" There is more code in the example for command line args that for actual work!Use "select as" to get the right field names
The changes to the code to make this handle s3 are pretty trivial. You have to change the ResultSet loop to a
The changes to the code to make this handle s3 are pretty trivial. You have to change the ResultSet loop to a