33. Cheap Aggregates
• It pays to know your data well
• Reduce values are stored inline with the view
b-tree
• Small values take very little space
• Nice built-in reduce functions
• Not just for user visible data
50. Manual Indexing
• Store an index as a document
• Good properties for mostly static indexing
• Cluster friendly
• Create custom constrains (uniqueness)
• Snapshot of a slow query for speed
51. GeoCouch
• R-tree based
• First-class Erlang
• improved with view engine refactor
• Can be abused for multi-dimensional queries
• more than just geo-data
52. CouchDB Lucene
• Based on CouchDB Externals
• Limited to Couchbase Single Server
• Faceted queries
• Full-text indexing
53. Hybrid
• Application managed
• Allow stand alone service to work with
Couchbase cluster
• i.e. Solr, Redis, PostgreSQL
• Complex concurrency
• More moving parts
This presentation shares some tips on how I've gotten CouchDB to perform well for me in the past as well as things to looks forward to in the future.\n\nAdvanced is kind of a distraction. CouchDB is simple so what you see here shouldn't be that different from basic queries.\n
Queries always end up being about data. All of our data is inside special purpose data structures. Our control of the query depends on understanding and controlling these structures.\n
Queries always end up being about data. All of our data is inside special purpose data structures. Our control of the query depends on understanding and controlling these structures.\n
Queries always end up being about data. All of our data is inside special purpose data structures. Our control of the query depends on understanding and controlling these structures.\n
Everything. Even when it's calculated live, in memory. Not all of these are created equal however. Fortunately CouchDB keeps it simple and presents one general structure for most use cases.\n
I won't cover B-trees in depth here. Wikipedia is a good start if you're wondering. Keep in mind that CouchDB has a specific incarnation that gives us some special properties.\n
Cornerstone to all databases, I/O will decide if your ideas fly or fail. Feeding your intense, networked, interactive software of today requires a serious study of I/O characteristics.\n
Throughput and latency tend to be the measurements of choice. Notice how big of a jump RAM is. Imagine how many CPU cycles o e HDD seek is.\n
So let's keep RAM in mind. Couchbase does make good use of RAM in their clustered product for documents but it's not available for queries.\n
Usually enough but this should actually be measured. How, well, let's look at what I call a "working set".\n
All of your data might exist somewhere on disk. That doesn't mean it can't have those disk pages cached in RAM. Keep it there. Try to keep data clustered on disk so you have better page cache and buffer cache efficiency.\n
All of your data might exist somewhere on disk. That doesn't mean it can't have those disk pages cached in RAM. Keep it there. Try to keep data clustered on disk so you have better page cache and buffer cache efficiency.\n
What a working set is.\n
Controlling the working set by tuning your database design. This talk will focus on views for queries but all of these point matter. Measure because it better add up or your performance will be painfully slow.\n
I always like to start talking about indexing by declaring that it's already there. We already have an automatic index. I call this the primary index, but that's just me.\n
Key-value anyone? How do we make key based access fast. How do we accelerate random access vs sequential access. It's all about data layout. It equates to an index.\n
Key-value applies to CouchDB.\n
A nice property of this key index is that it provides a method of uniques. I hear this question all the time. "How do I constrain fields of a document to a unique value?" Short answer is _id.\n
This leads beautifully to revision based concurrency. Semantic keying is a good idea, even if it's not in you primary index, but why wait to build a view?\n
Finally, my favorite part of the primary document tree is that it's just one file. No duplication of information, do your overhead is nice and small. It's always fresh too, unlike views.\n
\n\n
These are just a few ideas I've made up names for.\n
\n\n
\n\n
\n\n
\n\n
\n\n
\n\n
It's pretty obvious how this key design helps turn joins into a range query.\n
\n\n
\n\n
\n\n
\n\n
\n\n
_rev can also be passed, but be careful as revisions can be pruned during compaction.\n
They don't cost much so it pays to have default reduce functions. It's all about knowing your data better.\n
\n\n
\n\n
\n\n
When you have one big database, you pay all costs all at once. Compaction costs, for example, can be huge.\n
When you have many smaller databases, costs can be paid for incrementally. Compaction will take much less overhead for example.\n
\n\n
Key access is fast. Simple.\n
Key access is fast. Simple.\n
Key access is fast. Simple.\n
\n\n
\n\n
\n\n
Merging queries means you might have cases with partial results.\n
\n\n
It's still an option, especially if you need certain performance on a cluster.\n
Available as part of Couchbase Single/Mobile.\n
CouchDB, Couchbase Single only.\n
Good way to extend an existing cluster. Up to the application layer.\n