If you are thinking of trying out a NoSQL document database, there are many good options available to Microsoft-oriented developers. In this session, we’ll compare some of the more popular databases, including: CosmosDb, Couchbase, MongoDb, CouchDb, and RavenDb. We’ll look at the strengths and weaknesses of each system. Querying, scaling, usability, speed, deployment, support and flexibility will all be covered. This session will include a discussion about when NoSQL is right for your project and give you an idea of which technology to pursue for your use case.
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
1. 5 Popular Choices for
NoSQL on a Microsoft
Platform
Matthew Groves @mgroves
2. 2
AGENDA
01/ What is NoSQL?
02/ Popular NoSQL Choices
03/ Evaluation Criteria
04/ Details
05/ The End
3. Where am I?
3
• Tulsa Tech Fest
• https://grouplings.com/TulsaTechFest
• https://twitter.com/TulsaTechFest
4. Who am I?
4
• Matthew D. Groves
• Developer Advocate for Couchbase
• @mgroves on Twitter
• Podcast and blog: http://crosscuttingconcerns.com
• "I am not an expert, but I am an enthusiast." –Alan Stevens
@natelovett
39. Usability
39
• Built in Futon / Fauxton
• Scaling / Replication work
• Cluster Setup Wizard (2.x)
• There is no "official" .NET SDK
• .NET SDKs are REST wrappers
71. Frequently Asked Questions
1. How is Couchbase different than Mongo?
2. I'M N1QL RIIIICCCKK!!!!!
3. How tall are you? Do you play basketball?
4. What is the Couchbase licensing situation?
5. Is Couchbase a ManagedCloud Service?
6. me@mgroves.com
74. Licensing
Couchbase Server Community
• Open source (Apache 2)
• Binary release is one release behind Enterprise
• Free to use in dev/test/qa/prod
• Forum support only
Couchbase Server Enterprise
• Mostly open source (Apache 2)
• Some features exclusive: https://www.couchbase.com/products/editions
• Free to use in dev/test/qa
• Need commercial license for prod
• Paid support provided
< Back
Not going to get too technical today
This is just an overview
The goal is to show you what's out there, give you an idea of what might appeal to you
This is just the start of your journey
This picture I didn't draw, it was drawn by Nate Lovette
Anything I drew, you can tell because it won't look nearly as good as this
NoSQL is a big umbrella term that encompasses a lot of databases
These are 4 of the most popular models, but they aren't the only ones
You can see some of these databases are in multiple categories,
That's because they support multiple models
I'm going to be focusing mainly on document databases today
They are similar to key/value databases: lot of key-based operations
Often a server-side querying mechanism that takes advantages of data being in a known format
Like JSON, XML, etc
So this is why you hear the term "NoSQL", because there is no SQL involved in interacting with data
Think of a document database at the most simple model
Of a key/value store, where the value is in a known format
You write code where you start with a key, and you ask the database to return the document
That corresponds to that key.
And the same with creating/updating
Why were these databases created in the first place?
Flexibility: schemas are hard to manage
Availability: if a machine does down (on purpose or not), we still want to be able to serve customers
Speed: the faster we can get the data to a screen, the more likely they are to purchase, less likely to give up, etc
Scaling: demand fluctuates, and we need to deal with peaks and valleys
When you start from scratch saying 'we need a database that can do X,Y,Z' one of the tradeoffs
That allow for all these things is throwing tables and schemas and SQL out the door
There's a lot of joking about "web scale" and "/dev/null" and so on. But yes these databases
Do need to actually store and be able to retrieve data reliably. Those are table stakes, if you ask me.
Sometimes you need tables, constraints, enforced schemas; but you might be surprised at how often you don't.
That's a quick rundown of the why/how/history of NoSQL. Hopefully you're in the mindset tonight of
"Okay, it might be a useful tool to have in my box. What software should I be looking at specifically? What are the next steps?"
In this session, I'll be focusing on 5 databases that fit these basic criteria
1 – document databases only, I'm narrowing the focus a bit
2- runs on Microsoft; this session is meant for windows / .net / Microsoft / azure developers
3 – popular by some measure, this is pretty arbitrary
Let's talk about popularity for a second
Db-engines is a site that puts out this monthly ranking
It measures a databases popularity by: search engine results, google trends, stack overflow, jobs, linkedin, twitter
Mongodb is the most popular nosql database by far
Some notable dbs I'm not covering as Cassandra, which is a columnar database, not document
I'm omitted Redis, because it's a key/value store, and it's also more of an in-memory cache than it is a database
I've also omitted elasticsearch, solr, which I don't really think of as databases, but I guess they are
I'm omitting DynamoDB because it's AWS exclusive
It's not unheard of for a .NET app to use dynamodb, but it's a bit out of the ordinary
I'm omitting memcached… explain origins of couchbase via memcached and couchdb
Maybe also mention Cloudant
CosmosDb is up to #30 (it was 37 the last time I presented)
I think it's pretty interesting, but it is azure exclusive.
Firebird, is mobile focused. I'm not going to focus on mobile too much, but Couchbase has a mobile implementation too; and CouchDb has a "reimplementation" in JavaScript called PouchDb, which is geared toward mobile development.
I'm omitting marklogic, which is a commercial nosql database
Omitting Hazelcast, because it's key/value store
Omitting Riak, it doesn't run on windows and it's key/value store
RethinkDB has a bit of a cult following
Omitting Aerospike because they are key value
OrientDb is written in Java, it's multi-model including graph and document
Realm, is mobile focused. I'm not going to focus on mobile too much
Starting to get low on the list
Cloudant is based on CouchDb and BigCouch
RavenDB is low on the list and getting lower, even though it's been around for quite some time.
I wouldn't normally bring it up in a discussion of "popularity", except that I've noticed anecdotally that .NET devs seem to be a lot more aware of it than the rest of the world (for obvious reasons)
So I'm doing a bit of cherry picking here
So these are the 5 that I've picked for this evaluation
I'm definitely biased towards my employer
This is what I want to do with this session
But I'm going to try as be fair as possible and say as many nice things about each of these that I can
If you want to find mean things that people have said, there is no shortage of that on the internet
Each of these tools have been created by smart people and are used by great companies to do great things
There is no perfect software. Every database is a sum total of flaws, tradeoffs, decisions, and preferences
If you have a question like "why should I use couchbase instead of mongo", I can begrudgingly answer in the general, but in reality the answer depends on so many factors
I'm not going to cover these tools in details.
If you have a question like "what's the best way to index a date field in couchbase to optimize for time series data", this is not the right session.
But, I'm happy to talk about anything after the session is over.
Ward Cunningham's Law
"the best way to get the right answer on the internet is not to ask a question; ... it's to post the wrong answer."
No way am I an expert in any of these databases, even couchbase
First criteria is querying
All these databases are typically going to provide ways to get or mutate one document at a time
But what else do they have beyond that for querying data?
The more options the better? Or too many options? Match the option with the use case?
NoSQL databases were created in a post-web world
Where huge amounts of people are using a web site or a mobile app
And thus one of the things these databases provide is scalability
How well do they scale, how difficult is it to scale? How many steps and decisions are involved?
Peer-to-peer is easiest
master-slave and replica sets being more difficult
This may not be a big deal to you, if the database otherwise does what you want it to and does it well
We are tech people, we can tinker for a while
But I think making software as easy as possible to get started makes our lives much better
So I'm going to call out features that I think make it a more pleasant experience
"Make common things easy, rare things possible"
Keep in the mind the tradeoff here, you may be sacrificing some level of tweaking for ease of use
And you may be sacrificing security sometimes if you aren't paying close enough attention
Many nosql databases make a lot of claims about speed
I'm wary of benchmarks
So I'm going to focus on architectural decisions that affect speed
It's generally bigger, complex data where speed becomes a problem
When in doubt do your own benchmarks
Since we're focusing Microsoft, obviously this is going to lean towards windows and azure
But it's nice to know what other options are out there, in case you need to go
In a different direction, or use different infrastructure for whatever reason
I don't have an icon for it, but I'll also mention if they have a Kubernetes operator or not
Pure technology is not the only thing that matters
"The biggest challenges in adopting #NoSQL are usually human rather than technical" - @JudahGabriel
Who is doing the support? What are the licenses?
Is it open source, does it have a big community? Responsive community?
Is it going to be around a year from now?
Are they innovating, adding new features I want? Features I may want in the future?
I do not like this query syntax, It's limited in terms of joins, unions, etc
Text search is present, but it's limited compared to elasticsearch, etc
.NET SDK follows .NET idioms pretty well, there's a bit of weirdness having to convert objects to BsonDocuments, but it's not that bad
Linq provider built into Mongo .NET SDK, but limited due to the underlying query capabilities
Scaling is possible, you set up multiple types of nodes, configure sharding
Replication is a master/slave setup, meaning that a single member of the cluster is the master and is the only one allowed to modify data
And that includes between data centers
Decision fatigue
Insecure by default: anonymous admin access
Indexing is important for querying
Mongo has an in-memory option and an on-disk option
So that's a basic tradeoff you can make
Windows / Mac / Linux
AWS / Azure / Google support (it is VMs)
Docker
DBaaS – lot of managed partners like Mongolab, MongoSoup, etc
MongoDB Atlas is mongo's own managed DBaaS
Just announced a kubernetes operator for 4.0
Licensing! It's AGPL, and some enterprises don't like that. In fact, they list it as a possible weakness/threat in their S-1
I haven't been able to spend much time with mongodb 4.0
So some of the things I'm saying about may be out of date, or no longer true
Two big things they've announced: ACID Transaction support and a mobile database
CouchDB
Mongo-inspired query syntax, based on Cloudant Query this is in version 2.x though
MapReduce which is great for performance, but it is javascript and doesn't accommodate adhoc queries
Mango is "mongo inspired" I don't know if it's mongo compatible or meant to be mongo compatible
You can setup a cluster relatively easy
But you need to run a proxy in front of it, like HAProxy (couchdb recommends)
Sharding you need to configure the number of shards per database
Replication and conflict management is something that couchdb is good at
When removing a node from a cluster you have to make sure to move shards away
So there is some manual work involved in managing scale
Futon is web console in 1.x
Fauxton is web console in 2.x
Insecure by default: anonymous admin access
CouchDb's design assumes that caching will be handled by the operating system, by the browser, by a proxy you setup
Not by couchdb itself
Windows / Mac / Linux
AWS / Azure / Google support (VMs)
Docker (there isn't an official 2.x on docker hub yet)
DBaaS – Cloudant
No kubernetes operator that I'm aware of
Cloudant is compatible
Couchbase Lite and Sync Gateway are "compatible" with couchdb (version 2 of Couchbase Lite will probably change that)
But interop between them is not supported
I really like N1QL, it's kinda what attracted me to couchbase in the first place
I already know how to write SQL, so I can apply that a nosql database
Linq2Couchbase is a linq provider that generates N1QL (it's not officially supported yet)
Multi-master
Security: with 4.x it was *mostly* secure by default
You need to setup a password, but you can create buckets without passwords
With 5.x it is completely secure by default
Explain memcached and couchdb
Windows / Mac / Linux
AWS / Azure / Google support (it is VMs)
Docker
no DBaaS, managed database (yet)
Kubernetes operator currently in public beta, will be RTM in a few months
Cosmos DB
There is a SQL language for CosmosDB but it is very limited
No intra-document joins, no GROUP BY, no insert/update/delete
Cosmos has sprocs, triggers, udfs, but you have to write them in ECMAScript 2015 (JavaScript)
Azure handles the scaling for you
You set request units per second or per minute and cosmos db will scale to handle that
Cosmosdb has 5 consistency options, so you can explicitly trade off between strong consistency and eventual consistency,
"guarantee" is Microsoft's wording
"guarantee" is Microsoft's wording
This is azure only
You can run the emulator on windows, the emulator is not meant for production
But I'm sure some joker is going to try it
The emulator is also available in docker so hypothetically you could deploy that anywhere
No kubernetes operator, not sure if it needs one or if it's possible or what
Microsoft support only, of course
This is not open source (which used to be implied with Microsoft, but I feel like I have to say that now)
In Raven 3.x each database is an independent entity, you can setup replication and cooperation, but
Oren: "There is a lot of work that you need to do on all the nodes" which "can grow very tedious"
Oren on 4.x: "You can bring in additional nodes without having to update any configuration"
4.X is in release candidate now
I'm not convinced it's the right database for large scales, but it has some interesting functionality
Ravendb is on docker, but they also publish a handy Powershell script
For getting up and running with docker
Which I think is a nice usability touch
Secure by default, but you can turn on anonymous admin access
I think the auto-indexing feature is really intriguing
Basically, it will create indexes as you need them
And keep them around until they aren't used anymore
Sounds great in theory, I've heard in practice that it doesn't quite work out as well as it sounds
And you still end up needing to create indexes declaratively
I've not seen any bold claims or benchmarks showing raven blowing everyone away in speed
Mostly they claim to be a "safe by default" database, so maybe that's the tradeoff they're making
They've announced performance improvements in 4.x
Raven is built on .NET, some people will claim this hinders performance
Just as people will claim databases built on the JVM
There may be some truth to this, databases often use low-level operations
.NET Core?
"RavenDB High Performance " by Brian Ritchie
Raven 4.x runs on Linux
Raven can run on aws/azure in VMs
Raven run on windows and 4.x runs in docker
RavenHQ is a hosted ravendb provider
No kubernetes operator that I'm aware of
It's agpl
Bizspark discounts
Marten is not a database
it's a .NET library that stands between your application and postgresql
Postgresql has some really good json support
Marten leverages that to treat postgresql as a document database
Why just one?
Mobile? look at Couchbase, Sync Gateway, Couchbase Mobile, and maybe CouchDb and MongoDb
Querying? Look at Couchbase, CosmosDb, Raven, Marten
Cost? Mongo, CouchDb
Ops/DevOps proficient? Mongo, CouchDb, Raven
Ops/DevOps deficient? CosmosDb, Couchbase, Marten, MongoDb Atlas
Hobby? Mongo, CouchDb, Couchbase
Resume? Go with the popular one.
Speed? Maybe not Raven
Transactions? Raven, Marten, Mongo or maybe Cosmos
Security? Be careful with some that aren't secure by default
Integrations? This is huge. Databases generally don't just sit behind one app. They need to integrate with other software. Maybe stick to relational, unless a connector exists, or you are using microservices, soa, or clean architecture
All I ask is that you give Couchbase a chance
Free download
You can also take it for a free test drive on the major cloud providers
Also, this is something new for me this year, please go to this URL to enter to win a $100 gift card. It is literally a 1 question survey and it helps me out a lot.
This is my family
My enormous head barely fits in the picture
Open source apache license for community edition, enterprise edition on a faster release schedule, some advanced features, and support license.
Couchbase is software you can run in the cloud on a VM or on your own data center. CosmosDb is a manage cloud service, but there is a emulator you can run locally.
If you want to play with N1QL (SQL for JSON) you don't even have to install Couchbase first
You can do it in browser
CosmosDb has one of these too
Main points of diff: Architecture & Features
Architecture
Memory first: integrated cache, you don't need to put redis on top of couchbase
Master-master: easier scaling, better scaling
Auto-sharding: we call vBuckets, you don't have to come up with a sharding scheme, it's done by crc32
Features
N1QL: SQL, mongo has a more limited query language and it's not SQL-like
Full Text Search: Using the bleve search engine, language aware FTS capabilities built in
**this may not apply after Mongo 4.0 release** Mobile & sync: Mongo has nothing like the offline-first and sync capabilities couchbase offers
Mongo DOES have a DbaaS cloud provider
Everything I've shown you today is available in Community edition
The only N1QL feature I can think of not in Community is INFER
The Enterprise features you probably don't need unless you are Enterprise developer.
Enterprise feature examples:
Graphical explain plan
Schema inference
Index replicas
Rack zone
XDCR advanced features
Full RBAC
Unlimited query concurrency
Ephemeral buckets
MDS
We announced "Couchbase Managed Cloud"
This is not a "sign-up for a free tier", it's more like a white-glove service to host Couchbase on the cloud provider(s) of your choice
Couchbase IS in the Azure and AWS marketplaces, and there are some wizards to make config easy, but it runs on your VMs. This is one step beyond that.
A plain old DBaaS is on the horizon, but not currently available.