Couchbase 3.0.2 d1

Couchbase
Sachin Kansal
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Introduction to NoSQL
• Need & options of NoSQL Solution
• Getting Started with Couchbase
• Administration of Couchbase
• Considerations
• Best Practices
• Case Study
• New Features
• Assessment
Agenda
sachinkkansal@gmail.com 2
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

INTRODUCTIONS
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Database Architect & Consultant
• 15 years IT experience
• Well versed with RDBMS solution products - SQL Server, Oracle, DB2.
• NoSQL Enthusiast & follower.
• Reachable at sachinkkansal@gmail.com
• https://couchbaseblog.wordpress.com
Sachin Kansal
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Quick Check of NoSQL technology
Discussion & simple assessment
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Is it free ?
• Is it secure ?
• Is it only buzz word or any one is using it really ?
• Which technology it supports ?
• Can it manage my current work setup ?
• Does NoSQL requires separate h/w to work ?
• Is it required…actually ?
• Any others …?
Discussion
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Introduction On NoSQL :
What It Is And Why You Need It
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

History :
• The term NoSQL was coined by Carlo Strozzi in the year 1998.
• He used this term to name his Open Source, Light Weight, DataBase which did not have an SQL interface.
• In the early 2009, when last.fm wanted to organize an event on open-source distributed databases, Eric Evans, a
Rackspace employee, reused the term to refer databases which are non-relational, distributed, and does not conform
to atomicity, consistency, isolation, durability - four obvious features of traditional relational database systems.
• In the same year, the "no:sql(east)" conference held in Atlanta, USA, NoSQL was discussed and debated a lot.
• And then, discussion and practice of NoSQL got a momentum, and NoSQL saw an unprecedented growth.
• NoSQL is a non-relational database management systems, different from traditional relational
database management systems in some significant ways.
• It is designed for distributed data stores where very large scale of data storing needs (for example
Google or Facebook which collects terabits of data every day for their users).
• These type of data storing may not require fixed schema, avoid join operations and typically scale
horizontally.
NoSQL
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

NoSQL – Not Only SQL
Evolution of NoSQL
• RDBMS ruled 3 decades
• Emerging in the 1970s and early 1980s, relational databases offered a searchable
mechanism for persisting complex data with minimal use of storage space.
• Optimized to run on single machines.
• schema-based approach to modeling data.
• Expensive H/w
• Dramatic changes in usage & lower cost.
• Result : Increased complexity to the application and database design and often
resulted in inferior performance.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Dimensions that mattered
• The number of concurrent users skyrocketed as applications increasingly became accessible via the web (and later
on mobile devices).
• The amount of data collected and processed soared as it became easier and increasingly valuable to capture all
kinds of data.
• The amount of unstructured or semi-structured data exploded and its use became integral to the value and
richness of applications.
• Google, Amazon, Facebook, and LinkedIn were among the first companies to discover the serious limitations
of relational database technology for supporting these new application requirements.
• Commercial alternatives didn’t exist, so they invented new data management approaches themselves.
• Open source NoSQL database projects formed to leverage the work of the pioneers, and commercial companies
associated with these projects soon followed
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Four interrelated megatrends are driving the adoption of NoSQL technology.
• Big Users
• The Internet of Things.
• Big Data
• Cloud
Why do you need it ?
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Big users
Support global users 24 X 7 yearly
• A newly launched app can go viral, growing from
zero to a million users overnight — literally.
• Some users are active frequently, while others use
an app a few times, never to return.
• Seasonal swings like those around festivals /
holidays create spikes for short periods.
• New product releases or promotions can spawn
dramatically higher application usage
• The large numbers of users combined with the
dynamic nature of usage patterns is driving the
need for more easily scalable database technology.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

The amount of machine-generated data is increasing with the proliferation of digital
telemetry.
• There are 14 billion things connected to the Internet.
• They’re in factories, farms, hospitals, and warehouses.
• They’re in homes: appliances, gaming consoles, and more.
• They’re cars.
• They’re mobile phones and tablets.
• They receive environment, location, movement, temperature, weather data, and more
from 50 billion sensors.
Estimation
• By 2020, 32 billion things will be connected to the Internet.
• By 2020, 10% of data will be generated by embedded systems.
• By 2020, 20% of target rich data will be generated by embedded systems.
The Internet of Things
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Big Data
Data is becoming easier to
capture and access through third
parties such as Facebook.
Personal user information,
geolocation data, social graphs,
user-generated
content, machine logging data,
and sensor-generated data are
just a few examples of the ever-
expanding array of data being
captured.
Example: Flight Information
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

SaaS / PaaS
Cloud
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Better application development productivity through a more fexible data model;
• Greater ability to scale dynamically to support more users and data;
• Improved performance to satisfy expectations of users wanting highly responsive
applications and to allow more complex processing of data.
• To address 3 V’s , CAP and over come RDBMS limitations.
Need for NoSQL
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

RDBMS NoSQL
Structured and organized data Stands for Not Only SQL
Structured query language (SQL) No declarative query language
Data and its relationships are stored in separate tables. No predefined schema
Data Manipulation Language, Data Definition Language
KeyValue pair storage, Column Store, Document
Store, Graph databases
Tight Consistency Eventual consistency rather ACID property
ACID Unstructured and unpredictable data
CAP Theorem / BASE
Prioritizes high performance, high availability and
scalability
RDBMS vs NoSQL
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Relational and NoSQL data models are very different.
• The relational model takes data and separates it into many interrelated tables.
• Each table contains rows and columns where a row might contain lots of information
about a person and each column might contain a value for a specific attribute
associated with that person, like their age.
• Tables reference each other through foreign keys that are stored in columns as well.
• The relational model minimizes the amount of storage space required, because each
piece of data is only stored in one place
• NoSQL Data Model
Data Model - RDBMS
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

NoSQL databases have a very different model.
For example, a document oriented NoSQL database takes the data you want to store
and aggregates it into documents using the JSON format.
Each JSON document can be thought of as an object to be used by your application.
A JSON document might, for example, take all the data stored in a row that spans 20
tables of a relational database and aggregate it into a single document/object.
Aggregating this information may lead to duplication of information, but since storage is
no longer cost prohibitive, the resulting data model flexibility, ease of efficiently
distributing the resulting documents and read and write performance improvements make
it an easy trade-o for web-based applications.
Data Model – NoSQL
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Data Model Comparison
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

In 2000, Eric Brewer, a computer scientist from the University of California, Berkeley,
proposed the following conjecture:
• Consistency: All components of the system see the same data.
• This means that the data in the database remains consistent after the execution of an operation.
For example after an update operation all clients see the same data.
• Availability: All requests to the system receive a response, whether success or failure.
• This means that the system is always on (service guarantee availability), no downtime
• Partition tolerance: The system continues to function even if some components fail or some message
traffic is lost.
• This means that the system continues to function even the communication among the servers is
unreliable, i.e. the servers may be partitioned into multiple groups that cannot communicate with
one another.
CAP Theorem
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• In theoretically it is impossible to fulfill all 3 requirements.
• CAP provides the basic requirements for a distributed system to follow 2 of the 3
requirements.
• Therefore all the current NoSQL database follow the different combinations of the C,
A, P from the CAP theorem.
Here is the brief description of three combinations CA, CP, AP :
• CA - Single site cluster, therefore all nodes are always in contact. When a partition
occurs, the system blocks.
• CP - Some data may not be accessible, but the rest is still consistent/accurate.
• AP - System is still available under partitioning, but some of the data returned may be
inaccurate.
CAP 2
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Current situation
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

The BASE acronym was defined by Eric Brewer (CAP person)
A BASE system gives up on consistency.
• Basically Available indicates that the system does guarantee availability, in terms of
the CAP theorem.
• Soft state indicates that the state of the system may change over time, even without
input. This is because of the eventual consistency model.
• Eventual consistency indicates that the system will become consistent over time,
given that the system doesn't receive input during that time.
The BASE
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

There are four general types (most common categories) of NoSQL databases.
(There is not a single solutions which is better than all the others, however there are some databases that are better to
solve specific problems)
• Key-value stores
• Column-oriented
• Graph
• Document oriented
NoSQL Categories / Models
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Key-value stores are most basic types of NoSQL databases.
• Designed to handle huge amounts of data.
• Based on Amazon’s Dynamo paper.
• Key value stores allow developer to store schema-less data.
• In the key-value storage, database stores data as hash table where each key is unique and the value can be string,
JSON, BLOB (basic large object) etc.
• A key may be strings, hashes, lists, sets, sorted sets and values are stored against these keys.
• For example a key-value pair might consist of a key like "Name" that is associated with a value like "Robin".
• Key-Value stores can be used as collections, dictionaries, associative arrays etc.
• Key-Value stores follows the 'Availability' and 'Partition' aspects of CAP theorem.
• Key-Values stores would work well for shopping cart contents, or individual values like color schemes, a landing
page URI, or a default account number.
Key-value stores
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Key-value stores
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Column-oriented databases primarily work on columns and every column is treated individually.
• Values of a single column are stored contiguously.
• Column stores data in column specific files.
• In Column stores, query processors work on columns too.
• All data within each column datafile have the same type which makes it ideal for compression.
• Column stores can improve the performance of queries as it can access specific column data.
• High performance on aggregation queries (e.g. COUNT, SUM, AVG, MIN, MAX).
• Works on data warehouses and business intelligence, customer relationship management (CRM), Library card
catalogs etc.
• Ex: BigTable, Cassandra, SimpleDB etc
Column-oriented
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Column-oriented
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• A graph database stores data in a graph.
• It is capable of elegantly representing any kind of data in a highly accessible way.
• A graph database is a collection of nodes and edges
• Each node represents an entity (such as a student or business) and each edge
represents a connection or relationship between two nodes.
• Every node and edge is defined by a unique identifier.
• Each node knows its adjacent nodes.
• As the number of nodes increases, the cost of a local step (or hop) remains the same.
• Index for lookups.
Graph databases
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Graph databases
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Comparison Sum-UP
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Scenarios: Where Nosql can be used
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Good Fit
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

1. Bigness
2. Massive write performance
3. Fast key-value access
4. Flexible schema and flexible datatypes
5. Schema migration
6. Write availability
7. Easier maintainability, administration and operations
8. No single point of failure
9. Generally available parallel computing
10. Programmer ease of use
11. Use the right data model for the right problem
12. Avoid hitting the wall
13. Distributed systems support
14. Tunable CAP tradeoffs
Specific Use Cases  Specific use cases.docx
General Use Cases
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Flavours of NoSQL
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Currently there are 150 (OMG !!) Flavors in the market
(Complete list @http://nosql-database.org/)
Flavors
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

The Storage Architecture
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Vertical Scaling vs Horizontal Scaling
• Example: Residential tower & Expressway
Storage Architecture
Vertical Horizontal
• Can essentially resize your server with no
change to your code.
• It is the ability to increase the capacity of
existing hardware or software by adding
resources.
• Limited by the fact that you can only get as big
as the size of the server.
• Affords the ability to scale wider to deal with
traffic.
• It is the ability to connect multiple hardware or
software entities, such as servers, so that they
work as a single logical unit.
• This kind of scale takes time & effort to design
& implement.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Getting Started With Couchbase Server
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Evolution from memcached
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Couchbase Server Core Principles
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Push-button elasticity
• Add or remove multiple servers simultaneously with the push of a button
• Efficient data rebalancing without requiring application changes
Zero-downtime maintenance
• Add or remove servers, upgrade software in and perform any maintenance tasks in a live cluster
• No application downtime required
• No application performance degradation
Data replication with auto-failover
• Maintain multiple copies of your data within the cluster for high-availability
• User configurable replication count
• User configurable failover policy to ensure data availability in the face of hardware failure
Key Couchbase Server characteristics and capabilities
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Enterprise class monitoring and administration
• Deeply instrumented monitoring with rich administration GUI
• Dynamic system monitoring charts
• Backup and restore capability
• RESTful management API
• Easy interface to external monitoring and management systems
• Easy to automate deployment to the cloud
Couchbase Server is simple, fast, elastic, and reliable
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Simple. Everything about Couchbase Server is easy: getting, installing, managing, expanding and
using it. As a document database, there is no need to create and manage schemas; and never a need
to normalize, shard or tune the database. Build applications faster, keep them running reliably and
easily adapt them to changing business requirements.
Fast. Couchbase Server is screamingly, predictably fast. It is the lowest latency, highest throughput
NoSQL database technology available. Read and write data with consistently low latency and
sustained high throughput across the scaling spectrum. Get the performance you need at lower cost
Elastic. By automatically distributing data and I/O across commodity servers or virtual machines,
Couchbase Server makes it easy to match the optimal quantity of resources to the changing needs of
an application. Quickly grow a cluster from 1 node to 25 nodes to 100 nodes or shrink a cluster to
sustain application performance, while precisely matching cost to demand.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Couchbase Server, originally known as Membase, is an open source, distributed
(shared-nothing architecture).
Recent release : 3.0.2 – 15 Dec 2014
Written in C++, Erlang, C
History of CouchBase
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Couchbase Server comes in two different editions:
• Enterprise Edition (EE)
• latest stable version of Couchbase, which includes all the bugfixes and has passed a rigorous QA
process.
• It is free for use with any number of nodes for testing and development purposes, and with up to 2
nodes for production.
• Purchase required for an annual support plan with this edition.
• Community Edition (CE)
• lags behind the EE by about one release cycle and does not include all the latest fixes or
commercial support.
• Is open source and entirely free for use in testing and in production (for braves only though)
• This edition is largely meant for enthusiasts and non-critical systems.
Selecting a Couchbase Server Edition
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Recommended
• Quad-core for key-value store, 64-bit CPU running at 3GHz.
• Six cores if XDCR (Cross Data Center Replication) and views are used.
• 16GB RAM (physical).
• Block-based storage device (hard disk, SSD, EBS, iSCSI). Network filesystems such as CIFS and
NFS are not supported.
Minimum specification
• Dual-core CPU running at 2GHz for key-value store.
• 4GB RAM (physical).
Storage requirements
1GB for application logging.
At least twice the disk space to match the physical RAM for persistence of information.
Resource requirements
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Platform Version 32 / 64 bit Supported Recommended
Red Hat Enterprise Linux 5 64 bit Developer and Production RHEL 5.8
Red Hat Enterprise Linux 6 64 bit Developer and Production RHEL 6.3
CentOS 5 64 bit Developer and Production CentOS 5.8
CentOS 6 64 bit Developer and Production CentOS 6.3
Amazon Linux 2013.03 64 bit Developer and Production
Ubuntu Linux 10.04 64 bit Developer and Production
Ubuntu Linux 12.04 64 bit Developer and Production Ubuntu 12.04
Debian Linux 7 64 bit Developer and Production Debian 7.0
Windows 2012 R2 SP1 64 bit Developer and Production
Windows 2008 R2 with SP1 64 bit Developer and Production Windows 2008
Windows 8 32 and 64 bit Developer only
Windows 7 32 and 64 bit Developer only
Mac OS 10.7 64 bit Developer only
Mac OS 10.8 64 bit Developer only Mac OS 10.8
Supported platforms
Couchbase clusters on mixed platforms are not supported.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Couchbase Server uses specific network ports for communication between server components
and with the clients accessing the data stored in the Couchbase cluster.
• The listed ports must be available on the host for Couchbase Server to run and operate correctly.
• Couchbase Server configures these ports automatically, but you must verify that the firewall and IP
tables configuration allow communication on the specified ports for each usage type.
• Ports used for different types of communication with Couchbase Server:
1. Node to node These ports are used by Couchbase Server for communication between all nodes within the
cluster. These ports must be open to enable nodes to communicate with each other.
2. Node to client These ports are used by Couchbase Server for communication between all nodes within the
cluster. These ports must be open to enable nodes to communicate with each other.
3. Cluster administration These ports are used for Couchbase administration including the REST API, command-
line clients, and web browsers.
4. XDCR These ports are used for XDCR (Cross Data Center Replication) communication between all nodes in both
the source and destination clusters.
Network Ports
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Port Description Node to node Node to client Cluster admin XDCR v1 XDCR v2
8091 Web Administration Port Yes Yes Yes Yes Yes
8092 Couchbase API Port Yes Yes No Yes Yes
11207 Internal/External Bucket Port for SSL No Yes No No No
11209 Internal Bucket Port Yes No No No No
11210 Internal/External Bucket Port Yes Yes No No Yes
11211 Client interface (proxy) No Yes No No No
11214 Incoming SSL Proxy No No No No Yes
11215 Internal Outgoing SSL Proxy No No No No Yes
18091 Internal REST HTTPS for SSL No Yes Yes No Yes
18092 Internal CAPI HTTPS for SSL No Yes No No Yes
4369 Erlang Port Mapper ( epmd ) Yes No No No No
21100 to 21299 (inclusive) Node data exchange
Network ports
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Port Used by
Port 8091 Used by the Couchbase Web Console for REST/HTTP traffic.
Port 8092 Used to access views, run queries, and update design documents.
Port 11207 Used by smart client libraries to access data nodes using SSL.
Port 11210 Used by smart client libraries or Moxi to directly connect to the data nodes.
Port 11211 Used by pre-existing Couchbase and memcached (non-smart) client libraries.
Ports 11214 and 11215 Used for SSL XDCR data encryption.
Port 18091 Used by the Couchbase Web Console for REST/HTTP traffic with SSL.
Port 18092 Used to access views, run queries, and update design documents with SSL.
All other Ports Used for other Couchbase Server communications.
Network Ports
• Port 11213 is an internal ports used on the local host for memcached and compaction.
• The node is not used for communication between nodes in a cluster.
• For firewall purposes, you do not need to take port 11213 into consideration. However, if a service is listening on this port,
Couchbase Server does not start correctly.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Implement the same operating system on all machines within each discreet cluster.
• Mixed clusters and mixed XDCR deployments are not supported due to incompatibility
caused by differences in the number of shards between platforms.
Deployment Consideration
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Installation & Configuration
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• We will be using Windows in our session
• Install the package choices :
• Thru Wizard
• Unattended / Silent
• No anti-virus software running
• Administrator privileges
• Couchbase Server uses the Microsoft C++ redistributable package, which is automatically
downloaded during installation.
• If already being used, installation can fail. Close application using it prior.
Installation - Windows
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• HO  Start Installing Couchbase server 3.0.2 by running the Installer package.
• Take the snapshot of each screen to explore the relevance of parameters after installation.
• If the Windows installer hangs on the Computing Space Requirements screen, there is an issue
with setup or installation environment, such as other running applications.
• Stop any other browsers and applications that were running when you started installing the Couchbase Server.
• Kill the installation process and uninstall the failed setup.
• Delete or rename the temp location under C:Users[logonuser]AppDataTemp
• Reboot and try again.
Installation - Wizard
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

An unattended installation uses a script to install Couchbase Server.
Steps:
1. Record your installation settings in the wizard installation. These settings are saved to a file, which
is used to silently install other nodes of the same version.
 Open a Command Terminal or Power and start the installation executable with the /r command-line
option: couchbase_server_version.exe /r /f1your_file_name.iss
 Provide your installation options when prompted. The wizard completes the server installation and
provides a file with your recorded options at C:Windowsyour_file_name.iss. (Accept an increase in MaxUserPort )
2. Copy the your_file_name.iss file into the same directory as the installer executable.
 Run the installer from the command-line using the /s option:
> couchbase_server_version.exe /s -f1your_file_name.iss
3. To repeat this process on multiple machines, copy the installation package and the your_file_name.iss file to the
same directory on each machine.
Installation - Unattended
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Un-Install
• To uninstall Couchbase Server on a Windows system, you must have Administrator or
Power User privileges.
• Go to Control panel and remove from add/remove program options
Upgrade
• The installation wizard will upgrade your server installation using the same installation
location.
Uninstall & Upgrade
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Configuring Couchbase Server
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Configuring Couchbase Server
• Open the administration web console and configure the server.
• In browser: http://<server>:8091 <server> is the machine on which you’ve installed
Couchbase.
• Then after there are simple multiple screens to configure the server, however each
options are of high significance which we will be exploring in coming slides.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Configuring Couchbase Server, step 1
• The Databases Path field is the location where Couchbase will store its
persisted data.
• The Indices Path field is where Couchbase will keep the indices created by
views.
• Both locations refer only to the current server node.
• Placing the index data on a different physical disk than the document data is
likely to result in better performance, especially if you will be using many
views or creating views on the fly.
• In a Couchbase cluster, every node must have the same amount of RAM
allocated.
• The RAM quota you set when starting a new cluster will be inherited by every
node that joins the cluster in the future. It is possible to change the server
RAM quota later through the command-line administration tools.
• Used on demand & normally set to 60% of total RAM
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Memcached bucket type will hide the unsupported configuration options, such
as replicas and read-write concurrency.
• The memory size is the amount of RAM that will be allocated for this bucket on
every node in the cluster.
• This is the amount of RAM that will be allocated on every node, not the
total amount that will be split between all nodes.
• Couchbase buckets can replicate data across multiple nodes in the cluster.
With replication enabled, all data will be copied up to three times to different
nodes. If a node fails, Couchbase will make one of the replica copies available
for use.
• number of replicas setting refers to copies of data. For example, setting it to 3
will result in a total of four instances of your data in the cluster, which also
requires a minimum of four nodes.
• Enabling index replication - This has the effect of increasing traffic between
nodes, but also means that the indices will not need to be rebuilt in the event of
node failure
• The disk read-write concurrency setting controls the number of threads that
will perform disk IO operations for this bucket
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

All set Screen
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Couchbase Architecture
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Database Architecture
• Each Couchbase Server node
has two major components:
the Cluster Manager and the
Data Manager
• Applications use the Client
Software Development Kits
(SDKs) to communicate with
both of these components.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Couchbase Server
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• A Couchbase Server cluster consists of between 1 and 1024 nodes, with each node
running exactly one instance of the Couchbase Server software.
• The data is partitioned and distributed between the nodes in the cluster.
• This means that each node holds some of the data and is responsible for some of the
storing and processing load.
• Distributing data this way is often referred to as sharding, with each partition referred
to as a shard
CouchBase Server
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Data Manager
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Data Manager. The data manager does the work of storing and retrieving data in
response to data operation requests from applications.
• It exposes two “memcapable” ports to the network – one port supports non-vBucket-
aware memcached client libraries (pre-memcapable 2.0 API), which are proxied if
required.
• The other port expects to communicate with vBucket-aware clients (memcapable 2.0+
API). The majority of code in the Data Manager is C and C++.
Data Manager
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

The Couchbase Server data manager listens for requests on two TCP ports - the port
numbers are configurable
Port 11211 – The traditional memcached port number processes requests from clients
supporting version 1.0 of the memcapable API specification. These clients rely on a
consistent hashing algorithm to map keys directly to servers in a variable-length server
list. Most memcached clients today support memcapable 1.0, though memcapable 2.0
clients for the most popular platforms are being introduced (e.g., spymemcached for
Java, enyim for .NET, fauna for Ruby, libmemcached for C and other languages that wrap
this client library).
• Port 11210 – a port directly accessible to clients implementing version 2.0 of the
memcapable API. These clients are “vBucket aware,” using a hashing algorithm to map
keys to one of a fixed number of “vBuckets” (
TCP Ports for Data Manager
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Cluster Manager
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

The cluster manager supervises the configuration and behavior of all nodes in a
Couchbase Server cluster. Cluster management code runs on every node in the cluster,
but one node (the one holding a global singleton) is elected to perform aggregation,
consensus building and cross-node control decisions at any point in time.
• The Couchbase Server cluster manager monitors health and coordinates data
manager behavior on each node
• configures and supervises inter-node behavior (e.g. replication streams and
rebalancing operations)
• Provides aggregation and consensus functions for the cluster (e.g. global singleton
election)
• Provides a RESTful cluster management API.
• The cluster manager is build atop Erlang/OTP, a proven environment for building and
operating robust fault-tolerant distributed applications
Cluster Manager
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

There are four primary subsystems that operate on each node.
1. Heartbeat. A watchdog process periodically communicates with the currently elected
cluster leader (the node with the global singleton) to provide Couchbase Server
health updates.
2. Process monitor. This subsystem monitors execution of the local data manager,
restarting failed processes as required and contributing status information to the
heartbeat module.
3. Configuration Manager. Each Couchbase Server node has a configuration – a
vBucket map, active replication streams, a target rebalance map, etc. The
configuration manager receives, processes and monitors local configuration, in
concert with a cluster-wide configuration distribution system.
4. Global Singleton Supervisor. In a Couchbase Server cluster, one node is elected
leader. If the leader dies, a new leader is elected. The Global Singleton Supervisor is
responsible for electing a cluster leader and supervising “per-cluster” processes if the
local node is the current leader.
Per node configuration management and monitoring functions
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

In addition to the per-node functions which are always executing at each node in a
Couchbase Server cluster, there are a set of functions which active only on one node in
the cluster at any point in time.
Possession of a global singleton data structure indicates to a node that it should execute
these functions:
1. Rebalance Orchestrator. The rebalance orchestrator calculates, distributes and provides cluster-wide supervision
of a rebalance operation. When a rebalance operation is initiated, it calculates a target vBucket map based on the
current pending set of servers to be added and removed from the cluster; distributes commands to individual nodes
to build a network of vBucket migration streams; and monitors migration completion events, updating and
distributing the current vBucket map as migrations complete
2. Node Health Monitor. The node health monitor (also known as The Doctor) receives heartbeat updates from
individual nodes in the cluster, updating configuration and raising alerts as required.
1. vBucket state and replication manager. Responsible for establishing and monitoring the current network of
replication streams
Per cluster functions
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Interfacing And Interacting
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Data flow in a Couchbase Server environment
Between application and Couchbase Server
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Data Flow - Within the Couchbase Server cluster
1. The set arrives into the Couchbase Server listener-
receiver.
2. Couchbase Server immediately replicates the data
to replica servers – the number of replica copies is
user defined. Upon arrival at replica servers, the
data is persisted.
3. The data is cached in main memory.
4. The data is queued for persistence and de-
duplicated if a write is already pending. Once the
pending write is pulled from the queue, the value is
retrieved from cache and written to disk (or SSD).
5. Set acknowledgment return to application
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Metadata and Documents
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Retrieval Operations
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Storage Operations
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Consistency
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Ejection, NRU, Cache Miss
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Clients Connect Directly to Couchbase Nodes
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Key Hash-Partitioning
Application Servers
MAP
MAP
MAP
1024
Partitions
8 GB RAM
3 IO Workers
ClientHashFunc6on("ABCXYZ@couchbase.com") => Par66on[0..1023] {25}
ClusterMap[P(25)] => [x.x.x.x] => IP of Server Responsible for Par66on 25
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Horizontal Scale-Rebalance
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

All Metadata for All
Documents
(64 bytes + Key Length)
Document Values (NRU
Ejected if RAM Quota
Used > 90%)
Also Leave RAM For OS:
[Filesystem Cache >> Views]
Document Indexing
Monitoring
XDCR
Recommended:
minimum 4 Cores
+ 1 core per design document
+ 1 core per XDCR replicated
bucket
Persisted Documents
All Indexes for Design
Documents/Views
Append-‐Only Disk Format
& CompacUon
Performance: MulUple
EBS Volumes High
IOPS Raid 0 on Amazon
RAM, CPU and IO Guidelines
RAM CPU Disk IO
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Key Concepts
Architecture – Building Blocks
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Architecture – Building Blocks
Key Concepts
Node Cluster Cluster Manager Caching layer
vbuckets Buckets RAM quotas Tunable memory
Disk storage Shared thread pool Disk I/O priority TAP
Expiration Server warmup Replicas and
replication
Database Change
Protocol
Proxy (Moxi)
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Couchbase Server can be used either in a standalone configuration, or in a cluster
configuration where multiple Couchbase Servers are connected together to provide a
single, distributed, data store.
Couchbase Server or node
• A single instance of the Couchbase Server software running on a machine, whether a physical machine, virtual
machine or other environment.
• All instances of Couchbase Server are identical, provide the same functionality, interfaces and systems, and
consist of the same components.
• All nodes within Couchbase Server are created equally.
• No Master :There is no hierarchy or topology, and no single node is a ‘master’ of the rest of the cluster.
• Each node is responsible only for the data it stores and the requests made to it by clients.
• Range 1 and 1024 nodes
Node
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Cluster
• A cluster is a collection of one or more instances of Couchbase Server that are configured as a logical
cluster.
• All nodes within the cluster are identical and provide the same functionality.
• Each node is capable of managing the cluster and each node can provide aggregate statistics and
operational information about the cluster.
• User data is stored across the entire cluster through the vBucket system.
• Clusters operate in a completely horizontal fashion.
• To increase the size of a cluster, add another node.
• There are no parent/child relationships or hierarchical structures involved. This means that Couchbase
Server scales linearly, both in terms of increasing the storage capacity and performance and scalability.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• The Cluster Manager is responsible for node and cluster management. Every node within a
Couchbase cluster includes the Cluster Manager component.
• Access to the Cluster Manager is provided through the administration interface on a dedicated
network port and through dedicated network ports for client access.
• Additional ports are configured for inter-node communication.
• The data is partitioned and distributed between the nodes in the cluster.
• Distributing data this way is often referred to as sharding, with each partition referred to as a shard
The Cluster Manager is responsible for the following within a cluster:
• Cluster management
• Node administration
• Node monitoring
• Statistics gathering and aggregation
• Run-time logging
• Multi-tenancy
• Security for administrative and client access
• Client proxy service to redirect requests
Cluster Manager
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• The Rack Awareness feature permits logical groupings of servers on a cluster where each server group physically
belongs to a rack or Availability Zone.
• To use and enable Rack Awareness, all servers in a cluster must be upgraded to Couchbase Server Enterprise
Edition and minimally, version 2.5.
• By design, Couchbase Server evenly distributes data of active and replica vBuckets across the cluster for cluster
performance and redundancy purposes.
• With Rack Awareness, server partitions are laid out so the replica partitions for servers in one server group are
distributed in servers for a second group and vice versa.
• If one of the servers becomes unavailable or if an entire rack goes down, data is retained since the replicas are
available on the second server group.
• Replica vBuckets are evenly distributed from one server group to another server group to provide redundancy and
data availability.
• The rebalance operation also evenly distributes the replica vBuckets from one server group to another server group
across the cluster. If an imbalance occurs where there is an unequal number of servers in one server group, the
rebalance operation performs a "best effort" of evenly distributing the replica vBuckets across the cluster.
Rack Awareness
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Distribution of vBuckets and replica vBuckets
Distribution with additional server Distribution with unavailable server
• If the cluster becomes imbalanced, add servers to balance the cluster. For optimal Rack Awareness functionality, a
balanced cluster is recommended.
• If there is only one server or only one server group, default behavior is automatically implemented, that is, Rack
Awareness functionality is disabled.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Couchbase Server provides data management services using buckets.
• Buckets are isolated virtual containers for data.
• A bucket is a logical grouping of physical resources within a cluster of Couchbase
Servers.
• Buckets provide a secure mechanism for organizing, managing, and analyzing data
storage resources. Two types of data buckets, memcached and couchbase, enable
you to store data either in-memory only or both in-memory and on disk (for added
reliability). During Couchbase Server set up, the type of bucket that you need for your
implementation is selected.
• Buckets can be used by multiple client applications across a cluster.
• Similar to databases in Microsoft SQL Server, or to schemas in Oracle.
• Typically, you would have separate buckets for separate applications.
• Couchbase supports two kinds of buckets: Couchbase and memcached.
Buckets
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Buckets Comparison
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Data is cached in memory and persisted to disk and can be dynamically rebalanced
between nodes in a cluster to distribute the load.
• Couchbase buckets can be configured to maintain between one and three replica
copies of the data, which provides redundancy in the event of node failure. Because
each copy must reside on a different node, replication requires at least one node per
replica, plus one for the active instance of data.
• Couchbase-type buckets provide a highly-available and dynamically reconfigurable
distributed data store, survive node failures, and allow cluster reconfiguration while
continuing to service requests. Couchbase-type buckets provide the following core
capabilities:
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Buckets
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Default bucket
• The default bucket is a Couchbase bucket that always resides on port 11211 and is a
non-SASL authenticating bucket.
• When Couchbase Server is first installed this bucket is automatically set up during
installation.
• This bucket can be removed after installation and can also be re-added later, but
when re-adding a bucket named “default”, the bucket must be place on port 11211 and
must be a non-SASL authenticating bucket.
• A bucket not named default cannot reside on port 11211 if it is a non-SASL bucket.
• The default bucket can be reached with a vBucket aware smart client, an ASCII client
or a binary client that doesn’t use SASL authentication.
bucket interface types – 1/3
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Non-SASL buckets
Non-SASL buckets can be placed on any available port with the exception of port 11211 if
the bucket is not named “default”. Only one Non-SASL bucket can placed on any
individual port. These buckets can be reached with a vBucket aware smart client, an
ASCII client or a binary client that doesn’t use SASL authentication.
SASL buckets
• SASL authenticating Couchbase buckets can only be placed on port 11211 and each
bucket is differentiated by its name and password. S
• ASL bucket cannot be placed on any other port beside 11211.
• These buckets can be reached with either a vBucket aware smart client or a binary
client that has SASL support. These buckets cannot be reached with ASCII clients.
bucket interface types – 2/3/3
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

GENERIC NOTE
• Smart clients discover changes in the cluster using the Couchbase Management REST API.
• Buckets can be used to isolate individual applications to provide multi-tenancy or to isolate data
types in the cache to enhance performance and visibility.
• Couchbase Server permits you to configure different ports to access different buckets, and
provides the option to access isolated buckets using either the binary protocol with SASL
authentication or the ASCII protocol with no authentication
• Quotas for RAM and disk usage are configurable per bucket so that resource usage can be
managed across the cluster.
bucket
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• A vBucket is defined as the owner of a subset of the key space of a Couchbase
cluster. These vBuckets are used to distributed information effectively across a cluster.
• The vBucket system is used both for distributing data and for supporting replicas
(copies of bucket data) on more than one node.
• vBuckets are not a user-accessible component, but they are a critical component of
Couchbase Server and are vital to the availability support
• Every document ID belongs to a vBucket.
vBuckets
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Documents in a bucket are further subdivided into virtual buckets (vBuckets) by their key.
• Each vBucket owns a subset of all the possible keys, and documents are mapped to vBuckets
according to a hash of their key.
• Every vBucket,in turn, belongs to one of the nodes of the cluster.
• When a client needs to access a document, it first hashes the document key to find out which
vBucket owns that key.
• The client then checks the cluster map to find which node hosts the relevant vBucket.
• Lastly, the client connects directly to the node that stores the document to perform the get
operation.
Vbuckets - Functioning
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

vBucket
• The client first hashes the key to calculate the vBucket
which owns KEY. In this example, the hash resolves to
vBucket 8 (vB8).
• By examining the vBucket map, the client determines
Server C hosts vB8.
• The client sends the GET operation directly to Server C
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• This architecture permits Couchbase Server to cope with changes without using the
typical RDBMS sharding method.
• In addition, the architecture differs from the method used by memcached, which uses
client-side key hashes to determine the server from a defined list.
• The memcached method requires active management of the list of servers and
specific hashing algorithms such as Ketama to cope with changes to the topology.
vBucket
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

RAM is allocated to Couchbase Server in 2 configurable quantities: Server Quota and
Bucket Quota.
Server quota
• The Server Quota is the RAM that is allocated to the server when Couchbase Server
is first installed.
• This sets the limit of RAM allocated by Couchbase for caching data for all buckets and
is configured on a per-node basis.
• The Server Quota is initially configured in the first server in your cluster is configured,
and the quota is identical on all nodes.
• Example: if you have 10 nodes and a 16GB Server Quota, there is 160GB RAM
available across the cluster. If you were to add two more nodes to the cluster, the new
nodes would need 16GB of free RAM, and the aggregate RAM available in the cluster
would be 192GB.
RAM quotas
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Bucket quota
• The Bucket Quota is the amount of RAM allocated to an individual bucket for caching data.
• Bucket Quotas are configured on a per-node basis, and is allocated out of the RAM defined by the Server Quota.
• Example: if you create a new bucket with a Bucket Quota of 1GB, in a 10 node cluster there would be an
aggregate bucket quota of 10GB across the cluster. Adding two nodes to the cluster would extend your aggregate
bucket quota to 12GB.
• Bucket Quota is used by the system to determine when data should be ejected from memory.
• Bucket Quotas are dynamically configurable, within the Server Quota limits, and enable individual control of
information cached in memory on a per bucket basis. Therefore, buckets can be configured differently depending
your caching RAM allocation requirements.
• The Server Quota is also dynamically configurable, however, ensure that the cluster nodes have the available RAM
to support the chosen RAM quota configuration.
RAM Quota
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Couchbase Server includes a built-in caching layer which acts as a central part of the server and provides very
rapid reads and writes of data.
• Couchbase Server automatically manages the caching layer and coordinates with disk space to ensure
that enough cache space exists to maintain performance.
• Couchbase Server automatically places items that come into the caching layer into disk queue so that it can write
these items to disk.
• If the server determines that a cached item is infrequently used, it removes it from RAM to free space for other
items.
• Similarly the server retrieves infrequently-used items from disk and stores them into the caching layer when the
items are requested.
• In order to provide the most frequently-used data while maintaining high performance, Couchbase Server manages
a working set of your entire information.
• The working set is the data most frequently accessed and is kept in RAM for high performance.
Caching layer
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Couchbase automatically moves data from RAM to disk asynchronously, in the background, to keep frequently used
information in memory and less frequently used data on disk.
• Couchbase constantly monitors the information accessed by clients and decides how to keep the active data within the
caching layer.
• Data is ejected to disk from memory while the server continues to service active requests.
• During sequences of high writes to the database, clients are notified that the server is temporarily out of memory until enough
items have been ejected from memory to disk.
• The asynchronous nature and use of queues in this way enables reads and writes to be handled at a very fast rate, while
removing the typical load and performance spikes that would otherwise cause a traditional RDBMS to produce erratic
performance.
• When the server stores data on disk and a client requests the data, an individual document ID is sent and then the server
determines whether the information exists or not. Couchbase Server does this with metadata structures.
• The metadata holds information about each document in the database and this information is held in RAM. This means that
the server returns a ‘document ID not found’ response for an invalid document ID, returns the data from RAM, or returns the
data after being fetched from disk.
cont.. Caching layer
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Other database solutions read and write data from disk, which results in much slower
performance.
• One approach used by other database solutions is to install and manage a caching
layer as a separate component which works with a database.
• This approach has drawbacks because of the significant custom code and effort due
to the burden of managing the caching layer and the data transfers between the
caching layer and database.
Cont.. Caching Layer
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Couchbase Server mainly stores and retrieves information for clients using RAM. At
the same time, Couchbase Server eventually stores all data to disk to provide a higher
level of reliability.
• It writes data to the caching layer and puts the data into a disk write queue to be
persisted to disk.
• Disk persistence enables to perform backup and restore operations and to grow
datasets larger than the built-in caching layer.
• This disk storage process is called eventual persistence since the server does not
block a client while it writes to disk.
• If a node fails and all data in the caching layer is lost, the items are recovered from
disk. When the server identifies an item that needs to be loaded from disk, because it
is not in active memory, the process is handled by a background process that
processes the load queue and reads the information back from disk and into memory.
Disk storage
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Multi-threaded readers and writers provide multiple processes to simultaneously read
and write data on disk. Simultaneous reads and writes increase disk speed and
improve the read rate from disk.
• Multiple readers and writers are supported to persist data on disk.
• When server nodes are upgraded, the multiple readers and writers setting is
implemented with bucket restart and warmup. In this case, install the new node, add it
to the cluster, and edit the existing bucket setting for readers and writers
• After rebalancing the cluster, the new node performs reads and writes with multiple
readers and writers and the data bucket does not restart or go through a warmup.
• The multi-threaded engine includes additional synchronization among threads that are
accessing the same data cache to avoid conflicts. To maintain performance while
avoiding conflicts over data, Couchbase Server uses a form of locking between
threads and thread allocation among vBuckets with static partitioning.
Disk Storage -- Multiple readers and writers
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• When Couchbase Server creates multiple reader and writer threads, the server assesses a range
of vBuckets for each thread and assigns each thread exclusively to certain vBuckets.
• With this static thread coordination, the server schedules threads so that only a single reader and
single writer thread can access the same vBucket at any given time.
• Example: 6 pre-allocated threads & 2 data Buckets.
• Each thread has the range of vBuckets that is
statically partitioned for read and write access.
Disk Storage -- Multiple readers and writers
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Couchbase Server never deletes entire items from disk unless a client explicitly
deletes the item from the database or the expiration value for the item is reached.
• The ejection mechanism removes an item from RAM, while keeping a copy of the key
and metadata for that document in RAM and also keeping copy of that document on
disk.
Document deletion
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Tombstones are records of expired or deleted items that include item keys and metadata.
• Couchbase Server and other distributed databases maintain tombstones in order to provide
eventual consistency between nodes and between clusters.
• Couchbase Server stores the key plus several bytes of metadata per deleted item in two structures
per node. With millions of mutations, the space taken up by tombstones can grow quickly. This is
especially the case if there are a large number of deletions or expired documents.
• The Metadata Purge Interval sets how frequently a node permanently purges metadata on deleted
and expired items.
• The Metadata Purge Interval setting runs as part of auto-compaction. This helps reduce the
storage requirement by roughly 3x times lower than before and also frees up space much faster
Tombstone purging
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• A shared thread pool is a collection of threads which are shared across multiple
buckets.
• A thread pool is a collection of threads used to perform similar jobs. Each server node
has a thread pool that is shared across multiple buckets. Shared thread pool
optimizes dispatch tasks by decoupling buckets from thread allocation.
• Threads are spawned at initial startup of a server node instance and are based on the
number of CPU cores.
• With the shared thread pool associated with each node, threads and buckets are
decoupled.
• By decoupling threads from specific buckets, threads can run tasks for any bucket.
Since the global thread pool permits for bucket priority levels, a separate I/O queue is
available with the reader and writer workers at every priority level. This provides
improved task queueing.
• Example, when a thread is assigned to running a task from an I/O queue and a
second task is requested, another thread is assigned to pick up the second task.
Shared thread pool
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Circumstances describes how threads are scheduled to dispatch tasks:
• If all buckets have the same priority (default setting), each thread evenly round-robins
over all the task queues of the buckets.
• If buckets have different priorities, the threads spend an appropriate fraction of time
(scheduling frequency) dispatching tasks from queues of these bucket.
• If a bucket is being compacted, threads are not allocated to dispatch tasks for that
bucket.
• If all buckets are either empty or being serviced by other threads, the thread goes to
sleep.
Shared thread pool - Working
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Disk I/O priority enables workload priorities to be set at the bucket level.
• The bucket disk I/O priority can be set as either high or low (Default – LOW).
• Bucket priority settings determine whether I/O tasks for a bucket are enqueued in either low or high
priority task queues
• Threads in the global pool poll the high priority task queues more often compared to the low priority
task queues.
• Bucket latency and I/O operations are impacted by the setting value.
• can be configured during the initial setup and then edited after the setup -- after the setup results in
a restart of the bucket and resetting of the client connections
Disk I/O priority
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Tunable memory enables both value-only ejection and full metadata ejection from memory.
• The cache management approach for item ejection is implemented with value-only ejection and full
metadata ejection
Value-only ejection (the default) removes the data from cache but keeps all keys and metadata fields
for non-resident items. When the value bucket ejection occurs, the item's value is reset.
Full metadata ejection removes all data including keys, metadata, and key-values from cache for
non-resident items. Full metadata ejection reduces RAM requirement for large buckets.
• Full-bucket ejection supports very large data footprints (a large number of datasets or items/keys)
since the working sets in memory are smaller. The smaller working sets allow efficient cache
management a
• Example, you might want to enable the full metadata ejection on that bucket if you need to store
huge amounts of data (for example, tera or peta bytes) nd reduced warmup times. Metadata
ejection is configured at the bucket-level.
Tunable Memory
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• With 2.x, cb cached all keys and metadata in memory and allowed ejection of values
only. That is great for low-latency access to any part of your data.
• However some workloads don't require low latency access to all parts of data and
rather have memory reserved for 'hotter' parts of the working set.
• With 3.0, for large databases with a smaller active working set, you can turn on 'full
ejection' and eject keys and metadata for parts of your data that is rarely accessed.
Even if you consider keys and metadata small, with a large number of keys, the
memory used for keys and metadata can add up. With the full ejection mode, you can
more effectively use memory for caching larger parts of your working set.
• Enabling the option is easy and can be done per bucket in the admin console. The
change is transperant to apps so there is nothing that needs to be done on the
application size to take advantage of the setting.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Working set management is the process of freeing up space and ensuring that the most used items are available in
RAM. Ejection is the process of removing data from RAM to provide room for frequently used items.
• The process that Couchbase Server performs to free space in RAM, and to ensure the most-used items are still
available in RAM is also known as working set management.
• Ejections is automatically performed by Couchbase Server.
• When Couchbase Server ejects information, it works in conjunction with the disk persistence system to ensure that
data in RAM has been persisted to disk and can be safely retrieved back into RAM if the item is requested.
• In addition to memory quota for the caching layer, there are two watermarks the engine uses to determine when it
is necessary to start persisting more data to disk. These are mem_low_wat and mem_high_wat.
Tunable Memory - Working set management and ejection
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• As the caching layer becomes full of data, eventually the mem_low_wat is passed. At
this time, no action is taken.
• As data continues to load, it eventually reaches mem_high_wat. At this point, a
background job is scheduled to ensure items are migrated to disk and that memory is
available for other Couchbase Server items.
• This job runs until measured memory reaches mem_low_wat.
• If the rate of incoming items is faster than the migration of items to disk, the system
can return errors indicating there is not enough space.
• This continues until there is available memory.
• The process of removing data from the caching to make way for the actively used
information is called ejection and is controlled automatically through thresholds set on
each configured bucket in the Couchbase Server cluster.
Tunable Memory
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Insider action by server white paper :Tunnable Memory - Working set.docx
Details more in PPT from Couchbase: Tunable Memory.pptx
WEB : http://www.couchbase.com/nosql-resources/blog/all-new-30-full-ejection-tuning-
memory-large-databases
Tunable Memory
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

TTL
• Each document stored in the database has an optional expiration value (TTL, time to
live) that is used to automatically deleted items.
• The expiration option can be used for data that has a limited life and could be
automatically deleted.
• TTL is specified in seconds, or as Unix epoch time.
• The default is no expiration.
• Typical uses for an expiration value include web session data where the actively
stored information needs to be removed from the system once the user activity has
stopped.
• With an expiration value, the data times out and is removed from the system without
being explicitly deleted.
• This frees up RAM and disk for more active data.
Expiration
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• When Couchbase Server is restarted or when it is started after a restore from backup,
• the server goes through a warm-up process.
• The warm-up loads data from disk into RAM, making the data available to clients.
• The warmup process must complete before clients can be serviced.
• Depending on the size and configuration of your system, and the amount of data that
you have stored, the warmup may take some time to load all of the stored data into
memory.
• use cbstats to get information about server warmup, including the status of warmup
and whether warmup is enabled.
• ep_warmup_thread - Indicates whether the warmup completed or is still running.
Returns “running” or “complete”.
• ep_warmup_state - Indicates the current progress of the warmup
Server warmup
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Replicas are copies of data that are proved on another node in a cluster.
• A copy of data from one bucket, known as a source is copied to a destination, which
we also refer to as the replica, or replica vBucket.
• replica node The node that contains the replica vBucket .
• source node The node containing original data to be replicated.
• Distribution of replica data is handled in the same way as data at a source node
• portions of replica data will be distributed around the cluster to prevent a single point
of failure
• After Couchbase has stored replica data at a destination node, the data will also be
placed in a queue to be persisted on disk at that destination node.
• When replication is performed between two Couchbase clusters, it is called cross
datacenter replication (XDCR)
Replicas and replication
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• The TAP protocol is an internal part of the Couchbase Server system and is used in a
number of different areas to exchange data throughout the system.
• TAP provides a stream of data of the changes that are occurring within the system.
• TAP is used during replication, to copy data between vBuckets used for replicas.
• It is also used during the rebalance procedure to move data between vBuckets and
redestribute the information across the system.
TAP
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• DCP is an innovative protocol that drive data sync for Couchbase Server v3.0.
• The Database Change Protocol (DCP) is a streaming protocol that significantly reduces latency for view updates.
• Increase data sync efficiency with massive data footprints
• Remove slower Disk-IO from the data sync path
• Improve latencies – replication for data durability
• In future, will provide a programmable data sync protocol for external stores outside Couchbase Server
• With DCP changes made to documents in memory are immediately streamed to be indexed without being written to
disk.
• This provides faster view consistency which provides fresher data.
• DCP reduces latency for cross data center replication (XDCR).
• Data is replicated memory-to-memory from the source cluster to the destination cluster before being written to disk
on the source cluster.
Database Change Protocol -- DCP
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Major difference between TAP and DCP:
 Tap guarantees you will see all mutations at least once, but doesn’t guarantee any specific order
 Tap doesn’t have the ability to restart from anywhere
 De-duplication of items means that we cannot tell when we have a consistent view of the database
Tap vs. DCP
©2014 Couchbase, Inc. 137
TAP DCP
Ordering No ordering guaranteed! Ordered
Restart-ability Not really! Granular Restart-ability
Consistency No snapshotting capabilities
here!
Snapshots give a consistent
view of the DB.
Performance No memory based support
for Views & XDCR
Memory based all data
synchronization
components!
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Since DCP is more of internal communication within Couchbase Server, its manageability
is very limited and beyond the scope of administering directly.
However, for understanding the internals, the building blocks are various components are
listed in this doc.
DCP.docx
For details of internal functioning and its benefits here is the PPT from Couchbase
DCP Deep Dive.pptx
DCP
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Couchbase Client Libraries / SDK
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

• Couchbase Server is designed to distribute data across multiple nodes, which adds
complexity to storage and retrieval.
• To allow applications to save and retrieve data, Couchbase provides a set of
language-specific client libraries, also called Couchbase Client SDKs.
Currently, Couchbase provides officially supported SDKs for the following seven
languages and runtimes
SDK
SDKs for several additional languages, including Clojure, Go, Perl, and Erlang, are
maintained by the community as open source projects
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

An up-to-date list can be found at the official Couchbase website:
www.couchbase.com/communities/all-client-libraries
http://www.couchbase.com/open-source
• These SDKs are used to communicate with the Couchbase Server cluster.
• The SDKs usually communicate directly with the nodes storing relevant data within the
cluster, and then perform data operations such as insertion and retrieval.
• The Couchbase clients are aware of all the nodes in the cluster, their current state,
and what documents are stored in each node.
SDK
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Connect to Couchbase
Step 1
Step 2
Step 3
Client Connection Steps
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Service List
• Dynamic Distributed Services
• Dynamic Configuration Updates – No
additional work from the developer
• Fault Tolerant/Durable Connectivity
Client Connectivity Characteristics:
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Client Operations
All SDK’s
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Unified API - DML Methods, Add/Remove/Update
Adding and Removing Information (JSON Documents, K/V
Binary)
• insert-Insert a document or binary key/value. Fails if the item exists.
• upsert-Stores a document or binary key/value to the bucket, or updates if a document exists.
• replace-Replaces a document or binary key/value in a bucket. Fails if the item doesn’t exist.
• remove-Deletes an item from the bucket. Fails if the item doesn’t exist
• append/prepend-Appends or prepends in place the value of a binary k/v item. Does NOT
work with documents
• touch-Updates the ttl of a documet.
• getAndTouch-Retrieves a document or binary key/value and updates the expiry of the item
at the same time.
• counter-Increments or decrements a key's numeric value.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Unified API - DML Methods, Retrieval
Retrieving Information (JSON Documents, K/V Binary)
• get-Retrieves a document or binary key/value.
• getAndLock-Lock the document or binary key/value on the server and retrieve it. When an
document is locked, its CAS changes and subsequent operations on the document (without
providing the current CAS) will fail until the lock is no longer held.
• getReplica-Get a document binary key/value from a replica server in your cluster.
• unlock-Unlock a previously locked document or binary key/value on within a bucket.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Unified API – DML, CAS Example
©2014 Couchbase, Inc.
1. Two Clients retrieve the same document "XYZ"
2. Client A retrieves it first.
3. Client B then retrieves XYZ. Both clients will have the same CAS value for document
XYZ
4. Client B tries to perform an update to document XYZ. The update succeeds as the CAS
value was unchanged from when Client B initially retrieved the document. Once the
update succeeds, the CAS value for XYZ changes.
5. Client A then tries to perform an update on XYZ immediately after Client B. The update
will fail as Client A's CAS value is out of date. When Client B updated XYZ, the CAS
value changed.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

For drag and drop UI driven ETL there is a connector for Talend:
http://www.couchbase.com/couchbase-server/connectors/talend
For simple reporting operations pentaho is really easy to use and is completely UI driven:
http://www.pentaho.com
With elastic search kibana can also be utilized for some really powerful reporting:
http://www.elasticsearch.org
For deployment we work with Cloudsoft (brooklyn):
http://www.cloudsoftcorp.com/partner/brooklyn/
We also work with CumuLogic:
http://www.cumulogic.com
Both of these platforms work well for deployment and scalability. We have a library for Puppet for use with Vagrant:
https://github.com/couchbaselabs/vagrants
There's an excellent cookbook for chef available in the Supermarket:
https://community.opscode.com/cookbooks/couchbase/versions/1.1.0
The Console allows you to do all management features on this system. These can also be done from the command line
or the REST AP
Additional Tools
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

End Day 1
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Couchbase 3.0.2 d1

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Couchbase 3.0.2 d1

Ähnlich wie Couchbase 3.0.2 d1 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Couchbase 3.0.2 d1