SlideShare ist ein Scribd-Unternehmen logo
1 von 150
Downloaden Sie, um offline zu lesen
Couchbase
Sachin Kansal
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Introduction to NoSQL
• Need & options of NoSQL Solution
• Getting Started with Couchbase
• Administration of Couchbase
• Considerations
• Best Practices
• Case Study
• New Features
• Assessment
Agenda
sachinkkansal@gmail.com 2
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
INTRODUCTIONS
sachinkkansal@gmail.com 3
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Database Architect & Consultant
• 15 years IT experience
• Well versed with RDBMS solution products - SQL Server, Oracle, DB2.
• NoSQL Enthusiast & follower.
• Reachable at sachinkkansal@gmail.com
• https://couchbaseblog.wordpress.com
Sachin Kansal
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Quick Check of NoSQL technology
Discussion & simple assessment
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Is it free ?
• Is it secure ?
• Is it only buzz word or any one is using it really ?
• Which technology it supports ?
• Can it manage my current work setup ?
• Does NoSQL requires separate h/w to work ?
• Is it required…actually ?
• Any others …?
Discussion
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Introduction On NoSQL :
What It Is And Why You Need It
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
History :
• The term NoSQL was coined by Carlo Strozzi in the year 1998.
• He used this term to name his Open Source, Light Weight, DataBase which did not have an SQL interface.
• In the early 2009, when last.fm wanted to organize an event on open-source distributed databases, Eric Evans, a
Rackspace employee, reused the term to refer databases which are non-relational, distributed, and does not conform
to atomicity, consistency, isolation, durability - four obvious features of traditional relational database systems.
• In the same year, the "no:sql(east)" conference held in Atlanta, USA, NoSQL was discussed and debated a lot.
• And then, discussion and practice of NoSQL got a momentum, and NoSQL saw an unprecedented growth.
• NoSQL is a non-relational database management systems, different from traditional relational
database management systems in some significant ways.
• It is designed for distributed data stores where very large scale of data storing needs (for example
Google or Facebook which collects terabits of data every day for their users).
• These type of data storing may not require fixed schema, avoid join operations and typically scale
horizontally.
NoSQL
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
NoSQL – Not Only SQL
Evolution of NoSQL
• RDBMS ruled 3 decades
• Emerging in the 1970s and early 1980s, relational databases offered a searchable
mechanism for persisting complex data with minimal use of storage space.
• Optimized to run on single machines.
• schema-based approach to modeling data.
• Expensive H/w
• Dramatic changes in usage & lower cost.
• Result : Increased complexity to the application and database design and often
resulted in inferior performance.
What It Is And Why You Need It
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Dimensions that mattered
• The number of concurrent users skyrocketed as applications increasingly became accessible via the web (and later
on mobile devices).
• The amount of data collected and processed soared as it became easier and increasingly valuable to capture all
kinds of data.
• The amount of unstructured or semi-structured data exploded and its use became integral to the value and
richness of applications.
• Google, Amazon, Facebook, and LinkedIn were among the first companies to discover the serious limitations
of relational database technology for supporting these new application requirements.
• Commercial alternatives didn’t exist, so they invented new data management approaches themselves.
• Open source NoSQL database projects formed to leverage the work of the pioneers, and commercial companies
associated with these projects soon followed
What It Is And Why You Need It
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Four interrelated megatrends are driving the adoption of NoSQL technology.
• Big Users
• The Internet of Things.
• Big Data
• Cloud
Why do you need it ?
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Big users
Support global users 24 X 7 yearly
• A newly launched app can go viral, growing from
zero to a million users overnight — literally.
• Some users are active frequently, while others use
an app a few times, never to return.
• Seasonal swings like those around festivals /
holidays create spikes for short periods.
• New product releases or promotions can spawn
dramatically higher application usage
• The large numbers of users combined with the
dynamic nature of usage patterns is driving the
need for more easily scalable database technology.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
The amount of machine-generated data is increasing with the proliferation of digital
telemetry.
• There are 14 billion things connected to the Internet.
• They’re in factories, farms, hospitals, and warehouses.
• They’re in homes: appliances, gaming consoles, and more.
• They’re cars.
• They’re mobile phones and tablets.
• They receive environment, location, movement, temperature, weather data, and more
from 50 billion sensors.
Estimation
• By 2020, 32 billion things will be connected to the Internet.
• By 2020, 10% of data will be generated by embedded systems.
• By 2020, 20% of target rich data will be generated by embedded systems.
The Internet of Things
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Big Data
Data is becoming easier to
capture and access through third
parties such as Facebook.
Personal user information,
geolocation data, social graphs,
user-generated
content, machine logging data,
and sensor-generated data are
just a few examples of the ever-
expanding array of data being
captured.
Example: Flight Information
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
SaaS / PaaS
Cloud
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Better application development productivity through a more fexible data model;
• Greater ability to scale dynamically to support more users and data;
• Improved performance to satisfy expectations of users wanting highly responsive
applications and to allow more complex processing of data.
• To address 3 V’s , CAP and over come RDBMS limitations.
Need for NoSQL
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
RDBMS NoSQL
Structured and organized data Stands for Not Only SQL
Structured query language (SQL) No declarative query language
Data and its relationships are stored in separate tables. No predefined schema
Data Manipulation Language, Data Definition Language
KeyValue pair storage, Column Store, Document
Store, Graph databases
Tight Consistency Eventual consistency rather ACID property
ACID Unstructured and unpredictable data
CAP Theorem / BASE
Prioritizes high performance, high availability and
scalability
RDBMS vs NoSQL
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Relational and NoSQL data models are very different.
• The relational model takes data and separates it into many interrelated tables.
• Each table contains rows and columns where a row might contain lots of information
about a person and each column might contain a value for a specific attribute
associated with that person, like their age.
• Tables reference each other through foreign keys that are stored in columns as well.
• The relational model minimizes the amount of storage space required, because each
piece of data is only stored in one place
• NoSQL Data Model
Data Model - RDBMS
sachinkkansal@gmail.com 18
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
NoSQL databases have a very different model.
For example, a document oriented NoSQL database takes the data you want to store
and aggregates it into documents using the JSON format.
Each JSON document can be thought of as an object to be used by your application.
A JSON document might, for example, take all the data stored in a row that spans 20
tables of a relational database and aggregate it into a single document/object.
Aggregating this information may lead to duplication of information, but since storage is
no longer cost prohibitive, the resulting data model flexibility, ease of efficiently
distributing the resulting documents and read and write performance improvements make
it an easy trade-o for web-based applications.
Data Model – NoSQL
sachinkkansal@gmail.com 19
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Data Model Comparison
sachinkkansal@gmail.com 20
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
In 2000, Eric Brewer, a computer scientist from the University of California, Berkeley,
proposed the following conjecture:
• Consistency: All components of the system see the same data.
• This means that the data in the database remains consistent after the execution of an operation.
For example after an update operation all clients see the same data.
• Availability: All requests to the system receive a response, whether success or failure.
• This means that the system is always on (service guarantee availability), no downtime
• Partition tolerance: The system continues to function even if some components fail or some message
traffic is lost.
• This means that the system continues to function even the communication among the servers is
unreliable, i.e. the servers may be partitioned into multiple groups that cannot communicate with
one another.
CAP Theorem
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• In theoretically it is impossible to fulfill all 3 requirements.
• CAP provides the basic requirements for a distributed system to follow 2 of the 3
requirements.
• Therefore all the current NoSQL database follow the different combinations of the C,
A, P from the CAP theorem.
Here is the brief description of three combinations CA, CP, AP :
• CA - Single site cluster, therefore all nodes are always in contact. When a partition
occurs, the system blocks.
• CP - Some data may not be accessible, but the rest is still consistent/accurate.
• AP - System is still available under partitioning, but some of the data returned may be
inaccurate.
CAP 2
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Current situation
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
The BASE acronym was defined by Eric Brewer (CAP person)
A BASE system gives up on consistency.
• Basically Available indicates that the system does guarantee availability, in terms of
the CAP theorem.
• Soft state indicates that the state of the system may change over time, even without
input. This is because of the eventual consistency model.
• Eventual consistency indicates that the system will become consistent over time,
given that the system doesn't receive input during that time.
The BASE
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
There are four general types (most common categories) of NoSQL databases.
(There is not a single solutions which is better than all the others, however there are some databases that are better to
solve specific problems)
• Key-value stores
• Column-oriented
• Graph
• Document oriented
NoSQL Categories / Models
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
sachinkkansal@gmail.com 26
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
sachinkkansal@gmail.com 27
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Key-value stores are most basic types of NoSQL databases.
• Designed to handle huge amounts of data.
• Based on Amazon’s Dynamo paper.
• Key value stores allow developer to store schema-less data.
• In the key-value storage, database stores data as hash table where each key is unique and the value can be string,
JSON, BLOB (basic large object) etc.
• A key may be strings, hashes, lists, sets, sorted sets and values are stored against these keys.
• For example a key-value pair might consist of a key like "Name" that is associated with a value like "Robin".
• Key-Value stores can be used as collections, dictionaries, associative arrays etc.
• Key-Value stores follows the 'Availability' and 'Partition' aspects of CAP theorem.
• Key-Values stores would work well for shopping cart contents, or individual values like color schemes, a landing
page URI, or a default account number.
Key-value stores
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Key-value stores
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Column-oriented databases primarily work on columns and every column is treated individually.
• Values of a single column are stored contiguously.
• Column stores data in column specific files.
• In Column stores, query processors work on columns too.
• All data within each column datafile have the same type which makes it ideal for compression.
• Column stores can improve the performance of queries as it can access specific column data.
• High performance on aggregation queries (e.g. COUNT, SUM, AVG, MIN, MAX).
• Works on data warehouses and business intelligence, customer relationship management (CRM), Library card
catalogs etc.
• Ex: BigTable, Cassandra, SimpleDB etc
Column-oriented
sachinkkansal@gmail.com 30
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Column-oriented
sachinkkansal@gmail.com 31
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• A graph database stores data in a graph.
• It is capable of elegantly representing any kind of data in a highly accessible way.
• A graph database is a collection of nodes and edges
• Each node represents an entity (such as a student or business) and each edge
represents a connection or relationship between two nodes.
• Every node and edge is defined by a unique identifier.
• Each node knows its adjacent nodes.
• As the number of nodes increases, the cost of a local step (or hop) remains the same.
• Index for lookups.
Graph databases
sachinkkansal@gmail.com 32
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Graph databases
sachinkkansal@gmail.com 33
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
sachinkkansal@gmail.com 34
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Comparison Sum-UP
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Scenarios: Where Nosql can be used
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Good Fit
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Good Fit
sachinkkansal@gmail.com 38
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
sachinkkansal@gmail.com 39
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
1. Bigness
2. Massive write performance
3. Fast key-value access
4. Flexible schema and flexible datatypes
5. Schema migration
6. Write availability
7. Easier maintainability, administration and operations
8. No single point of failure
9. Generally available parallel computing
10. Programmer ease of use
11. Use the right data model for the right problem
12. Avoid hitting the wall
13. Distributed systems support
14. Tunable CAP tradeoffs
Specific Use Cases  Specific use cases.docx
General Use Cases
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Flavours of NoSQL
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Currently there are 150 (OMG !!) Flavors in the market
(Complete list @http://nosql-database.org/)
Flavors
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
The Storage Architecture
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Vertical Scaling vs Horizontal Scaling
• Example: Residential tower & Expressway
Storage Architecture
Vertical Horizontal
• Can essentially resize your server with no
change to your code.
• It is the ability to increase the capacity of
existing hardware or software by adding
resources.
• Limited by the fact that you can only get as big
as the size of the server.
• Affords the ability to scale wider to deal with
traffic.
• It is the ability to connect multiple hardware or
software entities, such as servers, so that they
work as a single logical unit.
• This kind of scale takes time & effort to design
& implement.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Getting Started With Couchbase Server
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Evolution from memcached
sachinkkansal@gmail.com 46
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Couchbase Server Core Principles
sachinkkansal@gmail.com 47
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Push-button elasticity
• Add or remove multiple servers simultaneously with the push of a button
• Efficient data rebalancing without requiring application changes
Zero-downtime maintenance
• Add or remove servers, upgrade software in and perform any maintenance tasks in a live cluster
• No application downtime required
• No application performance degradation
Data replication with auto-failover
• Maintain multiple copies of your data within the cluster for high-availability
• User configurable replication count
• User configurable failover policy to ensure data availability in the face of hardware failure
Key Couchbase Server characteristics and capabilities
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Enterprise class monitoring and administration
• Deeply instrumented monitoring with rich administration GUI
• Dynamic system monitoring charts
• Backup and restore capability
• RESTful management API
• Easy interface to external monitoring and management systems
• Easy to automate deployment to the cloud
Couchbase Server is simple, fast, elastic, and reliable
Key Couchbase Server characteristics and capabilities
sachinkkansal@gmail.com 49
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Simple. Everything about Couchbase Server is easy: getting, installing, managing, expanding and
using it. As a document database, there is no need to create and manage schemas; and never a need
to normalize, shard or tune the database. Build applications faster, keep them running reliably and
easily adapt them to changing business requirements.
Fast. Couchbase Server is screamingly, predictably fast. It is the lowest latency, highest throughput
NoSQL database technology available. Read and write data with consistently low latency and
sustained high throughput across the scaling spectrum. Get the performance you need at lower cost
Elastic. By automatically distributing data and I/O across commodity servers or virtual machines,
Couchbase Server makes it easy to match the optimal quantity of resources to the changing needs of
an application. Quickly grow a cluster from 1 node to 25 nodes to 100 nodes or shrink a cluster to
sustain application performance, while precisely matching cost to demand.
Key Couchbase Server characteristics and capabilities
sachinkkansal@gmail.com 50
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Couchbase Server, originally known as Membase, is an open source, distributed
(shared-nothing architecture).
Recent release : 3.0.2 – 15 Dec 2014
Written in C++, Erlang, C
History of CouchBase
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Couchbase Server comes in two different editions:
• Enterprise Edition (EE)
• latest stable version of Couchbase, which includes all the bugfixes and has passed a rigorous QA
process.
• It is free for use with any number of nodes for testing and development purposes, and with up to 2
nodes for production.
• Purchase required for an annual support plan with this edition.
• Community Edition (CE)
• lags behind the EE by about one release cycle and does not include all the latest fixes or
commercial support.
• Is open source and entirely free for use in testing and in production (for braves only though)
• This edition is largely meant for enthusiasts and non-critical systems.
Selecting a Couchbase Server Edition
sachinkkansal@gmail.com 52
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Recommended
• Quad-core for key-value store, 64-bit CPU running at 3GHz.
• Six cores if XDCR (Cross Data Center Replication) and views are used.
• 16GB RAM (physical).
• Block-based storage device (hard disk, SSD, EBS, iSCSI). Network filesystems such as CIFS and
NFS are not supported.
Minimum specification
• Dual-core CPU running at 2GHz for key-value store.
• 4GB RAM (physical).
Storage requirements
1GB for application logging.
At least twice the disk space to match the physical RAM for persistence of information.
Resource requirements
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Platform Version 32 / 64 bit Supported Recommended
Red Hat Enterprise Linux 5 64 bit Developer and Production RHEL 5.8
Red Hat Enterprise Linux 6 64 bit Developer and Production RHEL 6.3
CentOS 5 64 bit Developer and Production CentOS 5.8
CentOS 6 64 bit Developer and Production CentOS 6.3
Amazon Linux 2013.03 64 bit Developer and Production
Ubuntu Linux 10.04 64 bit Developer and Production
Ubuntu Linux 12.04 64 bit Developer and Production Ubuntu 12.04
Debian Linux 7 64 bit Developer and Production Debian 7.0
Windows 2012 R2 SP1 64 bit Developer and Production
Windows 2008 R2 with SP1 64 bit Developer and Production Windows 2008
Windows 8 32 and 64 bit Developer only
Windows 7 32 and 64 bit Developer only
Mac OS 10.7 64 bit Developer only
Mac OS 10.8 64 bit Developer only Mac OS 10.8
Supported platforms
sachinkkansal@gmail.com 54
Couchbase clusters on mixed platforms are not supported.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Couchbase Server uses specific network ports for communication between server components
and with the clients accessing the data stored in the Couchbase cluster.
• The listed ports must be available on the host for Couchbase Server to run and operate correctly.
• Couchbase Server configures these ports automatically, but you must verify that the firewall and IP
tables configuration allow communication on the specified ports for each usage type.
• Ports used for different types of communication with Couchbase Server:
1. Node to node These ports are used by Couchbase Server for communication between all nodes within the
cluster. These ports must be open to enable nodes to communicate with each other.
2. Node to client These ports are used by Couchbase Server for communication between all nodes within the
cluster. These ports must be open to enable nodes to communicate with each other.
3. Cluster administration These ports are used for Couchbase administration including the REST API, command-
line clients, and web browsers.
4. XDCR These ports are used for XDCR (Cross Data Center Replication) communication between all nodes in both
the source and destination clusters.
Network Ports
sachinkkansal@gmail.com 55
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Port Description Node to node Node to client Cluster admin XDCR v1 XDCR v2
8091 Web Administration Port Yes Yes Yes Yes Yes
8092 Couchbase API Port Yes Yes No Yes Yes
11207 Internal/External Bucket Port for SSL No Yes No No No
11209 Internal Bucket Port Yes No No No No
11210 Internal/External Bucket Port Yes Yes No No Yes
11211 Client interface (proxy) No Yes No No No
11214 Incoming SSL Proxy No No No No Yes
11215 Internal Outgoing SSL Proxy No No No No Yes
18091 Internal REST HTTPS for SSL No Yes Yes No Yes
18092 Internal CAPI HTTPS for SSL No Yes No No Yes
4369 Erlang Port Mapper ( epmd ) Yes No No No No
21100 to 21299 (inclusive) Node data exchange
Network ports
sachinkkansal@gmail.com 56
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Port Used by
Port 8091 Used by the Couchbase Web Console for REST/HTTP traffic.
Port 8092 Used to access views, run queries, and update design documents.
Port 11207 Used by smart client libraries to access data nodes using SSL.
Port 11210 Used by smart client libraries or Moxi to directly connect to the data nodes.
Port 11211 Used by pre-existing Couchbase and memcached (non-smart) client libraries.
Ports 11214 and 11215 Used for SSL XDCR data encryption.
Port 18091 Used by the Couchbase Web Console for REST/HTTP traffic with SSL.
Port 18092 Used to access views, run queries, and update design documents with SSL.
All other Ports Used for other Couchbase Server communications.
Network Ports
sachinkkansal@gmail.com 57
• Port 11213 is an internal ports used on the local host for memcached and compaction.
• The node is not used for communication between nodes in a cluster.
• For firewall purposes, you do not need to take port 11213 into consideration. However, if a service is listening on this port,
Couchbase Server does not start correctly.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Implement the same operating system on all machines within each discreet cluster.
• Mixed clusters and mixed XDCR deployments are not supported due to incompatibility
caused by differences in the number of shards between platforms.
Deployment Consideration
sachinkkansal@gmail.com 58
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Installation & Configuration
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• We will be using Windows in our session
• Install the package choices :
• Thru Wizard
• Unattended / Silent
• No anti-virus software running
• Administrator privileges
• Couchbase Server uses the Microsoft C++ redistributable package, which is automatically
downloaded during installation.
• If already being used, installation can fail. Close application using it prior.
Installation - Windows
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• HO  Start Installing Couchbase server 3.0.2 by running the Installer package.
• Take the snapshot of each screen to explore the relevance of parameters after installation.
• If the Windows installer hangs on the Computing Space Requirements screen, there is an issue
with setup or installation environment, such as other running applications.
• Stop any other browsers and applications that were running when you started installing the Couchbase Server.
• Kill the installation process and uninstall the failed setup.
• Delete or rename the temp location under C:Users[logonuser]AppDataTemp
• Reboot and try again.
Installation - Wizard
sachinkkansal@gmail.com 61
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
An unattended installation uses a script to install Couchbase Server.
Steps:
1. Record your installation settings in the wizard installation. These settings are saved to a file, which
is used to silently install other nodes of the same version.
 Open a Command Terminal or Power and start the installation executable with the /r command-line
option: couchbase_server_version.exe /r /f1your_file_name.iss
 Provide your installation options when prompted. The wizard completes the server installation and
provides a file with your recorded options at C:Windowsyour_file_name.iss. (Accept an increase in MaxUserPort )
2. Copy the your_file_name.iss file into the same directory as the installer executable.
 Run the installer from the command-line using the /s option:
> couchbase_server_version.exe /s -f1your_file_name.iss
3. To repeat this process on multiple machines, copy the installation package and the your_file_name.iss file to the
same directory on each machine.
Installation - Unattended
sachinkkansal@gmail.com 62
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Un-Install
• To uninstall Couchbase Server on a Windows system, you must have Administrator or
Power User privileges.
• Go to Control panel and remove from add/remove program options
Upgrade
• The installation wizard will upgrade your server installation using the same installation
location.
Uninstall & Upgrade
sachinkkansal@gmail.com 63
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Configuring Couchbase Server
sachinkkansal@gmail.com 64
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Configuring Couchbase Server
• Open the administration web console and configure the server.
• In browser: http://<server>:8091 <server> is the machine on which you’ve installed
Couchbase.
• Then after there are simple multiple screens to configure the server, however each
options are of high significance which we will be exploring in coming slides.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
sachinkkansal@gmail.com 66
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Configuring Couchbase Server, step 1
sachinkkansal@gmail.com 67
• The Databases Path field is the location where Couchbase will store its
persisted data.
• The Indices Path field is where Couchbase will keep the indices created by
views.
• Both locations refer only to the current server node.
• Placing the index data on a different physical disk than the document data is
likely to result in better performance, especially if you will be using many
views or creating views on the fly.
• In a Couchbase cluster, every node must have the same amount of RAM
allocated.
• The RAM quota you set when starting a new cluster will be inherited by every
node that joins the cluster in the future. It is possible to change the server
RAM quota later through the command-line administration tools.
• Used on demand & normally set to 60% of total RAM
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Configuring Couchbase Server, step 2
sachinkkansal@gmail.com 68
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Configuring Couchbase Server, step 3
sachinkkansal@gmail.com 69
• Memcached bucket type will hide the unsupported configuration options, such
as replicas and read-write concurrency.
• The memory size is the amount of RAM that will be allocated for this bucket on
every node in the cluster.
• This is the amount of RAM that will be allocated on every node, not the
total amount that will be split between all nodes.
• Couchbase buckets can replicate data across multiple nodes in the cluster.
With replication enabled, all data will be copied up to three times to different
nodes. If a node fails, Couchbase will make one of the replica copies available
for use.
• number of replicas setting refers to copies of data. For example, setting it to 3
will result in a total of four instances of your data in the cluster, which also
requires a minimum of four nodes.
• Enabling index replication - This has the effect of increasing traffic between
nodes, but also means that the indices will not need to be rebuilt in the event of
node failure
• The disk read-write concurrency setting controls the number of threads that
will perform disk IO operations for this bucket
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Configuring Couchbase Server, step 4
sachinkkansal@gmail.com 70
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Configuring Couchbase Server, step 5
sachinkkansal@gmail.com 71
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
All set Screen
sachinkkansal@gmail.com 72
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Couchbase Architecture
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Database Architecture
sachinkkansal@gmail.com 74
• Each Couchbase Server node
has two major components:
the Cluster Manager and the
Data Manager
• Applications use the Client
Software Development Kits
(SDKs) to communicate with
both of these components.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Couchbase Server
sachinkkansal@gmail.com 75
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• A Couchbase Server cluster consists of between 1 and 1024 nodes, with each node
running exactly one instance of the Couchbase Server software.
• The data is partitioned and distributed between the nodes in the cluster.
• This means that each node holds some of the data and is responsible for some of the
storing and processing load.
• Distributing data this way is often referred to as sharding, with each partition referred
to as a shard
CouchBase Server
sachinkkansal@gmail.com 76
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Data Manager
sachinkkansal@gmail.com 77
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Data Manager. The data manager does the work of storing and retrieving data in
response to data operation requests from applications.
• It exposes two “memcapable” ports to the network – one port supports non-vBucket-
aware memcached client libraries (pre-memcapable 2.0 API), which are proxied if
required.
• The other port expects to communicate with vBucket-aware clients (memcapable 2.0+
API). The majority of code in the Data Manager is C and C++.
Data Manager
sachinkkansal@gmail.com 78
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
The Couchbase Server data manager listens for requests on two TCP ports - the port
numbers are configurable
Port 11211 – The traditional memcached port number processes requests from clients
supporting version 1.0 of the memcapable API specification. These clients rely on a
consistent hashing algorithm to map keys directly to servers in a variable-length server
list. Most memcached clients today support memcapable 1.0, though memcapable 2.0
clients for the most popular platforms are being introduced (e.g., spymemcached for
Java, enyim for .NET, fauna for Ruby, libmemcached for C and other languages that wrap
this client library).
• Port 11210 – a port directly accessible to clients implementing version 2.0 of the
memcapable API. These clients are “vBucket aware,” using a hashing algorithm to map
keys to one of a fixed number of “vBuckets” (
TCP Ports for Data Manager
sachinkkansal@gmail.com 79
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Cluster Manager
sachinkkansal@gmail.com 80
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
The cluster manager supervises the configuration and behavior of all nodes in a
Couchbase Server cluster. Cluster management code runs on every node in the cluster,
but one node (the one holding a global singleton) is elected to perform aggregation,
consensus building and cross-node control decisions at any point in time.
• The Couchbase Server cluster manager monitors health and coordinates data
manager behavior on each node
• configures and supervises inter-node behavior (e.g. replication streams and
rebalancing operations)
• Provides aggregation and consensus functions for the cluster (e.g. global singleton
election)
• Provides a RESTful cluster management API.
• The cluster manager is build atop Erlang/OTP, a proven environment for building and
operating robust fault-tolerant distributed applications
Cluster Manager
sachinkkansal@gmail.com 81
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
There are four primary subsystems that operate on each node.
1. Heartbeat. A watchdog process periodically communicates with the currently elected
cluster leader (the node with the global singleton) to provide Couchbase Server
health updates.
2. Process monitor. This subsystem monitors execution of the local data manager,
restarting failed processes as required and contributing status information to the
heartbeat module.
3. Configuration Manager. Each Couchbase Server node has a configuration – a
vBucket map, active replication streams, a target rebalance map, etc. The
configuration manager receives, processes and monitors local configuration, in
concert with a cluster-wide configuration distribution system.
4. Global Singleton Supervisor. In a Couchbase Server cluster, one node is elected
leader. If the leader dies, a new leader is elected. The Global Singleton Supervisor is
responsible for electing a cluster leader and supervising “per-cluster” processes if the
local node is the current leader.
Per node configuration management and monitoring functions
sachinkkansal@gmail.com 82
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
In addition to the per-node functions which are always executing at each node in a
Couchbase Server cluster, there are a set of functions which active only on one node in
the cluster at any point in time.
Possession of a global singleton data structure indicates to a node that it should execute
these functions:
1. Rebalance Orchestrator. The rebalance orchestrator calculates, distributes and provides cluster-wide supervision
of a rebalance operation. When a rebalance operation is initiated, it calculates a target vBucket map based on the
current pending set of servers to be added and removed from the cluster; distributes commands to individual nodes
to build a network of vBucket migration streams; and monitors migration completion events, updating and
distributing the current vBucket map as migrations complete
2. Node Health Monitor. The node health monitor (also known as The Doctor) receives heartbeat updates from
individual nodes in the cluster, updating configuration and raising alerts as required.
1. vBucket state and replication manager. Responsible for establishing and monitoring the current network of
replication streams
Per cluster functions
sachinkkansal@gmail.com 83
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Interfacing And Interacting
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Data flow in a Couchbase Server environment
sachinkkansal@gmail.com 85
Between application and Couchbase Server
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Data Flow - Within the Couchbase Server cluster
sachinkkansal@gmail.com 86
1. The set arrives into the Couchbase Server listener-
receiver.
2. Couchbase Server immediately replicates the data
to replica servers – the number of replica copies is
user defined. Upon arrival at replica servers, the
data is persisted.
3. The data is cached in main memory.
4. The data is queued for persistence and de-
duplicated if a write is already pending. Once the
pending write is pulled from the queue, the value is
retrieved from cache and written to disk (or SSD).
5. Set acknowledgment return to application
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Metadata and Documents
sachinkkansal@gmail.com 87
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Retrieval Operations
sachinkkansal@gmail.com 88
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Storage Operations
sachinkkansal@gmail.com 89
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Consistency
sachinkkansal@gmail.com 90
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Ejection, NRU, Cache Miss
sachinkkansal@gmail.com 91
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Clients Connect Directly to Couchbase Nodes
sachinkkansal@gmail.com 92
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Key Hash-Partitioning
Application Servers
MAP
MAP
MAP
1024
Partitions
8 GB RAM
3 IO Workers
ClientHashFunc6on("ABCXYZ@couchbase.com") => Par66on[0..1023] {25}
ClusterMap[P(25)] => [x.x.x.x] => IP of Server Responsible for Par66on 25
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Horizontal Scale-Rebalance
sachinkkansal@gmail.com 94
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
All Metadata for All
Documents
(64 bytes + Key Length)
Document Values (NRU
Ejected if RAM Quota
Used > 90%)
Also Leave RAM For OS:
[Filesystem Cache >> Views]
Document Indexing
Monitoring
XDCR
Recommended:
minimum 4 Cores
+ 1 core per design document
+ 1 core per XDCR replicated
bucket
Persisted Documents
All Indexes for Design
Documents/Views
Append-‐Only Disk Format
& CompacUon
Performance: MulUple
EBS Volumes High
IOPS Raid 0 on Amazon
RAM, CPU and IO Guidelines
RAM CPU Disk IO
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Key Concepts
Architecture – Building Blocks
sachinkkansal@gmail.com 96
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Architecture – Building Blocks
Key Concepts
Node Cluster Cluster Manager Caching layer
vbuckets Buckets RAM quotas Tunable memory
Disk storage Shared thread pool Disk I/O priority TAP
Expiration Server warmup Replicas and
replication
Database Change
Protocol
Proxy (Moxi)
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Couchbase Server can be used either in a standalone configuration, or in a cluster
configuration where multiple Couchbase Servers are connected together to provide a
single, distributed, data store.
Couchbase Server or node
• A single instance of the Couchbase Server software running on a machine, whether a physical machine, virtual
machine or other environment.
• All instances of Couchbase Server are identical, provide the same functionality, interfaces and systems, and
consist of the same components.
• All nodes within Couchbase Server are created equally.
• No Master :There is no hierarchy or topology, and no single node is a ‘master’ of the rest of the cluster.
• Each node is responsible only for the data it stores and the requests made to it by clients.
• Range 1 and 1024 nodes
Node
sachinkkansal@gmail.com 98
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Cluster
• A cluster is a collection of one or more instances of Couchbase Server that are configured as a logical
cluster.
• All nodes within the cluster are identical and provide the same functionality.
• Each node is capable of managing the cluster and each node can provide aggregate statistics and
operational information about the cluster.
• User data is stored across the entire cluster through the vBucket system.
• Clusters operate in a completely horizontal fashion.
• To increase the size of a cluster, add another node.
• There are no parent/child relationships or hierarchical structures involved. This means that Couchbase
Server scales linearly, both in terms of increasing the storage capacity and performance and scalability.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• The Cluster Manager is responsible for node and cluster management. Every node within a
Couchbase cluster includes the Cluster Manager component.
• Access to the Cluster Manager is provided through the administration interface on a dedicated
network port and through dedicated network ports for client access.
• Additional ports are configured for inter-node communication.
• The data is partitioned and distributed between the nodes in the cluster.
• Distributing data this way is often referred to as sharding, with each partition referred to as a shard
The Cluster Manager is responsible for the following within a cluster:
• Cluster management
• Node administration
• Node monitoring
• Statistics gathering and aggregation
• Run-time logging
• Multi-tenancy
• Security for administrative and client access
• Client proxy service to redirect requests
Cluster Manager
sachinkkansal@gmail.com 100
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• The Rack Awareness feature permits logical groupings of servers on a cluster where each server group physically
belongs to a rack or Availability Zone.
• To use and enable Rack Awareness, all servers in a cluster must be upgraded to Couchbase Server Enterprise
Edition and minimally, version 2.5.
• By design, Couchbase Server evenly distributes data of active and replica vBuckets across the cluster for cluster
performance and redundancy purposes.
• With Rack Awareness, server partitions are laid out so the replica partitions for servers in one server group are
distributed in servers for a second group and vice versa.
• If one of the servers becomes unavailable or if an entire rack goes down, data is retained since the replicas are
available on the second server group.
• Replica vBuckets are evenly distributed from one server group to another server group to provide redundancy and
data availability.
• The rebalance operation also evenly distributes the replica vBuckets from one server group to another server group
across the cluster. If an imbalance occurs where there is an unequal number of servers in one server group, the
rebalance operation performs a "best effort" of evenly distributing the replica vBuckets across the cluster.
Rack Awareness
sachinkkansal@gmail.com 101
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Distribution of vBuckets and replica vBuckets
sachinkkansal@gmail.com 102
Distribution with additional server Distribution with unavailable server
• If the cluster becomes imbalanced, add servers to balance the cluster. For optimal Rack Awareness functionality, a
balanced cluster is recommended.
• If there is only one server or only one server group, default behavior is automatically implemented, that is, Rack
Awareness functionality is disabled.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Couchbase Server provides data management services using buckets.
• Buckets are isolated virtual containers for data.
• A bucket is a logical grouping of physical resources within a cluster of Couchbase
Servers.
• Buckets provide a secure mechanism for organizing, managing, and analyzing data
storage resources. Two types of data buckets, memcached and couchbase, enable
you to store data either in-memory only or both in-memory and on disk (for added
reliability). During Couchbase Server set up, the type of bucket that you need for your
implementation is selected.
• Buckets can be used by multiple client applications across a cluster.
• Similar to databases in Microsoft SQL Server, or to schemas in Oracle.
• Typically, you would have separate buckets for separate applications.
• Couchbase supports two kinds of buckets: Couchbase and memcached.
Buckets
sachinkkansal@gmail.com 103
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Buckets Comparison
sachinkkansal@gmail.com 104
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Data is cached in memory and persisted to disk and can be dynamically rebalanced
between nodes in a cluster to distribute the load.
• Couchbase buckets can be configured to maintain between one and three replica
copies of the data, which provides redundancy in the event of node failure. Because
each copy must reside on a different node, replication requires at least one node per
replica, plus one for the active instance of data.
• Couchbase-type buckets provide a highly-available and dynamically reconfigurable
distributed data store, survive node failures, and allow cluster reconfiguration while
continuing to service requests. Couchbase-type buckets provide the following core
capabilities:
sachinkkansal@gmail.com 105
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Buckets
sachinkkansal@gmail.com 106
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Default bucket
• The default bucket is a Couchbase bucket that always resides on port 11211 and is a
non-SASL authenticating bucket.
• When Couchbase Server is first installed this bucket is automatically set up during
installation.
• This bucket can be removed after installation and can also be re-added later, but
when re-adding a bucket named “default”, the bucket must be place on port 11211 and
must be a non-SASL authenticating bucket.
• A bucket not named default cannot reside on port 11211 if it is a non-SASL bucket.
• The default bucket can be reached with a vBucket aware smart client, an ASCII client
or a binary client that doesn’t use SASL authentication.
bucket interface types – 1/3
sachinkkansal@gmail.com 107
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Non-SASL buckets
Non-SASL buckets can be placed on any available port with the exception of port 11211 if
the bucket is not named “default”. Only one Non-SASL bucket can placed on any
individual port. These buckets can be reached with a vBucket aware smart client, an
ASCII client or a binary client that doesn’t use SASL authentication.
SASL buckets
• SASL authenticating Couchbase buckets can only be placed on port 11211 and each
bucket is differentiated by its name and password. S
• ASL bucket cannot be placed on any other port beside 11211.
• These buckets can be reached with either a vBucket aware smart client or a binary
client that has SASL support. These buckets cannot be reached with ASCII clients.
bucket interface types – 2/3/3
sachinkkansal@gmail.com 108
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
GENERIC NOTE
• Smart clients discover changes in the cluster using the Couchbase Management REST API.
• Buckets can be used to isolate individual applications to provide multi-tenancy or to isolate data
types in the cache to enhance performance and visibility.
• Couchbase Server permits you to configure different ports to access different buckets, and
provides the option to access isolated buckets using either the binary protocol with SASL
authentication or the ASCII protocol with no authentication
• Quotas for RAM and disk usage are configurable per bucket so that resource usage can be
managed across the cluster.
bucket
sachinkkansal@gmail.com 109
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• A vBucket is defined as the owner of a subset of the key space of a Couchbase
cluster. These vBuckets are used to distributed information effectively across a cluster.
• The vBucket system is used both for distributing data and for supporting replicas
(copies of bucket data) on more than one node.
• vBuckets are not a user-accessible component, but they are a critical component of
Couchbase Server and are vital to the availability support
• Every document ID belongs to a vBucket.
vBuckets
sachinkkansal@gmail.com 110
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Documents in a bucket are further subdivided into virtual buckets (vBuckets) by their key.
• Each vBucket owns a subset of all the possible keys, and documents are mapped to vBuckets
according to a hash of their key.
• Every vBucket,in turn, belongs to one of the nodes of the cluster.
• When a client needs to access a document, it first hashes the document key to find out which
vBucket owns that key.
• The client then checks the cluster map to find which node hosts the relevant vBucket.
• Lastly, the client connects directly to the node that stores the document to perform the get
operation.
Vbuckets - Functioning
sachinkkansal@gmail.com 111
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
vBucket
sachinkkansal@gmail.com 112
• The client first hashes the key to calculate the vBucket
which owns KEY. In this example, the hash resolves to
vBucket 8 (vB8).
• By examining the vBucket map, the client determines
Server C hosts vB8.
• The client sends the GET operation directly to Server C
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• This architecture permits Couchbase Server to cope with changes without using the
typical RDBMS sharding method.
• In addition, the architecture differs from the method used by memcached, which uses
client-side key hashes to determine the server from a defined list.
• The memcached method requires active management of the list of servers and
specific hashing algorithms such as Ketama to cope with changes to the topology.
vBucket
sachinkkansal@gmail.com 113
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
RAM is allocated to Couchbase Server in 2 configurable quantities: Server Quota and
Bucket Quota.
Server quota
• The Server Quota is the RAM that is allocated to the server when Couchbase Server
is first installed.
• This sets the limit of RAM allocated by Couchbase for caching data for all buckets and
is configured on a per-node basis.
• The Server Quota is initially configured in the first server in your cluster is configured,
and the quota is identical on all nodes.
• Example: if you have 10 nodes and a 16GB Server Quota, there is 160GB RAM
available across the cluster. If you were to add two more nodes to the cluster, the new
nodes would need 16GB of free RAM, and the aggregate RAM available in the cluster
would be 192GB.
RAM quotas
sachinkkansal@gmail.com 114
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Bucket quota
• The Bucket Quota is the amount of RAM allocated to an individual bucket for caching data.
• Bucket Quotas are configured on a per-node basis, and is allocated out of the RAM defined by the Server Quota.
• Example: if you create a new bucket with a Bucket Quota of 1GB, in a 10 node cluster there would be an
aggregate bucket quota of 10GB across the cluster. Adding two nodes to the cluster would extend your aggregate
bucket quota to 12GB.
• Bucket Quota is used by the system to determine when data should be ejected from memory.
• Bucket Quotas are dynamically configurable, within the Server Quota limits, and enable individual control of
information cached in memory on a per bucket basis. Therefore, buckets can be configured differently depending
your caching RAM allocation requirements.
• The Server Quota is also dynamically configurable, however, ensure that the cluster nodes have the available RAM
to support the chosen RAM quota configuration.
RAM Quota
sachinkkansal@gmail.com 115
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Couchbase Server includes a built-in caching layer which acts as a central part of the server and provides very
rapid reads and writes of data.
• Couchbase Server automatically manages the caching layer and coordinates with disk space to ensure
that enough cache space exists to maintain performance.
• Couchbase Server automatically places items that come into the caching layer into disk queue so that it can write
these items to disk.
• If the server determines that a cached item is infrequently used, it removes it from RAM to free space for other
items.
• Similarly the server retrieves infrequently-used items from disk and stores them into the caching layer when the
items are requested.
• In order to provide the most frequently-used data while maintaining high performance, Couchbase Server manages
a working set of your entire information.
• The working set is the data most frequently accessed and is kept in RAM for high performance.
Caching layer
sachinkkansal@gmail.com 116
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Couchbase automatically moves data from RAM to disk asynchronously, in the background, to keep frequently used
information in memory and less frequently used data on disk.
• Couchbase constantly monitors the information accessed by clients and decides how to keep the active data within the
caching layer.
• Data is ejected to disk from memory while the server continues to service active requests.
• During sequences of high writes to the database, clients are notified that the server is temporarily out of memory until enough
items have been ejected from memory to disk.
• The asynchronous nature and use of queues in this way enables reads and writes to be handled at a very fast rate, while
removing the typical load and performance spikes that would otherwise cause a traditional RDBMS to produce erratic
performance.
• When the server stores data on disk and a client requests the data, an individual document ID is sent and then the server
determines whether the information exists or not. Couchbase Server does this with metadata structures.
• The metadata holds information about each document in the database and this information is held in RAM. This means that
the server returns a ‘document ID not found’ response for an invalid document ID, returns the data from RAM, or returns the
data after being fetched from disk.
cont.. Caching layer
sachinkkansal@gmail.com 117
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Other database solutions read and write data from disk, which results in much slower
performance.
• One approach used by other database solutions is to install and manage a caching
layer as a separate component which works with a database.
• This approach has drawbacks because of the significant custom code and effort due
to the burden of managing the caching layer and the data transfers between the
caching layer and database.
Cont.. Caching Layer
sachinkkansal@gmail.com 118
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Couchbase Server mainly stores and retrieves information for clients using RAM. At
the same time, Couchbase Server eventually stores all data to disk to provide a higher
level of reliability.
• It writes data to the caching layer and puts the data into a disk write queue to be
persisted to disk.
• Disk persistence enables to perform backup and restore operations and to grow
datasets larger than the built-in caching layer.
• This disk storage process is called eventual persistence since the server does not
block a client while it writes to disk.
• If a node fails and all data in the caching layer is lost, the items are recovered from
disk. When the server identifies an item that needs to be loaded from disk, because it
is not in active memory, the process is handled by a background process that
processes the load queue and reads the information back from disk and into memory.
Disk storage
sachinkkansal@gmail.com 119
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Multi-threaded readers and writers provide multiple processes to simultaneously read
and write data on disk. Simultaneous reads and writes increase disk speed and
improve the read rate from disk.
• Multiple readers and writers are supported to persist data on disk.
• When server nodes are upgraded, the multiple readers and writers setting is
implemented with bucket restart and warmup. In this case, install the new node, add it
to the cluster, and edit the existing bucket setting for readers and writers
• After rebalancing the cluster, the new node performs reads and writes with multiple
readers and writers and the data bucket does not restart or go through a warmup.
• The multi-threaded engine includes additional synchronization among threads that are
accessing the same data cache to avoid conflicts. To maintain performance while
avoiding conflicts over data, Couchbase Server uses a form of locking between
threads and thread allocation among vBuckets with static partitioning.
Disk Storage -- Multiple readers and writers
sachinkkansal@gmail.com 120
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• When Couchbase Server creates multiple reader and writer threads, the server assesses a range
of vBuckets for each thread and assigns each thread exclusively to certain vBuckets.
• With this static thread coordination, the server schedules threads so that only a single reader and
single writer thread can access the same vBucket at any given time.
• Example: 6 pre-allocated threads & 2 data Buckets.
• Each thread has the range of vBuckets that is
statically partitioned for read and write access.
Disk Storage -- Multiple readers and writers
sachinkkansal@gmail.com 121
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Couchbase Server never deletes entire items from disk unless a client explicitly
deletes the item from the database or the expiration value for the item is reached.
• The ejection mechanism removes an item from RAM, while keeping a copy of the key
and metadata for that document in RAM and also keeping copy of that document on
disk.
Document deletion
sachinkkansal@gmail.com 122
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Tombstones are records of expired or deleted items that include item keys and metadata.
• Couchbase Server and other distributed databases maintain tombstones in order to provide
eventual consistency between nodes and between clusters.
• Couchbase Server stores the key plus several bytes of metadata per deleted item in two structures
per node. With millions of mutations, the space taken up by tombstones can grow quickly. This is
especially the case if there are a large number of deletions or expired documents.
• The Metadata Purge Interval sets how frequently a node permanently purges metadata on deleted
and expired items.
• The Metadata Purge Interval setting runs as part of auto-compaction. This helps reduce the
storage requirement by roughly 3x times lower than before and also frees up space much faster
Tombstone purging
sachinkkansal@gmail.com 123
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• A shared thread pool is a collection of threads which are shared across multiple
buckets.
• A thread pool is a collection of threads used to perform similar jobs. Each server node
has a thread pool that is shared across multiple buckets. Shared thread pool
optimizes dispatch tasks by decoupling buckets from thread allocation.
• Threads are spawned at initial startup of a server node instance and are based on the
number of CPU cores.
• With the shared thread pool associated with each node, threads and buckets are
decoupled.
• By decoupling threads from specific buckets, threads can run tasks for any bucket.
Since the global thread pool permits for bucket priority levels, a separate I/O queue is
available with the reader and writer workers at every priority level. This provides
improved task queueing.
• Example, when a thread is assigned to running a task from an I/O queue and a
second task is requested, another thread is assigned to pick up the second task.
Shared thread pool
sachinkkansal@gmail.com 124
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Circumstances describes how threads are scheduled to dispatch tasks:
• If all buckets have the same priority (default setting), each thread evenly round-robins
over all the task queues of the buckets.
• If buckets have different priorities, the threads spend an appropriate fraction of time
(scheduling frequency) dispatching tasks from queues of these bucket.
• If a bucket is being compacted, threads are not allocated to dispatch tasks for that
bucket.
• If all buckets are either empty or being serviced by other threads, the thread goes to
sleep.
Shared thread pool - Working
sachinkkansal@gmail.com 125
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Disk I/O priority enables workload priorities to be set at the bucket level.
• The bucket disk I/O priority can be set as either high or low (Default – LOW).
• Bucket priority settings determine whether I/O tasks for a bucket are enqueued in either low or high
priority task queues
• Threads in the global pool poll the high priority task queues more often compared to the low priority
task queues.
• Bucket latency and I/O operations are impacted by the setting value.
• can be configured during the initial setup and then edited after the setup -- after the setup results in
a restart of the bucket and resetting of the client connections
Disk I/O priority
sachinkkansal@gmail.com 126
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Tunable memory enables both value-only ejection and full metadata ejection from memory.
• The cache management approach for item ejection is implemented with value-only ejection and full
metadata ejection
Value-only ejection (the default) removes the data from cache but keeps all keys and metadata fields
for non-resident items. When the value bucket ejection occurs, the item's value is reset.
Full metadata ejection removes all data including keys, metadata, and key-values from cache for
non-resident items. Full metadata ejection reduces RAM requirement for large buckets.
• Full-bucket ejection supports very large data footprints (a large number of datasets or items/keys)
since the working sets in memory are smaller. The smaller working sets allow efficient cache
management a
• Example, you might want to enable the full metadata ejection on that bucket if you need to store
huge amounts of data (for example, tera or peta bytes) nd reduced warmup times. Metadata
ejection is configured at the bucket-level.
Tunable Memory
sachinkkansal@gmail.com 127
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• With 2.x, cb cached all keys and metadata in memory and allowed ejection of values
only. That is great for low-latency access to any part of your data.
• However some workloads don't require low latency access to all parts of data and
rather have memory reserved for 'hotter' parts of the working set.
• With 3.0, for large databases with a smaller active working set, you can turn on 'full
ejection' and eject keys and metadata for parts of your data that is rarely accessed.
Even if you consider keys and metadata small, with a large number of keys, the
memory used for keys and metadata can add up. With the full ejection mode, you can
more effectively use memory for caching larger parts of your working set.
• Enabling the option is easy and can be done per bucket in the admin console. The
change is transperant to apps so there is nothing that needs to be done on the
application size to take advantage of the setting.
sachinkkansal@gmail.com 128
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Working set management is the process of freeing up space and ensuring that the most used items are available in
RAM. Ejection is the process of removing data from RAM to provide room for frequently used items.
• The process that Couchbase Server performs to free space in RAM, and to ensure the most-used items are still
available in RAM is also known as working set management.
• Ejections is automatically performed by Couchbase Server.
• When Couchbase Server ejects information, it works in conjunction with the disk persistence system to ensure that
data in RAM has been persisted to disk and can be safely retrieved back into RAM if the item is requested.
• In addition to memory quota for the caching layer, there are two watermarks the engine uses to determine when it
is necessary to start persisting more data to disk. These are mem_low_wat and mem_high_wat.
Tunable Memory - Working set management and ejection
sachinkkansal@gmail.com 129
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• As the caching layer becomes full of data, eventually the mem_low_wat is passed. At
this time, no action is taken.
• As data continues to load, it eventually reaches mem_high_wat. At this point, a
background job is scheduled to ensure items are migrated to disk and that memory is
available for other Couchbase Server items.
• This job runs until measured memory reaches mem_low_wat.
• If the rate of incoming items is faster than the migration of items to disk, the system
can return errors indicating there is not enough space.
• This continues until there is available memory.
• The process of removing data from the caching to make way for the actively used
information is called ejection and is controlled automatically through thresholds set on
each configured bucket in the Couchbase Server cluster.
Tunable Memory
sachinkkansal@gmail.com 130
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Insider action by server white paper :Tunnable Memory - Working set.docx
Details more in PPT from Couchbase: Tunable Memory.pptx
WEB : http://www.couchbase.com/nosql-resources/blog/all-new-30-full-ejection-tuning-
memory-large-databases
Tunable Memory
sachinkkansal@gmail.com 131
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
TTL
• Each document stored in the database has an optional expiration value (TTL, time to
live) that is used to automatically deleted items.
• The expiration option can be used for data that has a limited life and could be
automatically deleted.
• TTL is specified in seconds, or as Unix epoch time.
• The default is no expiration.
• Typical uses for an expiration value include web session data where the actively
stored information needs to be removed from the system once the user activity has
stopped.
• With an expiration value, the data times out and is removed from the system without
being explicitly deleted.
• This frees up RAM and disk for more active data.
Expiration
sachinkkansal@gmail.com 132
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• When Couchbase Server is restarted or when it is started after a restore from backup,
• the server goes through a warm-up process.
• The warm-up loads data from disk into RAM, making the data available to clients.
• The warmup process must complete before clients can be serviced.
• Depending on the size and configuration of your system, and the amount of data that
you have stored, the warmup may take some time to load all of the stored data into
memory.
• use cbstats to get information about server warmup, including the status of warmup
and whether warmup is enabled.
• ep_warmup_thread - Indicates whether the warmup completed or is still running.
Returns “running” or “complete”.
• ep_warmup_state - Indicates the current progress of the warmup
Server warmup
sachinkkansal@gmail.com 133
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Replicas are copies of data that are proved on another node in a cluster.
• A copy of data from one bucket, known as a source is copied to a destination, which
we also refer to as the replica, or replica vBucket.
• replica node The node that contains the replica vBucket .
• source node The node containing original data to be replicated.
• Distribution of replica data is handled in the same way as data at a source node
• portions of replica data will be distributed around the cluster to prevent a single point
of failure
• After Couchbase has stored replica data at a destination node, the data will also be
placed in a queue to be persisted on disk at that destination node.
• When replication is performed between two Couchbase clusters, it is called cross
datacenter replication (XDCR)
Replicas and replication
sachinkkansal@gmail.com 134
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• The TAP protocol is an internal part of the Couchbase Server system and is used in a
number of different areas to exchange data throughout the system.
• TAP provides a stream of data of the changes that are occurring within the system.
• TAP is used during replication, to copy data between vBuckets used for replicas.
• It is also used during the rebalance procedure to move data between vBuckets and
redestribute the information across the system.
TAP
sachinkkansal@gmail.com 135
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• DCP is an innovative protocol that drive data sync for Couchbase Server v3.0.
• The Database Change Protocol (DCP) is a streaming protocol that significantly reduces latency for view updates.
• Increase data sync efficiency with massive data footprints
• Remove slower Disk-IO from the data sync path
• Improve latencies – replication for data durability
• In future, will provide a programmable data sync protocol for external stores outside Couchbase Server
• With DCP changes made to documents in memory are immediately streamed to be indexed without being written to
disk.
• This provides faster view consistency which provides fresher data.
• DCP reduces latency for cross data center replication (XDCR).
• Data is replicated memory-to-memory from the source cluster to the destination cluster before being written to disk
on the source cluster.
Database Change Protocol -- DCP
sachinkkansal@gmail.com 136
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Major difference between TAP and DCP:
 Tap guarantees you will see all mutations at least once, but doesn’t guarantee any specific order
 Tap doesn’t have the ability to restart from anywhere
 De-duplication of items means that we cannot tell when we have a consistent view of the database
Tap vs. DCP
©2014 Couchbase, Inc. 137
TAP DCP
Ordering No ordering guaranteed! Ordered
Restart-ability Not really! Granular Restart-ability
Consistency No snapshotting capabilities
here!
Snapshots give a consistent
view of the DB.
Performance No memory based support
for Views & XDCR
Memory based all data
synchronization
components!
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Since DCP is more of internal communication within Couchbase Server, its manageability
is very limited and beyond the scope of administering directly.
However, for understanding the internals, the building blocks are various components are
listed in this doc.
DCP.docx
For details of internal functioning and its benefits here is the PPT from Couchbase
DCP Deep Dive.pptx
DCP
sachinkkansal@gmail.com 138
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Couchbase Client Libraries / SDK
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
• Couchbase Server is designed to distribute data across multiple nodes, which adds
complexity to storage and retrieval.
• To allow applications to save and retrieve data, Couchbase provides a set of
language-specific client libraries, also called Couchbase Client SDKs.
Currently, Couchbase provides officially supported SDKs for the following seven
languages and runtimes
SDK
sachinkkansal@gmail.com 140
SDKs for several additional languages, including Clojure, Go, Perl, and Erlang, are
maintained by the community as open source projects
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
An up-to-date list can be found at the official Couchbase website:
www.couchbase.com/communities/all-client-libraries
http://www.couchbase.com/open-source
• These SDKs are used to communicate with the Couchbase Server cluster.
• The SDKs usually communicate directly with the nodes storing relevant data within the
cluster, and then perform data operations such as insertion and retrieval.
• The Couchbase clients are aware of all the nodes in the cluster, their current state,
and what documents are stored in each node.
SDK
sachinkkansal@gmail.com 141
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Connecting to Couchbase
©2014 Couchbase, Inc.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Connect to Couchbase
©2014 Couchbase, Inc. 143
Step 1
Step 2
Step 3
Client Connection Steps
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Service List
©2014 Couchbase, Inc. 144
• Dynamic Distributed Services
• Dynamic Configuration Updates – No
additional work from the developer
• Fault Tolerant/Durable Connectivity
Client Connectivity Characteristics:
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Client Operations
All SDK’s
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Unified API - DML Methods, Add/Remove/Update
©2014 Couchbase, Inc. 146
Adding and Removing Information (JSON Documents, K/V
Binary)
• insert-Insert a document or binary key/value. Fails if the item exists.
• upsert-Stores a document or binary key/value to the bucket, or updates if a document exists.
• replace-Replaces a document or binary key/value in a bucket. Fails if the item doesn’t exist.
• remove-Deletes an item from the bucket. Fails if the item doesn’t exist
• append/prepend-Appends or prepends in place the value of a binary k/v item. Does NOT
work with documents
• touch-Updates the ttl of a documet.
• getAndTouch-Retrieves a document or binary key/value and updates the expiry of the item
at the same time.
• counter-Increments or decrements a key's numeric value.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Unified API - DML Methods, Retrieval
©2014 Couchbase, Inc. 147
Retrieving Information (JSON Documents, K/V Binary)
• get-Retrieves a document or binary key/value.
• getAndLock-Lock the document or binary key/value on the server and retrieve it. When an
document is locked, its CAS changes and subsequent operations on the document (without
providing the current CAS) will fail until the lock is no longer held.
• getReplica-Get a document binary key/value from a replica server in your cluster.
• unlock-Unlock a previously locked document or binary key/value on within a bucket.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
Unified API – DML, CAS Example
©2014 Couchbase, Inc.
1. Two Clients retrieve the same document "XYZ"
2. Client A retrieves it first.
3. Client B then retrieves XYZ. Both clients will have the same CAS value for document
XYZ
4. Client B tries to perform an update to document XYZ. The update succeeds as the CAS
value was unchanged from when Client B initially retrieved the document. Once the
update succeeds, the CAS value for XYZ changes.
5. Client A then tries to perform an update on XYZ immediately after Client B. The update
will fail as Client A's CAS value is out of date. When Client B updated XYZ, the CAS
value changed.
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
For drag and drop UI driven ETL there is a connector for Talend:
http://www.couchbase.com/couchbase-server/connectors/talend
For simple reporting operations pentaho is really easy to use and is completely UI driven:
http://www.pentaho.com
With elastic search kibana can also be utilized for some really powerful reporting:
http://www.elasticsearch.org
For deployment we work with Cloudsoft (brooklyn):
http://www.cloudsoftcorp.com/partner/brooklyn/
We also work with CumuLogic:
http://www.cumulogic.com
Both of these platforms work well for deployment and scalability. We have a library for Puppet for use with Vagrant:
https://github.com/couchbaselabs/vagrants
There's an excellent cookbook for chef available in the Supermarket:
https://community.opscode.com/cookbooks/couchbase/versions/1.1.0
The Console allows you to do all management features on this system. These can also be done from the command line
or the REST AP
Additional Tools
sachinkkansal@gmail.com 149
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com
End Day 1
sachinkkansal@gmail.com 150
w
w
w
.D
ataC
oncur.com
sachinkkansal@
gm
ail.com

Weitere ähnliche Inhalte

Was ist angesagt?

To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionKrishnakumar S
 
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Slides: NoSQL Data Modeling Using JSON Documents – A Practical ApproachSlides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Slides: NoSQL Data Modeling Using JSON Documents – A Practical ApproachDATAVERSITY
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesRob Winters
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationDatabricks
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseRob Winters
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoalarsgeorge
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
TechEvent Building a Data Lake
TechEvent Building a Data LakeTechEvent Building a Data Lake
TechEvent Building a Data LakeTrivadis
 
Domain events & Kafka in Ruby applications
Domain events & Kafka in Ruby applicationsDomain events & Kafka in Ruby applications
Domain events & Kafka in Ruby applicationsSpyros Livathinos
 
Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil GamesRob Winters
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsQuontra Solutions
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Considerations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectConsiderations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectAkmal Chaudhri
 

Was ist angesagt? (20)

To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the question
 
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Slides: NoSQL Data Modeling Using JSON Documents – A Practical ApproachSlides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
 
NoSql Brownbag
NoSql BrownbagNoSql Brownbag
NoSql Brownbag
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil Games
 
Data engineering
Data engineeringData engineering
Data engineering
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
TechEvent Building a Data Lake
TechEvent Building a Data LakeTechEvent Building a Data Lake
TechEvent Building a Data Lake
 
Domain events & Kafka in Ruby applications
Domain events & Kafka in Ruby applicationsDomain events & Kafka in Ruby applications
Domain events & Kafka in Ruby applications
 
Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil Games
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Considerations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectConsiderations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT project
 

Andere mochten auch

No sql now2011_review_of_adhoc_architectures
No sql now2011_review_of_adhoc_architecturesNo sql now2011_review_of_adhoc_architectures
No sql now2011_review_of_adhoc_architecturesNicholas Goodman
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introductionmattcasters
 
What its ‘real’ about my video
What its ‘real’ about my videoWhat its ‘real’ about my video
What its ‘real’ about my videosarahlambe
 
Ast simple-ppt-1230292562412121-1 - copy
Ast simple-ppt-1230292562412121-1 - copyAst simple-ppt-1230292562412121-1 - copy
Ast simple-ppt-1230292562412121-1 - copyBatgerel Batjargal
 
Blog optimization adding google+ author tag
Blog optimization   adding google+ author tagBlog optimization   adding google+ author tag
Blog optimization adding google+ author tagAndrea Berberich
 
12 chairs of modernism
12 chairs of modernism12 chairs of modernism
12 chairs of modernismDenis Masharov
 
Listening and memory
Listening and memoryListening and memory
Listening and memorydrmccreedy
 
Cara cepat memahami transfer
Cara cepat memahami transferCara cepat memahami transfer
Cara cepat memahami transferPramudjo211052
 
Presentation1
Presentation1Presentation1
Presentation1nooch33
 
Team 9, Nanjing. Presentation deck
Team 9, Nanjing. Presentation deckTeam 9, Nanjing. Presentation deck
Team 9, Nanjing. Presentation deckRam Reva
 
Collegio rotondi presentation
Collegio rotondi presentationCollegio rotondi presentation
Collegio rotondi presentationJose Duarte
 
EDEL 541 Baker Fall 2012
EDEL 541 Baker Fall 2012EDEL 541 Baker Fall 2012
EDEL 541 Baker Fall 2012staffordlibrary
 
Abercrombie & fitch co yang han
Abercrombie & fitch co yang hanAbercrombie & fitch co yang han
Abercrombie & fitch co yang hanhanyang87830
 
Changing faceelearningmulti device-world
Changing faceelearningmulti device-worldChanging faceelearningmulti device-world
Changing faceelearningmulti device-worldAllen Partridge
 
International Legal Research LexisNexis Academic
International Legal Research LexisNexis AcademicInternational Legal Research LexisNexis Academic
International Legal Research LexisNexis Academicstaffordlibrary
 

Andere mochten auch (20)

No sql now2011_review_of_adhoc_architectures
No sql now2011_review_of_adhoc_architecturesNo sql now2011_review_of_adhoc_architectures
No sql now2011_review_of_adhoc_architectures
 
Database management system
Database management system Database management system
Database management system
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
 
Inspiration
InspirationInspiration
Inspiration
 
What its ‘real’ about my video
What its ‘real’ about my videoWhat its ‘real’ about my video
What its ‘real’ about my video
 
Ast simple-ppt-1230292562412121-1 - copy
Ast simple-ppt-1230292562412121-1 - copyAst simple-ppt-1230292562412121-1 - copy
Ast simple-ppt-1230292562412121-1 - copy
 
Blog optimization adding google+ author tag
Blog optimization   adding google+ author tagBlog optimization   adding google+ author tag
Blog optimization adding google+ author tag
 
Arshad
ArshadArshad
Arshad
 
12 chairs of modernism
12 chairs of modernism12 chairs of modernism
12 chairs of modernism
 
Listening and memory
Listening and memoryListening and memory
Listening and memory
 
Cara cepat memahami transfer
Cara cepat memahami transferCara cepat memahami transfer
Cara cepat memahami transfer
 
Presentation1
Presentation1Presentation1
Presentation1
 
Team 9, Nanjing. Presentation deck
Team 9, Nanjing. Presentation deckTeam 9, Nanjing. Presentation deck
Team 9, Nanjing. Presentation deck
 
Presentación ctap portugués
Presentación ctap portugués Presentación ctap portugués
Presentación ctap portugués
 
Collegio rotondi presentation
Collegio rotondi presentationCollegio rotondi presentation
Collegio rotondi presentation
 
EDEL 541 Baker Fall 2012
EDEL 541 Baker Fall 2012EDEL 541 Baker Fall 2012
EDEL 541 Baker Fall 2012
 
Abercrombie & fitch co yang han
Abercrombie & fitch co yang hanAbercrombie & fitch co yang han
Abercrombie & fitch co yang han
 
Changing faceelearningmulti device-world
Changing faceelearningmulti device-worldChanging faceelearningmulti device-world
Changing faceelearningmulti device-world
 
Navibank
NavibankNavibank
Navibank
 
International Legal Research LexisNexis Academic
International Legal Research LexisNexis AcademicInternational Legal Research LexisNexis Academic
International Legal Research LexisNexis Academic
 

Ähnlich wie Couchbase 3.0.2 d1

How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLDataStax
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
Introduction to NoSQL database technology
Introduction to NoSQL database technologyIntroduction to NoSQL database technology
Introduction to NoSQL database technologynicolausalex722
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
 
NoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessNoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessInfiniteGraph
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Crate.io
 
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Vladi Vexler
 
Database Management Myths & Reality for the future
Database Management Myths & Reality for the futureDatabase Management Myths & Reality for the future
Database Management Myths & Reality for the futureA B M Moniruzzaman
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQLCrate.io
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deckMySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deckVladi Vexler
 
Key Database Criteria for Cloud Applications
Key Database Criteria for Cloud ApplicationsKey Database Criteria for Cloud Applications
Key Database Criteria for Cloud ApplicationsNuoDB
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesMaynooth University
 
Modernize Legacy and Enterprise Application Through Implementation of Cloud N...
Modernize Legacy and Enterprise Application Through Implementation of Cloud N...Modernize Legacy and Enterprise Application Through Implementation of Cloud N...
Modernize Legacy and Enterprise Application Through Implementation of Cloud N...Amazon Web Services
 
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...PascalDesmarets1
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauSam Palani
 
Introduction to no sql database
Introduction to no sql databaseIntroduction to no sql database
Introduction to no sql databaseHeman Hosainpana
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 

Ähnlich wie Couchbase 3.0.2 d1 (20)

How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
NoSQL and Couchbase
NoSQL and CouchbaseNoSQL and Couchbase
NoSQL and Couchbase
 
Introduction to NoSQL database technology
Introduction to NoSQL database technologyIntroduction to NoSQL database technology
Introduction to NoSQL database technology
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
NoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessNoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-less
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
 
Database Management Myths & Reality for the future
Database Management Myths & Reality for the futureDatabase Management Myths & Reality for the future
Database Management Myths & Reality for the future
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
UNIT-2.pptx
UNIT-2.pptxUNIT-2.pptx
UNIT-2.pptx
 
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deckMySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
 
Key Database Criteria for Cloud Applications
Key Database Criteria for Cloud ApplicationsKey Database Criteria for Cloud Applications
Key Database Criteria for Cloud Applications
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
 
Modernize Legacy and Enterprise Application Through Implementation of Cloud N...
Modernize Legacy and Enterprise Application Through Implementation of Cloud N...Modernize Legacy and Enterprise Application Through Implementation of Cloud N...
Modernize Legacy and Enterprise Application Through Implementation of Cloud N...
 
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
Introduction to no sql database
Introduction to no sql databaseIntroduction to no sql database
Introduction to no sql database
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 

Kürzlich hochgeladen

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Kürzlich hochgeladen (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Couchbase 3.0.2 d1

  • 2. • Introduction to NoSQL • Need & options of NoSQL Solution • Getting Started with Couchbase • Administration of Couchbase • Considerations • Best Practices • Case Study • New Features • Assessment Agenda sachinkkansal@gmail.com 2 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 4. • Database Architect & Consultant • 15 years IT experience • Well versed with RDBMS solution products - SQL Server, Oracle, DB2. • NoSQL Enthusiast & follower. • Reachable at sachinkkansal@gmail.com • https://couchbaseblog.wordpress.com Sachin Kansal w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 5. Quick Check of NoSQL technology Discussion & simple assessment w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 6. • Is it free ? • Is it secure ? • Is it only buzz word or any one is using it really ? • Which technology it supports ? • Can it manage my current work setup ? • Does NoSQL requires separate h/w to work ? • Is it required…actually ? • Any others …? Discussion w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 7. Introduction On NoSQL : What It Is And Why You Need It w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 8. History : • The term NoSQL was coined by Carlo Strozzi in the year 1998. • He used this term to name his Open Source, Light Weight, DataBase which did not have an SQL interface. • In the early 2009, when last.fm wanted to organize an event on open-source distributed databases, Eric Evans, a Rackspace employee, reused the term to refer databases which are non-relational, distributed, and does not conform to atomicity, consistency, isolation, durability - four obvious features of traditional relational database systems. • In the same year, the "no:sql(east)" conference held in Atlanta, USA, NoSQL was discussed and debated a lot. • And then, discussion and practice of NoSQL got a momentum, and NoSQL saw an unprecedented growth. • NoSQL is a non-relational database management systems, different from traditional relational database management systems in some significant ways. • It is designed for distributed data stores where very large scale of data storing needs (for example Google or Facebook which collects terabits of data every day for their users). • These type of data storing may not require fixed schema, avoid join operations and typically scale horizontally. NoSQL w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 9. NoSQL – Not Only SQL Evolution of NoSQL • RDBMS ruled 3 decades • Emerging in the 1970s and early 1980s, relational databases offered a searchable mechanism for persisting complex data with minimal use of storage space. • Optimized to run on single machines. • schema-based approach to modeling data. • Expensive H/w • Dramatic changes in usage & lower cost. • Result : Increased complexity to the application and database design and often resulted in inferior performance. What It Is And Why You Need It w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 10. Dimensions that mattered • The number of concurrent users skyrocketed as applications increasingly became accessible via the web (and later on mobile devices). • The amount of data collected and processed soared as it became easier and increasingly valuable to capture all kinds of data. • The amount of unstructured or semi-structured data exploded and its use became integral to the value and richness of applications. • Google, Amazon, Facebook, and LinkedIn were among the first companies to discover the serious limitations of relational database technology for supporting these new application requirements. • Commercial alternatives didn’t exist, so they invented new data management approaches themselves. • Open source NoSQL database projects formed to leverage the work of the pioneers, and commercial companies associated with these projects soon followed What It Is And Why You Need It w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 11. Four interrelated megatrends are driving the adoption of NoSQL technology. • Big Users • The Internet of Things. • Big Data • Cloud Why do you need it ? w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 12. Big users Support global users 24 X 7 yearly • A newly launched app can go viral, growing from zero to a million users overnight — literally. • Some users are active frequently, while others use an app a few times, never to return. • Seasonal swings like those around festivals / holidays create spikes for short periods. • New product releases or promotions can spawn dramatically higher application usage • The large numbers of users combined with the dynamic nature of usage patterns is driving the need for more easily scalable database technology. w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 13. The amount of machine-generated data is increasing with the proliferation of digital telemetry. • There are 14 billion things connected to the Internet. • They’re in factories, farms, hospitals, and warehouses. • They’re in homes: appliances, gaming consoles, and more. • They’re cars. • They’re mobile phones and tablets. • They receive environment, location, movement, temperature, weather data, and more from 50 billion sensors. Estimation • By 2020, 32 billion things will be connected to the Internet. • By 2020, 10% of data will be generated by embedded systems. • By 2020, 20% of target rich data will be generated by embedded systems. The Internet of Things w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 14. Big Data Data is becoming easier to capture and access through third parties such as Facebook. Personal user information, geolocation data, social graphs, user-generated content, machine logging data, and sensor-generated data are just a few examples of the ever- expanding array of data being captured. Example: Flight Information w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 16. • Better application development productivity through a more fexible data model; • Greater ability to scale dynamically to support more users and data; • Improved performance to satisfy expectations of users wanting highly responsive applications and to allow more complex processing of data. • To address 3 V’s , CAP and over come RDBMS limitations. Need for NoSQL w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 17. RDBMS NoSQL Structured and organized data Stands for Not Only SQL Structured query language (SQL) No declarative query language Data and its relationships are stored in separate tables. No predefined schema Data Manipulation Language, Data Definition Language KeyValue pair storage, Column Store, Document Store, Graph databases Tight Consistency Eventual consistency rather ACID property ACID Unstructured and unpredictable data CAP Theorem / BASE Prioritizes high performance, high availability and scalability RDBMS vs NoSQL w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 18. Relational and NoSQL data models are very different. • The relational model takes data and separates it into many interrelated tables. • Each table contains rows and columns where a row might contain lots of information about a person and each column might contain a value for a specific attribute associated with that person, like their age. • Tables reference each other through foreign keys that are stored in columns as well. • The relational model minimizes the amount of storage space required, because each piece of data is only stored in one place • NoSQL Data Model Data Model - RDBMS sachinkkansal@gmail.com 18 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 19. NoSQL databases have a very different model. For example, a document oriented NoSQL database takes the data you want to store and aggregates it into documents using the JSON format. Each JSON document can be thought of as an object to be used by your application. A JSON document might, for example, take all the data stored in a row that spans 20 tables of a relational database and aggregate it into a single document/object. Aggregating this information may lead to duplication of information, but since storage is no longer cost prohibitive, the resulting data model flexibility, ease of efficiently distributing the resulting documents and read and write performance improvements make it an easy trade-o for web-based applications. Data Model – NoSQL sachinkkansal@gmail.com 19 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 20. Data Model Comparison sachinkkansal@gmail.com 20 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 21. In 2000, Eric Brewer, a computer scientist from the University of California, Berkeley, proposed the following conjecture: • Consistency: All components of the system see the same data. • This means that the data in the database remains consistent after the execution of an operation. For example after an update operation all clients see the same data. • Availability: All requests to the system receive a response, whether success or failure. • This means that the system is always on (service guarantee availability), no downtime • Partition tolerance: The system continues to function even if some components fail or some message traffic is lost. • This means that the system continues to function even the communication among the servers is unreliable, i.e. the servers may be partitioned into multiple groups that cannot communicate with one another. CAP Theorem w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 22. • In theoretically it is impossible to fulfill all 3 requirements. • CAP provides the basic requirements for a distributed system to follow 2 of the 3 requirements. • Therefore all the current NoSQL database follow the different combinations of the C, A, P from the CAP theorem. Here is the brief description of three combinations CA, CP, AP : • CA - Single site cluster, therefore all nodes are always in contact. When a partition occurs, the system blocks. • CP - Some data may not be accessible, but the rest is still consistent/accurate. • AP - System is still available under partitioning, but some of the data returned may be inaccurate. CAP 2 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 24. The BASE acronym was defined by Eric Brewer (CAP person) A BASE system gives up on consistency. • Basically Available indicates that the system does guarantee availability, in terms of the CAP theorem. • Soft state indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model. • Eventual consistency indicates that the system will become consistent over time, given that the system doesn't receive input during that time. The BASE w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 25. There are four general types (most common categories) of NoSQL databases. (There is not a single solutions which is better than all the others, however there are some databases that are better to solve specific problems) • Key-value stores • Column-oriented • Graph • Document oriented NoSQL Categories / Models w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 28. • Key-value stores are most basic types of NoSQL databases. • Designed to handle huge amounts of data. • Based on Amazon’s Dynamo paper. • Key value stores allow developer to store schema-less data. • In the key-value storage, database stores data as hash table where each key is unique and the value can be string, JSON, BLOB (basic large object) etc. • A key may be strings, hashes, lists, sets, sorted sets and values are stored against these keys. • For example a key-value pair might consist of a key like "Name" that is associated with a value like "Robin". • Key-Value stores can be used as collections, dictionaries, associative arrays etc. • Key-Value stores follows the 'Availability' and 'Partition' aspects of CAP theorem. • Key-Values stores would work well for shopping cart contents, or individual values like color schemes, a landing page URI, or a default account number. Key-value stores w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 30. • Column-oriented databases primarily work on columns and every column is treated individually. • Values of a single column are stored contiguously. • Column stores data in column specific files. • In Column stores, query processors work on columns too. • All data within each column datafile have the same type which makes it ideal for compression. • Column stores can improve the performance of queries as it can access specific column data. • High performance on aggregation queries (e.g. COUNT, SUM, AVG, MIN, MAX). • Works on data warehouses and business intelligence, customer relationship management (CRM), Library card catalogs etc. • Ex: BigTable, Cassandra, SimpleDB etc Column-oriented sachinkkansal@gmail.com 30 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 32. • A graph database stores data in a graph. • It is capable of elegantly representing any kind of data in a highly accessible way. • A graph database is a collection of nodes and edges • Each node represents an entity (such as a student or business) and each edge represents a connection or relationship between two nodes. • Every node and edge is defined by a unique identifier. • Each node knows its adjacent nodes. • As the number of nodes increases, the cost of a local step (or hop) remains the same. • Index for lookups. Graph databases sachinkkansal@gmail.com 32 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 36. Scenarios: Where Nosql can be used w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 40. 1. Bigness 2. Massive write performance 3. Fast key-value access 4. Flexible schema and flexible datatypes 5. Schema migration 6. Write availability 7. Easier maintainability, administration and operations 8. No single point of failure 9. Generally available parallel computing 10. Programmer ease of use 11. Use the right data model for the right problem 12. Avoid hitting the wall 13. Distributed systems support 14. Tunable CAP tradeoffs Specific Use Cases  Specific use cases.docx General Use Cases w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 42. Currently there are 150 (OMG !!) Flavors in the market (Complete list @http://nosql-database.org/) Flavors w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 44. • Vertical Scaling vs Horizontal Scaling • Example: Residential tower & Expressway Storage Architecture Vertical Horizontal • Can essentially resize your server with no change to your code. • It is the ability to increase the capacity of existing hardware or software by adding resources. • Limited by the fact that you can only get as big as the size of the server. • Affords the ability to scale wider to deal with traffic. • It is the ability to connect multiple hardware or software entities, such as servers, so that they work as a single logical unit. • This kind of scale takes time & effort to design & implement. w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 45. Getting Started With Couchbase Server w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 46. Evolution from memcached sachinkkansal@gmail.com 46 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 47. Couchbase Server Core Principles sachinkkansal@gmail.com 47 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 48. Push-button elasticity • Add or remove multiple servers simultaneously with the push of a button • Efficient data rebalancing without requiring application changes Zero-downtime maintenance • Add or remove servers, upgrade software in and perform any maintenance tasks in a live cluster • No application downtime required • No application performance degradation Data replication with auto-failover • Maintain multiple copies of your data within the cluster for high-availability • User configurable replication count • User configurable failover policy to ensure data availability in the face of hardware failure Key Couchbase Server characteristics and capabilities w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 49. Enterprise class monitoring and administration • Deeply instrumented monitoring with rich administration GUI • Dynamic system monitoring charts • Backup and restore capability • RESTful management API • Easy interface to external monitoring and management systems • Easy to automate deployment to the cloud Couchbase Server is simple, fast, elastic, and reliable Key Couchbase Server characteristics and capabilities sachinkkansal@gmail.com 49 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 50. Simple. Everything about Couchbase Server is easy: getting, installing, managing, expanding and using it. As a document database, there is no need to create and manage schemas; and never a need to normalize, shard or tune the database. Build applications faster, keep them running reliably and easily adapt them to changing business requirements. Fast. Couchbase Server is screamingly, predictably fast. It is the lowest latency, highest throughput NoSQL database technology available. Read and write data with consistently low latency and sustained high throughput across the scaling spectrum. Get the performance you need at lower cost Elastic. By automatically distributing data and I/O across commodity servers or virtual machines, Couchbase Server makes it easy to match the optimal quantity of resources to the changing needs of an application. Quickly grow a cluster from 1 node to 25 nodes to 100 nodes or shrink a cluster to sustain application performance, while precisely matching cost to demand. Key Couchbase Server characteristics and capabilities sachinkkansal@gmail.com 50 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 51. Couchbase Server, originally known as Membase, is an open source, distributed (shared-nothing architecture). Recent release : 3.0.2 – 15 Dec 2014 Written in C++, Erlang, C History of CouchBase w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 52. • Couchbase Server comes in two different editions: • Enterprise Edition (EE) • latest stable version of Couchbase, which includes all the bugfixes and has passed a rigorous QA process. • It is free for use with any number of nodes for testing and development purposes, and with up to 2 nodes for production. • Purchase required for an annual support plan with this edition. • Community Edition (CE) • lags behind the EE by about one release cycle and does not include all the latest fixes or commercial support. • Is open source and entirely free for use in testing and in production (for braves only though) • This edition is largely meant for enthusiasts and non-critical systems. Selecting a Couchbase Server Edition sachinkkansal@gmail.com 52 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 53. Recommended • Quad-core for key-value store, 64-bit CPU running at 3GHz. • Six cores if XDCR (Cross Data Center Replication) and views are used. • 16GB RAM (physical). • Block-based storage device (hard disk, SSD, EBS, iSCSI). Network filesystems such as CIFS and NFS are not supported. Minimum specification • Dual-core CPU running at 2GHz for key-value store. • 4GB RAM (physical). Storage requirements 1GB for application logging. At least twice the disk space to match the physical RAM for persistence of information. Resource requirements w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 54. Platform Version 32 / 64 bit Supported Recommended Red Hat Enterprise Linux 5 64 bit Developer and Production RHEL 5.8 Red Hat Enterprise Linux 6 64 bit Developer and Production RHEL 6.3 CentOS 5 64 bit Developer and Production CentOS 5.8 CentOS 6 64 bit Developer and Production CentOS 6.3 Amazon Linux 2013.03 64 bit Developer and Production Ubuntu Linux 10.04 64 bit Developer and Production Ubuntu Linux 12.04 64 bit Developer and Production Ubuntu 12.04 Debian Linux 7 64 bit Developer and Production Debian 7.0 Windows 2012 R2 SP1 64 bit Developer and Production Windows 2008 R2 with SP1 64 bit Developer and Production Windows 2008 Windows 8 32 and 64 bit Developer only Windows 7 32 and 64 bit Developer only Mac OS 10.7 64 bit Developer only Mac OS 10.8 64 bit Developer only Mac OS 10.8 Supported platforms sachinkkansal@gmail.com 54 Couchbase clusters on mixed platforms are not supported. w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 55. • Couchbase Server uses specific network ports for communication between server components and with the clients accessing the data stored in the Couchbase cluster. • The listed ports must be available on the host for Couchbase Server to run and operate correctly. • Couchbase Server configures these ports automatically, but you must verify that the firewall and IP tables configuration allow communication on the specified ports for each usage type. • Ports used for different types of communication with Couchbase Server: 1. Node to node These ports are used by Couchbase Server for communication between all nodes within the cluster. These ports must be open to enable nodes to communicate with each other. 2. Node to client These ports are used by Couchbase Server for communication between all nodes within the cluster. These ports must be open to enable nodes to communicate with each other. 3. Cluster administration These ports are used for Couchbase administration including the REST API, command- line clients, and web browsers. 4. XDCR These ports are used for XDCR (Cross Data Center Replication) communication between all nodes in both the source and destination clusters. Network Ports sachinkkansal@gmail.com 55 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 56. Port Description Node to node Node to client Cluster admin XDCR v1 XDCR v2 8091 Web Administration Port Yes Yes Yes Yes Yes 8092 Couchbase API Port Yes Yes No Yes Yes 11207 Internal/External Bucket Port for SSL No Yes No No No 11209 Internal Bucket Port Yes No No No No 11210 Internal/External Bucket Port Yes Yes No No Yes 11211 Client interface (proxy) No Yes No No No 11214 Incoming SSL Proxy No No No No Yes 11215 Internal Outgoing SSL Proxy No No No No Yes 18091 Internal REST HTTPS for SSL No Yes Yes No Yes 18092 Internal CAPI HTTPS for SSL No Yes No No Yes 4369 Erlang Port Mapper ( epmd ) Yes No No No No 21100 to 21299 (inclusive) Node data exchange Network ports sachinkkansal@gmail.com 56 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 57. Port Used by Port 8091 Used by the Couchbase Web Console for REST/HTTP traffic. Port 8092 Used to access views, run queries, and update design documents. Port 11207 Used by smart client libraries to access data nodes using SSL. Port 11210 Used by smart client libraries or Moxi to directly connect to the data nodes. Port 11211 Used by pre-existing Couchbase and memcached (non-smart) client libraries. Ports 11214 and 11215 Used for SSL XDCR data encryption. Port 18091 Used by the Couchbase Web Console for REST/HTTP traffic with SSL. Port 18092 Used to access views, run queries, and update design documents with SSL. All other Ports Used for other Couchbase Server communications. Network Ports sachinkkansal@gmail.com 57 • Port 11213 is an internal ports used on the local host for memcached and compaction. • The node is not used for communication between nodes in a cluster. • For firewall purposes, you do not need to take port 11213 into consideration. However, if a service is listening on this port, Couchbase Server does not start correctly. w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 58. • Implement the same operating system on all machines within each discreet cluster. • Mixed clusters and mixed XDCR deployments are not supported due to incompatibility caused by differences in the number of shards between platforms. Deployment Consideration sachinkkansal@gmail.com 58 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 60. • We will be using Windows in our session • Install the package choices : • Thru Wizard • Unattended / Silent • No anti-virus software running • Administrator privileges • Couchbase Server uses the Microsoft C++ redistributable package, which is automatically downloaded during installation. • If already being used, installation can fail. Close application using it prior. Installation - Windows w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 61. • HO  Start Installing Couchbase server 3.0.2 by running the Installer package. • Take the snapshot of each screen to explore the relevance of parameters after installation. • If the Windows installer hangs on the Computing Space Requirements screen, there is an issue with setup or installation environment, such as other running applications. • Stop any other browsers and applications that were running when you started installing the Couchbase Server. • Kill the installation process and uninstall the failed setup. • Delete or rename the temp location under C:Users[logonuser]AppDataTemp • Reboot and try again. Installation - Wizard sachinkkansal@gmail.com 61 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 62. An unattended installation uses a script to install Couchbase Server. Steps: 1. Record your installation settings in the wizard installation. These settings are saved to a file, which is used to silently install other nodes of the same version.  Open a Command Terminal or Power and start the installation executable with the /r command-line option: couchbase_server_version.exe /r /f1your_file_name.iss  Provide your installation options when prompted. The wizard completes the server installation and provides a file with your recorded options at C:Windowsyour_file_name.iss. (Accept an increase in MaxUserPort ) 2. Copy the your_file_name.iss file into the same directory as the installer executable.  Run the installer from the command-line using the /s option: > couchbase_server_version.exe /s -f1your_file_name.iss 3. To repeat this process on multiple machines, copy the installation package and the your_file_name.iss file to the same directory on each machine. Installation - Unattended sachinkkansal@gmail.com 62 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 63. Un-Install • To uninstall Couchbase Server on a Windows system, you must have Administrator or Power User privileges. • Go to Control panel and remove from add/remove program options Upgrade • The installation wizard will upgrade your server installation using the same installation location. Uninstall & Upgrade sachinkkansal@gmail.com 63 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 64. Configuring Couchbase Server sachinkkansal@gmail.com 64 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 65. Configuring Couchbase Server • Open the administration web console and configure the server. • In browser: http://<server>:8091 <server> is the machine on which you’ve installed Couchbase. • Then after there are simple multiple screens to configure the server, however each options are of high significance which we will be exploring in coming slides. w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 67. Configuring Couchbase Server, step 1 sachinkkansal@gmail.com 67 • The Databases Path field is the location where Couchbase will store its persisted data. • The Indices Path field is where Couchbase will keep the indices created by views. • Both locations refer only to the current server node. • Placing the index data on a different physical disk than the document data is likely to result in better performance, especially if you will be using many views or creating views on the fly. • In a Couchbase cluster, every node must have the same amount of RAM allocated. • The RAM quota you set when starting a new cluster will be inherited by every node that joins the cluster in the future. It is possible to change the server RAM quota later through the command-line administration tools. • Used on demand & normally set to 60% of total RAM w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 68. Configuring Couchbase Server, step 2 sachinkkansal@gmail.com 68 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 69. Configuring Couchbase Server, step 3 sachinkkansal@gmail.com 69 • Memcached bucket type will hide the unsupported configuration options, such as replicas and read-write concurrency. • The memory size is the amount of RAM that will be allocated for this bucket on every node in the cluster. • This is the amount of RAM that will be allocated on every node, not the total amount that will be split between all nodes. • Couchbase buckets can replicate data across multiple nodes in the cluster. With replication enabled, all data will be copied up to three times to different nodes. If a node fails, Couchbase will make one of the replica copies available for use. • number of replicas setting refers to copies of data. For example, setting it to 3 will result in a total of four instances of your data in the cluster, which also requires a minimum of four nodes. • Enabling index replication - This has the effect of increasing traffic between nodes, but also means that the indices will not need to be rebuilt in the event of node failure • The disk read-write concurrency setting controls the number of threads that will perform disk IO operations for this bucket w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 70. Configuring Couchbase Server, step 4 sachinkkansal@gmail.com 70 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 71. Configuring Couchbase Server, step 5 sachinkkansal@gmail.com 71 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 72. All set Screen sachinkkansal@gmail.com 72 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 74. Database Architecture sachinkkansal@gmail.com 74 • Each Couchbase Server node has two major components: the Cluster Manager and the Data Manager • Applications use the Client Software Development Kits (SDKs) to communicate with both of these components. w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 76. • A Couchbase Server cluster consists of between 1 and 1024 nodes, with each node running exactly one instance of the Couchbase Server software. • The data is partitioned and distributed between the nodes in the cluster. • This means that each node holds some of the data and is responsible for some of the storing and processing load. • Distributing data this way is often referred to as sharding, with each partition referred to as a shard CouchBase Server sachinkkansal@gmail.com 76 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 78. Data Manager. The data manager does the work of storing and retrieving data in response to data operation requests from applications. • It exposes two “memcapable” ports to the network – one port supports non-vBucket- aware memcached client libraries (pre-memcapable 2.0 API), which are proxied if required. • The other port expects to communicate with vBucket-aware clients (memcapable 2.0+ API). The majority of code in the Data Manager is C and C++. Data Manager sachinkkansal@gmail.com 78 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 79. The Couchbase Server data manager listens for requests on two TCP ports - the port numbers are configurable Port 11211 – The traditional memcached port number processes requests from clients supporting version 1.0 of the memcapable API specification. These clients rely on a consistent hashing algorithm to map keys directly to servers in a variable-length server list. Most memcached clients today support memcapable 1.0, though memcapable 2.0 clients for the most popular platforms are being introduced (e.g., spymemcached for Java, enyim for .NET, fauna for Ruby, libmemcached for C and other languages that wrap this client library). • Port 11210 – a port directly accessible to clients implementing version 2.0 of the memcapable API. These clients are “vBucket aware,” using a hashing algorithm to map keys to one of a fixed number of “vBuckets” ( TCP Ports for Data Manager sachinkkansal@gmail.com 79 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 81. The cluster manager supervises the configuration and behavior of all nodes in a Couchbase Server cluster. Cluster management code runs on every node in the cluster, but one node (the one holding a global singleton) is elected to perform aggregation, consensus building and cross-node control decisions at any point in time. • The Couchbase Server cluster manager monitors health and coordinates data manager behavior on each node • configures and supervises inter-node behavior (e.g. replication streams and rebalancing operations) • Provides aggregation and consensus functions for the cluster (e.g. global singleton election) • Provides a RESTful cluster management API. • The cluster manager is build atop Erlang/OTP, a proven environment for building and operating robust fault-tolerant distributed applications Cluster Manager sachinkkansal@gmail.com 81 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 82. There are four primary subsystems that operate on each node. 1. Heartbeat. A watchdog process periodically communicates with the currently elected cluster leader (the node with the global singleton) to provide Couchbase Server health updates. 2. Process monitor. This subsystem monitors execution of the local data manager, restarting failed processes as required and contributing status information to the heartbeat module. 3. Configuration Manager. Each Couchbase Server node has a configuration – a vBucket map, active replication streams, a target rebalance map, etc. The configuration manager receives, processes and monitors local configuration, in concert with a cluster-wide configuration distribution system. 4. Global Singleton Supervisor. In a Couchbase Server cluster, one node is elected leader. If the leader dies, a new leader is elected. The Global Singleton Supervisor is responsible for electing a cluster leader and supervising “per-cluster” processes if the local node is the current leader. Per node configuration management and monitoring functions sachinkkansal@gmail.com 82 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 83. In addition to the per-node functions which are always executing at each node in a Couchbase Server cluster, there are a set of functions which active only on one node in the cluster at any point in time. Possession of a global singleton data structure indicates to a node that it should execute these functions: 1. Rebalance Orchestrator. The rebalance orchestrator calculates, distributes and provides cluster-wide supervision of a rebalance operation. When a rebalance operation is initiated, it calculates a target vBucket map based on the current pending set of servers to be added and removed from the cluster; distributes commands to individual nodes to build a network of vBucket migration streams; and monitors migration completion events, updating and distributing the current vBucket map as migrations complete 2. Node Health Monitor. The node health monitor (also known as The Doctor) receives heartbeat updates from individual nodes in the cluster, updating configuration and raising alerts as required. 1. vBucket state and replication manager. Responsible for establishing and monitoring the current network of replication streams Per cluster functions sachinkkansal@gmail.com 83 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 85. Data flow in a Couchbase Server environment sachinkkansal@gmail.com 85 Between application and Couchbase Server w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 86. Data Flow - Within the Couchbase Server cluster sachinkkansal@gmail.com 86 1. The set arrives into the Couchbase Server listener- receiver. 2. Couchbase Server immediately replicates the data to replica servers – the number of replica copies is user defined. Upon arrival at replica servers, the data is persisted. 3. The data is cached in main memory. 4. The data is queued for persistence and de- duplicated if a write is already pending. Once the pending write is pulled from the queue, the value is retrieved from cache and written to disk (or SSD). 5. Set acknowledgment return to application w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 87. Metadata and Documents sachinkkansal@gmail.com 87 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 91. Ejection, NRU, Cache Miss sachinkkansal@gmail.com 91 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 92. Clients Connect Directly to Couchbase Nodes sachinkkansal@gmail.com 92 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 93. Key Hash-Partitioning Application Servers MAP MAP MAP 1024 Partitions 8 GB RAM 3 IO Workers ClientHashFunc6on("ABCXYZ@couchbase.com") => Par66on[0..1023] {25} ClusterMap[P(25)] => [x.x.x.x] => IP of Server Responsible for Par66on 25 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 95. All Metadata for All Documents (64 bytes + Key Length) Document Values (NRU Ejected if RAM Quota Used > 90%) Also Leave RAM For OS: [Filesystem Cache >> Views] Document Indexing Monitoring XDCR Recommended: minimum 4 Cores + 1 core per design document + 1 core per XDCR replicated bucket Persisted Documents All Indexes for Design Documents/Views Append-‐Only Disk Format & CompacUon Performance: MulUple EBS Volumes High IOPS Raid 0 on Amazon RAM, CPU and IO Guidelines RAM CPU Disk IO w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 96. Key Concepts Architecture – Building Blocks sachinkkansal@gmail.com 96 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 97. Architecture – Building Blocks Key Concepts Node Cluster Cluster Manager Caching layer vbuckets Buckets RAM quotas Tunable memory Disk storage Shared thread pool Disk I/O priority TAP Expiration Server warmup Replicas and replication Database Change Protocol Proxy (Moxi) w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 98. Couchbase Server can be used either in a standalone configuration, or in a cluster configuration where multiple Couchbase Servers are connected together to provide a single, distributed, data store. Couchbase Server or node • A single instance of the Couchbase Server software running on a machine, whether a physical machine, virtual machine or other environment. • All instances of Couchbase Server are identical, provide the same functionality, interfaces and systems, and consist of the same components. • All nodes within Couchbase Server are created equally. • No Master :There is no hierarchy or topology, and no single node is a ‘master’ of the rest of the cluster. • Each node is responsible only for the data it stores and the requests made to it by clients. • Range 1 and 1024 nodes Node sachinkkansal@gmail.com 98 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 99. Cluster • A cluster is a collection of one or more instances of Couchbase Server that are configured as a logical cluster. • All nodes within the cluster are identical and provide the same functionality. • Each node is capable of managing the cluster and each node can provide aggregate statistics and operational information about the cluster. • User data is stored across the entire cluster through the vBucket system. • Clusters operate in a completely horizontal fashion. • To increase the size of a cluster, add another node. • There are no parent/child relationships or hierarchical structures involved. This means that Couchbase Server scales linearly, both in terms of increasing the storage capacity and performance and scalability. w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 100. • The Cluster Manager is responsible for node and cluster management. Every node within a Couchbase cluster includes the Cluster Manager component. • Access to the Cluster Manager is provided through the administration interface on a dedicated network port and through dedicated network ports for client access. • Additional ports are configured for inter-node communication. • The data is partitioned and distributed between the nodes in the cluster. • Distributing data this way is often referred to as sharding, with each partition referred to as a shard The Cluster Manager is responsible for the following within a cluster: • Cluster management • Node administration • Node monitoring • Statistics gathering and aggregation • Run-time logging • Multi-tenancy • Security for administrative and client access • Client proxy service to redirect requests Cluster Manager sachinkkansal@gmail.com 100 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 101. • The Rack Awareness feature permits logical groupings of servers on a cluster where each server group physically belongs to a rack or Availability Zone. • To use and enable Rack Awareness, all servers in a cluster must be upgraded to Couchbase Server Enterprise Edition and minimally, version 2.5. • By design, Couchbase Server evenly distributes data of active and replica vBuckets across the cluster for cluster performance and redundancy purposes. • With Rack Awareness, server partitions are laid out so the replica partitions for servers in one server group are distributed in servers for a second group and vice versa. • If one of the servers becomes unavailable or if an entire rack goes down, data is retained since the replicas are available on the second server group. • Replica vBuckets are evenly distributed from one server group to another server group to provide redundancy and data availability. • The rebalance operation also evenly distributes the replica vBuckets from one server group to another server group across the cluster. If an imbalance occurs where there is an unequal number of servers in one server group, the rebalance operation performs a "best effort" of evenly distributing the replica vBuckets across the cluster. Rack Awareness sachinkkansal@gmail.com 101 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 102. Distribution of vBuckets and replica vBuckets sachinkkansal@gmail.com 102 Distribution with additional server Distribution with unavailable server • If the cluster becomes imbalanced, add servers to balance the cluster. For optimal Rack Awareness functionality, a balanced cluster is recommended. • If there is only one server or only one server group, default behavior is automatically implemented, that is, Rack Awareness functionality is disabled. w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 103. • Couchbase Server provides data management services using buckets. • Buckets are isolated virtual containers for data. • A bucket is a logical grouping of physical resources within a cluster of Couchbase Servers. • Buckets provide a secure mechanism for organizing, managing, and analyzing data storage resources. Two types of data buckets, memcached and couchbase, enable you to store data either in-memory only or both in-memory and on disk (for added reliability). During Couchbase Server set up, the type of bucket that you need for your implementation is selected. • Buckets can be used by multiple client applications across a cluster. • Similar to databases in Microsoft SQL Server, or to schemas in Oracle. • Typically, you would have separate buckets for separate applications. • Couchbase supports two kinds of buckets: Couchbase and memcached. Buckets sachinkkansal@gmail.com 103 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 105. • Data is cached in memory and persisted to disk and can be dynamically rebalanced between nodes in a cluster to distribute the load. • Couchbase buckets can be configured to maintain between one and three replica copies of the data, which provides redundancy in the event of node failure. Because each copy must reside on a different node, replication requires at least one node per replica, plus one for the active instance of data. • Couchbase-type buckets provide a highly-available and dynamically reconfigurable distributed data store, survive node failures, and allow cluster reconfiguration while continuing to service requests. Couchbase-type buckets provide the following core capabilities: sachinkkansal@gmail.com 105 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 107. Default bucket • The default bucket is a Couchbase bucket that always resides on port 11211 and is a non-SASL authenticating bucket. • When Couchbase Server is first installed this bucket is automatically set up during installation. • This bucket can be removed after installation and can also be re-added later, but when re-adding a bucket named “default”, the bucket must be place on port 11211 and must be a non-SASL authenticating bucket. • A bucket not named default cannot reside on port 11211 if it is a non-SASL bucket. • The default bucket can be reached with a vBucket aware smart client, an ASCII client or a binary client that doesn’t use SASL authentication. bucket interface types – 1/3 sachinkkansal@gmail.com 107 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 108. Non-SASL buckets Non-SASL buckets can be placed on any available port with the exception of port 11211 if the bucket is not named “default”. Only one Non-SASL bucket can placed on any individual port. These buckets can be reached with a vBucket aware smart client, an ASCII client or a binary client that doesn’t use SASL authentication. SASL buckets • SASL authenticating Couchbase buckets can only be placed on port 11211 and each bucket is differentiated by its name and password. S • ASL bucket cannot be placed on any other port beside 11211. • These buckets can be reached with either a vBucket aware smart client or a binary client that has SASL support. These buckets cannot be reached with ASCII clients. bucket interface types – 2/3/3 sachinkkansal@gmail.com 108 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 109. GENERIC NOTE • Smart clients discover changes in the cluster using the Couchbase Management REST API. • Buckets can be used to isolate individual applications to provide multi-tenancy or to isolate data types in the cache to enhance performance and visibility. • Couchbase Server permits you to configure different ports to access different buckets, and provides the option to access isolated buckets using either the binary protocol with SASL authentication or the ASCII protocol with no authentication • Quotas for RAM and disk usage are configurable per bucket so that resource usage can be managed across the cluster. bucket sachinkkansal@gmail.com 109 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 110. • A vBucket is defined as the owner of a subset of the key space of a Couchbase cluster. These vBuckets are used to distributed information effectively across a cluster. • The vBucket system is used both for distributing data and for supporting replicas (copies of bucket data) on more than one node. • vBuckets are not a user-accessible component, but they are a critical component of Couchbase Server and are vital to the availability support • Every document ID belongs to a vBucket. vBuckets sachinkkansal@gmail.com 110 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 111. • Documents in a bucket are further subdivided into virtual buckets (vBuckets) by their key. • Each vBucket owns a subset of all the possible keys, and documents are mapped to vBuckets according to a hash of their key. • Every vBucket,in turn, belongs to one of the nodes of the cluster. • When a client needs to access a document, it first hashes the document key to find out which vBucket owns that key. • The client then checks the cluster map to find which node hosts the relevant vBucket. • Lastly, the client connects directly to the node that stores the document to perform the get operation. Vbuckets - Functioning sachinkkansal@gmail.com 111 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 112. vBucket sachinkkansal@gmail.com 112 • The client first hashes the key to calculate the vBucket which owns KEY. In this example, the hash resolves to vBucket 8 (vB8). • By examining the vBucket map, the client determines Server C hosts vB8. • The client sends the GET operation directly to Server C w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 113. • This architecture permits Couchbase Server to cope with changes without using the typical RDBMS sharding method. • In addition, the architecture differs from the method used by memcached, which uses client-side key hashes to determine the server from a defined list. • The memcached method requires active management of the list of servers and specific hashing algorithms such as Ketama to cope with changes to the topology. vBucket sachinkkansal@gmail.com 113 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 114. RAM is allocated to Couchbase Server in 2 configurable quantities: Server Quota and Bucket Quota. Server quota • The Server Quota is the RAM that is allocated to the server when Couchbase Server is first installed. • This sets the limit of RAM allocated by Couchbase for caching data for all buckets and is configured on a per-node basis. • The Server Quota is initially configured in the first server in your cluster is configured, and the quota is identical on all nodes. • Example: if you have 10 nodes and a 16GB Server Quota, there is 160GB RAM available across the cluster. If you were to add two more nodes to the cluster, the new nodes would need 16GB of free RAM, and the aggregate RAM available in the cluster would be 192GB. RAM quotas sachinkkansal@gmail.com 114 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 115. Bucket quota • The Bucket Quota is the amount of RAM allocated to an individual bucket for caching data. • Bucket Quotas are configured on a per-node basis, and is allocated out of the RAM defined by the Server Quota. • Example: if you create a new bucket with a Bucket Quota of 1GB, in a 10 node cluster there would be an aggregate bucket quota of 10GB across the cluster. Adding two nodes to the cluster would extend your aggregate bucket quota to 12GB. • Bucket Quota is used by the system to determine when data should be ejected from memory. • Bucket Quotas are dynamically configurable, within the Server Quota limits, and enable individual control of information cached in memory on a per bucket basis. Therefore, buckets can be configured differently depending your caching RAM allocation requirements. • The Server Quota is also dynamically configurable, however, ensure that the cluster nodes have the available RAM to support the chosen RAM quota configuration. RAM Quota sachinkkansal@gmail.com 115 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 116. • Couchbase Server includes a built-in caching layer which acts as a central part of the server and provides very rapid reads and writes of data. • Couchbase Server automatically manages the caching layer and coordinates with disk space to ensure that enough cache space exists to maintain performance. • Couchbase Server automatically places items that come into the caching layer into disk queue so that it can write these items to disk. • If the server determines that a cached item is infrequently used, it removes it from RAM to free space for other items. • Similarly the server retrieves infrequently-used items from disk and stores them into the caching layer when the items are requested. • In order to provide the most frequently-used data while maintaining high performance, Couchbase Server manages a working set of your entire information. • The working set is the data most frequently accessed and is kept in RAM for high performance. Caching layer sachinkkansal@gmail.com 116 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 117. • Couchbase automatically moves data from RAM to disk asynchronously, in the background, to keep frequently used information in memory and less frequently used data on disk. • Couchbase constantly monitors the information accessed by clients and decides how to keep the active data within the caching layer. • Data is ejected to disk from memory while the server continues to service active requests. • During sequences of high writes to the database, clients are notified that the server is temporarily out of memory until enough items have been ejected from memory to disk. • The asynchronous nature and use of queues in this way enables reads and writes to be handled at a very fast rate, while removing the typical load and performance spikes that would otherwise cause a traditional RDBMS to produce erratic performance. • When the server stores data on disk and a client requests the data, an individual document ID is sent and then the server determines whether the information exists or not. Couchbase Server does this with metadata structures. • The metadata holds information about each document in the database and this information is held in RAM. This means that the server returns a ‘document ID not found’ response for an invalid document ID, returns the data from RAM, or returns the data after being fetched from disk. cont.. Caching layer sachinkkansal@gmail.com 117 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 118. • Other database solutions read and write data from disk, which results in much slower performance. • One approach used by other database solutions is to install and manage a caching layer as a separate component which works with a database. • This approach has drawbacks because of the significant custom code and effort due to the burden of managing the caching layer and the data transfers between the caching layer and database. Cont.. Caching Layer sachinkkansal@gmail.com 118 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 119. • Couchbase Server mainly stores and retrieves information for clients using RAM. At the same time, Couchbase Server eventually stores all data to disk to provide a higher level of reliability. • It writes data to the caching layer and puts the data into a disk write queue to be persisted to disk. • Disk persistence enables to perform backup and restore operations and to grow datasets larger than the built-in caching layer. • This disk storage process is called eventual persistence since the server does not block a client while it writes to disk. • If a node fails and all data in the caching layer is lost, the items are recovered from disk. When the server identifies an item that needs to be loaded from disk, because it is not in active memory, the process is handled by a background process that processes the load queue and reads the information back from disk and into memory. Disk storage sachinkkansal@gmail.com 119 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 120. • Multi-threaded readers and writers provide multiple processes to simultaneously read and write data on disk. Simultaneous reads and writes increase disk speed and improve the read rate from disk. • Multiple readers and writers are supported to persist data on disk. • When server nodes are upgraded, the multiple readers and writers setting is implemented with bucket restart and warmup. In this case, install the new node, add it to the cluster, and edit the existing bucket setting for readers and writers • After rebalancing the cluster, the new node performs reads and writes with multiple readers and writers and the data bucket does not restart or go through a warmup. • The multi-threaded engine includes additional synchronization among threads that are accessing the same data cache to avoid conflicts. To maintain performance while avoiding conflicts over data, Couchbase Server uses a form of locking between threads and thread allocation among vBuckets with static partitioning. Disk Storage -- Multiple readers and writers sachinkkansal@gmail.com 120 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 121. • When Couchbase Server creates multiple reader and writer threads, the server assesses a range of vBuckets for each thread and assigns each thread exclusively to certain vBuckets. • With this static thread coordination, the server schedules threads so that only a single reader and single writer thread can access the same vBucket at any given time. • Example: 6 pre-allocated threads & 2 data Buckets. • Each thread has the range of vBuckets that is statically partitioned for read and write access. Disk Storage -- Multiple readers and writers sachinkkansal@gmail.com 121 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 122. • Couchbase Server never deletes entire items from disk unless a client explicitly deletes the item from the database or the expiration value for the item is reached. • The ejection mechanism removes an item from RAM, while keeping a copy of the key and metadata for that document in RAM and also keeping copy of that document on disk. Document deletion sachinkkansal@gmail.com 122 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 123. • Tombstones are records of expired or deleted items that include item keys and metadata. • Couchbase Server and other distributed databases maintain tombstones in order to provide eventual consistency between nodes and between clusters. • Couchbase Server stores the key plus several bytes of metadata per deleted item in two structures per node. With millions of mutations, the space taken up by tombstones can grow quickly. This is especially the case if there are a large number of deletions or expired documents. • The Metadata Purge Interval sets how frequently a node permanently purges metadata on deleted and expired items. • The Metadata Purge Interval setting runs as part of auto-compaction. This helps reduce the storage requirement by roughly 3x times lower than before and also frees up space much faster Tombstone purging sachinkkansal@gmail.com 123 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 124. • A shared thread pool is a collection of threads which are shared across multiple buckets. • A thread pool is a collection of threads used to perform similar jobs. Each server node has a thread pool that is shared across multiple buckets. Shared thread pool optimizes dispatch tasks by decoupling buckets from thread allocation. • Threads are spawned at initial startup of a server node instance and are based on the number of CPU cores. • With the shared thread pool associated with each node, threads and buckets are decoupled. • By decoupling threads from specific buckets, threads can run tasks for any bucket. Since the global thread pool permits for bucket priority levels, a separate I/O queue is available with the reader and writer workers at every priority level. This provides improved task queueing. • Example, when a thread is assigned to running a task from an I/O queue and a second task is requested, another thread is assigned to pick up the second task. Shared thread pool sachinkkansal@gmail.com 124 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 125. Circumstances describes how threads are scheduled to dispatch tasks: • If all buckets have the same priority (default setting), each thread evenly round-robins over all the task queues of the buckets. • If buckets have different priorities, the threads spend an appropriate fraction of time (scheduling frequency) dispatching tasks from queues of these bucket. • If a bucket is being compacted, threads are not allocated to dispatch tasks for that bucket. • If all buckets are either empty or being serviced by other threads, the thread goes to sleep. Shared thread pool - Working sachinkkansal@gmail.com 125 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 126. • Disk I/O priority enables workload priorities to be set at the bucket level. • The bucket disk I/O priority can be set as either high or low (Default – LOW). • Bucket priority settings determine whether I/O tasks for a bucket are enqueued in either low or high priority task queues • Threads in the global pool poll the high priority task queues more often compared to the low priority task queues. • Bucket latency and I/O operations are impacted by the setting value. • can be configured during the initial setup and then edited after the setup -- after the setup results in a restart of the bucket and resetting of the client connections Disk I/O priority sachinkkansal@gmail.com 126 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 127. • Tunable memory enables both value-only ejection and full metadata ejection from memory. • The cache management approach for item ejection is implemented with value-only ejection and full metadata ejection Value-only ejection (the default) removes the data from cache but keeps all keys and metadata fields for non-resident items. When the value bucket ejection occurs, the item's value is reset. Full metadata ejection removes all data including keys, metadata, and key-values from cache for non-resident items. Full metadata ejection reduces RAM requirement for large buckets. • Full-bucket ejection supports very large data footprints (a large number of datasets or items/keys) since the working sets in memory are smaller. The smaller working sets allow efficient cache management a • Example, you might want to enable the full metadata ejection on that bucket if you need to store huge amounts of data (for example, tera or peta bytes) nd reduced warmup times. Metadata ejection is configured at the bucket-level. Tunable Memory sachinkkansal@gmail.com 127 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 128. • With 2.x, cb cached all keys and metadata in memory and allowed ejection of values only. That is great for low-latency access to any part of your data. • However some workloads don't require low latency access to all parts of data and rather have memory reserved for 'hotter' parts of the working set. • With 3.0, for large databases with a smaller active working set, you can turn on 'full ejection' and eject keys and metadata for parts of your data that is rarely accessed. Even if you consider keys and metadata small, with a large number of keys, the memory used for keys and metadata can add up. With the full ejection mode, you can more effectively use memory for caching larger parts of your working set. • Enabling the option is easy and can be done per bucket in the admin console. The change is transperant to apps so there is nothing that needs to be done on the application size to take advantage of the setting. sachinkkansal@gmail.com 128 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 129. • Working set management is the process of freeing up space and ensuring that the most used items are available in RAM. Ejection is the process of removing data from RAM to provide room for frequently used items. • The process that Couchbase Server performs to free space in RAM, and to ensure the most-used items are still available in RAM is also known as working set management. • Ejections is automatically performed by Couchbase Server. • When Couchbase Server ejects information, it works in conjunction with the disk persistence system to ensure that data in RAM has been persisted to disk and can be safely retrieved back into RAM if the item is requested. • In addition to memory quota for the caching layer, there are two watermarks the engine uses to determine when it is necessary to start persisting more data to disk. These are mem_low_wat and mem_high_wat. Tunable Memory - Working set management and ejection sachinkkansal@gmail.com 129 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 130. • As the caching layer becomes full of data, eventually the mem_low_wat is passed. At this time, no action is taken. • As data continues to load, it eventually reaches mem_high_wat. At this point, a background job is scheduled to ensure items are migrated to disk and that memory is available for other Couchbase Server items. • This job runs until measured memory reaches mem_low_wat. • If the rate of incoming items is faster than the migration of items to disk, the system can return errors indicating there is not enough space. • This continues until there is available memory. • The process of removing data from the caching to make way for the actively used information is called ejection and is controlled automatically through thresholds set on each configured bucket in the Couchbase Server cluster. Tunable Memory sachinkkansal@gmail.com 130 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 131. Insider action by server white paper :Tunnable Memory - Working set.docx Details more in PPT from Couchbase: Tunable Memory.pptx WEB : http://www.couchbase.com/nosql-resources/blog/all-new-30-full-ejection-tuning- memory-large-databases Tunable Memory sachinkkansal@gmail.com 131 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 132. TTL • Each document stored in the database has an optional expiration value (TTL, time to live) that is used to automatically deleted items. • The expiration option can be used for data that has a limited life and could be automatically deleted. • TTL is specified in seconds, or as Unix epoch time. • The default is no expiration. • Typical uses for an expiration value include web session data where the actively stored information needs to be removed from the system once the user activity has stopped. • With an expiration value, the data times out and is removed from the system without being explicitly deleted. • This frees up RAM and disk for more active data. Expiration sachinkkansal@gmail.com 132 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 133. • When Couchbase Server is restarted or when it is started after a restore from backup, • the server goes through a warm-up process. • The warm-up loads data from disk into RAM, making the data available to clients. • The warmup process must complete before clients can be serviced. • Depending on the size and configuration of your system, and the amount of data that you have stored, the warmup may take some time to load all of the stored data into memory. • use cbstats to get information about server warmup, including the status of warmup and whether warmup is enabled. • ep_warmup_thread - Indicates whether the warmup completed or is still running. Returns “running” or “complete”. • ep_warmup_state - Indicates the current progress of the warmup Server warmup sachinkkansal@gmail.com 133 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 134. • Replicas are copies of data that are proved on another node in a cluster. • A copy of data from one bucket, known as a source is copied to a destination, which we also refer to as the replica, or replica vBucket. • replica node The node that contains the replica vBucket . • source node The node containing original data to be replicated. • Distribution of replica data is handled in the same way as data at a source node • portions of replica data will be distributed around the cluster to prevent a single point of failure • After Couchbase has stored replica data at a destination node, the data will also be placed in a queue to be persisted on disk at that destination node. • When replication is performed between two Couchbase clusters, it is called cross datacenter replication (XDCR) Replicas and replication sachinkkansal@gmail.com 134 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 135. • The TAP protocol is an internal part of the Couchbase Server system and is used in a number of different areas to exchange data throughout the system. • TAP provides a stream of data of the changes that are occurring within the system. • TAP is used during replication, to copy data between vBuckets used for replicas. • It is also used during the rebalance procedure to move data between vBuckets and redestribute the information across the system. TAP sachinkkansal@gmail.com 135 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 136. • DCP is an innovative protocol that drive data sync for Couchbase Server v3.0. • The Database Change Protocol (DCP) is a streaming protocol that significantly reduces latency for view updates. • Increase data sync efficiency with massive data footprints • Remove slower Disk-IO from the data sync path • Improve latencies – replication for data durability • In future, will provide a programmable data sync protocol for external stores outside Couchbase Server • With DCP changes made to documents in memory are immediately streamed to be indexed without being written to disk. • This provides faster view consistency which provides fresher data. • DCP reduces latency for cross data center replication (XDCR). • Data is replicated memory-to-memory from the source cluster to the destination cluster before being written to disk on the source cluster. Database Change Protocol -- DCP sachinkkansal@gmail.com 136 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 137. Major difference between TAP and DCP:  Tap guarantees you will see all mutations at least once, but doesn’t guarantee any specific order  Tap doesn’t have the ability to restart from anywhere  De-duplication of items means that we cannot tell when we have a consistent view of the database Tap vs. DCP ©2014 Couchbase, Inc. 137 TAP DCP Ordering No ordering guaranteed! Ordered Restart-ability Not really! Granular Restart-ability Consistency No snapshotting capabilities here! Snapshots give a consistent view of the DB. Performance No memory based support for Views & XDCR Memory based all data synchronization components! w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 138. Since DCP is more of internal communication within Couchbase Server, its manageability is very limited and beyond the scope of administering directly. However, for understanding the internals, the building blocks are various components are listed in this doc. DCP.docx For details of internal functioning and its benefits here is the PPT from Couchbase DCP Deep Dive.pptx DCP sachinkkansal@gmail.com 138 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 139. Couchbase Client Libraries / SDK w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 140. • Couchbase Server is designed to distribute data across multiple nodes, which adds complexity to storage and retrieval. • To allow applications to save and retrieve data, Couchbase provides a set of language-specific client libraries, also called Couchbase Client SDKs. Currently, Couchbase provides officially supported SDKs for the following seven languages and runtimes SDK sachinkkansal@gmail.com 140 SDKs for several additional languages, including Clojure, Go, Perl, and Erlang, are maintained by the community as open source projects w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 141. An up-to-date list can be found at the official Couchbase website: www.couchbase.com/communities/all-client-libraries http://www.couchbase.com/open-source • These SDKs are used to communicate with the Couchbase Server cluster. • The SDKs usually communicate directly with the nodes storing relevant data within the cluster, and then perform data operations such as insertion and retrieval. • The Couchbase clients are aware of all the nodes in the cluster, their current state, and what documents are stored in each node. SDK sachinkkansal@gmail.com 141 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 142. Connecting to Couchbase ©2014 Couchbase, Inc. w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 143. Connect to Couchbase ©2014 Couchbase, Inc. 143 Step 1 Step 2 Step 3 Client Connection Steps w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 144. Service List ©2014 Couchbase, Inc. 144 • Dynamic Distributed Services • Dynamic Configuration Updates – No additional work from the developer • Fault Tolerant/Durable Connectivity Client Connectivity Characteristics: w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 146. Unified API - DML Methods, Add/Remove/Update ©2014 Couchbase, Inc. 146 Adding and Removing Information (JSON Documents, K/V Binary) • insert-Insert a document or binary key/value. Fails if the item exists. • upsert-Stores a document or binary key/value to the bucket, or updates if a document exists. • replace-Replaces a document or binary key/value in a bucket. Fails if the item doesn’t exist. • remove-Deletes an item from the bucket. Fails if the item doesn’t exist • append/prepend-Appends or prepends in place the value of a binary k/v item. Does NOT work with documents • touch-Updates the ttl of a documet. • getAndTouch-Retrieves a document or binary key/value and updates the expiry of the item at the same time. • counter-Increments or decrements a key's numeric value. w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 147. Unified API - DML Methods, Retrieval ©2014 Couchbase, Inc. 147 Retrieving Information (JSON Documents, K/V Binary) • get-Retrieves a document or binary key/value. • getAndLock-Lock the document or binary key/value on the server and retrieve it. When an document is locked, its CAS changes and subsequent operations on the document (without providing the current CAS) will fail until the lock is no longer held. • getReplica-Get a document binary key/value from a replica server in your cluster. • unlock-Unlock a previously locked document or binary key/value on within a bucket. w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 148. Unified API – DML, CAS Example ©2014 Couchbase, Inc. 1. Two Clients retrieve the same document "XYZ" 2. Client A retrieves it first. 3. Client B then retrieves XYZ. Both clients will have the same CAS value for document XYZ 4. Client B tries to perform an update to document XYZ. The update succeeds as the CAS value was unchanged from when Client B initially retrieved the document. Once the update succeeds, the CAS value for XYZ changes. 5. Client A then tries to perform an update on XYZ immediately after Client B. The update will fail as Client A's CAS value is out of date. When Client B updated XYZ, the CAS value changed. w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 149. For drag and drop UI driven ETL there is a connector for Talend: http://www.couchbase.com/couchbase-server/connectors/talend For simple reporting operations pentaho is really easy to use and is completely UI driven: http://www.pentaho.com With elastic search kibana can also be utilized for some really powerful reporting: http://www.elasticsearch.org For deployment we work with Cloudsoft (brooklyn): http://www.cloudsoftcorp.com/partner/brooklyn/ We also work with CumuLogic: http://www.cumulogic.com Both of these platforms work well for deployment and scalability. We have a library for Puppet for use with Vagrant: https://github.com/couchbaselabs/vagrants There's an excellent cookbook for chef available in the Supermarket: https://community.opscode.com/cookbooks/couchbase/versions/1.1.0 The Console allows you to do all management features on this system. These can also be done from the command line or the REST AP Additional Tools sachinkkansal@gmail.com 149 w w w .D ataC oncur.com sachinkkansal@ gm ail.com
  • 150. End Day 1 sachinkkansal@gmail.com 150 w w w .D ataC oncur.com sachinkkansal@ gm ail.com