This paper briefly describes what NoSQL systems are and the motivating factors for their recent
interest. Core concepts associated with the NoSQL data model are introduced, compared and
differentiated with the relational model. The four major types of NoSQL systems are then
discussed, along with brief descriptions of some well known data store implementations.
Included are short descriptions of how NoSQL systems are currently helping solve some of the
business problems of big data and search. Finally, the paper concludes by mentioning some
topics and issues that lie ahead for the NoSQL movement.
3.
A beginners introduction to NoSQL
3. Terminology
Some of the terminology used in the paper is listed below. They will be useful to understand and
refer back to when found in use within this paper.
● NoSQL systems: are those systems that use a variety of data store types (databases)
● Hashing: is the process of mapping data of arbitrary length to a fixed length and is used
to uniquely identity documents
● Caching: is the process of storing recently used information so that it can be quickly
accessed at a later time
● Sharding: is a process of dividing a data set and distributing it across multiple servers
● Horizontal scaling: or scale out, is the process of joining multiple computers together to
provide more processing power
● Cluster: is composed of a set of processors called nodes grouped together in racks
● Key/Value: is the term used when identifying data (value) by an arbitrary name (key)
● Indexing: is a method for sorting data by multiple fields
● Replication: is a term to describe the sharing of information between systems so as to
ensure it’s consistency and high availability
● JSON: or Javascript Object Notation, is a universal internet data exchange format
● XML: or eXtensible Markup Language, is a universal internet data exchange format
● RDBMS: or Relational Database Management System, is a database management
system that stores data in the form of related tables
● CAP Theorem: or Consistency, Availability, and Partition tolerance theorem, is a
computing rule that states a system with the three properties of Consistency, Availability,
and Partition tolerance can only provide two of three services at any given time
● ACID: or Atomicity, Consistency, Isolation, Durability, are properties of transactional
control systems such as an RDBMS
● BASE: or Basic availability, Softstate, Eventual consistency, is the alternative to ACID,
and is used to describe systems, such as NoSQL, that focus on data availability more
than data consistency
● MapReduce: is a programming model for processing large amounts of unstructured
data in parallel on large clusters of commodity hardware
3
4.
A beginners introduction to NoSQL
● Hadoop: is an open source software project that uses the MapReduce framework to
enable the distributed processing of large data sets
Lets discuss some essential NoSQL concepts next.
4. No SQL Concepts
Important key concepts and architectural guidelines of NoSQL are:
● Simple Building Blocks
● Layered Architecture
● Hashing and Data Distribution
● Distributed Caching
● Sharding
Each of these concepts is described in more detail below.
4.1 Simple Building Blocks
NoSQL systems are created using modular and simple components that can be reassembled
to meet the needs of different applications [1]. For instance, a system could consist of several
simple functions: one that allows sharing of objects in memory, another that executes batch
jobs, and a third that is responsible for storing documents. The focus for each of the functions is
to provide efficient services that are frequently used to power a distributed service.
4.2 Layered Architecture
NoSQL systems make use of application tiers to simplify design, similar to many other systems
such as the Relational Database Management Systems (RDBMS). However, NoSQL
applications are distributed differently than RDBMSs. Both types of systems consist of the User
4
10.
A beginners introduction to NoSQL
● They make use of replication to create backup copies of data in real time and deliver fast
read/write consistency.
● They let the database distribute queries evenly to data nodes, and then efficiently
combine the results together.
6.2 Search
Search involves finding an item of interest in a database when only partial information is available
about the item. NoSQL systems combine document store concepts with full text indexing to
deliver high quality search solutions [1]. Document stores keep data in single hierarchical tree
and don’t shred elements into rows within tables. The retained structure then can be used to
exactly locate a matched keyword within the document. NoSQL solutions when used together
with highly scalable processes such as MapReduce can be used to create reverse indexes for
enabling fast search.
6.3 High Availability
The features that allow NoSQL systems to scale out and handle big data problems can also be
used to increase the availability of database servers. NoSQL architectures make effective use
of a couple of strategies to create high availability systems backed by clusters of multiple
systems [1]:
● By using a load balancer to direct traffic to the least busy node
● By using highavailability distributed file systems. One such high availability system is
the Hadoop Distributed File System (HDFS).
Simplicity of design of NoSQL systems is another feature that promotes high availability.
Organizations benefit by the cost effectiveness provided by operating highavailability NoSQL
systems that run on multiple processors.
10
12.
A beginners introduction to NoSQL
7. Future and Beyond
NoSQL systems are relatively new. There are a number of challenges that NoSQL systems
face: Some of these challenges and what is in the horizon for the further evolution of NoSQL
systems, along with some other viable persistence alternatives, are briefly described next.
7.1 Lack of trained Administrators and Developers
In today’s environment, most senior administrators and developers have extensive experience
writing code and managing relational databases. NoSQL systems are still unchartered territory
for many of them. There is a significant shortage of skilled professionals who can evaluate the
needs of enterprises to determine the type of NoSQL systems best suited for their business
purpose, and then also administer and develop the algorithms and programs to operate with their
choice of NoSQL system [8, 9].
7.2 Adoption Readiness
Many enterprises are reluctant to invest in commercial NoSQL technology due to the lack of
trained professionals to manage, optimize, and develop applications for them. Also, there still
are a few enterprises that feel the technology is not quite ready for primetime. Finally, there is a
need for the increased development of more big data applications that are ideally suited for the
use of NoSQL systems [10].
7.3 Support for Real-Time Analytics
Realtime analytics is the use of all available enterprise data and resources whenever they are
needed. It involves dynamic analysis and reporting, based on data entered into a system a
fraction of seconds before the actual moment of use. Relational databases are ideally suited for
complex query and analysis, however real time analysis of operational data is better suited for
12
13.
A beginners introduction to NoSQL
NoSQL systems. Today, business intelligence and analytics support with NoSQL databases is
still very new, but has started to grow rapidly [2].
7.4 Global Transaction Support
The Consistency, Availability, and Partitiontolerance (CAP) theorem states that a distributed
system cannot simultaneously provide all three services, and at best can provide two of the
three services. This is widely understood within the NoSQL community as justifying the need for
NoSQL systems to provide high availability over consistency. Consistency of data is one of the
guarantees of transactional systems, and is must for certain types of enterprise applications,
such as financial applications. The growing understanding that consistency of data is crucial for
many applications has many NoSQL system designers now favoring a return to support for
transactions with NoSQL [12]. And to provide the transaction support functionality without
sacrificing the NoSQL functional advantages in the area of scalability, faulttolerance,
concurrency, and performance.
7.5 Other Storage Technologies
NoSQL systems have done a great deal to open up the world of databases. But they are still
only a part of the picture of choosing the right persistence option for the task at hand. There are
several other persistence options besides relational and NoSQL systems, and here are just a
couple of them briefly described:
● File Systems.
File systems are ubiquitous, and are widely used for storing personal productivity
documents. They are similar to keyvalue stories with a hierarchic key and provide little
control over concurrency. They offer no support for queries on their own, and work best
for a relatively small number of large files that can be processed in big chunks [2].
13