Oracle vs NoSQL – The good, the bad and the ugly

REMINDER
Check in on the COLLABORATE
mobile app
Oracle vs. NoSQL
The good, the bad and the ugly
John Kanagaraj
Member of Technical Staff,
PayPal Database Engineering,
An eBay Inc. company

Housekeeping
■  Check the font sizes
▪  Can you read this at the back of the room?
▪  Can you read this at the back of the room?
▪  Just kidding!
■  Silence your Phones!
■  Q & A : Ask as we go along (and I will repeat the question)
▪  Keep it relevant to the slide at hand
▪  I might defer the question to a later slide if I believe it is
addressed later
▪  If it gets too long, I humbly request we deal with it after the
break or after the session
■  It is a long day, so if you nod off it is ok (hopefully no snoring!)

Agenda
■  Big Data – What it is, why should we care
■  NoSQL – What it is, and why do we need it
■  Concepts you need to understand
▪  CAP Theorem (and why it is important)
▪  Unstructured Data
▪  Sharding and Replication
▪  Data Modeling in the brave new world of NoSQL
■  Introduction to some popular NoSQL stores
■  A look into the (immediate) future: Moving forward

Not on the Agenda
■  Not a Tutorial on various NoSQL datastores
■  NotAnInstallationGuide
■  NotAnAdministrationManual
■  If you already know the CAP Theorem and NoSQL:
▪  I will be covering the basics (so you know!)
▪  We are all here to share and learn: Maybe I can learn from your
questions/inputs (time and context permitting)
▪  Let’s talk after the talk (or during the break)

Speaker Qualifications
■  Currently Database Engineer @ PayPal
■  Has been working with Oracle Databases and
UNIX for too many years J
■  Author and Technical editor
■  Frequent speaker at OOW, IOUG
COLLABORATE and regional OUGs
■  Oracle ACE
■  Contributing Editor, IOUG SELECT Journal
■  Loves to mentor new speakers and authors!
■  http://www.linkedin.com/in/johnkanagaraj

Big Data – The Why
■  2.5 quintillions of data is generated every day
▪  (1 quintillion = 1018 Bytes): so that is ~= 2.3 Trillion GB
▪  Humans (using devices) as well as Machines (IoT)
—  Location data emitted by your smart phone
—  “Web-scale” Webserver logs and interactions
—  Sensor data emitted by almost every networked device: E.g.
Cars’ fuel/pressure gauges, Personal fitness devices (wearables)
—  Multi-media sources: Security cameras, Face/Plate recognition
—  Data that matters to you: Medical, Scientific, Weather
▪  Lots of value in this data, but mostly untapped
▪  Most of this is never stored: Too big to store, but not too big
to understand J

■  Plummeting cost of technology
▪  Storage Cost/GB – 1980 : $437,500, 2013 : $0.05
▪  Computing Cost – Moore’s law
▪  Network transportation Cost – WiFi, BLE, etc.
■  What is driving this?
▪  Cheaper to store data than to delete/ignore it
▪  Minimal cost to generate, transport and store
▪  Ubiquity of network, storage and data generation
▪  Accelerating advances in science and technology
▪  Machine learning and intelligence is growing
Source for storage cost: http://www.statisticbrain.com/average-cost-of-hard-drive-storage/

Infographic:
h.p://www.ibmbigdatahub.com/infographic/four-‐vs-‐big-‐data

Big Data Characteristics: 4 V’s + 1
■  Volume – Scale at which data is generated
▪  Cannot be stored using traditional methods
▪  Cannot be stored in a monolithic store
■  Variety – Different forms of data
▪  Big Data is usually not structured; structure not known in
advance; structure not controlled by consumer
▪  May not always be in text form (more than just binary)
■  Velocity – Data arrives in a continuous stream
▪  Multiple, varied source produce data continuously
▪  Peaks and bursts unpredictable
▪  “Always on”: No down time for maintenance or re-orgs
▪  No “Known Users” – unpredictable, unknown patterns/scale

Big Data Characteristics: 4 V’s + 1
■  Veracity – Uncertainty: Data is not always accurate
▪  Multiplicity of sources creates convergence of truth
▪  Eventual consistency (versus immediate consistency)
■  Value – Immediacy and hidden relationships
▪  In many use cases, value of Big Data declines quickly
—  Traffic reports do not matter after 30 minutes
—  Routing resupply trucks is counterproductive after the fact
—  However, some historical value may be derived post the event
▪  Concept of “Near Line” data (neither fully online or offline)
▪  Easy to miss hidden relationships
—  Most data sets are correlated to other data sets, implicitly or explicitly
—  Not easy to detect due to volume and variety
—  Mine data using various techniques (Data Science)

So how do we store this storm?
■  Big Data impossible to store using RDBMS
▪  Too big, too fast for RDBMS to ingest
▪  RDBMS needs “schema before write”
▪  Unknown structures = “schema during read”
■  So what is limiting RDBMS?
▪  ACID requirement drives “protection” mechanism
▪  Redo and Undo in Oracle provides ACID
▪  “Relational” imposes “schema before write”
▪  Easy to get “small bits”; hard to get “large pieces”

So how do we store this storm?
■  RDBMS’ are essentially ACID
▪  Atomic: Transactions fully succeeds or fully fails
▪  Consistent: Transactions moves the database from one
consistent state to another
▪  Isolated: Transactions cannot interfere with each other
▪  Durable: Committed transactions persist even during failure
■  RDBMS Clusters = “Shared everything” for ACID
■  Atomicity in a distributed database: Two Phase commit
▪  Essential for splitting workload
▪  Reduction in availability though!
■  New concept! BASE (Basically Available, Soft state, Eventual
Consistency)

Conﬁden=al
and
Proprietary
14

■  Heap table with one or more “right growing” indexes
−  Primary Key: Unique index on a NUMBER column
−  Key value generated from an Oracle Sequence (NEXTVAL = 1)
−  I.e. “monotonically” increasing ID value
−  High rate of insert (> 5000 inserts/second) from multiple sessions
−  Multiple indexes, typically leading date/time series or mono-valued
−  E.g. Oracle E-Business Suite’s FND_CONCURRENT_REQUESTS
■  Here’s the Problem:
−  All INSERTing sessions need one particular index block in CURRent
mode (as well as one particular data block in CURRent mode)
−  Question: Would you use RAC to scale out this particular workload?
A common scalability inhibitor

Conﬁden=al
and
Proprietary
15

■  Here’s what happens to accommodate the INSERT
−  Assume the current value of the PK is 100, and NEXTVAL = 1
−  Assume we have ‘N’ sessions simultaneously inserting into that table
−  Session 1 needs to update the Index block (add the Index entry for 100)
−  Session 2 wants the same block in CURRent mode (add another entry for
101; needs the same block because the entry fits in the same block)
−  Session 3… N also want the same block in CURRent mode at the very
same time (as all sessions will have “nearby” values for index entry)
−  Block level pins/unpins (+ lots of other work – Redo/Undo) required….
−  Same memory location (SGA buffer for Index block) accessed
−  Smaller but still impacting work for buffer for Data block
−  Rate of work constrained by CPU speed and RAM access speeds
A quick deep dive

Conﬁden=al
and
Proprietary
16

■  What if you use RAC to “scale out” this workload?
−  Assume “N” sessions simultaneously inserting from 2 RAC nodes (2xN)
−  In addition to previously described work, you need to
−  Obtain the Index block from remote node in CURRent mode
−  Session 1 (Node 1) updates Index block with value 100
−  Session 2 (Node 2) requests block in CURRent mode (value 101)
−  LMS processes on both nodes churn CPU co-ordinating messages and
block transfers back and forth on the interconnect
−  Flush redo changes to disk on Node 1 before shipping CURRent block
to Node 2 (gated by RedoWriter response!!!)
−  Sessions block on “gc current <state>” waits during this process
−  CPU, Redo IO, Interconnect, LMS/LMD processes involved
A quick deep dive

Conﬁden=al
and
Proprietary
17

■  Some solutions
−  Spread the pain for the right growing index
−  Use Reverse Indexes (cons: Range scan not possible)
−  Use Hash partitioned indexes (cons: All partitions probed for Range
scan, Need Partitioning Option, Additional administration)
−  Prefix RAC node # (or some identifier per node) to key
−  Use a modified key: Use Java UUID, Other distinct prefix/suffixes
−  Use Range-Hash Partitioned tables with Time based ID as key
−  E.g. Epoch Time (# of seconds from Jan 1, 1970) + Sequence value for
lower bits
−  Enables Date/Time based partitioning key
−  Unique values allow Local Index to be unique
A quick deep dive

Relaxing ACID – Skip the Redo/Undo ☺
■  BASE Model
▪  “In partitioned databases, trading some consistency for
availability can lead to dramatic improvements in scalability”
▪  Proposed by Dan Pritchett (eBay) in 2008
▪  ACID is pessimistic; enforces consistency at the end of a
transaction
▪  BASE is optimistic; accepts eventual consistency
▪  Supports partial failure without total failure
■  Enabled new paradigms
▪  New patterns for distributing workload emerges
—  Sharding and Replication
—  Less than perfect (but good enough) consistency

A New Beginning - NoSQL
■  A new dawn emerges…
▪  Brewer proposes CAP theorem (2000)
▪  Google creates BigTable (~ 2006)
▪  Amazon creates Dynamo (~ 2007)
▪  eBay shards over Oracle Databases (2008)
▪  Inspires a new set of alternate data storage
projects
▪  NoSQL databases start appearing…
(~2008 – 2010)
▪  Becomes a buzz word (~ 2011 – 2013)
■  Now we all want “in”…
■  Picture courtesy Kamran Agayev via Twitter

So What is NoSQL?
■  NoSQL – supposed to be “No SQL”, but it is NOT
■  NoSQL – Loosely it is “Not Only SQL” (i.e. NOSQL)
▪  Term coined by Eric Evans (developer at Rackspace)
▪  Adopted by Johan Oskarrson (another developer)
▪  For a meetup of like minds at SF, 2009
▪  Meetup for “open-source, distributed, nonrelational
databases” [Voldemort, Cassandra, CouchDB, MongoDB, etc.]
■  NoSQL does not mean there is no “SQL-Like” interface
▪  Cassandra supports CQL (Cassandra Query Language)
■  NoSQL does NOT always mean Big Data
▪  But Big Data stores are almost always NoSQL based
▪  That is, if you count Hadoop as a NoSQL datastore *
* See: http://wiki.apache.org/hadoop/HadoopIsNot

A small diversion: The Hadoop ecosystem
■  Let’s understand Hadoop vs. the Rest
■  Hadoop – The real Big Data Store
▪  Real Big platform to store data
▪  Store almost anything and everything
▪  Key components of Hadoop:
—  HDFS: A unified file system that combines all storage in the cluster
—  MapReduce: A programming model to handle large data sets
—  An extensile ecosystem: Other components to control, schedule and
manage processing and the cluster
▪  Is NOT a database (although there is HBase)….
▪  But supports SQL-like interface using Hive
▪  Not really meant for Online, Web-site facing implementation

A small diversion: The Hadoop ecosystem

Big Data / NoSQL Landscape
From http://www.bigdata-startups.com/open-source-tools/

Why NoSQL?
■  Impedance Mismatch
▪  Real world data does not naturally posses structure
▪  A “Person” has many variable characteristics
▪  Applications deal with a “person” object
▪  This is then a set of In-memory structures
▪  Relational Databases require structured table/columns
though….
▪  Thus, an “impedance mismatch” between Dev and DBA
▪  Which ORM’s try to bridge (the gap between Dev and DBA)
—  Cultural mismatch: “Agile” (Dev) seems to be “Fragile” (for a DBA)
—  Technical mismatch: “Objects” to “Relational Tables”
—  Storage structure mismatch: “Un-/Semi-structured” to “Structured”

Why NoSQL?
■  Rapid “web-scale” growth for external entities/users
▪  Ability to support viral/burst traffic patterns
■  Most data does not (usually) need immediate consistency
▪  It is ok to lose some data; It is Ok not to have ACID
■  Commodity hardware and the Cloud
▪  RDBMS’ don’t run well on clusters (apologies: RAC world)
—  Shared Disk clusters are both a SPOF and expensive!
—  License costs for RDBMS on clusters
—  Failure of one component brings everything down
▪  Clustering cheaper commodity hardware is economical
—  Single or even a small number of failures affect a portion of
workload, not the whole application (due to sharding)
▪  Easier to create a “cloud” with commodity hardware

Why NoSQL?
■  Open patterns
▪  Almost all NoSQL products is open-source
▪  Relatively open learning
—  Meetups; Open seminars run by vendors
—  Lively blogs and passionate contributors
▪  Quick-and-easy installs
▪  Community versions from vendors
▪  Easy to install on for-rent cloud environments
▪  Monitoring/Alerting through open frameworks (Nagios, Ganglia)
■  Enterprise support through vendor
▪  10gen for MongoDB; DataStax for Cassandra; CouchBase
▪  Cloudera, Hortonworks, MapR for Hadoop
■  Large Webscale companies building own NoSQL databases

NoSQL Characteristics
■  “Schema before write” vs. “Schema before read”
▪  Caters to “unstructured” need
▪  Primarily solves Impedance mismatch
▪  Creates its own challenges
■  Modeled by read and write patterns
▪  “customer and orders” together for a customer centric view
▪  “product and orders” for a production/supply-chain centric view
▪  Alternative: Store twice
■  Data modeling driven by physical storage model
■  Read patterns
▪  Secondary indexing (overheads)
▪  Brute-force access via MapReduce jobs
▪  Store multiple, denormalized copies (“disk is cheap”)

NoSQL Characteristics
■  ACID is “relaxed”
▪  A transaction is limited to an aggregate (k-v pair)
▪  Enables distributed, shared-nothing architectures
▪  Ideal for clustered deployments
▪  Optimistic locking
▪  Some loss of data and consistency is expected (and catered to)
■  Write patterns
▪  UPDATEs converted to INSERTs (timestamped/tombstoned)
▪  Time-To-Live (TTL) based DELETE’s/Purges
▪  Compaction based garbage collection
▪  Reduced Write latency due to memory only writes
▪  Transaction logging supported in some NoSQL stores

Why use an RDBMS then?!
■  ACID may be a hard business requirement
▪  Data loss can never be tolerated
▪  Data inconsistency can never be tolerated (e.g. Money
movement)
■  Complex data models favor RDBMS
▪  Try modeling Oracle EBS in NoSQL J
■  Standardized interface via SQL
▪  Broadly same across all RDBMS
▪  Well understood, skills availability
■  Inter-application integration
▪  Single platform for data created it’s own ecosystem
■  Cost to change is prohibitive

Introducing the CAP Theorem
■  Eric Brewer’s conjecture at the July 2000 ACM Symposium
■  Formalized by Seth Gilbert and Nancy Lynch in 2002
■  Any networked shared-data system can have at most two of
three desirable properties:
▪  At least one Consistent (C) up-to-date copy of the data
▪  high Availability (A) of that data (for both reads and updates)
▪  tolerance to network Partitions (P)
■  Core systemic requirements in a distributed environment
▪  Special symbiotic relationship
▪  Present during design and deployment of applications in a
distributed environment (whether acknowledged or not)
■  Applies well to the distributed NoSQL world

Components of the CAP Theorem
■  (C)onsistency
▪  All clients see the same results from a query, even in the
presence of an update at the same time as the query
■  High (A)vailability
▪  All clients can write or access data, even in the presence of
system failures. Requestors receive acknowledgment of
success or failure
▪  Performance may degrade, but consuming applications are able
to access data even though some parts of the system may not
be operational at the time of a query
■  (P)artition Tolerance
▪  The system returns results regardless of failures in
communication between partitions in the distributed system; i.e.
system property holds true even if there is a network partition

Illustrating the CAP Theorem (adapted)
■  You start a small business: Provide phone reminders/information
■  Customers call with information; You call back/respond to remind
■  Start small: All information written down in your (single) notebook
■  Business grows: Wife is recruited (scale out, PBX shards calls)
■  Inconsistency: Response misses info updated in Wife’s notebook
■  Resolve inconsistency: All notebooks updated when call ends (lock)
■  Wife’s day off: You leave sticky notes (Inconsistent until next day)
■  Wife fights with you: Network Partition (sticky notes thrown away)
■  You have a choice here: CAP Theorem in play – Pick two
▪  (C) Always provide consistent information to clients
▪  (A) Business is always open if at least one of you is present
▪  (P) Business is open even during a loss of communication between 2
■  Run around clerk: Eventual consistency and Compaction

Examples of CAP Theorem pairs
■  Consistency and Partition Tolerance (CP): Banking Transaction at an ATM
▪  Data needs to be consistent in the presence of updates
▪  If there is a network failure, dispense cash but limit the transaction amount
▪  Transaction still available, but system property changed due to network partition
■  Consistency and Availability (CA): Database System-of-Record
▪  Data Consistency is key
▪  During is a network failure, clients stop writing (no redo), no write availability
▪  Present in Oracle Data Guard’s Maximum protection mode/Single node DB
■  Availability and Partition Tolerance (AP): Shopping cart in Amazon.com
▪  Spread data across multiple partitions to be always available
▪  Reconcile cart at checkout (may result in dual purchases!)
▪  Sacrifices consistency, but works for most cases, most of the time

CAP Theorem in the Oracle World
■  Application Scalability: Some well-known techniques
▪  Partition workload by function
—  Schema level split: data unrelated to each other is segregated
—  Typically provides headroom for main workload/environment
▪  Distribute transactions
—  For related data that still needs to be viewed together
—  Typically using Database links
—  Typically for master lookups and remote writes
—  Introduces dependencies (more on that soon)
▪  Decouple work asynchronously
—  Use AQ to write tokens or keys to process later
—  Introduces a “delay”: Data not immediately consistent

CAP Theorem in the Oracle World
■  Application Scalability: Some well-known techniques
▪  Offload reads using Active Data Guard (DB 11g and above)
▪  DG copy opened for reads during Real Time Apply
▪  DG allows Redo Data shipping in 3 modes
—  Maximum Protection: Zero loss but dependent on remote redo write
—  Maximum Performance: Remote redo written asynchronously
—  Maximum Availability: Switches to Max Performance mode on remote
redo write failure, operates in Max protection mode otherwise
▪  Offers multiple shades of availability and protection
▪  ADG and “read your writes” pattern
—  RTA apply is not equal to “instant” apply
—  Not “immediately consistent” but “eventually consistent”

CAP Theorem in the NoSQL World
■  Realization of CAP enabled NoSQL to “break free”
▪  Opened minds of database developers
■  However, the “2 of 3” rule was somewhat misleading
▪  NoSQL datastores offer options to vary consistency/durability
and availability levels
▪  MongoDB has “Write Concern” – Unacknowledged,
Acknowledged, Journaled, Replica Acknowledged
▪  Cassandra has Write Consistency: From ANY to ALL
■  Reality is a spectrum between C and A in the presence of P
▪  Eventual Consistency is a given
▪  Some data loss is expected
▪  Application code/other techniques will need to cater for this

Sharding and Replication in NoSQL
■  NoSQL datastores: essentially shared-nothing clusters
■  Relaxing ACID allows distributed processing (CAP applies!)
■  Ability to scale out reads/writes is the key
■  Achieved using two techniques: Sharding and Replication
■  Sharding: Divide and Rule
▪  Data is read/written to different servers (“shards”)
▪  Location determined applying a fixed function on a known key
▪  Different functions: Modulo, Hash, Range, Programmatic
▪  Efficacy of load balancing dependent on function and data
▪  Typically used for Write-scaling (more than Read-scaling)
▪  (Hash partitioned tables/indexes are essentially object level
sharding in Oracle databases to enable write scaling)

Sharding and Replication in NoSQL
■  Sharding (contd.)
▪  Difficult, if not impossible to change function once implemented
▪  No consistency across shards, or across aggregates
▪  No joins allowed – no cross-shard dependencies
▪  Resilience does not improve (but enables partial availability)
▪  Not to be implemented lightly: Start single if you can
▪  Many NoSQL stores allow auto-sharding (e.g. CouchBase)
■  Replication: Allow multiple copies
▪  Master-Slave model: Simplest, Scales out reads only; Read
resilience; May need to cater for eventual consistency
▪  Peer-to-Peer or Multi-Master model: Scales out reads and
writes, but consistency/conflict resolution is a big problem
■  Can combine Sharding and Replication!

The NoSQL Datastore Landscape
■  Generally four types:
▪  Key-Value
▪  Document
▪  Column Family
▪  Graph
■  Not using the relational model, i.e. schema-less
▪  But not without a Data Model!
■  Runs on clusters of commodity hardware
■  Generally Open Source
■  Can be considered as storing/retrieving “aggregates”
▪  a collection of related objects that can be treated as a unit
■  Usually described by “Keys” and “Values” (i.e. K-V pairs)

Key-Value NoSQL stores
■  The most basic of NoSQL stores
■  Simple K-V structure: A “blob” of data (“Value”) indexed and
accessed via a “Key”
■  “Value” part also known as Aggregate
■  Aggregate is a collection of related objects treated as a unit
■  Written/Updated/Read/Consistent as single, smallest unit
■  Typically, aggregate is limited in size (BLOB in Oracle)
■  Typically, expressed in JSON, and sometimes in XML
■  JSON/XML aggregates are self-describing
■  Value is “opaque” in a K-V store, but is simple
■  Scale out with sharding
■  Examples of K-V store: Riak, Oracle NoSQL

Key-Value NoSQL stores
■  Typical Use cases
▪  Shines when you need simple GET/PUT operations
▪  Session state; Tokens – Enables web-scale
▪  User profiles and preferences – Typically latent caching layer
▪  Latency bridge: Support RYOW’s in some cases
■  Anti-patterns
▪  No ad-hoc query patterns - (i.e. need key to access)
▪  Not meant for analytics type workload
▪  When multi-key/multi-operation consistency is required
▪  Set based operations (i.e. related data)

Document NoSQL stores
■  Datastore able to understand and manipulate structures
■  Needs to follow an agreed format
▪  usually JSON, but BSON, XML and YAML
■  Support for secondary indexes
▪  Needs ability to understand/index K-V pairs in the aggregate
▪  Secondary indexes may throttle write rate
■  Aggregate size usually limited
■  Scale-out again supported via sharding
▪  Some stores support multiple sharding methods (MongoDB)
■  K-V store sometimes evolve into Document stores
▪  E.g. CouchBase evolution
■  Needs embedding/linking support (size/other limitations)

Document NoSQL stores
▪  Of course, any collection of document-type models
▪  Easy-to-start NoSQL projects when moving from RDBMS
▪  Almost any NoSQL use case needing secondary index access
▪  Content and Metadata store: typically multiple keys
▪  Queries using materialized views (CouchBase)
▪  Non-trivial sharding (MongoDB)
▪  Horizontally scaled or Cached reads (MongoDB, CouchBase)
▪  Models requiring simple relationships (Blogs, User modeling)
■  Anti-patterns:
▪  Not a drop-in replacement for RDBMS
▪  Evolving relationships or query patterns
▪  Usually not good for write-heavy

Column Family NoSQL stores
■  Characteristics of CF Stores
▪  Data is mostly organized by sets of columns
▪  Key – Value based access
▪  “Value” consists of sets or ranges of columns
▪  Still unstructured
▪  No joins (except via another keyed table, using MapReduce)
■  Cassandra, Hbase, Amazon SimpleDB are prime examples
▪  HDFS on a Hadoop cluster underlies HBase
▪  HBase evolved from Google’s BigTable
▪  Cassandra evolved from Facebook
▪  Cassandra also supports CQL (a SQL like language)

Column Family NoSQL stores
▪  Data is mostly organized by sets of columns (super columns)
▪  Key – Value based access
▪  “Value” consists of sets of columns (but still unstructured)
▪  Lots of repeated sets of values (e.g. Customer transactions)
▪  No joins (except via another keyed table, using MapReduce)
▪  Write-intensive patterns (Internet-of-Things type data)
▪  Rolling expiry patterns such as Time series data
▪  IMHO Low-latency reads (in comparison to other NoSQL stores)
▪  Need access via secondary or other keys

Graph NoSQL stores
■  Stores Nodes and Edges
■  Provides “Index-free Adjacency”
■  Nodes are entities: People, Accounts, Items, Locations
■  Edges connect Nodes to other Nodes
■  Edges have properties
■  Can mine patterns present in these relationships
■  Supports graph-like queries:
▪  Shortest distance between two locations
▪  Social Graphing: Connecting people
▪  Products that your friends liked
■  Neo4j is a well-known graph database
■  Giraph: An open source graph processing systems (FB!)

Graph NoSQL stores
■  Typical Use Cases
▪  Social Graphs
▪  Recommendation Engines
▪  Graph transversal uses cases
▪  Relationships with defined end-points
▪  Routing and Location based solutions
▪  Account Linking (e.g. for fraud detection; peer risk checking)
▪  Scale out via sharding typically not supported in some products
▪  Update all/Update most patterns
▪  Dangling end-points

Some more concepts: JSON
■  You need to understand JSON
▪  Java Script Object Notation
▪  Self describing, English text key-value pairs
▪  In other words, a simpler version of XML
▪  No externally imposed structure (hint: No tab/column mapping!)
{
"id":101,
”first_name":”John",
“second_name”:”Kanagaraj”,
”residential_address":[{“add1”:”20 First St”, "city":”San Jose”, “state”:”CA”}],
“phone”:”408-555-9999”
}
▪  Can you spot some optimization here?

Some more concepts: Languages
■  You need to understand JVMs and some Java
▪  Many NoSQL stores use JVM based programs
▪  E.g. Hadoop, Cassandra
▪  Ability to understand JVM’s and their internals is key
▪  JVM’s Garbage Collection needs to be managed
▪  Need to understand/configure JMX (Java Management Xtensions)
▪  Most NoSQL stores support Java API’s out of the box
■  Most NoSQL stores support more than just Java
▪  E.g. Python, Ruby, Perl, C/C++, Node.js, Go
▪  Less-well known ones such as Erlang, Haskell, Scala
▪  Need to able to install and troubleshoot app issues
■  Deploy/Management: Puppet, Nagios, Ganglia, Fab
▪  Frameworks can do more than just NoSQL!

MongoDB: Document datastore
Client

MongoS
MongoS

MongoD

(Master)

MongoD

(Slave)

MongoD

(Slave)

MongoD

(Master)

MongoD

(Slave)

MongoD

(Slave)

MongoD

MongoD

Replica
Set
1
Replica
Set
2

1
3
2
•  Write
scaling

Sharding
through

MongoS

•  Read
scaling
via

Replica
sets

•  Writes
to
Master

Node,
reads
from

Master
and
Slave

nodes
(op=onal)

MongoD

Routers

Conﬁg
Servers

4

MongoDB: Data Modeling
RDBMS
MongoDB

Database
Database

Table
Collec=on

Row
Document

RowID
_id

Index
Index

Join
Embedded

Document

(DBRef)

Foreign
Key
Reference

Order
ID:
1001

Customer:
John

Order
Line
Items:

20001
–
Tires
–
2
x
$84
-‐
$168

45320
–
Pump
–
1
x
$54
-‐
$54

Payment
Details:

Card:
Amex

CC:
3425268768

Exp:
03/17

Total:
$222

Order

Customer

Line
Items

Financial

Instrument

FinTrans

Journal

{
“order_id”:
“1001”,
“customer”:”John”,

“orderitems”:
[
{“prodid”:”20001”,
“prodname”:”Tires”,
“Qty”:2,
“price”:168},

{“prodid”:”45320”,
“prodname”:”Pump”,
“Qty”:1,
“price”:54}
],

“pcard”:”Amex”,”pcc”:”3425268768”,”pexp”:”03/17”,”ord_tot”:222
}

MongoDB: Essentials
■  Stands for “huMONGOus DataBase”
■  Reads and Writes using memory-mapped files
▪  Try and fit working set in memory
▪  Use SSDs for faster I/O
■  Very good index support on identified JSON fields
▪  Allows Key-Value, Range and text search queries
▪  Unique as well as Compound Indexes
▪  Special TTL (Time-to-Live) index to retire data
■  Stores documents in BSON format (Binary JSON)
■  Interact, manage, program through Mongo Shell
■  Many other drivers and interfaces
■  Support for Geospatial data and queries
■  Aggregation Framework and MapReduce support

MongoDB Physical/Memory Mapping

MongoDB: Essentials
■  Query optimizer exposes execution plan
■  Multiple sharding methods:
▪  Range-based sharding: Optimized for range queries
▪  Hash-based sharding: Ensure uniform distribution
▪  Tag-aware sharding: Partitioned by user-specified configuration
■  Write-ahead journaling
▪  Journal commits every 100ms (oplog is capped collection)
■  Configurable Write-availability via Write Concern
▪  Unacknowledged (memory only)
▪  Acknowledgement for specific levels:
—  Write to at least 2 replicas in the same datacenter
—  Write to at least 1 replica in remote datacenter
■  Commercially supported by 10gen (now called MongoDB)

MongoDB: The Not-so-good…
■  Reads block Writes (albeit for very short periods ~ microsecs)
▪  Be careful about aggregation/MapReduce: Intense reads
▪  Read lock yields when read has to go to disk
▪  Read locks can be shared by multiple readers
■  Writes block Reads (Writer-greedy, for very short periods)
■  Locks are at a “database” level
▪  Careful with your data model!
▪  Typically restrict one collection per database if possible
▪  Write to multiple documents will yield periodically
■  Index creation (writes) locks your entire database
■  Replicates to Slaves and locks all slaves in Replicaset
■  Compaction also locks the database
■  Secondaries block on replication writes

CouchBase – Another Document Store
Couchbase Cluster"
Multitenant Architecture"
Server Nodes"
User/applica=on
data

based
on
bucket
par==oning

Which
live
on

Data Buckets"
Documents"
Read/write
from/to

That
form
a

Clients

Servers

dynamically
scalable

CouchBase Single-Node Architecture
Replica=on,
Rebalance,

Shard
State
Manager

REST
management

API/Web
UI

8091

Admin
Console

Erlang
/OTP

11210
/
11211

Data
access
ports

Object-‐managed

Cache

Storage
Engine

8092

Query
API

Query
Engine

hDp

Data
Manager
Cluster
Manager

CouchBase: Background and Use cases
■  Created as a Merge of code and ideas:
▪  MemCache – An excellent memory only cache
▪  CouchDB – A Key-Value store
▪  Now a Persistent Cache
▪  Code in Erlang and C++ (??)
▪  Different ports for both products – now merging
▪  Lots of MemCache implementations
▪  Now can upgrade into CouchBase quickly – Moxi client
■  Primarily as a Caching solution
▪  Very fast for reads and writes
▪  Some concerns with cross data center replication
▪  IMHO - Not yet suited for RYOWs via secondary key

Cassandra: Column-Family datastore
Node
1

Node
2

Node
3

Node
4

Node
5

Node
6

Client

•  Hash
func=on(Key)
=>
Token

•  Client
writes
to
selected
Node
as
per

Token

•  Coordinator
Node
replicates
to
other

nodes
(Timed
per
Quorum
selng)

•  Node
acknowledges
to
coordinator

•  Acknowledgement
to
client

•  Data
wri.en
to
internal
commit
log

•  If
node
goes
oﬄine,
writes
stop

•  When
node
rejoins,
a
“hinted
handoﬀ”

process
completes
the
pending
writes
+

“read
repair”

•  Requests
can
range
from
ANY
to
ALL

•  ANY:
Write
to
commit
log
on
at

least
1
node

•  ALL:
Writes
complete
to
memory

and
commit
log
on
ALL
replicas

•  Availability
precedes
Consistency
(AP)

•  Read
and
Write
Paths
are
separate

Cassandra: Column-Family datastore
(1)  Write:(K1,{C1:V1})

(2)  Write:(K1,{C2:V2})

(3)  Write:(K2,{C1:V3,C2:V4})

(4)  Write:(K1,{C1:V5,C3:V6})

K1
C1:V1
Memory

Disk

K1
C1:V1

C2:V2

K1
C2:V2

K2
C1:V3
C2:V4

K2
C1:V3
C2:V4

C1:V5
C3:V6

K1
C1:V5
C3:V6

Memtable

Commit
log

Index

K1
C2:V2
C1:V5
C3:V6

K2
C1:V3
C2:V4

SSTable

Cassandra: Essentials
■  Write Path is simpler; Reads are a little more complex
▪  Merge Memtable (Row/Key cache) and Row Reads from Disk
▪  Uses Bloom Filter to decide which SSTables to skip (false +ive)
▪  In-memory caches are stored in Java heap (GC!!!!)
▪  Can return inconsistent data for RYOW (depending on Quorum)
▪  Consistent: (nodes_written + nodes_read) > replication_factor
■  Compaction: Merge SSTables; Expire Tombstoned data (TTL)
■  Data Modeling:
▪  Model your queries – Optimize for reads
▪  Denormalize – Reads: Slow; Writes: Fast; Disk: Cheap
▪  Column families are stored sorted by timestamp
■  CQL: Cassandra Query Language – A familiar interface
■  Maintaining the Cluster: Gossip and Snitch J

Choosing the right NoSQL database:
ASCII the right question!
■  Is this a site-facing, P1 Application?
■  Is this a BI/Analytics type problem waiting to be solved?
■  Is this Write Intensive or Read Intensive?
■  Is this a Caching problem?
■  Can the application afford some data loss?
■  What about data consistency?
■  What is more important – consistency or availability?
■  How many data centers need to be supported?
■  What are the query patterns? Are they widely varying?
■  How many distinct clusters of data are present, and how are
they related?
■  Is my organization ready to support this product?

Generic problems
■  Consistency is and will be a problem in the NoSQL world
■  Data loss will be present - application should cater to this
▪  Consider the cost of workarounds/cost of data loss
■  The world of NoSQL is evolving:
▪  Maturing slowly: Peak -> Sliding into the trough
▪  Too many choices: 150 choices: http://nosql-database.org/
▪  Many picking the wrong product…
—  (and had to change it later: Check my Delicious stream #nosql)
▪  Most NoSQL vendors still VC funded
▪  New Versions/Features every 6 months!
▪  We will learn lessons the hard way…..

Real World problems
■  Need to break out of the RDBMS/ACID world
▪  Imagine a world with no COMMITs, no “Transactions”
▪  Data loss and Data inconsistency is inevitable
▪  Data Owners/Architects shy away: FUDs, Real dangers
■  Everyone wants to become (or is!) a NoSQL expert
▪  Spell NoSQL and earn $$$ J
▪  Best way to learn: Create a “Big Data” need and fulfill it
▪  Who makes the decisions?
■  Lack of skills and maturity
▪  Product choice: Knowledge/Experience/Forethought required
▪  Many NoSQL products still basic in functionality
▪  Be prepared to back out of your initial choice

How to get there (from here)?
■  This presentation is just the beginning
■  Lots and lots of reading and experimenting required
■  Recommended Reading:
▪  NoSQL Distilled by Fowler and Sadalage
▪  Seven Databases in Seven Weeks: Redmond and Wilson
▪  Many NoSQL books – browse at Safari Online
■  Lots of links to read – Live links:
▪  Follow me on http://delicious.com/jkanagaraj - Tag #nosql
■  Play with the community versions:
▪  Available from the vendors: No support though
▪  Spin up/use Cloud based VMs – Rackspace or AWS

A warning – And some advice
“Some people, when confronted with a big data
problem, think, I’ll use Hadoop. Now, they have a
big data problem and a big Hadoop cluster”
Dmitry Ryaboy, Engineering Manager, Twitter
▪  Start small
▪  Grow with success
▪  Create your own expertise
▪  It is about the untapped potential in your data

Please
ﬁll
in
the
feedback
form!

Link
up
with
me
on
LinkedIn

John
Kanagaraj,
PayPal,
an
eBay
Inc.
Company

Oracle vs NoSQL – The good, the bad and the ugly

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Oracle vs NoSQL – The good, the bad and the ugly

Ähnlich wie Oracle vs NoSQL – The good, the bad and the ugly (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Oracle vs NoSQL – The good, the bad and the ugly