2. The following is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
3. Agenda
• NoSQL Use Case
• Oracle NoSQL Database
• Architecture
• Integration with the RDBMS
• Benchmark Results
4. Use Case – Online Display Advertising
• Problem
• Very low latency requirements – Publishers require 50 – 60 ms response
time from the ad serving platform
• Extreme data velocity – Multi-millions of requests per second
• Highly available – 24/7 sites
• Revenue maximization – Deliver the most relevant ad to maximize
revenue
• Solution – Where to use a NoSQL Database?
• Cookie store – NoSQL database used to store cookies and associated
behavioral segments
• Track behavioral data – Beacons utilized during browsing to store
timestamp, frequency, and behavioral segments by cookie
• Optimize ad delivery – Recency, frequency, and behavioral segments
used to determine optimal ad to deliver to user
5. Online Display Advertising Overall Solution
Real Time Reporting and
Campaign Management
RDBMS
Hadoop Cluster
Ad Server
Multi Dimensional
Reporting
6. Online Display Advertising – Usage
Characteristics
• NoSQL Database
• Low latency high volume
• Millions of ad serving requests per minute or second
• Stringent latency requirements from publishers
• Loose consistency
• Cookie data used for ad targeting – Increase probability that user will click on ad.
• Relational Database
• Campaign booking information – hundreds of users
• Real time business metrics for publishers and advertisers
• Business financials for ad serving company
• Year to date revenue, quarter over quarter etc.
• Billing
• SOX reporting for public companies
• Hadoop
• Unique visits (select count(distinct)) over many terabytes of data
• Inventory forecasting across behavioral segments
7. Agenda
• NoSQL Use Case
• Oracle NoSQL Database
• Architecture
• Integration with the RDBMS
• Benchmark Results
8. A Distributed, Scalable Key-Value Database
• Simple Data Model
• Key-value pair with major+minor-key paradigm
• CRUD + range scans Application Application
• Scalability NoSQL DB Driver NoSQL DB Driver
• Dynamic data partitioning and distribution
• Optimized data access via intelligent driver
• High availability
• One or more replicas
• Resilient to partition failures
• Disaster recovery through location of replicas
• No single point of failure
• Transparent load balancing Storage Nodes Storage Nodes
Data Center A Data Center B
• Reads from master or replicas
• Driver is network topology & latency aware
• Elastic Expansion
• Online addition/removal of storage nodes and automatic data redistribution
9. Architecture – The Application’s Perspective
Application
NoSQL DB Driver
Shard 1 Shard 2 Shard N
Master Master Master
Replicas Replicas Replicas
10. Transactions
• ACID transactions at shard granularity
• Transaction Scope
• Single API call
• All records must have the same major key
• Multiple operations within a transaction via collections
• Can be relaxed for increased performance on a per-
operation basis
11. Simple Data Model
ACID Transactions – Configurability
• Configurable Durability Policy
• Configurable Consistency Policy
12. Integration with the RDBMS and Other
Products
• Oracle External Tables
• Export data directly from NoSQL database and create Oracle
External Table
• Pre-packaged utility
• Oracle Loader for Hadoop
• Parallel map reduce job
• Utilizes InputFormat
• Oracle Event Processing
• NoSQL data available through OEP query language (CQL)
13. Benchmarks – General Configuration
• YCSB-based QA/benchmarking
• Key ~= 10 bytes, Data = 1108 bytes
• Configurations of 6-30 nodes
• Typical Replication Factor of 3 (master + 2 replicas)
• 200m records per shard, 2 billion records in total
• 2 replication nodes per storage node
• Used SSDs - Two of them per host
• Minimal I/O overhead
• B+Tree fits in memory => one I/O per record read
• Writes are buffered + log structured storage system == fast write throughput