The document summarizes Oracle's Big Data Appliance and solutions. It discusses the Big Data Appliance hardware which includes 18 servers with 48GB memory, 12 Intel cores, and 24TB storage per node. The software includes Oracle Linux, Apache Hadoop, Oracle NoSQL Database, Oracle Data Integrator, and Oracle Loader for Hadoop. Oracle Loader for Hadoop can be used to load data from Hadoop into Oracle Database in online or offline mode. The Big Data Appliance provides an optimized platform for storing and analyzing large amounts of data and is integrated with Oracle Exadata.
2. The following is intended to outline our general product
direction. It is intended for information purposes only, and
may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing
decisions.
The development, release, and timing of any features or
functionality described for Oracle’s products remain at the
sole discretion of Oracle.
3. Case: On-line Ads and Content
NoSQL
DB
Expert
System
Real-time: Determine
best ad to place
on page for this user
Input into
Lookup user
profile
Add user
if not present
Web
logs
HDFS
Profiles
NoSQL DB
High scale
data reductions BI and
Analytics
Billing
Predictions
on browsing
Actual
ads
served
Low
Latency
Batch
4. Agenda
• Big Data Technology
• Oracle Big Data Appliance
• Big Data Applications
• Summary
• Q&A
6. • Deep Analytics
• Agile Development
• Massive Scalability
• Real Time Results
• High Throughput
• In-Place Preparation
• All Data Sources/Structures
• Low, predictable Latency
• High Transaction Volume
• Flexible Data Structures
Big Data: Infrastructure Requirements
Acquire Organize Analyze
12. •18 Sun X4270 M2 Servers
– 48 GB memory per node = 864 GB memory
– 12 Intel cores per node = 216 cores
– 24 TB storage per node = 432 TB storage
•40 Gb p/sec InfiniBand
•10 Gb p/sec Ethernet
Oracle Big Data Appliance Hardware
13. Big Data Appliance
Cluster of industry standard servers for Hadoop and NoSQL Database
• Focus on Scalability and Availability at low cost
Compute and Storage
• 18 High-performance low-cost
servers acting as Hadoop
nodes
• 24 TB Capacity per node
• 2 6-core CPUs per node
• Hadoop triple replication
• NoSQL Database triple
replication
10GigE Network
• 8 10GigE ports
• Datacenter connectivity
InfiniBand Network
• Redundant 40Gb/s switches
• IB connectivity to Exadata
14. Scale Out to Infinity
Scale out by connecting racks
to each other using Infiniband
• Expand up to eight racks without
additional switches
• Scale beyond eight racks by adding
an additional switch
15. •Oracle Linux 5.6
•Java Hotspot VM
•Apache Hadoop Distribution v0.20.x
•R Distribution
•Oracle NoSQL Database Enterprise
Edition
•Oracle Data Integrator Application
Adapter for Hadoop
•Oracle Loader for Hadoop
Oracle Big Data Appliance Software
16. Why Open-Source Apache Hadoop?
• Fast evolution in critical features
• Built by the Hadoop experts in the community
• Practical instead of esoteric
• Focus on what is needed for large clusters
• Proven at very large scale
• In production at all the large consumers of Hadoop
• Extremely stable in those environments
• Well-understood by practitioners
17. Software Layout
• Node 1:
• M: Name Node, Balancer & HBase Master
• S: HDFS Data Node, NoSQL DB Storage Node
• Node 2:
• M: Secondary Name Node, Management,
Zookeeper, MySQL Slave
• S: HDFS Data Node, NoSQL DB Storage Node
• Node 3:
• M: JobTracker, MySQL Master, ODI Agent,
Hive Server
• S: HDFS Data Node, NoSQL DB Storage Node
• Node 4 – 18:
• S: HDFS Data Nodes, Task Tracker, HBase
Region Server, NoSQL DB Storage Nodes
• Your MapReduce runs here!
18. Big Data Appliance
Big Data for the Enterprise
• Optimized and Complete
• Everything you need to store and integrate
your lower information density data
• Integrated with Oracle Exadata
• Analyze all your data
• Easy to Deploy
• Risk Free, Quick Installation and Setup
• Single Vendor Support
• Full Oracle support for the entire system and
software set
20. Key-Value Store Workloads
• Large dynamic schema based data repositories
• Data capture
• Web applications
• Online retail
• Sensor/statistics/network capture/Mobile Devices
• Data services
• Scalable authentication
• Real-time communication (MMS, SMS, routing)
• Personalization / Localization
• Social Networks
21. Oracle NoSQL DB
A distributed, scalable key-value database
• Simple Data Model
• Key-value pair with major+sub-key paradigm
• Read/insert/update/delete operations
• Scalability
• Dynamic data partitioning and distribution
• Optimized data access via intelligent driver
• High availability
• One or more replicas
• Disaster recovery through location of replicas
• Resilient to partition master failures
• No single point of failure
• Transparent load balancing
• Reads from master or replicas
• Driver is network topology & latency aware
Storage Nodes
Data Center A
Storage Nodes
Data Center B
NoSQLDB Driver
Application
NoSQLDB Driver
Application
22. • Operation result
• New Partition Map
• RepNodeStorageTable
information
Resolving a Request
Hash Major Key to determine
Partition id
Use Partition Map to map Partition
id to a Rep Group
Use State Table to determine eligible
Storage Node(s) within Rep Group
Use Load Balancer to select best
eligible Rep Node
Contact Rep Node directly
Client
Operation + Key[M,m] + Value + Transaction Policy
23. ACID Transactions
Transaction Policy
Write Durability
• Configurable per-operation,
application can set defaults
• Write Transaction Durability consists
of both
a) Sync policy (on Master and
Replica)
• Sync – force to disk
• Write No Sync – force to OS
buffer
• No Sync – write to local log buffer,
flush when convenient
b) Replica Acknowledgement Policy
• All
• Simple Majority
• None
Transaction Policy
Read Consistency
• Configurable per-operation,
application can set defaults
• Read Consistency specified as
Absolute, Time-based, Version or
None
• Absolute Read from the master
• Time-based Read from any
replica that is within <time-
interval> of master or better
• Version Read from any replica
that is current with <transaction-
token> or higher
• None Read from any replica
24. Oracle NoSQL DB Differentiation
• Commercial Grade Software and Support
• General-purpose
• Reliable – Based on proven Berkeley DB JE HA
• Easy to install and configure
• Scalable throughput, bounded latency
• Simple Programming and Operational Model
• Simple Major + Sub key and Value data structure
• ACID transactions
• Configurable consistency & durability
• Easy Management
• Web-based console, API accessible
• Manages and Monitors: Topology; Load; Performance; Events; Alerts
• Completes Oracle large scale data storage offerings
25. Try NoSQL Database on OTN
Oracle NoSQL Database:
• Community Edition is available as a software
only distribution
• Enterprise Edition is available as a separately
licensable product or as part of Big Data Appliance
38. Big Data Appliance
Big Data for the Enterprise
• Optimized and Complete
• Everything you need to store and integrate your lower
information density data
• Integrated with Oracle Exadata
• Analyze all your data
• Easy to Deploy
• Risk Free, Quick Installation and Setup
• Single Vendor Support
• Full Oracle support for the entire system and software
set
39. Big Data Appliance and Exadata
Big Data for the Enterprise
NoSQL DB
HDFS
Hadoop
RDBMS