The effective use of big data is the key to gaining a competitive advantage and outperforming the competition. This change demands that companies consume and blend enormous amount of data created from divergent and inherently mismatched sources, which represents a paradigm shift to the traditional data warehouse.
Companies need to modernize their data warehouse, augmenting it with a platform that allows storage, processing, exploration and analysis of large and diverse datasets without limiting the ability to deliver the data access, and flexibility responding to the needs of the business. That’s where Oracle Cloud and Qubole work together delivering a new breed of data platform —capable of storing and processing the overwhelming amount of data that on-premises big data deployments cannot handle.
Watch this on-demand webinar to understand:
- Why deploying big data on-premises is expensive, complex to maintain and limits your ability to scale across new use cases and data sources
- How Oracle Bare Metal Cloud's predictable and fast performance compute and network services deliver the foundation of a cost-effective, high-performance big data platform
- How Qubole leverages Oracle Bare Metal Cloud to provide a turnkey big data service that optimizes cost, performance, and scale, enabling self-service data exploration.
Qubole delivers a cloud-based, turnkey, self-service big data service that removes the complexity and reduces the cost of doing big data. It leverages Oracle Bare Metal Cloud’s next generation of scalable, inexpensive and performant compute, network and storage public cloud infrastructure to provide a solution that accelerates time to market and reduces the risk of your big data initiatives.
2. Today’s Speakers
Craig Carl Xing Quan
Director of Solutions Architecture Senior Director of Product Management
3. Big Data Disrupts Markets
What do they have in Common?
Design products that fit customers
according to their DNA
Program recommendations
and commissioning new
content
Accurate estimated
time of arrival
Price suggestions for
hosts
New stores in very
close proximity
Search for similar
images
4. Challenges Implementing Big Data
• Variety (40%) and Volume (14%) are
the main drivers for big data
explosion
– Many disjointed sources
• Data silos only provide partial
answers
• Deploying big data on-premises:
– Is complex to maintain and operate
– Is expensive
– Requires expertise
– Unable to scale
Collect multiple
data sources
Make them
usable
Make it available
to the business
Big Data
5. Why Spark?
Spark Streaming
real-time
Spark SQL
Structured ad-hoc
MLlib
Machine Learning
GraphX
Graph Processing
Spark Core
Scala, Python
• Spark does processing in memory, which is faster than traditional HDDs
• It has a fully-featured ecosystem of products and use cases; in particular, it is
tailored toward a Data Scientist and algorithm/machine learning development
• It has a very simple API
• It’s open source and helps you avoid vendor and technology lock-in
6. Hadoop and Spark Model & Issues
• Hadoop/Spark puts compute and
storage together within a compute
node
• Forces compute and storage to scale
together, which is not ideal
• The cluster must be persistently on or
else the data is inaccessible
C+S
C+S
C+S
C+S
C+S
C+S
C+S
C+S
C+S
C+S C+S C+S
7. A Modern Data Platform
• Leverage the cloud
– On-demand and elastic compute
– Scale out object storage
• Expand and contract based on workloads
• Turnkey service, rather than a managed
software or hardware
– Increase time to value
• High degree of automation, orchestration
and self-service enablement
– Reduce costs and complexities
Big Data
Ephemeral
Automation
Self-service
Orchestration
8. 8
Oracle Bare Metal Cloud Services
Craig Carl
Director of Solutions Architecture, Bare Metal Cloud
9. • Over 600 people in Seattle and Northern California
• Hundreds of experts at delivering high-scale production cloud products
– AWS, Azure, Google, Joyent, F5, Salesforce
• To a one we’re passionate about solving large scale distributed compute
problems, passionate people build amazing product
• Combined with Oracle’s decades of success in the enterprise market
9
Deep cloud engineering experience
Oracle Bare Metal Cloud Services
10. 10
Industry’s first Bare Metal Cloud Service
(with Virtual Machines, of course!)
Fully Dedicated
Industry’s first fully
dedicated instances –
no hypervisor, agents,
noisy neighbors or
shared resources
Built for Enterprise
Apps
Built to support
demanding enterprise
applications
Performance-First
Performance-first
approach with
significantly higher
performance than
existing cloud options
Pay-as-you-go Pricing
Pay by the hour for
everything: compute,
IP address and block
storage – burst up or
down quickly
Automated and API
Driven
RESTful APIs, SDKs,
orchestration, CLIs,
complete and public
documentation
Fast Provisioning
Spin-up bare metal
instances in less than 5
minutes, virtual
instances in 90
seconds
Mix Bare Metal and
virtual instances
Identical user
experience between
Bare Metal and Virtual
instances
11. 11
OBMCS Fundamentals: Availability Domains
Regional Model
Sub-millisecond latency between ADs
10Gb/sec between each instance, inter and intra AD
12. 12
• Multiple instance types
– Standard – 256 GB RAM
– High I/O – 12.8 TB NVMe SSD, 512 GB RAM
– Dense I/O – 28.8 TB NVMe SSD, 512 GB RAM
– 1, 2, 4, 8, 16 core VMs (7GB mem/core)
• Bare Metal instance shapes
– 36 cores 2.3 GHz Intel® Xeon® processor E5-2600 v3
– 10Gb network
• Images
– Oracle Linux, CentOS, Ubuntu, Windows
– Support for custom images and custom OSes
Compute
13. 13
• Single node Oracle database
– High and Dense instances
• 2 node Oracle RAC
• Exadata
– Quarter
– Half
– Full rack
DB Systems
14. 14
Services Oracle BMCS vs AWS
High Performance Compute
(DenseIO compared to AWS I2.8xlarge)
8 core Virtual Machine
(Compare to AWS M4.2xlarge)
Outboard Data Transfer $
86%
Lower
$
38%
Lower
2.25 x
Cores
$
21%
Lower2 x
RAM
11.5 x
IOPS
4.5 x
Storage
Similar
RAM
Same
Cores
1 Pricing
dimension
vs. 4
Free
inter-AD
10 x Free
Egress
16. Simple
• A complete data platform solution
• No need to manage infrastructure
• Self-service data access across the enterprise
Agile and Fast
• Spark and Hadoop clusters in minutes
• Builds on Oracle Bare Metal Cloud
performance advantages
• Get business insights faster
Cost
• Stand up your Spark or Hadoop infrastructure
at a fraction of the cost
• Reduce operation and management cost
Qubole is a Turnkey
Big Data Service on
Oracle Bare Metal Cloud
17. Built for Anyone who Uses Data
Analysts l Data Scientists l Data Engineers l Data Admins
Big Data
Your Way.
Qubole automates,
controls and orchestrates
your big data workloads so
that you can optimize
performance, cost
and scale.
A Single Platform for Any Use Case
ETL & Reporting l Ad Hoc Queries l Machine Learning l
Streaming l Vertical Apps
Open Source Engines, Optimized for the Cloud
Native Integration with Oracle Bare Metal Cloud Service
Leverages the Oracle Cloud Platform’s speed and performance
18. Spin up real-time streaming data
processing on-demand
115% Faster
than on-premises
QUBOLE DATA SERVICE (QDS) SPARK SQL
ON ORACLE CLOUD PLATFORM
INFRASTRUCTURE
• 115% faster on reporting queries and
50% faster on analytics queries than
Cloudera Impala on-premises*
19. What makes us different
19Qubole Confidential
User Productivity
• Self-service data access
• Simple Interfaces
• Increased Personas on Oracle BMC
Amplify the Cloud
• Object Store as data lake
• Leverage Network Performance
• Support for all shapes
Automation
• Automatic use of Oracle BMC APIs
• Cluster lifecycle management
• Auto-scaling
• Software Upgrades
Elasticity
• Scale 34x on average
• Reduce TCO by 33%
• Drives scale to Oracle BMC
20. The Most Scalable Platform
500 PB
Data Processed in the
Cloud Monthly
500 Nodes
Largest Spark Cluster in
the Cloud
2000
Clusters Started per month
6 PB 80 PB 150 PB 500 PB
22. Maximize
productivity and
reduce complexity
with automated
lifecycle cluster
management
Control costs – pay
only for what you
use with Auto-
scaling
Control mixed
workloads, multiple
clusters and
different engines
with a single control
panel or REST API
Data Engineers and Data Admins
23. Faster exploration
& iteration with an
agile infrastructure
Built to adopt
existing, new &
future technologies
– no vendor lock-in
Improve
productivity with a
collaborative
platform
Data Analysts and Data Scientists
24. Qubole auto-scaling advantage
12.5
10.0
7.5
5.0
Ten Node Cluster (fixed)
Five Node Cluster (fixed)
7 8 9 10 11 12 13 14 15 16 17 10% cheaper, but 90% slower
Commands per Hour Auto-scale –Nodes per Hour
Workload fluctuation
60% of the time
13% faster, but 32% more expensive
25. Dataflow Diagram
User Access
Qubole UI
via Browser
SDK
ODBC/
JDBC
Qubole SaaS Tier
Web Servers and
Control Logic
Database
Account and User Settings
Default Hive Metastore
Customer’s Bare Metal Cloud Tenancy
REST
API
Oracle Bare Metal Compute
Ephemeral Clusters
Oracle Cloud
Platform Object
Store
Oracle Cloud VCN
Compartment
Oracle
User
DB DB
Oracle Bare Metal Compute
Oracle Bare Metal Compute
Oracle Bare Metal Compute
Oracle Bare Metal Compute
Persistent Storage
27. Thank You
Get Free TrialGET BOOK REGISTER FOR A WEBINARREGISTER FOR CONFERENCE
http://bit.ly/DataOpsBook https://www.dataplatforms.com/ https://www.qubole.com/event/