Long journey of Ruby standard library at RubyConf AU 2024
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk Enterprise
1. Turnkey and Scalable Infrastructure for Splunk
Denis Guyadeen, VCPx4, CCAH
Sr. Systems Engineer
2. Nutanix – Who are we
Delivering Google-like Infrastructure for the Enterprise
Incorporated:
Raised:
Sep 2009
$72M
• Lightspeed
• Khosla Ventures
• Goldman Sachs
• Battery Ventures
Product launch:
Nov 2011
Employees:
+300 in 25 countries
Spent:
IP:
50 + patents filed
1 granted
Filing ~10 / quarter
~50% of capital raised
Recognition:
Nutanix is the Only Company to Receive
Best of VMworld Recognition for Past Three
Consecutive Years!
2
Best of VMworld, 2011
Best of VMworld, 2012
Best of VMworld, 2013
Best of Interop, Tokyo
2012
European
coverage:
+150 partners
+100 customers
+15 employees
3. Splunk Requirements
• Splunk is IO intensive
•
•
Write-intensive (ingest data)
Read-intensive (search)
• Project timeline
•
•
Time to value (incredible business value, how fast can you get it?)
Coordinating across different groups (Storage, Compute, Networking, etc)
• Operational Challenges
•
Server Sprawl
• More data sources
•
How do I add capacity?
4. Splunk Requirements
• Splunk is IO intensive – Use SSD
•
•
Write-intensive (ingest data)
Read-intensive (search)
• Project Timeline
•
•
Time to value (incredible business value, how fast can you get it?)
Coordinating across different groups (Storage, Compute, Networking, etc)
• Operational Challenges
•
Server Sprawl
• More data sources
•
How do I add capacity?
5. Splunk Requirements
• Splunk is IO intensive – Use SSD
•
•
Write-intensive (ingest data)
Read-intensive (search)
• Project Timeline – Use a Datacenter appliance
•
•
Time to value (incredible business value, how fast can you get to it?)
Coordinating across different groups (Storage, Compute, Networking, etc)
• Operational Challenges
•
Server Sprawl
• More data sources
•
How do I add capacity?
6. Splunk Requirements
• Splunk is IO intensive – Use SSD
•
•
Write-intensive (ingest data)
Read-intensive (search)
• Project Timeline – Use a Datacenter Appliance
•
•
Time to value (incredible business value, how fast can you get to it?)
Coordinating across different groups (Storage, Compute, Networking, etc)
• Operational Challenges – Use a Scale-Out Architecture
•
Server Sprawl
• More data sources
•
How do I add capacity?
7. Splunk Requirements
• Splunk is IO intensive – Use SSD
•
•
Write-intensive (ingest data)
Read-intensive (search)
• Project timeline – Use a Datacenter Appliance
•
•
Time to value (incredible business value, how fast can you get to it?)
Coordinating across different groups (Storage, Compute, Networking, etc)
• Operational Challenges – Use a Scale-Out Cluster
•
Server Sprawl
• More data sources – Use a Scale-Out Datacenter Appliance
•
How do I add capacity?
8. Nutanix – The Big Picture
Convergence of Compute and Storage
• A Datacenter Appliance to run Splunk
• Compute & Storage all-in-one appliance
• Higher performance – SSD built-in
• Faster time to value – Delivered as an Appliance
• Scalable
• Pay only for what you need now
• No unexpected surprises ($$$!) from architectural limits
• Less expensive
• Smaller datacenter footprint, less power, less cooling
• Easier to manage – All-in-one solution
9. Definition of the Next-Gen Datacenter
Physical transforms
to virtual
Scale-out architectures
Services delivered via
software
Commodity hardware
alters economics
Massively Scalable.
Elastic.
Agile.
11. Virtualization Changes Everything
• Complex to manage
• Costly to scale
SAN/NAS
Storage Network
• Managed separately
Centralized
Storage
from virtualization
• Difficult to provision
• Performance bottleneck
12. Cloud-Generation Systems
Convergence of Compute and Storage
The consumer cloud guys argued for…
Flatter datacenters
That scale by adding another x86
server…
NUTANIX INC. – CONFIDENTIAL AND PROPRIETARY
Building
embarrassingly
parallel DCs
13. Cloud and Web-Scale Architectures
Convergence of Compute and Storage
Add scale one x86
server at a time
Flat and
simple
datacenters
Software
driven to
reduce CapEx
14. Software-Defined Data Centers
Compute, Network, Storage, Security converged in x86 servers
So we are now converging on…
Flatter datacenters
That scale by adding another x86
server…
NUTANIX INC. – CONFIDENTIAL AND PROPRIETARY
Building
embarrassingly
parallel DCs
15. Software-Defined Data Centers
The Core Building Blocks
1
Simplicity
3
4
True
Convergence
2
Scale Out
All Sizes
All Workloads
All Hypervisors
5
End-to-end
Visibility
16. Convergence 2.0
Storage Fabric inside the x86 server, where VMs run!
Virtual Machine/Virtual Disk
Virtual Storage Control
Flash
Virtual Storage Control
HDD
Enterprise Grade Data
Services
Clones, snapshots,
replication,
compression, thin
provisioning
Fastest Performance
Hypervisor Agnostic
Data Locality,
Real-Time tiering,
De-Duplication
vSphere, KVM,
Hyper-V
17. The Virtual Computing Platform
Scale-out, Converged, Software-defined, Flash-enabled, and Hybrid
Convergence is only one of the pillars
of the next-generation datacenter…
18. Converged
Storage fabric collapsed on to compute
Hypervisor the de facto substrate, i.e., the new datacenter (DC) OS
All DC services are now virtual. No room for special-purpose “appliances”
Software-Defined
Zero hardware crutch
Deliver technology as a portfolio: pre-packaged, all sw, usage-based
VM-awareness for everything; mechanism decoupled from policy
Server-Side Flash
Flash needs to begin at the server, i.e., as close to compute as possible
Server-side form factors – DIMM-based, PCIe-based, SATA-based – critical
Hybrid Computing
Single control and data fabric to unify VMware ESXi, KVM, and Microsoft
Hyper-V environments
Private Cloud transparently bleeds into the Public Cloud
19. Achieving Scale
Metadata, Data Movement, Recovery, Self-description for Versioning
Self-Describing {Storage, Service}
protobuf’s for backward compatibility of data protobuf’s for versioning APIs,
services.
NoSQL
Metadata must scale with the cluster
Lock-less operations for metadata update: optimistic concurrency control
Compression
No impact to your workloads, order of magnitude faster then traditional algorithms,
runs on cold dataset
MapReduce for Scaling Operations
Massively Parallel Disk Recovery
Massively Parallel Data Rebalancing (when machines are added/removed)
Massively Parallel Data Tiering Algorithms
… and so on.
20. Large Clusters: Single Fabrics
Designed for Scalability Day One
1
Analytics
2
Configuration
4
Scaling UI
Patterns, Hotspots
Hive-based Log analytics, heat-maps
3
Scaling Ops
Rolling Upgrades, Add/Remove
Web Service, Stats
Every node upgrades itself, Auto-Discovery
Cluster re-balancing via MapReduce
Web service runs on all machines; leader
elected on the fly using ZooKeeper
Fine-grained stats stored in NoSQL
21. Pay-As-You-Grow
• Scale incrementally one server
at a time
• Protect infrastructure
investment by eliminating
forklift upgrades
• Scale storage capacity
and performance
independently
22. Elastic Deduplication Engine
Real-time deduplication for
RAM and flash
100% software-driven
Designed for scale-out
Extensible for all storage,
including HDD
23. Dynamic Cluster Expansion
Self-discovery with zero downtime
Flexible Clusters
Add nodes in 2 clicks
Expand cluster in minutes, not days or weeks
Self discovery
Automatically detects
new nodes
NUTANIX INC. – CONFIDENTIAL AND PROPRIETARY
Zero cluster
downtime
24. Rolling Upgrades
Zero downtime
Upgrade SW
with
NO DOWNTIME
Service Continuity
Dynamically utilizes neighboring controller
Minimal
Simple. administrative
intervention
Data remains available
No impact to end user
Even for large clusters
25. Capacity Optimization
Inline and Post-processed compression
•
•
•
Data compressed as its written (synchronously)
Ideal for archival data
High performance for sequential workloads
•
Data compressed after “cold” data is migrated
to lower-performance storage tiers
Processed only when data and compute
resources are available
No impact to normal IO path
Ideal for random batch workloads
•
•
•
VM
Centric
Purpose-built for virtualization
•
•
•
•
Increased usable capacity across all storage tiers
Compression policies align with VM-centric workflows
Maximum compression/decompression performance with Snappy algorithm
Sub-block compression for granularity and maximum efficiency
26. Snapshots / Clones
VM-centric with no LUNs or Volumes
Full VMware integration
Support for VMware API for Array Integration
(VAAI) primitives
Support for View Composer for Array
Integration (VCAI) standards
Offloads the virtualization tier to increase
performance of common operations
Native VM-centric
snapshots
No LUNs or
Volumes
Array-based
quick-clones for
efficient
provisioning
27. Native Disaster Recovery
VM-centric replication
VM-Centric workflows
Granular VM-based snapshots and policies
Better than LUN or file system based
Flexible n-way protection
Simultaneous bi-directional replication between sites
N-way master-master model
Data protection
VM and application level crash consistency
Flexible protection domains for VM grouping and
policies
28. Nutanix Prism
Consumer-like simplicity and cloud-ready
Prism GUI
Consumer-grade user experience
Vantage points for at-a-glance view of
server, storage and network operations
HTML5 based for multi device mgmt
Prism REST APIs
Supports all Nutanix functionality: server,
storage, virtualization and networking
Storage
Provides extensibility with OpenStack
and Cloud Management solutions
29. Why Virtualize Splunk?
Splunk Status Check
• Typically Bare Metal
• Dedicated, single-purpose
No Tiering
BIG DATA PRIVATE CLOUD
Provision clusters on
demand for test-anddev and ephemeral jobs
• Hot and cold data reside
SECURITY & MULTITENANCY
Keep data separate for
different business
units & prevent runaway jobs
on the same tier
Lacks Enterprise Features
• HA, vMotion, Snapshots,
Backup, DR, Quick Clones,
etc.
• Development & IT are
MANAGEABILITY
ELASTICITY
Use the same
Reclaim power, cooling,
and rack space and use
monitoring and
only what you need,
management tools you
when you need it.
know and love
tightly coupled
2
9
30. Linear scalability for Splunk
Convergence of Compute and Storage
EPS
Capacity
70
60
(EPS)
50
1,500,000
40
1,000,000
30
20
500,000
10
-
0
4
8
12
Nutanix Nodes
(4 nodes per 2U Appliance)
16
Raw Capacity
2,000,000
(TB)
Events Per Second
2,500,000
31. Dispelling the Myth
Nutanix outperforms virtualized and bare metal
Testing Events Per Seconds with Splunk on different appliances
160,000
124,000-126,000
EPS
120,000
73,409
80,000
40,000
38,731
0
EMC
HP
Nutanix
Bare metal
Rack Size
48U
EMC Isilon x400 (8 node)
2x UCS C240 Servers
vSphere 5
Per VM specs: 8 GB
RAM, 8vCPU
2U
.5U
DL 380
2 6 core Xeon
12 GB RAM
Nutanix 3000 series (1 node)
2x Xeon
vSphere 5.1
Per VM specs: 8 vCPU, 8GB
RAM
32. Splunk on Nutanix Reference Architecture
http://go.nutanix.com/rs/nutanix/images/TG_Splunk_on_Nutanix_RA.pdf
33. Splunk on Nutanix Reference Architecture
3 GB/s sequential
100,000 Random Read IOPS
500,000 EPS
2U
http://go.nutanix.com/rs/nutanix/images/TG_Splunk_on_Nutanix_RA.pdf
34. Technical Specifications
Complete Portfolio
NX-1000 Series
NX-6000 Series
NX-3000 Series
NX-1050
NX-3050/NX-3051
Per Node (4 per Block)
Per Node (4 per Block)
Server
Compute
Dual 6 core SandyBridge E5-2620 /
2.0GHz
Dual 8 core SandyBridge E5-2670 /
2.6GHz
Cold Tier
4 x 1 TB per node
4 x 1 TB per node
Hot Tier
400GB SSD per node
2 x 400GB/800GB SSD per node
(800GB/1.6TB)
Memory
64 or 128GB/node (DDR3 1600GHz)
128 or 256GB/node
(DDR3 1600GHz)
128 or 256GB/node
(DDR3 1600GHz)
Dual 10GbE, 2 x 1GbE
1 x IPMI (10/100 Mb/s)
Dual 10GbE, 2 x 1GbE
1 x IPMI (10/100 Mb/s)
Dual 10GbE, 2 x 1GbE
1 x IPMI (10/100 Mb/s)
Redundant 1100W, 110/1620W,
208V
Redundant 1620W, 208V
Redundant 1620W, 208V
Network
Connections
Power Supply
NX-6050
NX-6070
Per Node (2 per Block)
Dual 8 core SandyBridge
E5-2670 / 2.6GHz
Dual 8 core SandyBridge E52690 / 2.9GHz
4 x 4TB per node
2 x 400GB SSD per node
(800GB)
NUTANIX INC. – CONFIDENTIAL AND PROPRIETARY
2 x 800GB SSD per node (1.6
TB)
35. Next-Gen Infrastructure
• All physical resources pooled and abstracted
• Storage containers maintain logical separation between business units
• Runs mixed workloads with multiple hypervisors
• Simple elasticity through linear scale-out
• On-demand provisioning with existing virtualization tools
37. Broad Industry Recognition
Top 10 Storage
Startups
“ Converged storage makes SAN look like the mainframe.”
Computerworld
“ I am always hesitant to declare a product a "game changer" but Nutanix may have
just done that with their Nutanix Complete Cluster.”
George Crump, Founding Analyst
“ In the case of EMC, HP or NetApp, they're taking the same storage products they've
been selling for years and repackaging for virtual server environments. I think
Nutanix's product is a powerful solution. It's a powerful architecture concept.”
Andrew Reichman, Senior Analyst
“ Did Nutanix just create the ultimate server/storage big data combo hardware for VDI?”
Brian Madden, Independent Desktop Virtualization Expert
“ If workable in real-time, that would mean Nutanix has one-upped competitors like EMC
Corporation, Cisco Systems, NetApp, VMware and Hewlett-Packard”
Riley McDermid of VentureBeat
The underlying solutionthat is used to store Splunk must be flexible and able to scale easily without interruption to the operation of the Splunk environment.
Splunk processing requires dynamically scalable compute andstorage that can be non-disruptively scaled for capacity and performance.
Google uses Protocol Buffers for almost all of its internal RPC protocols and file formats.Google developed protocol buffers to improve the performance and efficiency of communication in a distributed system. Awesome serialization tech.
Anomalies
Key Points:The Nutanix architecture is based on these same design principles that are powering the world’s largest cloud datacenters.Specifically, the Nutanix virtual computing platform converges all compute and storage resources into a single, integrated system.Multiple sever nodes and Nutanix blocks can be seamlessly clustered to achieve massive scale.Each Nutanix solution delivered in a easy-to-deploy 2U appliances, with virtualization software pre-installed and ready to run out of the boxEach server node integrates a virtual storage controller to manage storage resources across the cluster, and for all guest VMs
3 ½ more perf then Isilon1 ½ times more perf then HP