This document discusses strategies for scaling a Splunk deployment to handle more use cases, data, and users. It covers indexing strategies like indexer clustering and cross-site clustering. It also discusses search head clustering for high availability and scaling search capacity. Other topics include distributed management, centralized configuration, and hybrid cloud deployments.
2. 2
Splunk at the Next Level
Time to move beyond initial Splunk environment
• More use cases – how to tackle?
• More data – how do we scale?
• Splunk is mission critical == HA
• Global deployments
• Splunk user experience Screenshot here
4. 4
Growing your Splunk Deployment
Many customers start with a single use case…
• Ex: Monitor the web servers
• Help ensure up-time & response times
• Track usage, errors
• Provides business value
5. 5
Growing your Splunk Deployment
Value statement for each overall service
Your services exist in a larger context than just one app, or one tier.
What is the value of the service as a whole?
What are CIO commitments for the service?
• The company’s web store is one of the most critical parts of the business.
• Performance of the overall environment must be maintained at all times.
• Failures in any portion of the web store must be quickly identified, send
notification to the appropriate parties.
• Dependencies on external processes must be monitored as well.
6. 6
Growing your Splunk Deployment
The larger context
• Failure in one system cascades
• Map dependencies, estimate costs
• Use Splunk to track all dependencies.
• What happens when it is down?
Dependencies often include:
• Networking dependencies
• Shared storage
• Databases, middleware, custom apps
• Virtualization layer
Screenshot here
10. 10
Scaling - Storage
Simple storage to complex
Raw data rate net compression of ~ 50% on disk.
Simple: rate * compression * retention
200 GB / day * 50% * 100 days = 10TB
Consider cold storage on NAS
– Changes storage story.
– Retention on fast, retention on slow
Clustering
– Changes storage story
12. 12
Scaling - Storage
RAID + SSD deep dive
• For spinning disks, Splunk recommends RAID 1+0 with 1k IOPs
• SSDs provide extremely high IOPs (45,000 +)
• RAID 5 SSD arrays give great Splunk performance in most
scenarios.
Additional details: Splunk Docs, Capacity Planning Manual
13. 13
Forwarder Load Balancing
Have UF balance across multiple indexers
Multiple hosts in outputs
LB not needed!
Geography-based routing
(DNS round robin)
14. HTTP Event Collector
Introducing a New Way to send data!
• Created w/Developers in Mind
• Send via HTTP/HTTPS
• token-based
• Send directly from anywhere!
• Easy to configure / works out of the
box.
• Highly performant and scalable
15. 15
Indexer Clustering
High-Availability, Out of the Box
Splunk indexer clustering
Active-Active= better performance
Specific terms:
– Master Node
– Peer Node
– Search Factor
– Replication Factor
Additional details: Splunk Docs, Distributed Deployment Manual
17. 17
Scaling the Search Heads
Splunk Search is critical, too!
Splunk Search high availability needs
Scale to handle # of concurrent queries
Search Parallelization (Optimized CPU…Let’s talk about DMs!)
Intelligent Job Scheduling
18. 18
SHP vs SHC
SHC
• SHP
• Available since v4.2
• Sharing configurations through NFS
• Single point of failure
• Performance issues
• No NFS
• Replication using local storage
• Commodity hardware
NFS
20. 20
Search Head Clustering
Use “Captain” for Master to avoid confusion with Index-Clustering
Minimum 3 nodes required. Odd is always preferred.
Cluster takes certain key decisions based on *majority* (consensus)
In multi-site setup have more nodes in main datacenter
22. 22
Deployment Server
Central management of Splunk Forwarders
Deployment Server manages Apps, Configs
Select one or more classes for each host
Class defines apps & configs
Works by phone-home
Notes:
DS does not push forwarder binaries
Use Cluster Master to manage indexers in cluster, not DS
25. 25
We Want to Hear your Feedback!
After the Breakout Sessions conclude
Text Splunk to 20691
And be entered for a chance to win a $100 AMEX gift card!
I want to look at my Web Server Environment
Gauge end user reponses, look for 404 errors or whatever you are doing in your environment
Look at dependencies that Web Servers have.
Same conversations about email, or Active Directory or other Key services
Dependencies, middleware, storage, information available to you on the wire.
If any part of this environment goes down, what is the business impact of that?
Started with just looking at a Web Server
But
Load balancers
Firewalls
DNS Servers Facing the Internet
All of that guides people to your Web
None of it works when the database is down,
middleware
How to Plan out the Number of Disks you Need as well as Scaling out your Search Heads
SSDs 50K IOPS . So far off the charts. On a SATA based SSD using MLC. So the cheapest thing you can buy and it just goes through the roof after that
RAID 5 is terrible for Performance if you are standard physical disk.
RAID 5 with SSDs an option
Avoid RAID 5 when you can afford RAID 1+0 or any time you have spinning Disks
When you want to scale that out, consider moving your Cold out to NAS
He said, we support SIFS (what are SIFS), we don’t recommend but can use for cold. Heavy reads, no writes
Virtualizing:
Biggest concern is shared disk storage.
Do you have OLTP high transactional Oracle Databases running in your
Too high of a Disk Profile, if no, then your Splunk Indexers shouldn’t be running there either.
Give 100% reservation when you can
Use the same reference specs
Our Splunk in the Cloud is all Virtualized. There is nothing inherently wrong about a virtualized environment, you just have to be careful
Splunk for Vmware App
Side Note: If you double the number of indexers, if you double the number of Indexers you will effectively double the performance.
28.00
Outputs.conf file: IP or hostname of a single indexer
Pointed to a DNS multi-value A record (what they call a round robin A record) or you can identify the indexers
If you are using DNS round Robin. Lots of solutions, the first one they see
Indexers.splunk
Put them into a pool and randomly cycle through all ten of those indexers
Don’t need a Load Balancer, if you have
Any time you have an application that understands Load Balancing, it is going to do a better job because
31.20
Geography Based Routing.
How many have more than one physical location
Indexers that are geographically located. Data from all of those local sources can roll to the Indexer located locally within that data center
Developers sending data to splunk – had to make a service ID user – then maintain that – then what happens when that developer leaves the company?
JSON – no fields
By the
32.25
Active-Active All are ‘in service’ at any given time
Search Head distributes it’s query across all of the Indexer
How it knows that is
Replication Factor
Cross Site Clustering
Search Affinity by Location (how does that work?)
37.0
Search Factor of Two, Rep Factor of 3
I want to have a copy of that Data sent over t
SH in New York knows to query the local copy of the LA data it has
How do you get from Not Clustered to Clustered
Master Node manages all of the Apps and Configurations
Turn on Clustering with a Search Factor of 1 and a Replication Factor of 1
Splunk is going to add a little bit of additional metadata at ingest time
Stand up another Indexer and increase your Replication and Search Factor
Now have the option of turning OFF Search Affinity
If I have 5 Regions, can I have a local set of replicated copies at that one location?
Multi-Site Clustering
41.00
Search Head Pooling: Sorry. Had to have extremely high speed NFS to handle it. Single point of failure if the storage went down
Search Head Clustering: Doesn’t require NFS
Replication using local storage. Spunk the app is replicating that data back and forth with regard to the Search Head
One search per core
Deployment Manager (see number of concurrent Searches Running)
Example Topology
One of the Cluster Members will self-elect as a Captain
Deployer is responsible for managing the configurations of all of these Search Heads
Take away from this slide: Clustering Works, No longer requires NFS.
Talk to your Engineer
Architecture Class
Documentation
Came out last October
We require three nodes in the SH cluster. We use majority decision consensus approach.
Load Balancer should be ‘pretty sticky’. How much affinity to that session.
Use Search Head Clustering so I can scale out (not really focused on HA so much)
If you have used SOS in the Past. Support analyzing diag
Scaling discussion we already had
Health of your Indexers, Search Heads, License Master, Deployment Servers, KV Store (new feature 6.1 or later)
Distributed Management Console rolls in all kinds of info
50.59
Puppet or Chef or some fancy auto sync method
If you don’t have those tools, can use the Deployment Server
Enables the Splunk
Knows kind of OS it’s coming from
Active Directory, Mac OS
Manually managing
Allows your Splunk Admins to control what Splunk is collecting without having to contact Puppet or Chef environment
Instead of waiting for change control
55.00
We’re headed to the East Coast!
2 inspired Keynotes – General Session and Security Keynote + Super Sessions with Splunk Leadership in Cloud, IT Ops, Security and Business Analytics!
165+ Breakout sessions addressing all areas and levels of Operational Intelligence – IT, Business Analytics, Mobile, Cloud, IoT, Security…and MORE!
30+ hours of invaluable networking time with industry thought leaders, technologists, and other Splunk Ninjas and Champions waiting to share their business wins with you!
Join the 50%+ of Fortune 100 companies who attended .conf2015 to get hands on with Splunk. You’ll be surrounded by thousands of other like-minded individuals who are ready to share exciting and cutting edge use cases and best practices. You can also deep dive on all things Splunk products together with your favorite Splunkers.
Head back to your company with both practical and inspired new uses for Splunk, ready to unlock the unimaginable power of your data! Arrive in Orlando a Splunk user, leave Orlando a Splunk Ninja!
REGISTRATION OPENS IN MARCH 2016 – STAY TUNED FOR NEWS ON OUR BEST REGISTRATION RATES – COMING SOON!