If you're just getting started with Splunk, this session will help you understand how to use Splunk software to turn your silos of data into insights that are actionable. In this session, we’ll dive right into a Splunk environment and show you how to use the simple Splunk search interface to quickly find the needle-in-the-haystack or multiple needles in multiple haystacks. We’ll demonstrate how to perform rapid ad-hoc searches to conduct routine investigations across your entire IT infrastructure in one place, whether physical, virtual or in the cloud. We’ll show you how to then convert these searches into real time alerts and dashboards, so you can proactively monitor for problems before they impact your end user. We'll demonstrate how you can use Splunk to connect the dots across heterogeneous systems in your environment for cross-tier, cross-silo visibility. You'll have access to a demo environment. So, don't forget to bring your laptop and follow along for a hands-on experience.
You may have lots of disparate and complex and siloed based solutions. When you need to find a solution to a problem, you may need to get a war room ready, which leads to finger pointing and trying to debug in your production environment. You may spend hours trying to find a solution. Often, you end up using a brute force approach like restarting the system, leaving no evidence of what the problem actually was.
All of which means that IT is no longer spending time on innovating but losing valuable time just keeping the the lights on or fighting fires.
Splunk Enterprise is fully featured, platform for collecting, searching, monitoring and analyzing machine data and getting operational intelligence. You can monitor both real-time (as the data is streaming) and historical data. Splunk collects machine data securely and reliably from wherever it’s generated in any formant. It stores and indexes the data in real time in a centralized location and protects it with role-based access controls. You can troubleshoot your network problems and investigate security incidents in minutes (not hours or days). Monitor your end-to-end infrastructure to avoid service degradation or outages. Gain real-time visibility and critical insights into customer experience, transactions and behavior.
We don’t require you to have a deep understanding of your data, or to have a predefined schema and requirements. You don’t need expensive custom connecters to get data into Splunk. We have our own map reduce based high speed data index and retrieval mechanism. We can index data from any part of your infrastructure. We scale from a single server to petabytes of data, and you can use commodity hardware. You can leverage our Splunk Cloud offerng if you don’t want to manage your own Splunk instance.
You can start getting into the core of the problem, If you have a system that does not have proactive capabilities you can do that with Splunk Enterprise. And expand from there into security, capacity planning applications management – truly big gold mine of use cases from your data. And our customers once they start to gain that operational visibility they evolve to getting deeper insights from your data. No database in the backend as we apply schema on the fly. You need raw data to be able to re-use it. We are creating intelligence on top of the data therefore easy scaling.
We’ve found that most companies start using Splunk in one of these 5 areas, and typically as more teams use Splunk, their usage traverses each of these 5 areas. Both IT and business professionals can analyze machine data to get real-time visibility and operational intelligence. With our platform for machine data, organizations can meaningfully improve their performance in a wide range of areas e.g. meet service levels, reduce costs, mitigate security risks, maintain compliance and gain insights.
Today we are going to focus on some of the major use cases and values related to the IT Operations space.
In IT Operations, this maturity model is a great template/mainstay when it comes to how Splunk is utilized. Most teams have downloaded Splunk on a laptop and from there it gets scaled to a server and to multiple server, etc. The idea from an ITOps maturity model is very much the same—
Search and investigation. Using Splunk, organizations identify and resolve issues up to 70% faster and reduce costly escalations by up to 90%. Splunk is one place to find and fix problems, and investigate incidents across all your IT systems and infrastructure.
Proactive monitoring. Monitor IT systems in real time to identify issues, problems and attacks before they impact your customers, services and revenue. Splunk keeps watch of specific patterns, trends and thresholds in your machine data so you don't have to. Trigger notifications in real-time via email or RSS, execute a script to take remedial actions, send an SNMP trap to your system management console or generate a service desk ticket.
Operational visibility. See the whole picture, track performance and make better decisions. Visualize usage trends to better plan for capacity; spot SLA infractions, track how you are being measured by the business. Do all of this using your existing machine data without spending millions of dollars instrumenting your IT infrastructure.
Real-time business insight. Make better-informed business decisions by understanding trends, patterns and gaining Operational Intelligence from your machine data. See the success of new online services by channel or demographic, reconcile 3rd-party service provider fees against actual use, find your heaviest users and heaviest abusers, and more. Because machine data captures every behavior, the possibilities are game changing. You'll find the lead times to get to this intelligence dramatically less than other solutions - measured in minutes/hours instead of months.
Who is at Search and Investigate? Raise your Hands. Proactive Monitoring and Alerting? Raise your Hands. Operational Visibility? Raise your Hands. Real-time Business Insight? Raise your Hands.
Who thinks it makes sense for all of us to have our business at Real-time Business Insight? Why?
So how do we get there?
Over the last couple of years Splunk has evolved from an engine for machine data to a platform for machine data – nothing is a testimony of this more than our Apps store apps which range from plugins and templates to full fledged apps that help you collect, analyze and harness data from every layer of your technology stack. These apps are built by our customers, technology partners such as Cisco, NetApp, or others and Splunk employees. We are a platform as it is very easy to get data into Splunk and out of Splunk. We are complementing other solutions in the data center
Two important things to remember:
If a logo you have doesn't show up here, Splunk still doesn't’t limit you – you can always index data from that technology – Splunk extensions simply help you accelerate the process.
We provide a full featured REST API and a variety of SDKs that help you build your own custom apps for technologies and insights custom to your business. This is to help you create a specific interface to your data in special format and development languages your team is used to.
Lastly, each of the Splunk extensions is not comparable to point solutions in every silo, simply because your data from each silo is more valuable when in context of other data from other technology tiers. Splunk apps simply help you get to the point faster where you can see correlations and comparisons of machine data ACROSS silos.
We also recently introduced the 2 new offerings – one to collect wire data, with the Splunk App for Stream (stemming from the acquisition of Cloudmeter) and MINT (Mobile Intelligence) that stems from our acquisition of Bugsense. The Splunk App for Stream enables the capture of real-time streaming wire data, which is the data transmitted between applications over the network. It enables visibility into application, business and user activity without the need for instrumentation, enhancing various operational use cases across IT, security and the business.
And Splunk MINT helps you gain visibility into mobile app performance and quality, so you can deliver better mobile apps
Splunk MINT helps you combine and correlate mobile app data with other data in Splunk so you can pinpoint problems faster and analyze user experience/behavior across mobile, desktop and web channels.
The main value from the apps is providing context for data from silos and making it available inside Splunk for correlation with other data from other silos.
In addition to prebuilt apps, customers can also build their own.
What have developers been building using Splunk Enterprise? Examples include the following:
Run searches and retrieve Splunk data from existing Customer Service/Call Center applications (Comcast use case)
Integrate Splunk data into existing BI tools and dashboard (Tableau, MS Excel)
Build mobile applications with KPI dashboards and alerts powered by Splunk (Otto Group use case)
Log directly to Splunk from remote devices (Bosch use cases)
Build customer-facing dashboards powered by user-specific data in Splunk (Socialize, Hurricane Labs use cases)
Programmatically extract data from Splunk for long-term data warehousing
We hope this is just the beginning. We hope to open up a whole new world of enterprise apps.
And now, Splunk offers a tailored selection of apps along with Splunk Enterprise license, Splunk Education credits, passes to .conf, and support. These quick start bundles are a way to get up and running with Splunk quickly, to make sure you start getting value out of it right away by including the personalized support, and that you keep getting value out of Splunk with education and passes to .conf, our annual user conference.
(details below if needed)
Splunk Enterprise License
Discounted by volume
Available in Small, Medium, and Large Sizes
Perfect to get you started for your use case
Tailored selection of Apps and Add-ons
Infrastructure or Application Monitoring
Hundreds of pre-built visualizations
Collect and Correlate disparate data sources
Splunk Education Credits and .Conf Passes
Make the most of your investment
Ensure your team succeeds
Get your team Splunk Admin or Power User Certified
Professional Services and Customer Success Managers
Jumpstart your deployment
Align business objectives
Get up and running in as little as 5 days
The Infrastructure Monitoring Quick Start includes add-ons and apps for AWS, Unix and Linux, Microsoft Active Directory, Vmware, Microsoft Windows and the Add-on Builder, to help you build custom apps of your own.
---
Details if needed
Splunk License: Splunk Enterprise (just enterprise in the bundles)
Education Credits [details]
1 credits = $500 dollars
List price of course/500= # of Credits
10 credits enough to get 1 person admin certified, or 2 people power user certified
20 credits enough to get 2 people admin certified, or 4 people power user certified
Professional Services [details]
3 day ($6000 or $2k/day)
Install 1 search head and 1 indexer
Walk through deployment of universal forwarder
Highlight/walk through apps
4 day ($8k or $2k/day)
Install 1 search head, 1 indexer, and 1 app
Walk through universal forwarder deployment
Highlight/walk through apps
High level review of data onboarding
5 day ($10k or $2k/day)
Install 1 search head, 1 indexer, 1 app
Universal Forwarder Deployment
Two-day review of data onboarding
Highlight/walk through apps
Free .conf Pass [how many?]
1 to 2
Splunkbase Apps to DL: (note at this time apps are not coming pre-installed in the deployment, the PS will cover working through installing up to 1 app and reviewing the other suggested apps)
App and Add on for Unix and Linux
App and Add on for AWS
App and Add on for Active Directory
App and Add on for VMWare
App and Add on for MS Windows
App for Stream
And the Application Management Quick Start also includes add-ons and apps for AWS, Unix and Linux, Microsoft Active Directory Microsoft Windows and the Add-on Builder, but adds MINT for mobile application data, Stream for wire data, and the Machine Learning Toolkit so you can explore a variety of machine learning concepts.
---
Details if needed
Visual representation of specific components for ITTM bundles
Splunk License: Splunk Enterprise (just enterprise in the bundles)
Education Credits [details]
1 credits = $500 dollars
List price of course/500= # of Credits
10 credits enough to get 1 person admin certified, or 2 people power user certified
20 credits enough to get 2 people admin certified, or 4 people power user certified
Professional Services [details]
3 day ($6000 or $2k/day)
Install 1 search head and 1 indexer
Walk through deployment of universal forwarder
Highlight/walk through apps
4 day ($8k or $2k/day)
Install 1 search head, 1 indexer, and 1 app
Walk through universal forwarder deployment
Highlight/walk through apps
High level review of data onboarding
5 day ($10k or $2k/day)
Install 1 search head, 1 indexer, 1 app
Universal Forwarder Deployment
Two-day review of data onboarding
Highlight/walk through apps
Free .conf Pass [how many?]
1 to 2
Splunkbase Apps to DL: (note at this time apps are not coming pre-installed in the deployment, the PS will cover working through installing up to 1 app and reviewing the other suggested apps)
App and Add on for Unix and Linux
App and Add on for AWS
App and Add on for Active Directory
App and Add on for VMWare
App and Add on for MS Windows
App for Stream
This slide has instructions for setting up the app we’ll be using for this session. There will also be 1-2 cloud instances available at each SplunkLive – it would be nice if people weren’t all using the same credentials to log in, but it won’t be the end of the world either.
This is the Splunk user interface. When you first log in, you’ll see a list of apps down the left side and some icons in the center that will take you to product tours, the app base, and documentation. At the top of the page, there are some menus for settings. We’re going to use the IT Operations app, so click on the purple button that says “IT Operations”.
When you launch the app, you’ll be taken directly to a dashboard. We won’t be using this one in the session, but it is a good example of the sort of dashboard you can build with Splunk! But more on dashboards later…
A typical problem that comes up is a website throwing errors. One of the first steps we can take is to see if there’s one server that’s having problems or if the problems are with all servers. One server having problems can be taken out of the web farm until it’s been fixed, but if all the servers are having problems, something else can be going on. If you’re responsible for the servers, you’ll probably blame the developers, the database, or, if all else fails, the network.
We’ll be using a dashboard to help us find the root of the problem. At the top of the screen in the navigation menu, click “Troubleshooting Examples”, then “IT Troubleshooting Basics 1 – Web Site Errors”. This dashboard has a lot of information. At the top, there’s some background info, the key techniques that we’ll be using, and some next steps. Then we have the search we’ll be using, broken out into parts, so we can explore what each part of the search does. At the very bottom, we can see the results of the search, and hopefully the root cause of our problem. But let’s not get ahead of ourselves…
In this scenario, we’ll be using our Apache access logs. These logs contain a status code for every request made to our servers. To start with, we’re going to search the web logs and filter to just the log entries, or events, that returned an HTTP status code 503. If you scroll down to the “Line by Line” section of the dashboard, click “Load search to this point” under the first part of the search to see what the raw events looks like in Splunk.
This is the Splunk search screen. At the top, you can see the search we just ran. On the left are the fields that Splunk has extracted from our data. These fields are extracted on the fly – we don’t have to know what questions we want to ask our data beforehand. On the right are the events themselves. So now we can see our raw events, which is great, but we want to be able to analyze this data. Let’s go back to the tab with our dashboard and see how we do that.
Now we’ll load the results of the full search by clicking on “Load search to this point” for the second part of our search.
This time, instead of all the raw events, we have aggregated the data into a table. We used the “stats” command to get a count by host – that is, the count of access log events with HTTP status code 503 for each host. You can use the stats command to do other aggregations – some examples would be finding the minimum, maximum, average, or 95th percentile.
Now if we go back to our dashboard and scroll all the way to the bottom, we can see the search results showing us that web-05 is throwing more 503 errors than the other servers.
CLICK TO BRING IN THE REST OF THE SLIDE
To continue troubleshooting a problem like this, we would probably want to look at what other data we have for web-05 – we’ll take a look at that by clicking on “search” in the “Next Steps” section of the dashboard.
This search shows us the sourcetypes that we have for web-05. A sourcetype is a particular format of data – we can see the apache access logs, AWS cloudwatch logs, and some OS metrics like CPU and disk space.
If we didn’t know where to look, we could click on each sourcetype to see the events for each one, and see what fields are being extracted and use those fields to see if our server is healthy. Luckily for us, though, I already know what the problem is (yeah, I’m cheating). We’ll be using another dashboard now, so click on Troubleshooting Examples, then IT Troubleshooting Basics 2 – Server Issues
This time we’ll be looking at some data that helps us troubleshoot our infrastructure. This data is coming from the Unix and Linux Add-on, but Splunk also has apps for Microsoft Windows, Vmware, AWS, Azure, and Google Cloud to help you understand the health of your infrastructure.
CLICK TO BRING UP THE REST OF THE SLIDE
The format of this dashboard is just like the first one, so let’s look at the search we’re using.
Like we did last time, we’ll start with the first part of the search. This time instead of Apache access logs, we’re going to look for a sourcetype called CPU, and filtering just to the web servers.
Just like before, you can see our search at the top, fields on the left, and raw events on the right. What field do you think we might want to take a closer look at to find a server having a problem? Let’s go back to our dashboard and see…
And now by scrolling to the very bottom of the dashboard, we can see that web-05 has a higher CPU load than the other web servers, confirming that there is something wrong with it.
CLICK TO BRING IN THE REST OF THE SLIDE
Next let’s look at how we might put these search results on a graph.
By default, you’ll probably see a column chart, but if you see something else, don’t worry. We’re going to change it anyway.
Choose a couple of different options just to see what they look like. See how easy it is to change the graph type?
CLICK TWICE TO BRING IN DIFFERENT VISUALIZATIONS
And if you click on the “Statistics” tab, you can see the search results in a table, just like it was on our dashboard.
From here, we could save this search as a report, so we can share it with others. When we share a saved report, we know that everyone is looking at the same thing when we’re discussing a problem. We could also use that report as a panel in a dashboard, but we’ll look at dashboards a little later.
Saving a search as a report is simple – just click “Save As”, then choose “Report”. In the pop-up, you would just set a title, decide if you want to display just the graph, just the table, or both, and decide if you want to include a time range picker to make the report dynamic or if you want to keep the time range of your existing search, which can be useful if you want to share information about a specific time range with people on your team.
You can also save searches as alerts – click “Save As “ then “Alert” and we’ll take a look at how you’d do that.
CLICK TO LOAD POP-UP
We’re just going to look at this pop-up briefly. You can see that you can have your search run on a scheduled or real-time if it’s a very critical metric. You can trigger based on the number of results, number of sources, number of hosts, or set a custom condition like a max cpu load greater than 80. You can throttle alerts so you don’t get flooded by them. And you can configure what action to take when an alert fires – you can use Splunk’s built-in triggered alerts list. If you just want a record of the alert for future reference, you can just log an event. You could run a script that might be able to do some corrective action. You can send email, of course, or you can generate an HTTP POST request, which would let you integrate with a ticketing, chat, or notification system.
Now let’s move on to the next exercise! Click “Cancel” to close the pop-up and we’ll go to our next dashboard.
For now, let’s go to our next scenario. Click on Troubleshooting Examples and go to the next dashboard – IT Troubleshooting Basics 3 – Mobile App Errors. For this part, we’ll be using MINT. MINT is a Splunk app that gives you access to data for your mobile apps.
Instead of the stats command, we’re going to use chart. We’re going to search for all the data of sourcetype “mint:network”. Chart also does aggregations on data, but formats the output in a table. Let’s start with the first part of the search as we’ve done before – click on “Load search to this point”.
MINT data is in JSON, which works really well in Splunk. We can see some of the data that is available in MINT – app version, carrier, device… information that you can’t get from the server side. Let’s go back to our dashboard and see how we use this data to find the problem.
Let’s take a look at the full search since we’re using the chart command for the first time. “Count” we’re familiar with, and we’ve used “by” clauses before when we counted by host. But what about the “over appVersionName” part?
When we use “chart”, the “over” clause tells Splunk the field to group by while the “by” clause tells Splunk the field to split by. So in this search, we want to group by version name and split by status code. Now let’s scroll down to the results.
CLICK TO ADD THE SEARCH RESULTS TO THE SLIDE
We can see that the latest version of the app is throwing errors – instead of the expected 200 status code, we’re getting 401 status codes.
CLICK TO ADD THE NEXT STEPS
Next, let’s see if there’s a particular platform that is having a problem – click on the search link to continue.
This is the search that we used to identify the bad app version. Now remember the fields we saw earlier? There was one called “platform” that identified whether the OS was Android, iOS or Windows Mobile.
So if we want to group the results by platform instead of app version, what do we need to do to the search?
CLICK TO BRING UP THE NEXT PART OF THE SLIDE
We changed the “appVersionName” to “platform” and now we can see that only Android phones seem to be having problems with this version of the app. That could be very valuable information to the mobile app developers.
For our final scenario, we won’t actually be troubleshooting, we’re just going to explore one way that we might use APM data in Splunk. Click on Troubleshooting Examples, then choose the last dashboard – IT Troubleshooting Basics 4 – Using APM Data.
APM tools can be very helpful when troubleshooting issues with your application. Splunk has apps for AppDynamics and Dynatrace, and New Relic. For this last example we’re going to look at New Relic data. New Relic offers an API, which we can use to pull information into Splunk.
CLICK TO BRING UP THE REST OF THE SLIDE
We’ll start by looking at all data with sourcetype newrelic_account.
Our New Relic data is also in JSON, like the MINT data was. Click the plus signs in an event to expand all the nodes in the data. For this exercise, we’ll be looking at the health status for the key transactions that we’re measuring with New Relic. Look at the field list and think about how we’ll refer to that field in our search.
Now let’s go back to our dashboard to see the rest of the search.
We’re using stats again for this one, but our aggregation is new – we want to see the most recent health status from our data, so we use “latest”. And because this JSON data has nested nodes, in order to access data in the key_transaction node we have to specify key_transaction followed by a period, then the name of the field that we want to use – in this case health_status and name. Later, you can go back to the search results to see how we could access other fields, if you want. For now, let’s scroll down in the dashboard and see the search results.
CLICK TO BRING UP THE SEARCH RESULTS
Now we have a list of all the key transactions that we’re monitoring and the latest health status for each one. Sure, you can get this information directly from the APM tool, but by bringing this data into Splunk, you can see the APM data with your other data. Let’s see how easy it would be to add this table to a dashboard.
CLICK TO BRING UP THE REST OF THE SLIDE
Click to load the full search results in a new tab and we’ll see how to add them to a dashboard.
When your search results have loaded, click “Save As”, then “Dashboard Panel”.
CLICK TO SHOW POP-UP
In the pop-up select “Existing” and choose “Web Store Health”, then click “Save”. Then click “View Dashboard” to see the dashboard!
This dashboard shows health information for a website, including overall errors and response time as well as response time by tier – hopefully you can see it better on your screen than the picture on this slide. Each of these panels is backed by a Splunk search like the ones we’ve seen. At the very bottom, you’ll see your new panel. If you click “Edit” at the top, you can drag and drop that panel into a new position. Now you’ll be able to see your APM data along with data from your access logs and wire data from Splunk Stream.
And that’s the end of our hands-on exercise! I hope you had fun and learned something about Splunk along the way.
Before the session ends, let’s look at some real customer example, and how those map back to CIO priorities. Splunk is a strategic solution for IT, and provides value along every vector of interest to IT. You will see on this slide several examples about Splunk fostering Visibility across silos and making it faster to deliver business solutions.
Here are some key metrics and rapid ROI achieved by customers using Splunk for IT Operational Intelligence.
Verisign reduced the time to track deliveries by 90%.
Cars.com achieved a 200% ROI in usage analytics.
CERYX achived a 200% ROI and a better customer experience
And Ping Identity achieved a 70% reduction in MTTR.
Other areas that people are seeing value in is with:
Reduce/avoid downtime
Gain control over costs, capacity, user experience
User and usage analytics to support real-time business decision-making
Real-time and historical data analysis for trending and pattern detection
Next steps!
You can take the USB key with you and use the data in the application on it to work with the features we’ve already looked at, or any of the other features that Splunk offers!
You can also try out Splunk Cloud with your own data. Splunk Cloud gives you all the benefits of Splunk in a SaaS product.
You can use the Splunk installer from your USB key or download a free copy from splunk.com – load in your own data and see what you can find!
And if you’re interested in the Infrastructure Monitoring Quick Start or the Application Management Quick Start, you can find more information on splunk.com.