Note: this presentation is updated often. Please look for updates in the Partner Portal.
How long this is going to takeOverview of Approach, TechnologyLots of evaluators of Splunk are on this call and are curious on how to deploy the tool. Me using Splunk in many different waysQ&A at the end of the session. If questions come up during the presentation then type them in the Q&A or Chat box.
Machine data is an incredibly valuable resource, but organizations rarely get the value they need from it. Existing data analysis, management and monitoring solutions are simply not engineered for this type of data.Take Information Management. Data Warehouses and Relational Database Management Systems are based on rigid schemas and designed for structured, consistent data. They provide historical analysis but not real-time visibility. Enterprise Search is designed for human-generated data, such as documents and Web pages. This data is very different to machine data, which has an order of magnitude more in scale and diversity.IT Management tools and Security Information and Event Management on the other hand are siloed and designed for one level of the organization. They provide a narrow view of the underlying data and are hard-wired for specific data types and sources. Or they monitor across systems, with serious gaps in the data they collect. They also don’t provide any historical context.The fact is finding a better way to sift, distill and understand the vast amounts of machine data can transform how IT organizations manage,secure and audit IT. It can also provide valuable insights for the business on trends and behaviors of their customers and services.We call this gaining Operational Intelligence.
Machine data is an incredibly valuable resource, but organizations rarely get the value they need from it. Existing data analysis, management and monitoring solutions are simply not engineered for this type of data.Take Information Management. Data Warehouses and Relational Database Management Systems are based on rigid schemas and designed for structured, consistent data. They provide historical analysis but not real-time visibility. Enterprise Search is designed for human-generated data, such as documents and Web pages. This data is very different to machine data, which has an order of magnitude more in scale and diversity.IT Management tools and Security Information and Event Management on the other hand are siloed and designed for one level of the organization. They provide a narrow view of the underlying data and are hard-wired for specific data types and sources. Or they monitor across systems, with serious gaps in the data they collect. They also don’t provide any historical context.The fact is finding a better way to sift, distill and understand the vast amounts of machine data can transform how IT organizations manage,secure and audit IT. It can also provide valuable insights for the business on trends and behaviors of their customers and services.We call this gaining Operational Intelligence.
Machine data is an incredibly valuable resource, but organizations rarely get the value they need from it. Existing data analysis, management and monitoring solutions are simply not engineered for this type of data.Take Information Management. Data Warehouses and Relational Database Management Systems are based on rigid schemas and designed for structured, consistent data. They provide historical analysis but not real-time visibility. Enterprise Search is designed for human-generated data, such as documents and Web pages. This data is very different to machine data, which has an order of magnitude more in scale and diversity.IT Management tools and Security Information and Event Management on the other hand are siloed and designed for one level of the organization. They provide a narrow view of the underlying data and are hard-wired for specific data types and sources. Or they monitor across systems, with serious gaps in the data they collect. They also don’t provide any historical context.The fact is finding a better way to sift, distill and understand the vast amounts of machine data can transform how IT organizations manage,secure and audit IT. It can also provide valuable insights for the business on trends and behaviors of their customers and services.We call this gaining Operational Intelligence.
Machine data is an incredibly valuable resource, but organizations rarely get the value they need from it. Existing data analysis, management and monitoring solutions are simply not engineered for this type of data.Take Information Management. Data Warehouses and Relational Database Management Systems are based on rigid schemas and designed for structured, consistent data. They provide historical analysis but not real-time visibility. Enterprise Search is designed for human-generated data, such as documents and Web pages. This data is very different to machine data, which has an order of magnitude more in scale and diversity.IT Management tools and Security Information and Event Management on the other hand are siloed and designed for one level of the organization. They provide a narrow view of the underlying data and are hard-wired for specific data types and sources. Or they monitor across systems, with serious gaps in the data they collect. They also don’t provide any historical context.The fact is finding a better way to sift, distill and understand the vast amounts of machine data can transform how IT organizations manage,secure and audit IT. It can also provide valuable insights for the business on trends and behaviors of their customers and services.We call this gaining Operational Intelligence.
Splunk is the engine for machine data. It can take any machine data and automatically index it for fast searching. Because Splunk doesn’t use a database, there are no additional licenses, and most importantly, no pre-defined schema to limit how you use your information. For data that isn’t already in a text-based format, such as data locked up in APIs and databases, Splunk offers many free connectors to attach and retrieve that data from other systems. Examples include the Windows registry and WMI. But the most important thing to note is how easy it is to get data into Splunk and make it useful. So even if you have a custom app with custom formats, Splunk can make sense of it. Over time, IT has developed as “silos” of systems, focused on specific technologies, functions, departments, groups of systems and people, etc. As a consequence, IT ends up being managed as silos, with narrow, focused tools that provided a limited view of what’s really going on. What’s more, all IT systems in these silos generate data. This machine data or “exhaust” contains a categorical record of behavior – behavior of customers, user transactions, networks, servers, applications, and more. This data helps diagnose and fix issues, but is also a source of critical intelligence for the business.Even purported, “Single Panes of Glass”, like SIEMs, Application Performance Management, Event Correlation and Analysis systems and Data Warehouses, don’t provide the complete picture, because they aren’t designed for the full scope of this data.Today’s IT management tools, security solutions and even business intelligence systems are NOT designed to leverage the full scope of machine data. Data which is non-standard, unstructured, high volume and generated every millisecond of every day.Machine data used to be called log files. Syslog data from networks, web log data from web servers Lots of different sources of data Customer facing data such as cloud services. Challenge with machine data is: volume is growing exponentially vast number of sources none of it is the same
Reinforce points from previous slide.
Data is spread out everywhere, and getting it all to one place is often harder than expected. Splunk helps make that job easy, with both agent-less data gathering, and Splunk forwarders. Splunk forwarders collect, process, and forward data to a central Splunk indexer. Forwarders can be load-balanced, are fault tolerant and centrally managed by either Splunk’s Deployment Server or your own config management system, and come in several footprint options.
Splunk is a data engine for your machine data. It gives you real-time visibility and intelligence into what’s happening across your IT infrastructure – whether it’s physical, virtual or in the cloud. Everybody now recognizes the value of this data, the problem up to now has been getting to it. At Splunk we applied the search engine paradigm to being able to rapidly harness any and all machine data wherever it originates. The “no predefined schema” design, means you can point Splunk at any of your data, regardless of format, source or location. There is no need to build custom parsers or connectors, there’s no traditional RDBMS, there’s no need to filter and forward.Here we see just a sample of the kinds of data Splunk can ‘eat’.Reminder – what’s the ‘big deal’ about machine data? It holds a categorical record of the following:User transactionsCustomer behaviorMachine behaviorSecurity threatsFraudulent activityYou can imagine that a single user transaction can span many systems and sources of this data, or a single service relies on many underlying systems. Splunk gives you one place to search, report on, analyze and visualize all this data.Data is coming from everywhere. Log files, network equipment, transactions, statistics. As soon as a machine is turned on it sends data. VMWare, Amazon, Rightscale, shipping systems. Unique about Splunk. Came out of the world of Yahoo and search engines. All the lights are green but the web-site is still having problems. Engineers came out of environments without any pre-defined data types or db schemas. Time series engine that indexes data. Don’t need to parse or modify the data. I will show you how easy it is to get intelligence out of your data from Splunk. Splunk is a data engine for your machine data. It gives you real-time visibility and intelligence into what’s happening across your IT infrastructure – whether it’s physical, virtual or in the cloud. Everybody now recognizes the value of this data, the problem up to now has been getting to it. Here we see just a sample of the kinds of data Splunk can ‘eat’.Reminder – what’s the ‘big deal’ about machine data? It holds a categorical record of the following:User transactionsCustomer behaviorMachine behaviorSecurity threatsFraudulent activityYou can imagine that a single user transaction can span many systems and sources of this data, or a single service relies on many underlying systems. Splunk gives you one place to search, report on, analyze and visualize all this data.
To deliver Operational Intelligence requires handling three primary workloads from within the same system.Providing real-time visibility of live data, including correlating transactions and events across multiple sources, monitoring against thresholds and alerting, tracking against SLAs, etc.Enabling powerful navigation of the data to get to “the needle in the haystack” – to troubleshoot and identify root cause and to perform incident investigations.Providing the ability to analyze historical (as well as live streaming) data – to identify trends and patterns, to prove compliance, etc.Supporting these three workloads in the same system delivers value across the organization. Specific dashboards can provide meaningful information for different users and roles – from the server room to the boardroom, so the value of Operational Intelligence can be recognized deep within the organization.
What this means for somebody “in the trenches”, for example, troubleshooting an application issue?Customer calls service desk – service desk logs calls and escalates (red light/green light, everything looks green)Escalated to App support – looks at java monitoring tools and everything looks fine because rely on instrumentation; but no access to logs!Developer gets pulled in and has to stop working on new codeNeeds to ask sysadmin for logsDeveloper establishes not his problem, escalate to DB guyDB guy looks at audit logs and points to bad query We call this “human latency” and customers we talk to say it can consume hours or sometimes days of previous time when issues occur! Wouldn’t it be great if all this went away?
So what does “harnessing your machine data” practically translate to for your organization?From working closely with customers successfully harnessing their machine data, we think in terms of an “operational intelligence” maturity model with 4 stages. Stage 0, is IT silo chaos and we say that to impress upon the fact that there really is a lot of sources of data in non-standard formats.Stage 1 is “search and investigate” and it’s using the data to quickly find and fix problems across IT silos and systems, or finding that “needle in the haystack”, or “multiple needles in multiple haystacks”. In other words “fix IT”.By effectively harnessing this data, customers have experienced up to 70% improvements in mean time to identify and resolve issues. This alone removes much of the human latency experienced “in the trenches”.Moving up the path from reactive to proactive. Search and InvestigateDownload and start searching and investigating. The interface would look like this. Search bar for entering errors. Proactive MonitoringI want alert when issues are happening. Interface that shows my alerts. Alerts might be sent up to a higher level console. Operational VisibilityInformation on systems from a business standpoint. Which applications are working? SLA infractions. Real Time Business InsightsCombining machine data with pricing data. Executive might have a single widget on their internal wiki that shows SLA or Revenue information.
So what does “harnessing your machine data” practically translate to for your organization?From working closely with customers successfully harnessing their machine data, we think in terms of an “operational intelligence” maturity model with 4 stages. Stage 0, is IT silo chaos and we say that to impress upon the fact that there really is a lot of sources of data in non-standard formats.Stage 1 is “search and investigate” and it’s using the data to quickly find and fix problems across IT silos and systems, or finding that “needle in the haystack”, or “multiple needles in multiple haystacks”. In other words “fix IT”.By effectively harnessing this data, customers have experienced up to 70% improvements in mean time to identify and resolve issues. This alone removes much of the human latency experienced “in the trenches”.
Stage 2 is about starting to get more proactive by automatically monitor your infrastructure to identify issues, problems and attacks before they impact your customers and services. In other words, “better run IT”.By monitoring trends and thresholds across a wider scope of data, customers experiencing finding problems way before they impact customers and services, or before they cause pain. Systems that used to experience outage have remained running because of implementation of this approach.
Stage 2 is about starting to get more proactive by automatically monitor your infrastructure to identify issues, problems and attacks before they impact your customers and services. In other words, “better run IT”.By monitoring trends and thresholds across a wider scope of data, customers experiencing finding problems way before they impact customers and services, or before they cause pain. Systems that used to experience outage have remained running because of implementation of this approach.
Stage 3 is about gaining end-to-end visibility to track and deliver on IT KPIs and make better-informed IT decisions. In other words, “managing IT as a business”.Rolling up visibility to align it to IT KPIs (how IT is measured by the business) provides unprecedented intelligence to the NOC and senior IT personnel. Being able to spot SLA infractions in real time, or measure utilization as new services are launched enables IT to meet and exceed its objectives.
Stage 3 is about gaining end-to-end visibility to track and deliver on IT KPIs and make better-informed IT decisions. In other words, “managing IT as a business”.Rolling up visibility to align it to IT KPIs (how IT is measured by the business) provides unprecedented intelligence to the NOC and senior IT personnel. Being able to spot SLA infractions in real time, or measure utilization as new services are launched enables IT to meet and exceed its objectives.
Finally, stage 4 is about delivering real-time business insight - gain real-time insight from operational data to make better-informed business decision. In other words, “transforming business decisions”.Combining and correlating machine data with business data provides unique business insights. Watching the consumption of new online services by channel or demographics. Combining telecoms call records with tariff databases to get a real time view of revenue and 3rd party charges. There is a diverse set of cases where surfacing machine data provides operational intelligence to the business. And the lead times to get to this intelligence is dramatically less than other solutions. Months to a few days in many cases.
Finally, stage 4 is about delivering real-time business insight - gain real-time insight from operational data to make better-informed business decision. In other words, “transforming business decisions”.Combining and correlating machine data with business data provides unique business insights. Watching the consumption of new online services by channel or demographics. Combining telecoms call records with tariff databases to get a real time view of revenue and 3rd party charges. There is a diverse set of cases where surfacing machine data provides operational intelligence to the business. And the lead times to get to this intelligence is dramatically less than other solutions. Months to a few days in many cases.
Splunkbase is the home for our Splunk Apps. There, you'll find cool and useful downloads to extend Splunk. You can share what you make, from simple add-ons with a useful search, script, or report to full-fledged apps with multiple views. You’ll also find Apps from Splunk and our partners.Apps are being created all the time, so bookmark the site and check in frequently.Examples on this page include Apps for Cisco, F5, Twitter sentiment, external ‘WHOIS’ lookups, license usage, and more.Vendors, customers, Splunk Business Development create solutions that customers can easily start taking advantage of. Most of them are out on the community and they are free. Built on the knowledge of Splunk.
Note: this presentation is updated often. Please look for updates in the Partner Portal.
Machine data is an incredibly valuable resource, but organizations rarely get the value they need from it. Existing data analysis, management and monitoring solutions are simply not engineered for this type of data.Take Information Management. Data Warehouses and Relational Database Management Systems are based on rigid schemas and designed for structured, consistent data. They provide historical analysis but not real-time visibility. Enterprise Search is designed for human-generated data, such as documents and Web pages. This data is very different to machine data, which has an order of magnitude more in scale and diversity.IT Management tools and Security Information and Event Management on the other hand are siloed and designed for one level of the organization. They provide a narrow view of the underlying data and are hard-wired for specific data types and sources. Or they monitor across systems, with serious gaps in the data they collect. They also don’t provide any historical context.The fact is finding a better way to sift, distill and understand the vast amounts of machine data can transform how IT organizations manage,secure and audit IT. It can also provide valuable insights for the business on trends and behaviors of their customers and services.We call this gaining Operational Intelligence.
Machine data is an incredibly valuable resource, but organizations rarely get the value they need from it. Existing data analysis, management and monitoring solutions are simply not engineered for this type of data.Take Information Management. Data Warehouses and Relational Database Management Systems are based on rigid schemas and designed for structured, consistent data. They provide historical analysis but not real-time visibility. Enterprise Search is designed for human-generated data, such as documents and Web pages. This data is very different to machine data, which has an order of magnitude more in scale and diversity.IT Management tools and Security Information and Event Management on the other hand are siloed and designed for one level of the organization. They provide a narrow view of the underlying data and are hard-wired for specific data types and sources. Or they monitor across systems, with serious gaps in the data they collect. They also don’t provide any historical context.The fact is finding a better way to sift, distill and understand the vast amounts of machine data can transform how IT organizations manage,secure and audit IT. It can also provide valuable insights for the business on trends and behaviors of their customers and services.We call this gaining Operational Intelligence.
Data normalization – it’s hard to let go of. For decades we’ve been taking information and chopping it into rows and columns, and then looking up the data by referencing those like a virtual address. Unfortunately, with the radical increase in machine-generated data around most organizations, there aren’t enough skilled people to define how that data should be handled. Like how search overtook directories in how we navigate the Internet, search is the only way to handle so much data, in so many formats, and subject to change without notice. Universal indexing is a way of handling text based, time series data. Those are the only limitations for Splunk – the data must be text based (or converted into text) and must be orderable into a series. From there, Splunk handles the rest. It finds most timestamps (or creates one), breaks up the raw data stream into discrete events and retains the original data in a compressed form on disk. No normalization, no modification, just a flat-file index on top of the original, now compressed, data. Search is where the data is customized for how you want to visualize your IT information. At search time, fields are extracted that can be used like database ‘columns’ to pivot data. If you don’t like the fields Splunk finds, define your own with the graphical field extractor. Then you can create relationships between the rows, using event types to name certain kinds of events for easy aggregation and searching, or define transactions that tie events from multiple systems together into a logical structure. With that limited and easy-to-create structure, you can find almost any data or pattern across petabytes of information directly from the web UI, without mastering SQL or statistical query languages. Best of all, it’s data your way. Don’t like your search or your schema? Just rewrite or delete it – the data is unchanged. Any user can pivot their lens onto the underlying data without affecting others – unless, of course, they choose to share their knowledge through apps and shared searches.
Note: this presentation is updated often. Please look for updates in the Partner Portal.
Splunk can be divided into four logical functions. First, from the bottom up, is forwarding. Splunk forwarders come in two packages; the full Splunk distribution or a dedicated “Universal Forwarder”. The full Splunk distribution can be configured to filter data before transmitting, execute scripts locally, or run SplunkWeb. This gives you several options depending on the footprint size your endpoints can tolerate. The universal forwarder is an ultra-lightweight agent designed to collect data in the smallest possible footprint. Both flavors of forwarder come with automatic load balancing, SSL encryption and data compression, and the ability to route data to multiple Splunk instances or third party systems. To manage your distributed Splunk environment, there is the Deployment Server. Deployment server helps you synchronize the configuration of your search heads during distributed searching, as well as your forwarders to centrally manage your distributed data collection. Of course, Splunk has a simple flat-file configuration system, so feel free to use your own config management tools if your more comfortable with what you already have. The core of the Splunk infrastructure is indexing. An indexer does two things – it accepts and processes new data, adding it to the index and compressing it on disk. The indexer also services search requests, looking through the data it has via it’s indices and returning the appropriate results to the searcher over a compressed communication channel. Indexers scale out almost limitlessly and with almost no degradation in overall performance, allowing Splunk to scale from single-instance small deployments to truly massive Big Data challenges. Finally, the Splunk most users see is the search head. This is the webserver and app interpreting engine that provides the primary, web-based user interface. Since most of the data interpretation happens as-needed at search time, the role of the search head is to translate user and app requests into actionable searches for it’s indexer(s) and display the results. The Splunk web UI is highly customizable, either through our own view and app system, or by embedding Splunk searches in your own web apps via includes or our API.
Splunk scales linearly and scales to big data deployments across commodity servers thanks to a MapReduce-based architecture (scalability architecture made popular by Google). A single Splunk indexer can index hundreds of gigabytes per day depending the data sources and load from searchingIf you have terabytes a day you can linearly scale a single, logical Splunk deployment by adding index servers, using Splunk’s built in forwarder load balancing to distribute the data and using distributed search to provide a single view across all of these servers. Unlike some log management products you get full consolidated reporting and alerting not simply merged query results. We provide a rich set of benchmarking tools and recommend using them to get the indexing throughput and compression rate on your particular data in your target configuration.And of course, if customers or you are not sure how much data you need to index, you can set up a test deployment with a trial license and use Splunk itself to measure how much data you’re indexing.Single splunk server called an indexer.Might be sending syslog data from a port or Windows event log data either locally or remotely.
Splunk scales linearly and scales to big data deployments across commodity servers thanks to a MapReduce-based architecture (scalability architecture made popular by Google). A single Splunk indexer can index hundreds of gigabytes per day depending the data sources and load from searchingIf you have terabytes a day you can linearly scale a single, logical Splunk deployment by adding index servers, using Splunk’s built in forwarder load balancing to distribute the data and using distributed search to provide a single view across all of these servers. Unlike some log management products you get full consolidated reporting and alerting not simply merged query results. We provide a rich set of benchmarking tools and recommend using them to get the indexing throughput and compression rate on your particular data in your target configuration.And of course, if customers or you are not sure how much data you need to index, you can set up a test deployment with a trial license and use Splunk itself to measure how much data you’re indexing.
Splunk scales linearly and scales to big data deployments across commodity servers thanks to a MapReduce-based architecture (scalability architecture made popular by Google). A single Splunk indexer can index hundreds of gigabytes per day depending the data sources and load from searchingIf you have terabytes a day you can linearly scale a single, logical Splunk deployment by adding index servers, using Splunk’s built in forwarder load balancing to distribute the data and using distributed search to provide a single view across all of these servers. Unlike some log management products you get full consolidated reporting and alerting not simply merged query results. We provide a rich set of benchmarking tools and recommend using them to get the indexing throughput and compression rate on your particular data in your target configuration.And of course, if customers or you are not sure how much data you need to index, you can set up a test deployment with a trial license and use Splunk itself to measure how much data you’re indexing.During your evaluation you might be indexing over 100GB of data per day. You can deploy multiple indexers to handle the load. You might need to deploy indexers to different data centers.
Splunk can not only distribute the data collection challenge, but also search tasks as well. To achieve massive scale, as well as meeting data segmentation requirements, Splunk can distribute searches from a single Splunk searcher to any number of Splunk indexers. These indexers can all be local for massive parallelization for Big Data problems, or spread across a global enterprise to help you keep data wherever makes the most sense for your network and security requirements.
Splunk allows you divide up the work of search and indexing across as many servers as you need to achieve the performance and scale you require. Using work dividing techniques such as MapReduce, Splunk can take a single search and query as many indexers as you need to complete the job, allowing you to use inexpensive commodity hardware in massively parallel clusters. For example, if you had 1 million events to search, one Indexer can easily complete that search. But it will take a little time – let’s say 30 seconds. However, if the same million events was spread across 10 indexers, the same search would complete in 3 seconds. How fast and how large you want your searches is yours to control by adding indexers as desired.
Note: this presentation is updated often. Please look for updates in the Partner Portal.
Splunk is a different kind of company with a different kind of product. Our technology is built by IT pros for IT pros to be software people will want to use, from novice to guru. The product features one code base and package, regardless of how it’s deployed. Splunk is standards-based and built on an open architecture. In addition Splunk is flexible and extensible allowing you to access any data from any format and provide it for viewing across an organization. The Splunk architecture was designed to scale from a single user to truly massive and distributed global deployments. Splunk doesn’t dumb down or normalize data to fit into a database, potentially removing context. And finally we are easy to work with and provide a transparent support environment. Our documentation is all public, as well as our product roadmap, we even have real engineers on our IRC channel.
Note: this presentation is updated often. Please look for updates in the Partner Portal.
Note: this presentation is updated often. Please look for updates in the Partner Portal.
Finally, stage 4 is about delivering real-time business insight - gain real-time insight from operational data to make better-informed business decision. In other words, “transforming business decisions”.Combining and correlating machine data with business data provides unique business insights. Watching the consumption of new online services by channel or demographics. Combining telecoms call records with tariff databases to get a real time view of revenue and 3rd party charges. There is a diverse set of cases where surfacing machine data provides operational intelligence to the business. And the lead times to get to this intelligence is dramatically less than other solutions. Months to a few days in many cases.
Finally, stage 4 is about delivering real-time business insight - gain real-time insight from operational data to make better-informed business decision. In other words, “transforming business decisions”.Combining and correlating machine data with business data provides unique business insights. Watching the consumption of new online services by channel or demographics. Combining telecoms call records with tariff databases to get a real time view of revenue and 3rd party charges. There is a diverse set of cases where surfacing machine data provides operational intelligence to the business. And the lead times to get to this intelligence is dramatically less than other solutions. Months to a few days in many cases.
Stage 3 is about gaining end-to-end visibility to track and deliver on IT KPIs and make better-informed IT decisions. In other words, “managing IT as a business”.Rolling up visibility to align it to IT KPIs (how IT is measured by the business) provides unprecedented intelligence to the NOC and senior IT personnel. Being able to spot SLA infractions in real time, or measure utilization as new services are launched enables IT to meet and exceed its objectives.
Note: this presentation is updated often. Please look for updates in the Partner Portal.
So what does “harnessing your machine data” practically translate to for your organization?From working closely with customers successfully harnessing their machine data, we think in terms of an “operational intelligence” maturity model with 4 stages. Stage 0, is IT silo chaos and we say that to impress upon the fact that there really is a lot of sources of data in non-standard formats.Stage 1 is “search and investigate” and it’s using the data to quickly find and fix problems across IT silos and systems, or finding that “needle in the haystack”, or “multiple needles in multiple haystacks”. In other words “fix IT”.By effectively harnessing this data, customers have experienced up to 70% improvements in mean time to identify and resolve issues. This alone removes much of the human latency experienced “in the trenches”.