Splunk Data Onboarding Overview - Splunk Data Collection Architecture

© 2017 SPLUNK INC.
Data Onboarding Overview
Jon Harris | Senior Sales Engineer
11 JULY | SYDNEY

© 2017 SPLUNK INC.
During the course of this presentation, we may make forward-looking statements regarding future events or
the expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results could
differ materially. For important factors that may cause actual results to differ from those contained in our
forward-looking statements, please review our filings with the SEC.
The forward-looking statements made in this presentation are being made as of the time and date of its live
presentation. If reviewed after its live presentation, this presentation may not contain current or accurate
information. We do not assume any obligation to update any forward looking statements we may make. In
addition, any information about our roadmap outlines our general product direction and is subject to change
at any time without notice. It is for informational purposes only and shall not be incorporated into any contract
or other commitment. Splunk undertakes no obligation either to develop the features or functionality
described or to include any such feature or functionality in a future release.
Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in
the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved.
Safe Harbor Statement

© 2017 SPLUNK INC.
1. Splunk Data Collection Architecture
2. Apps and Technology Add-ons
3. Demos / Examples
4. Best Practices
5. Resources and Q&A
We Will Discuss:

© 2017 SPLUNK INC.
Splunk Data
Collection
Architecture

© 2017 SPLUNK INC.
Basic Architecture Refresh
How Splunk works at a high level
distributed search
auto-load balanced indexing
change tickets
web access logs
windows event logs / perfmon linux logs vmware logs, configs and metrics firewall data
app sever logs jmx and jvm metrics database logs and metrics product pricing
Search Head - Splunk’s UI
Indexer – Data Store/Processing
Forwarder - Collect & Send
Agentless

© 2017 SPLUNK INC.
What can Splunk Ingest?
Agent-Less and Forwarder Approach for Flexibility and Optimization
syslog
TCP/UDP
Event Logs, Active Directory, OS Stats
Unix, Linux and Windows hosts
Universal Forwarder
syslog hosts
and network devices
Local File Monitoring
Universal Forwarder
Aggretation
host Windows
Aggregated/API Data Sources
Pre-filtering, API subscriptions
Heavy Forwarder
Mainframes*nix
Wire Data
Splunk Stream
Universal Forwarder or
HTTP Event Collector
DevOps, IoT,
Containers
HTTP Event Collector
(Agentless)
shell
API
perf

© 2017 SPLUNK INC.
Collects Data From Remote Sources
• Splunk Universal Forwarders collect data from a local data source and sends it to
one or more Splunk indexers.
Scalable
• Thousands of universal forwarders can be installed with little impact on network
and host performance.
Broad Platform Support
• Available for installation on diverse computing platforms and architectures. Small
computing/disk/memory footprint.
Splunk Universal Forwarder
The Splunk Universal Forwarder is a Separate Download

© 2017 SPLUNK INC.
Also Collects Data From Remote Sources...
• ...but is typically used for data aggregation for passage through firewalls, data
routing and/or filtering, scripted/modular inputs, or for HEC endpoints (more on this
in a bit).
Often run as a “data collection node” for API/scripted data access
• A heavy forwarder is typically run as a “data collection node” for technologies
requiring access via API, and not for collection of data from the node itself
Platform Support limited to that of Splunk Enterprise
• Being standalone, Heavy Forwarders are typically run on Linux VMs...
Splunk Heavy Forwarder
Configured via the regular Splunk Enterprise download

© 2017 SPLUNK INC.
Large-Scale Data Collection Directly from Applications
• Provides a simple, load-balancer-friendly, secure way (token-based JSON or RAW
API) to send data at scale from applications directly to Splunk
Agentless
• Data at scale can be sent directly to indexer tier, bypassing forwarder layer
Broad Development Platform Support
• Logging drivers available for many platforms (docker, AWS Lambda, etc.) and
simple HTTP endpoint compatible with all development environments
Splunk HTTP Event Collector (HEC)
The Newest Way to Collect Data at Scale

© 2017 SPLUNK INC.
Apps and
Technology
Add-Ons (TAs)

© 2017 SPLUNK INC.
App??? Add-on (TA)??
▶ Your first choice when onboarding
new data
• Clean and ready to go out-of-the-box
▶ App is a complete solution
• Typically uses one or more TAs
▶ Technology Add-on
• Abstracts collection methodology (log file, API,
scripted input, HEC)
• Typically includes relevant field extractions
(schema-on-the-fly)
• Includes relevant config files (props/transforms)
and ancillary scripts binaries

© 2017 SPLUNK INC.
Where do you get Apps? Splunkbase!

© 2017 SPLUNK INC.
Thriving Community
dev.splunk.com
40,000+ questions
and answers
1,300+ apps
Local User Groups &
SplunkLive! events
apps.splunk.com answers.splunk.com usergroups.splunk.com
Development and SDK

© 2017 SPLUNK INC.
Data Onboarding: DEMOS

© 2017 SPLUNK INC.
▶ Using the Data Previewer
• Upload a File (You did this in the Getting Started Hands-on Session!)
▶ Installing and using Apps and Add-ons
▶ Continuous Local File Monitoring (Universal Forwarder)
• Monitor a directory and multiple files in real-time
• Most common architecture for syslog-based sourcetypes
What You Will See

© 2017 SPLUNK INC.
Data Onboarding
Best Practices

© 2017 SPLUNK INC.
Components of a Splunk Success Program
Architecture
&
Infrastructure
Operations
& Supporting
Tools
Staffing
Data
On-
Boarding
User
On-Boarding
Inform

© 2017 SPLUNK INC.
▶ Architect
• Design and optimize Splunk architecture for large-scale/distributed
deployments.
▶ System Administrator
• Implement and maintain Splunk infrastructure and configuration
▶ Search Expert
▶ App Developer
▶ Knowledge Manager
• Perform data interpretation, classification and enrichment
• Work with System Administrator to properly onboard data
Typical Splunk Staffing RolesArch &
Infra
Ops &
Tools
Staffing
Data
On-
Boarding
User
On-
Boarding
Inform

© 2017 SPLUNK INC.
▶ Define on-boarding process for
new data sources / apps
▶ Repeatable, documented
process
▶ Provide customer interview
forum or survey
▶ Integrate with service workflow
Data On-boarding TasksArch &
Infra
Ops &
Tools
StaffingData
On-
Boarding
User
On-
Boarding
Inform
New Data Source Request
q Provide a data sample
q Describe the data’s structure
§ timestamp | timezone n single-/multi-line
§ sourcetype n interesting fields
q Describe initial uses for the data
§ searches | alerts | reports | dashboards
q How to collect the data?
§ UF | syslog | API
q How long to retain the data?
q Who should have access?
q Apply Common information Model
§ Are there TA’s available?
q Validate

© 2017 SPLUNK INC.
Ladies and Gentlemen, We’ll be Boarding Soon!
Six Things to Get Right at Index Time
Source
Event
Boundary /
LineBreaking
Host
Index
Sourcetype
Date
Timestamp

© 2017 SPLUNK INC.
▶ Gather info (New Data Source Request):
• Where does this data originate/reside? How will Splunk collect it?
• Which users/groups will need access to this data? Access controls?
• Determine the indexing volume and data retention requirements
• Will this data need to drive existing dashboards (ES, PCI, etc.)?
• Who is the Owner/SME for this data?
▶ Map it out:
• Get a "big enough" sample of the event data
• Identify and map out fields (ensure CIM compliance)
• Assign sourcetype and TA names according to CIM conventions
Pre-Board Essentials

© 2017 SPLUNK INC.
▶ Identify the specific sourcetype(s) - onboard each separately
• Important – syslog is not a sourcetype!
• More on this later
▶ Check for pre-existing app/add-on on splunk.com – don't
reinvent the wheel!
▶ Start with a “Test” index, Verify index-time settings correct
(previous slide)
• Try the Data Previewer first
• tweak props/transforms “by hand” only if absolutely necessary
Pre-Board Essentials (cont.)

© 2017 SPLUNK INC.
▶ Find and fix index-time problems BEFORE
polluting your index
▶ A try-it-before-you-fry-it interface for figuring out
• Event breaking
• Timestamp recognition
• Timezone assignment
▶ Provides most necessary props.conf parameter settings
Your Friend, the Data Previewer

© 2017 SPLUNK INC.
If you have to get into the weeds...
Always set these six parameters in props.conf
# SL17
[SL17]
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 19
SHOULD_LINEMERGE = False
LINE_BREAKER = ([nr]+)d{4}-d{2}-d{2}sd{2}:d{2}:d{2}
TRUNCATE = 10000

© 2017 SPLUNK INC.
▶ The Common Information Model (CIM) defines relationships in
the underlying data, while leaving the raw machine data intact
▶ A naming convention for fields, eventtypes & tags
▶ More advanced reporting and correlation requires that the data
be normalized, categorized and parsed
▶ CIM-compliant data sources can drive CIM-based dashboards
(ES, PCI, others)
What Is the CIM and Why Should I Care?

© 2017 SPLUNK INC.
▶ Syslog is a protocol – not a sourcetype
▶ Syslog typically carries multiple sourcetypes
▶ Best to pre-filter syslog traffic using syslog-ng or rsyslog
• Do not send syslog data directly to Splunk over a network port (514)
▶ Use a UF or HEC to transport data to Splunk (next slide)
• Ensures proper load balancing and data distribution
• Secure and efficient
• Insulates against Splunk component failures
▶ See https://www.splunk.com/blog/2017/03/30/syslog-ng-and-hec-scalable-
aggregated-data-collection-in-splunk.html for more info on this topic
A special note on Syslog

© 2017 SPLUNK INC.
▶ Videos!
• http://www.splunk.com/view/education-videos/SP-CAAAGB6
▶ Getting Data In – Splunk Docs
• http://docs.splunk.com/Documentation/Splunk/latest/Data/WhatSplunkcanmonitor
▶ Date and time format variables
• http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Commontimeformatvariables
▶ Getting Data In – Dev Manual (very thorough!)
• http://dev.splunk.com/view/dev-guide/SP-CAAAE3A
▶ HTTP Event Collector
• http://docs.splunk.com/Documentation/Splunk/latest/Data/UsetheHTTPEventCollector
▶ .conf Sessions
• https://conf.splunk.com/session/2015/conf2015_Aduca_Splunk_Delpoying_OnboardingDataIntoSplunk.pdf
▶ GOOGLE!
Where to Go to Learn More

© 2017 SPLUNK INC.
Splunk Fundamentals Course
This course teaches you how to search and navigate in Splunk, use fields, get statistics from your
data, create reports, dashboards, lookups, and alerts. Scenario-based examples and hands-on
challenges will enable you to create robust searches, reports, and charts. It will also introduce you to
Splunk's datasets features and Pivot interface.
• https://www.splunk.com/view/SP-CAAAPX9
Free 101 Hands On e-Learning

© 2017 SPLUNK INC.
▶ https://splunkbase.splunk.com/app/2962/
▶ For creating REST API, Scripted or Modular Inputs through a GUI
▶ Helps your Add-ons get Certified
▶ Can also use on sample data to build out configs as well
Check Out the New Add-on Builder!

© 2017 SPLUNK INC.
SEPT 25-28, 2017
Walter E. Washington Convention Center
Washington, D.C.
.conf2017
The 8th Annual Splunk Conference
conf.splunk.com

Splunk Data Onboarding Overview - Splunk Data Collection Architecture

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Splunk Data Onboarding Overview - Splunk Data Collection Architecture

Ähnlich wie Splunk Data Onboarding Overview - Splunk Data Collection Architecture (20)

Mehr von Splunk

Mehr von Splunk (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Splunk Data Onboarding Overview - Splunk Data Collection Architecture