This document provides an overview and disclaimer for a Splunk presentation on best practices for data onboarding. It introduces the speaker and outlines the topics to be covered, including data, Splunk components, indexing data, proper parsing, challenging data types, and advanced inputs. The presentation cautions that forward-looking statements are based on current expectations and may differ from actual results.
2. Disclaimer
2
During
the
course
of
this
presentaGon,
we
may
make
forward-‐looking
statements
regarding
future
events
or
the
expected
performance
of
the
company.
We
cauGon
you
that
such
statements
reflect
our
current
expectaGons
and
esGmates
based
on
factors
currently
known
to
us
and
that
actual
events
or
results
could
differ
materially.
For
important
factors
that
may
cause
actual
results
to
differ
from
those
contained
in
our
forward-‐looking
statements,
please
review
our
filings
with
the
SEC.
The
forward-‐looking
statements
made
in
the
this
presentaGon
are
being
made
as
of
the
Gme
and
date
of
its
live
presentaGon.
If
reviewed
aQer
its
live
presentaGon,
this
presentaGon
may
not
contain
current
or
accurate
informaGon.
We
do
not
assume
any
obligaGon
to
update
any
forward-‐looking
statements
we
may
make.
In
addiGon,
any
informaGon
about
our
roadmap
outlines
our
general
product
direcGon
and
is
subject
to
change
at
any
Gme
without
noGce.
It
is
for
informaGonal
purposes
only,
and
shall
not
be
incorporated
into
any
contract
or
other
commitment.
Splunk
undertakes
no
obligaGon
either
to
develop
the
features
or
funcGonality
described
or
to
include
any
such
feature
or
funcGonality
in
a
future
release.
3. About
Me
! Senior
Professional
Services
Consultant
based
in
Boston,
MA
! 14+
Years
of
world-‐wide
Professional
Services
ConsulGng
with
the
last
two
at
Splunk
! Involved
in
20+
deployments
from
1GB
to
5TB
3
4. Agenda
! Data
! Splunk
Components
! Index
Data
! Proper
Parsing
! Challenging
Data
! Advanced
Inputs
4
5. Are
You
in
The
Right
Room?
5
! You
have
used
Splunk
at
least
once,
or
at
least
read
about
it
! You
are
interested
in
Splunk
best
pracGces
! You
like
to
use
Splunk’s
default
parsing
rules
! You
just
took
over
a
Splunk
deployment
and
you’re
not
sure
what
to
do
! This
is
not
an
educaGon
class;
it’s
best
pracGce
6. Data
6
! Machine
data
is
more
than
just
logs
-‐
it's
configuraGon
data,
data
from
APIs
and
message
queues,
change
events,
the
output
of
diagnosGc
commands
and
more
! Log
types:
ApplicaGon,
Web
Access
and
Proxy,
Call
Detail
Records
(CDR),
Clickstream,
Message
Queues,
Packet,
Database
audit
and
tables,
File
audit,
Syslog,
WMI,
PerfMon
! Manual:
Gecng
Data
In
hdp://docs.splunk.com/DocumentaGon/Splunk/latest/Data/
WhatSplunkcanmonitor
Splunk
is
the
engine
for
machine
data
7. Splunk
Apps
7
! Look
to
Splunk
Apps
first
and
uGlize
Technical
Add-‐On
(TA)
! Applies
the
Common
InformaGon
Model
(CIM)
! CIM
details
the
standard
fields,
event
type
tags,
and
host
tags
that
Splunk
uses
when
it
processes
most
IT
data
! Example
TAs:
Windows
Unix
Exchange
AcGve
Directory
VMware
Vcenter
WebSphere
9. Test
Environment
9
! Every
Splunk
deployment
should
have
a
test
environment
! It
can
be
a
laptop,
virtual
machine
or
spare
server
! Should
have
the
same
version
of
Splunk
running
in
producGon
! Accessible
to
other
Splunk
developers
and
administrators
10. One
Shot
10
! Easiest
way
to
get
data
into
your
test
environment
! Components
of
the
oneshot:
./splunk
add
oneshot
user_conf.txt
–index
indexname
–sourcetype
sourcetype
name
! Where
to
find
more
informaGon:
hdp://docs.splunk.com/DocumentaGon/Splunk/latest/Data/
MonitorfilesanddirectoriesusingtheCLI
16. Props
16
! By
default
set
to
True
#
USER
CONFERENCE
[user_conf_2014]
TIME_PREFIX
=
^
TIME_FORMAT
=
%Y-‐%m-‐%d
%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD
=
19
SHOULD_LINEMERGE
=
False
LINE_BREAKER
=
([nr]+)d{4}-‐d{2}-‐d{2}sd{2}:d{2}:d{2}
TRUNCATE
=
10000
17. Props
17
! By
default
set
to
([rn]+);
change
to
posiGve
lookahead
#
USER
CONFERENCE
[user_conf_2014]
TIME_PREFIX
=
^
TIME_FORMAT
=
%Y-‐%m-‐%d
%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD
=
19
SHOULD_LINEMERGE
=
False
LINE_BREAKER
=
([nr]+)d{4}-‐d{2}-‐d{2}sd{2}:d{2}:d{2}
TRUNCATE
=
10000
18. Props
18
! By
default
set
to
10000
bytes;
set
to
0
to
never
truncate
#
USER
CONFERENCE
[user_conf_2014]
TIME_PREFIX
=
^
TIME_FORMAT
=
%Y-‐%m-‐%d
%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD
=
19
SHOULD_LINEMERGE
=
False
LINE_BREAKER
=
([nr]+)d{4}-‐d{2}-‐d{2}sd{2}:d{2}:d{2}
TRUNCATE
=
10000
21. Why
to
Use
Splunk
Web
to
On-‐board?
21
Quick
and
easy
way
to…
! Easily
visualize
the
data
into
events
rather
then
lines
of
text
! Quickly
get
the
data
properly
broken
into
events
! Accurately
get
the
Gme
stamp
extracted
All
in
a
wicked
cool
GUI
Once
everything
is
good
you
take
your
PROPS
secngs
and
deploy
22. Splunk
Web
Data
On-‐Boarding
22
! Locate
the
source
file
on
the
Splunk
Server’s
file
system
23. Splunk
Web
Data
On-‐Boarding
23
! Validate
event
breaking
and
Gmestamp
recogniGon
31. Limit
Indexed
Data
31
6.X
or
later
Windows
forwarders
! Whitelist
events
or
blacklist
specific
events
! Inputs.conf
ConfiguraGon
32. Index
ExtracGons
32
! Provides
reliable
and
consistent
indexing
of
data
with
headers.
! Address
issue
on
forwarder:
INDEX_EXTRACTIONS
=
{CSV
|
W3C
|
TSV
|
PSV
|
JSON}
! Supports
custom
header
parsing
and
easy
mode
for
common
formats.
! Extract
IIS
fields
using
Props.conf
on
Windows
forwarder:
[iis]
INDEX_EXTRACTIONS
=
w3c
33. MulGple
Timestamps
33
datePme.xml
<datetime>
<define
name=”two_tz"
extract="day,
litmonth,
year,
hour,
minute,
second,
zone">
<text><![CDATA[^(d+)-‐(w+)-‐(d+),(d+):(d+):(d+),(?:[^,]*,){2}([w-‐]*)]]></text>
</define>
<timePatterns>
<use
name=”two_tz">
</timePatterns>
<datePatterns>
<use
name=”two_tz">
</datePatterns>
</datetime>
props.conf
#
USER
CONF
[user_conf]
DATETIME_CONFIG
=
/etc/apps/splk_ps_user_conf_props/local/datetime.xml
*
Do
not
set
TIME_FORMAT
12-‐Sep-‐2014,09:01:00,12-‐Sep-‐2014,09:02:00,-‐4
INFO
Gtle="User
Conference"
msg="Splunk
hosted
user
conference
in
Las
Vegas."
12-‐Sep-‐2014,19:01:00,12-‐Sep-‐2014,19:02:00,-‐5
DEBUG
Gtle="User
Conference"
msg="Gecng
Data
In,
Correctly
is
a
solid
session."
35. Database
Connect
35
! Allows
for
indexing
data
from
database
sources
directly
! Allows
for
adding
meta
data
to
events
from
database
sources
using
lookups
Caveats
! Java
required
on
Splunk
server
! Search
head
pooling
requires
custom
configuraGon
to
share
the
DB
connecGon
passwords.
Not
meant
for
data
input
sources
36. Database
Connect
Best
PracGces
36
! Normalize
Gmestamps
naGvely
inside
the
SQL
Query
! Filter
results
down
in
SQL
Query
to
reduce
garbage
in
Splunk
index
! Repeated
DBLookups
should
be
converted
to
staGc
lookup
! Search
head
pooling
requires
encrypted
password
replicaGon
38. Modular
and
Scripted
Inputs
38
Benefits
! Almost
any
program
that
can
output
text
can
be
used
to
index
! Modular
inputs
allow
for
configuraGon
files
and
configuraGon
secngs
inside
Splunk
Differences
! Scripted
inputs
require
configuraGon
to
be
done
in
the
script
! Modular
inputs
can
be
configured
via
deployed
.conf
files
and
accessed
via
REST
API
! Scripted
inputs
need
are
specific
to
the
OS
deployed
on
where
modular
inputs
can
support
mulGple
Examples
vmstat,
iostat,
Checkpoint
Opsec,
Twider,
Stream,
Amazon
S3
Online
storage
and
more…
39. Scripted
Inputs
Example
39
! Shell
script
saved
in
/opt/splunk/bin/scripts/
OR
in
a
specific
app
! Allows
you
to
execute
any
program
on
Splunk
forwarder
and
index
STDOUT
data.
! UGlizing
key
value
pairs
makes
for
easier
searching.
Sample
output
from
custom
script
/Applica3ons/Splunk/bin/scripts/FantasyFootball.sh
40. Scripted
Inputs
Example
40
Shell
script
calls
local
system
binary
programs
and
can
provide
configuraGon
opGons.
Use
Inputs.conf
to
define
INDEX,
SOURCETYPE,
and
INTERVAL
for
the
scripted
input
42. ProducGon
Environment
42
! Complexity
managing
configuraGons
across
tens,
hundreds,
or
thousands
of
forwarders
! Not
all
indexers
and
search
heads
receive
the
same
configuraGons
! Should
think
about
version
control
for
deployment
apps,
e.g.,
GitHub
SHP
43. Deployment
Server
Terminology
43
! Deployment
Server
-‐
A
Splunk
instance
that
acts
as
a
centralized
configuraGon
manager,
grouping
together
and
collecGvely
managing
any
number
of
Splunk
instances.
Any
Splunk
instance
can
act
as
a
deployment
server,
even
one
that
is
indexing
data
locally.
Splunk
instances
that
are
remotely
configured
by
deployment
servers
are
called
deployment
clients.
! Deployment
Client
-‐
A
Splunk
instance
that
is
remotely
configured
by
a
deployment
server.
! Server
Class
-‐
Represents
a
configuraGon
of
Splunk
deployment
clients.
Server
classes
enable
the
management
of
a
group
of
deployment
clients
as
a
single
unit.
A
server
class
can
be
used
to
group
deployment
clients
together
by
applicaGon,
OS,
data
type
to
be
indexed,
or
any
other
feature
of
your
Splunk
deployment.
44. Deployment
App
44
! A
deployment
app
(configuraGon
bundle)
is
a
set
of
deployment
content
(including
configuraGon
files)
deployed
as
a
unit
to
clients
of
a
server
class
! Located
in
$SPLUNK_HOME/etc/deployment-‐apps
and
pushed
to
deployment
client’s
$SPLUNK_HOME/etc/apps
folder
! DO
NOT
store
configuraGons
in
$SPLUNK_HOME/etc/system/local
! Use
deployment
apps
regardless
of
your
deployment
tool
45. Deployment
App
-‐
Naming
ConvenGon
45
org
acme
acme
splk
splk
group
finance
markeGng
all
ps
applicaGon
apache
iis
indexer
user_conf
configuraGon
inputs
props
Base
inputs
46. Deployment
App
-‐
Naming
ConvenGon
46
org
acme
acme
splk
splk
group
finance
markeGng
all
ps
applicaGon
apache
iis
indexer
user_conf
configuraGon
inputs
props
base
inputs
47. Deployment
App
-‐
Naming
ConvenGon
47
org
acme
acme
splk
splk
group
finance
markeGng
all
ps
applicaGon
apache
iis
indexer
user_conf
configuraGon
inputs
props
base
inputs
48. Deployment
App
-‐
Naming
ConvenGon
48
org
acme
acme
splk
splk
group
finance
markeGng
all
ps
applicaGon
apache
iis
indexer
user_conf
configuraGon
inputs
props
base
inputs
49. Deployment
App
-‐
Naming
ConvenGon
49
org
acme
acme
splk
splk
group
finance
markeGng
all
ps
applicaGon
apache
iis
indexer
user_conf
configuraGon
inputs
props
base
inputs
50. Deployment
App
-‐
Naming
ConvenGon
50
org
acme
acme
splk
splk
group
finance
markeGng
all
ps
applicaGon
apache
iis
indexer
user_conf
configuraGon
inputs
props
base
inputs
splk_ps_user_conf_inputs
52. CollecGng
Syslog
52
! Send
device,
e.g.,
routers,
firewalls
to
a
syslog
collector
! Write
files
to
this
directory
structure:
/sourcetype/host/log.txt
! Monitor
the
sourcetype
level
cisco_asa
my.firewall.name
#
CISCO
ASA
[monitor:///data/cisco_asa/…/]
sourcetype
=
cisco_asa
host_segment
=
3
index
=
firewall
53. Summary
53
! Test
in
a
non-‐producGon
environment
! Always
use
key
props
parameters:
– TIME_PREFIX
– TIME_FORMAT
– MAX_TIMESTAMP_LOOKAHEAD
– SHOULD_LINEMERGE
– LINE_BREAKER
– TRUNCATE
! Deploy
apps
to
/etc/apps;
not
/etc/system/local
! Clear
predictable
naming
convenGon
! When
you’re
stuck,
use
Answers
and
Re-‐Use
apps
from
Apps.Splunk.com
54. Resources
54
! Get
educated:
hdp://www.splunk.com/view/educaGon/SP-‐CAAAAH9
! Download
Splunk
applicaGons:
hdp://apps.splunk.com/
! Hire
Splunk
Professional
Services:
hdp://www.splunk.com/view/professional-‐services/SP-‐CAAABH9
! Watch
some
videos:
hdp://www.splunk.com/videos