2. Splunk – The Big Data Company
Company (NASDAQ: SPLK)
Founded 2004, first software release in 2006
HQ: San Francisco / Region HQ: London, Hong
Kong
Over 600 employees, based in 12 countries
FY2012 $120 million; +83% year-over-year
5,000+ Customers
Customers in over 80 countries
54 of the Fortune 100
Largest license: 100 Terabytes per day
2
3. Over 3,000 Customers in 70+ Countries
Cloud and Online Services Education Energy and Utilities Financial Services and Insurance
Government Healthcare Manufacturing Media
Retail Technology Telecommunications Travel and Leisure
4
4. Some Splunk Big Data Customers
Customer Daily Data Volume
12 TB
6 TB
4 TB
1.2 TB
900 GB
800 GB
5
5. Big Data Comes from Machines
Volume | Velocity | Variety | Variability
GPS,
Machine-generated data is one of the RFID,
fastest growing, most complex Hypervisor,
and most valuable segments of big data Web Servers,
Email, Messaging
Clickstreams, Mobile,
Telephony, IVR, Databases,
Sensors, Telematics, Storage,
Servers, Security Devices, Desktops
6
6. Big Data Technologies
Aster Data Cassandra
Greenplum Hbase
MongoDB
Hadoop
Single Single RDBMS SQL & NoSQL
RDBMS Bigger Sharding Map/Reduce
RDBMS
Map / Reduce
Relational Database (highly structured) Key/Value, Tables or Temporal, Unstructured
Other (semi-structured) Heterogeneous
Time
7
7. Splunk: the Platform for Machine Data
Innovative, Easy to Use and Powerful
Ad hoc Monitor Report and Custom Developer
search and alert analyze dashboards Platform
Data collection
and indexing
Splunk storage Other Big Data stores
8
8. Apps and Solutions
Application IT Web Business
Security Compliance
Monitoring Operations Intelligence Analytics
User Interface APIs SDK
Core Functions
Access Stats/
Alerts Reports Dashboards
Controls Analytics
Search
Indexing
Collection
9
9. Scales to TBs/day and Thousands of Users
Automatic load balancing linearly scales Distributed search and MapReduce linearly
indexing scales search and reporting
10
10. What Does Machine Data Look Like?
Sources
Order Processing
Middleware
Error
Care IVR
Twitter
11
11. Machine Data Contains Critical Insights
Sources Customer ID Order ID Product ID
Order Processing
Order ID Customer ID
Middleware
Error
Time Waiting On Hold
Care IVR
Customer ID
Twitter ID Customer’s Tweet
Twitter
Company’s Twitter ID
12
12. What do we do? Collect and index Machine Data
Customer Outside the
Facing Data Datacenter
Click-stream data Manufacturing,
Shopping cart data logistics…
Online transaction data CDRs & IPDRs
Power consumption
Logfiles Configs Messages Traps Metrics Scripts Changes Tickets RFID data
Alerts GPS data
Windows Linux/Unix Virtualization Applications Databases Networking
Registry Configurations & Cloud Web logs Configurations Configurations
Event logs syslog Hypervisor Log4J, JMS, JMX Audit/query syslog
File system File system Guest OS, Apps .NET events logs SNMP
sysinternals ps, iostat, top Cloud Code and scripts Tables netflow
Schemas
13
13. What do we do? Collect and index Machine Data
Customer Outside the
Facing Data Datacenter
Click-stream data Manufacturing,
Shopping cart data logistics…
Online transaction data
•Any amount, any location, any source. CDRs & IPDRs
Power consumption
Logfiles Configs Messages Traps Metrics Scripts Changes Tickets RFID data
Alerts GPS data
No upfront schema
No custom connectors
Windows Linux/Unix Virtualization
Registry Configuration
No RDBMS Applications
& Cloud Web logs
Databases
Configurations
Networking
Configurations
Event logs
File system
s
syslog
No need to filter/forward
Hypervisor Log4J, JMS, JMX
.NET events
Audit/query
logs
syslog
SNMP
Guest OS, Apps
sysinternals File system Cloud Code and scripts Tables netflow
ps, iostat, top Schemas
14
14. Inside Universal Indexing
Automatic event boundary identification
Automatic timestamp normalization
...enable accurate searching and
trending by time across all data:
15
15. Inside Universal Indexing
Segmentation & dense
indexing of every term
...enable Boolean search on
anything in the original event:
16
16. Inside Search-time Knowledge Extraction
Automatically discovered fields
And user-defined fields
... enable statistics and precise search on
specific fields:
17
17. New Approach to Heterogeneous Data
Universal Indexing Search-time Knowledge Flexibility and
Fast Time to Value
• No data normalization • Knowledge applied at • Normalization as it’s
• Automatically handles search-time needed
timestamps • No brittle schema to • Faster implementation
• Parsers not required work around • Easy search language
• Index every term & • Multiple views into the • Multiple views into the
pattern “blindly” same data same data
• No attempt to • Splunk helps find
“understand” up front transactions, patterns
and trends
18
18. Splunk Used Across IT and the Business
Application
Management
Operations
Management
Security &
Compliance
Web and
Business Analytics
19
19. Provides Strong Machine Data Governance
Provides comprehensive controls for data Single sign-on integration enables pass-
security, retention and integrity through authentication of user credentials
20
20. Splunk Big Data Strategy
Deliver ease of use, real-time analytics and enterprise capabilities
Ad hoc
search
Monitor
and alert
Data collection
Report and
and indexing analyze
Splunk storage
Other
Custom
Stores dashboards
Developer
Platform
21
22. Splunk-Hadoop: Co-existence use cases
Real-time Analytics
Side by Side
ETL / recommendation
system
Splunk in-front of Hadoop
Collect, Visualize, Report ETL, Archival, Long Running
Queries
Splunk visualize and
secure Hadoop Data
} Combine
Splunk Index Hadoop Data
23. Splunk: Enabling the Big Data Ecosystem
Real-time Dashboards,
Collection and Reports,
Analysis Access Controls
Splunk Hadoop Connect
• Reliable Data Export
• Import Hadoop Data
> > Splunk App for HadoopOps
> > • End-to-end monitoring,
> > troubleshooting , analysis of
Hadoop environment
24
24. Splunk Hadoop Connect
Delivers reliable integration
between Splunk and Hadoop
Export events to Hadoop
Explore and Browse Hadoop
directories
Import and Index Hadoop data
into Splunk
25
25. Splunk App for HadoopOps
Monitoring the full Hadoop environment – Hadoop, Switch, OS, AS, and Database
Splunk HadoopOps Splunk HadoopOps
Forwarder Package on every Dashboards, alerts and notifications,
host Add Collect & Distributed Monitor Rich UI powered by Splunk search
Knowledge Index Data Search & Alert Framewor
k
Host
Operating System
Infrastructure
26
26. Splunk and Big Data
Product-based Integrated and Performance
Solution End-to-end at scale
Easy to download and Collects data from tens of Proven at multi-terabyte
deploy thousands of sources scale per day
Pre-integrated, end-to- Advanced real-time and Upwards of PB under
end functionality historical analysis of data management
Enterprise-grade features Fast, custom visualizations Thousands of enterprise
for IT and business users customers
Developer API, SDKs
27