Hunk - Unlocking The Power of Big Data Breakout Session

Copyright © 2015 Splunk Inc.
Splunk / Hunk Big Data
Analytics
Raanan Dagan
Sr. SE, Hadoop DE

Mainframe
Data
VMware
Platform for Machine Data
Exchange PCI Security
DB Connect MobileForwarders
Syslog,
TCP,
Other
Sensors,
Control
Systems
600+ Ecosystem of Apps
Stream
SPLUNK TODAY

3
Distributed File System
(semi-structured)
Key/Value, Columnar or
Other (semi-structured)
Relational Database
(highly structured)
MapReduce
Cassandra
Accumulo
MongoDB
Splunk - Big Data Technologies
SQL &
MapReduce
NoSQL
Temporal, Unstructured
Heterogeneous
Hadoop
RDBMS HDFS Storage +
MapReduce
Real-Time Indexing
3
Oracle
MySQL
IBM DB2
Teradata

Hunk – Hadoop

5
Splunk and Hadoop
5
Hunk:
– Main use case = Analyze Hadoop Data using Hadoop Processing
Splunk Hadoop Connect:
– Main use case = Real-time export data from Splunk to Hadoop
Hunk Archive
– Main use case = Archive Splunk indexers to Hadoop
Splunk Monitor Hadoop:
– Main use case = Monitor Hadoop

6
Integrated Analytics Platform
Full-featured,
Integrated
Product
Insights for
Everyone
Works with
What You
Have Today
Explore Visualize Dashboard
s
ShareAnalyze
Hadoop Clusters NoSQL, EMR, S3 Buckets
Hadoop Client Libraries
for Diverse Data Stores

7
Hunk – Unique
7
1. Run Natively in Hadoop:
– Use Hadoop MapReduce
2. Mixed Mode:
– Allows for data Preview
3. Auto deploy SplunkD to DataNodes:
– On the fly Indexing
4. Access Control:
– Allows for many users / many Hadoop directories / support Kerberos
5. Schema On the Fly

Hunk – Demo

9
Run Natively in Hadoop
External resource
(e.g. hadoop.prod)
MapReduce
jobs
Tasks
/ working
directory
Index on data nodes
Hunk
search head >
1
5
3
4
2
NameNode
JobTracker
(YARN)
DataNode /
TaskTracker
DataNode /
TaskTracker
DataNode /
TaskTracker
HDFS
9
Hadoop
MR Jobs

10
Mixed-mode Search
10
Time
Hadoop MR /
Splunk Index
Splunk Stream
Switch over
time
preview
preview
• Data Preview
• Allows users to search interactively by pausing and
refining queries

11
Indexing On the fly - Hunk Data Processing
11
HDFS
Results
Final search
results
ERP
Search process
Remote results Remote results
Search head
MapReduce
Search process
TaskTracker
raw
preprocessed
Remote results
Remote results

12
12
Role-based Security for Shared Clusters
Pass-through
Authentication
• Provide role-based security
for Hadoop clusters
• Access Hadoop resources
under security and
compliance
• Integrates with Kerberos
for Hadoop security
Business
Analyst
Marketing
Analyst
Sys
Admin
Business
Analyst
Queue:
Biz Analytics
Marketing
Analyst
Queue:
Marketing
Sys
Admin2
Queue:
Prod

13
We added these in Hunk 6.*
13
1. Report Acceleration: Get results in seconds
2. Hive Schema: Expose User Created Schema, Parquet, Sequence,
ORC, RC
3. Data Exploration: UI to navigate Hadoop
4. Hunk on EMR (Amazon): Hunk by the Hour
5. Search Head Clustering: Unlimited number of end-users
6. Archive Splunk Indexers to HDFS: Search through years of data

Do not distribute
Splunk and Hadoop - Caching options
14

15
Archiving Splunk Enterprise to Hunk-HDFS
15
• Archive buckets to Hadoop (HDFS) instead of freezing buckets or throwing data away
• Store old data up to 1/10 cheaper in Hadoop cheap batch storage instead of SANs
• Optimize Splunk Enterprise search head performance for real-time monitoring,
alerting and dashboarding with short-term historical context
• Hunk search, analyze and visualize months or years of historical data in Hadoop
• Run federated queries and dashboards across Splunk Enterprise and Hunk
Hadoop Clusters
WARM
COLD
FROZEN

16
Hunk Enables Hadoop as Self Service
16

17
New Search
i ndex=" j obsummar y_l ogs_al l _r ed" cl ust er =" di l i t hi um* " | eval t ot al _sl ot _seconds=( m apSl ot Seconds + r educeSl ot Sec
onds) | eval gb_hour s=( ( t ot al _sl ot _seconds * 0. 5) / 3600) | eval gb_hour s=r ound( gb_h our s) | t i mechar t span=6h sum
( gb_hour s) as gb_hour s by queue
Last 7 days
✓ 1,175,726 events (5/20/ 14 8:00:00.000 PM to 5/ 27/14 8:26:26.000 PM)
200,000
400,000
600,000
_time ↕
OTH
ER
↕
apg_dai
lyhigh_
p3 ↕
apg_dail
ymedium
_p5 ↕
apg_hou
rlyhigh_
p1 ↕
apg_ho
urlylow_
p4 ↕
apg_hourl
ymedium
_p2 ↕
apg
_p7
↕
curveb
all_larg
e ↕
curveb
all_me
d ↕
sling
shot
↕
sling
stone
↕
Visualization
_time
Wed May 21
2014
Thu May 22 Fri May 23 Sat May 24 Sun May 25 Mon May 26
Yahoo - Visualizing Hadoop
1
• 600PB of Data
• Very large clusters used by many
groups across the enterprise
• 35,000 individual Datanodes
• Hadoop is provided as a Self
Service

18
Vantrix Mobile media optimization
1
144 Hadoop Nodes,
69 TB SSD Storage
Analytics Application
10 Million subscribers generate:
• 80GB of raw session log data / day
• 26 Million video data session records
Hunk Query
• 20 sec – search through 27M events
• Returning 4.7M events
Hunk as indexer - Automatically indexed and counted field value occurrences
Hunk as Self Service - Proved invaluable for identifying and exploring use cases
Hunk business value – Help identify when subscribers abandon video

Hunk – RDBMS and NoSQL

20
Hunk - Connect to NoSQL & SQL Databases
• Build custom streaming resource
libraries
• Search and analyze data from
other data stores in Hunk
• In partnership with leading
NoSQL vendors
• Use in conjunction with DB
Connect for relational database
lookups

21
MongoDB App for Hunk - Search Architecture
Query per
Index/Virtual Index
Search
Processor
Hunk
Search Head >
1.
3.
4.
2.
Splunk
Distributed
Search
Hadoop External
Results Provider
MongoDB
Streaming
Resource Library
MongoDBProvider
MongoDB
MongoDB
MongoDB
JSON Config
Results Reduction

22
Mongo Specific Integration Highlights
22
index=mongodb foo=xyz | timechart avg(bar) by baz
Predicate Pushdown Projections
Filtering terms are processed on the MongoDB
side, so only results where the field foo matches
xyz are returned
We only return back fields which are mentioned
in the particular search, in this case _time, bar
and baz

23
Splunk DB Connect
Enrich search results with additional
business context
Easily import data into Splunk for
deeper analysis
Integrate multiple DBs concurrently
Simple set-up, non-evasive and secure
Reliable, scalable, real-time
integration between Splunk and
traditional relational databases
Microsoft SQL
server
JDBC
Database
lookup
Database
query
Connection
pooling
Other
databases
Oracle
database
Java Bridge Server
23

The 6th Annual Splunk Worldwide Users’ Conference
September 21-24, 2015  The MGM Grand Hotel, Las Vegas
Did you like this session on Splunk for Big Data? You should check out
these sessions at .conf2015?
• Splunk Hunk – Performance, Best Practices, and Troubleshooting
• Archive Splunk Data and Access Using Hadoop Tools
• Hunk and Elastic Map Reduce (Amazon EMR)
• Real World Big Data Architecture (Splunk, Hunk, DB Connect)
• Splunk Distributed Processing with Spark
Register at: conf.splunk.com

The 6th Annual Splunk Worldwide Users’ Conference
September 21-24, 2015  The MGM Grand Hotel, Las Vegas
• 50+ Customer Speakers
• 50+ Splunk Speakers
• 35+ Apps in Splunk Apps Showcase
• 65 Technology Partners
• 4,000+ IT & Business Professionals
• 2 Keynote Sessions
• 3 days of technical content (150+ Sessions)
• 3 days of Splunk University
– Get Splunk Certified
– Get CPE credits for CISSP, CAP, SSCP, etc.
– Save thousands on Splunk education!
25
Register at: conf.splunk.com

26
www.splunk.com/apptitude
July 20th, 2015 Submission deadline

27
We Want to Hear your Feedback!
After the Breakout Sessions conclude
Text Splunk to 878787
And be entered for a chance to win a $100 AMEX gift card!

Thank you

Hunk - Unlocking The Power of Big Data Breakout Session

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Hunk - Unlocking The Power of Big Data Breakout Session

Ähnlich wie Hunk - Unlocking The Power of Big Data Breakout Session (20)

Mehr von Splunk

Mehr von Splunk (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Hunk - Unlocking The Power of Big Data Breakout Session

Hinweis der Redaktion