Log Data Analysis Platform is a completely automated system to ingest, process and store huge amount of log data based on Flume, Spark, Hadoop, Impala, Hive, ElasticSearch and Kibana.
4. Demo Lab: Why we’ve started this project?
1) Increase Internal Experience
2) Create Reference Solution w/o NDA Limitations
3) Get Playground for Tests
4) Provide Demo Environment for Customers (using their data)
5) Decrease time to Market (by introducing automation)
6. Log Data Analysis Platform Details
Key Facts:
• ~270-300 Web Servers
• Log Types: HTTPD Access
logs, Error logs, Application
Server Servlet, OS Service
Logs
• ~500K events per minute
• 150GB of data per day
Technologies:
• Flume
• Hadoop/HDFS, MapReduce
• Hive, Impala
• Oozie
• Elasticsearch, Kibana 3
• Tableau Analytics platform
• Puppet + Vagrant
7. Log Data Examples
Access log:
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
Error log:
[Sun Mar 7 20:58:27 2004] [info] [client 64.242.88.10] (104)Connection reset by peer: client
stopped connection before send body completed
[Sun Mar 7 21:16:17 2004] [error] [client 24.70.56.49] File does not exist:
/home/httpd/twiki/view/Main/WebHome
Vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 305416 260688 29160 2356920 2 2 4 1 0 0 6 1 92 2 0
iostat
Linux 2.6.32-100.28.5.el6.x86_64 (dev-db) 07/09/2011
avg-cpu: %user %nice %system %iowait %steal %idle
5.68 0.00 0.52 2.03 0.00 91.76
16. Solution Architecture
Batch Layer Serving Layer
Speed Layer
Raw Data
Storage
Data
Strea
m
Real-time
Views
Static Views
Precomputing
Precomputing
Ad-hoc Batch
Views
Static Batch
Views
Corporate BI
Tool
Legend:
Layer boundary
Data flow (with direction indicated)
Query flow
Apache HTTP Servers
Raw Data
Storage Pre-computing Batch Views
Real-Time Views
Dashboard/
Search
Data Stream
Real-Time Processing and
Aggregations
BI Tool
Avro as a Raw Data Storage file
format
Parquet as a Batch Views file
format
Star schema as a Batch Views
data model
30. Outcome
1) Demo lab, playground, testing platform (in 1 hour)
2) Sizing Calculator
3) Help to get 3 new customers (one is really, really
huge)
4) Strategic Partnership with Cloudera
5) Tons of experience and fun
Plans
1) Add support for other Hadoop Distributions
(Hortonworks, MapR)
2) Make Project Open-Source
31. Thank You!
31
SoftServe US Office
One Congress Plaza,
111 Congress Avenue, Suite 2700 Austin, TX
78701
Tel: 512.516.8880
Contacts
Valentyn Kropov
vkrop@softserveinc.com
Tel: 866.687.3588 x4341