SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Jan 2014, HAPPIEST MINDS TECHNOLOGIES

Innovation @Work
Log Management with Logstash
and ElasticSearch
Rishav Rohit

SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY.
2

Copyright Information

This document is exclusive property of Happiest Minds Technologies Pvt. Ltd.It is
intended for limited circulation.

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
3

Contents
Copyright Information ........................................................................................................... 2
Abstract ................................................................................................................................ 4
Introduction .......................................................................................................................... 4
Problem Definition ................................................................................................................ 4
High Level Solution ................................................................................................................ 5
Solution Details ..................................................................................................................... 6
Solution Benefits ................................................................................................................... 6
Solution extend-ability ........................................................................................................ 10
Deliverables ........................................................................................................................ 10
Conclusion........................................................................................................................... 11
References .......................................................................................................................... 11
Happiest Mind Innovators ................................................................................................... 11

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
4

Abstract
Gathering logs from a wide array of servers and applications to be collected,searched, and
analyzed centrally, in real-time, is a challenging task. Once we overcome this challenge we
can get an ocean of insights from these logs, identify problems and come up with a solution
or corrective measures much quickly. In this paper, we will build a highly scalable real-time
log collection, search, visualization and analysis application using Logstash, ElasticSearch and
Kibana.

Introduction
Recent compliance mandates require not only that organizations collect all logs, but also that
they be reviewed regularly, are searchable, and are stored in their original, unaltered, raw
form for mandate-specific timeframes. Log management solutions address data collection
and retention needs in a way that allows them to inexpensively collect, store and manage
large amounts of log data.
To solve this problem we can build a highly scalable solution with real-time analysis using
Logstash, ElasticSearch and Kibana.
Logstash: Logstash is a free, light weight and high-integrality tool for managing events and
logs. It can collect logs, parse them and store them in a central location.It is free and open
source under Apache license.
ElasticSearch: Elasticsearch is a search server based on Lucene. It provides a distributed,
multitenant-capable full-text search engine with a RESTful web interface and schema-free
JSON documents. Elasticsearch is free and open source under Apache license.
Kibana: Kibana is a web-based, highly scalable dashboard solution seamlessly integrated
with ElasticSearch and provides real-time analysis of streaming data. This is also free and
open source product.

Problem Definition
Logs are extremely useful in identifying security incidents, policy violations, fraudulent
activity, and operational problems. They are also valuable when performing audits, forensic
analysis, internal investigations and identifying operational trends and long-term problems.
However, the infinite variety of log data formats makes it impossible to utilize the data
without data normalization.
As organizations grow, the variety of log data sources and the volume of data will increase.
Compounding this challenge is the variability of data formats and distributed nature of these
sources; in addition, every network infrastructure is in a constant state of change, with new
systems, applications, users, and devices being added every day of the year.

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
5

All these challenges can be handled in a cost-effectiveand efficient manner by a log
management solution which can offer these features:
Centralized
Highly reliable
Searchable
Scalable
Secure

High Level Solution
Given below is brief overview of different technologies used for Log Management solution.
Logstash is a tool for managing events and logs. It is capable of filtering, modifying and
shipping out events and logs. Logstash natively offer plugins for variety of sources like
ElasticSearch, RabbitMQ, Redis, S3, Twitter, ZeroMQ, etc. Apart from single line logs it can
handle json, multi-line logs also. It offers wide range of filters like grok, csv, date, geoip, kv,
etc. and can it can ship out the parsed log to ElasticSearch, S3, Redis, ZeroMQ, MongoDB,
etc. A complete list of Logstashinput, output and filter plugins is available at
http://logstash.net/docs/latest/.
The alternatives for Logstash are Splunk, Chukwa, Flume and Graylog but none of these
offers the features like free and open source, high flexibility, low memory consumption and
native plugins for a range of inputs, codecs, filters and outputs.
ElasticSearch is rapidly growing open source search solution and it is used by thousands of
enterprises in virtually every industry. It is being used in production at companies like
Mozilla, StackOverflow, GitHub, Clout, McGraw-Hill, etc.
ElasticSearch provides amazing features like faceted search, auto-complete, routing,
sharding and scales easily. It provides search results in near real-time (close to milliseconds!).
Kibana is light weight web based dashboard and analysis application capable of real-time
analysis of streaming data. It provides dashboard components like maps, histogram, trends
and many other basic components.
The high level architecture for this solution is given in the diagram below:

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
6

Diagram – HLD of Log Management Solution
In the above architecture we have three components:
Logstash Agent
ElasticSearch Cluster
Kibana UI
Logstash agent is a light java application running on the server(s) which is/are producing logs.
It filters and parses log and then ships out a json document to ElasticSearch cluster.
ElasticSearch cluster acts as a persistent store for logs and offers real-time search
capabilities. Using its distributed architecture ElasticSearch can scale massively without
compromising on performance.
Kibana is an UI dashboard and analysis tool. It offers both pre-configured dashboards and
on-demand dashboards. Kibana makes use of REST APIs to interact with ElasticSearch.

Solution Details
For purpose of demo of this solution I have used clickstream logs from ECML/PKDD 2005
Discovery Challenge. Some sample log lines are shown below:
12;1075658406;195.146.109.248;05aa4f4db0162e5723331042eb9ce8a7;/ct/?c=153;http://www.shop3.cz/
12;1075658407;212.65.194.144;86140090a2e102f1644f29e5ddadad9b;/ls/?id=34;http://www.shop3.cz/ct/?c=155
12;1075658409;62.24.70.41;851f20e644eb8bf82bfdbe4379050e2e;/txt/?c=734;http://www.shop3.cz/onakupu/

These log lines are delimited by semi-colon (;) and have below mentioned fields in order:
shop_id

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
7

unixtime
client ip
session
visited page
referrer
For making the demo we need to create a logstash configuration file (clickstream.conf) which
consists of specifying inputs, filters and outputs.
The clickstream.conf file looks like:

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
8

input {
file { # path for clickstream log
path =>"/path/to/_2004_02_01_19_click_stream.log" # define a type for all events
handeled by this input
type =>"weblog"
start_position =>"beginning" # the clickstream log is in character set ISO-8859-1
codec => plain {charset =>"ISO-8859-1"}
}
}
filter {
csv { # define columns present in weblog
columns =>[shop_id, unixtime, client_ip, session, page, referrer]
separator =>";"
}
grok { # get visited page and page parameters
match =>["page","%{URIPATH:page_visited}(?:%{URIPARAM:page_params})?"]
remove_field =>["page"]
}
date { # as we are getting unixtime field in epoch seconds we will convert it to normal
timestamp
match =>["unixtime","UNIX"]
}
geoip { # this will convert ip to longitude-latitude using GeoLiteCity database from Maxmind
source =>"client_ip"
fields =>["latitude","longitude"]
target =>"geoip"
add_field =>["[geoip][coordinates]","%{[geoip][longitude]}"]
add_field =>["[geoip][coordinates]","%{[geoip][latitude]}"]
}
mutate { # this will convert geoip.coordinates to float values
convert =>["[geoip][coordinates]","float"]}
}
output { # store output in local elasticsearch cluster
elasticsearch {
host =>"127.0.0.1"
}
}

In the above logstash configuration file we have defined the input to be a log file and given
the absolute path for the log. In filter section of we are parsing different fields, converting
epoch seconds to date time format and converting IP address to latitude-longitude

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
9

combination for plotting them on map. Finally we are storing the parsed logs to a local
ElasticSearch cluster.
To start the logstash agent on the server run below command:
java -jar logstash-1.3.2-flatjar.jar agent -f clickstream.conf --web
This command will invoke logstash JVM process which will parse the logs, index them to
ElasticSearch and also start Kibana UI on http://localhost:9292/. By making some simple
dashboard in Kibana UI we can visualize the logs.
Some sample screenshots from Kibana UI are given below:

Screenshot 1 - Histogram showing page landing count for different time interval.

Screenshot 2 – Map showing geographical distribution of users.

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
10

Screenshot 3 – Table showing different fields of logs.

Solution Benefits
The benefits offered by this solution are listed below:
1. All the tools used in this solution are free and open source so this is a very costeffective solution.
2. Development effort required is very low, as on coding part only logstash
configuration file needs to be written and for UI, Kibana dashboards needs to be
designed.
3. This solution is highly scalable. Logstash is tested to process around 25,000
events/per node/per second and ElasticSearch is used in production by many web
scale companies.
4. All the tools are open sourced and are being actively contributed to, by a large
developer community.
5. Logstash consumes very less memory, around 150MB.

Solution extend-ability
Logstash not only manages logs but it is capable of handling different types of events like
JSON, ActiveMQ, RabbitMQ, ZeroMQ, Twitter feeds, etc. It can also output aggregated
counts of different events. And it is capable of shipping out events to a variety of tools like
Riak, Redis, S3, Graphite, etc.
Apart from used as a search engine ElasticSearch be used as a NoSQL database, historical
archive and real-time analytics tool.

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
11

The above mentioned features of Logstash and ElasticSearch offers us practical application of
this solution for many business problems.

Deliverables
Presentation of the solution with a focus on architecture, design and use cases.

Conclusion
The Log Management solution proposed using Logstash, ElasticSearch and Kibana is a costeffective, efficient, reliable and highly scalable solution.
These products are backed by an active user community which keeps adding values and new
functionalities to them. These are also backed and supported by the ElasticSearchcompany

References
Logstash - http://www.elasticsearch.org/overview/logstash/
ElasticSearch - http://www.elasticsearch.org/overview/
Kibana - http://www.elasticsearch.org/overview/kibana/
ElasticSearch Users - http://www.elasticsearch.com/case-studies/
Logstash Performance Test - https://gist.github.com/paulczar/4513552
Logstash Memory Consumption - http://blog.sematext.com/2013/11/05/logstashperformance-monitoring/
ECML/PKDD 2005 Discovery Challenge - http://lisp.vse.cz/challenge/ecmlpkdd2005/

Happiest Mind Innovators
Number of contributors - 1
Names of the contributors – Rishav Rohit
Role of the contributor – Solution design and development

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved

Weitere ähnliche Inhalte

Was ist angesagt?

Analyze and visualize non-relational data with DocumentDB + Power BI
Analyze and visualize non-relational data with DocumentDB + Power BIAnalyze and visualize non-relational data with DocumentDB + Power BI
Analyze and visualize non-relational data with DocumentDB + Power BISriram Hariharan
 
Patterns for Deploying Analytics in the Real World
Patterns for Deploying Analytics in the Real WorldPatterns for Deploying Analytics in the Real World
Patterns for Deploying Analytics in the Real WorldSriskandarajah Suhothayan
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and AnalyticsSrinath Perera
 
A head start on cloud native event driven applications - bigdatadays
A head start on cloud native event driven applications - bigdatadaysA head start on cloud native event driven applications - bigdatadays
A head start on cloud native event driven applications - bigdatadaysSriskandarajah Suhothayan
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchRuslan Zavacky
 
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...WSO2
 
Data Analytics with Druid
Data Analytics with DruidData Analytics with Druid
Data Analytics with DruidYousun Jeong
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Roy Russo
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsSamantha Quiñones
 
Make it fast for everyone - performance and middleware design
Make it fast for everyone - performance and middleware designMake it fast for everyone - performance and middleware design
Make it fast for everyone - performance and middleware designSriskandarajah Suhothayan
 
MongoDB - General Purpose Database
MongoDB - General Purpose DatabaseMongoDB - General Purpose Database
MongoDB - General Purpose DatabaseAshnikbiz
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkMongoDB
 
Real-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidReal-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidJan Graßegger
 
Session 2 - NGSI-LD primer & Smart Data Models | Train the Trainers Program
Session 2 - NGSI-LD primer & Smart Data Models | Train the Trainers ProgramSession 2 - NGSI-LD primer & Smart Data Models | Train the Trainers Program
Session 2 - NGSI-LD primer & Smart Data Models | Train the Trainers ProgramFIWARE
 

Was ist angesagt? (20)

Analyze and visualize non-relational data with DocumentDB + Power BI
Analyze and visualize non-relational data with DocumentDB + Power BIAnalyze and visualize non-relational data with DocumentDB + Power BI
Analyze and visualize non-relational data with DocumentDB + Power BI
 
The Rise of Streaming SQL
The Rise of Streaming SQLThe Rise of Streaming SQL
The Rise of Streaming SQL
 
Patterns for Deploying Analytics in the Real World
Patterns for Deploying Analytics in the Real WorldPatterns for Deploying Analytics in the Real World
Patterns for Deploying Analytics in the Real World
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
A head start on cloud native event driven applications - bigdatadays
A head start on cloud native event driven applications - bigdatadaysA head start on cloud native event driven applications - bigdatadays
A head start on cloud native event driven applications - bigdatadays
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
 
druid.io
druid.iodruid.io
druid.io
 
Data Analytics with Druid
Data Analytics with DruidData Analytics with Druid
Data Analytics with Druid
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time Metrics
 
Spark and MongoDB
Spark and MongoDBSpark and MongoDB
Spark and MongoDB
 
Make it fast for everyone - performance and middleware design
Make it fast for everyone - performance and middleware designMake it fast for everyone - performance and middleware design
Make it fast for everyone - performance and middleware design
 
MongoDB - General Purpose Database
MongoDB - General Purpose DatabaseMongoDB - General Purpose Database
MongoDB - General Purpose Database
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & Spark
 
Real-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidReal-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and Druid
 
Session 2 - NGSI-LD primer & Smart Data Models | Train the Trainers Program
Session 2 - NGSI-LD primer & Smart Data Models | Train the Trainers ProgramSession 2 - NGSI-LD primer & Smart Data Models | Train the Trainers Program
Session 2 - NGSI-LD primer & Smart Data Models | Train the Trainers Program
 

Ähnlich wie Log Management with Logstash and ElasticSearch

Centralization of all log (application, docker, security, ...)
Centralization of all log (application, docker, security, ...)Centralization of all log (application, docker, security, ...)
Centralization of all log (application, docker, security, ...)Thierry Gayet
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackRohit Sharma
 
Log analysis using elk
Log analysis using elkLog analysis using elk
Log analysis using elkRushika Shah
 
Filebeat Elastic Search Presentation.pptx
Filebeat Elastic Search Presentation.pptxFilebeat Elastic Search Presentation.pptx
Filebeat Elastic Search Presentation.pptxKnoldus Inc.
 
Log Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and SparkLog Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and SparkIRJET Journal
 
Enabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesEnabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesVasu S
 
LogStash: Concept Run-Through
LogStash: Concept Run-ThroughLogStash: Concept Run-Through
LogStash: Concept Run-ThroughManuj Aggarwal
 
Using Elasticsearch for Analytics
Using Elasticsearch for AnalyticsUsing Elasticsearch for Analytics
Using Elasticsearch for AnalyticsVaidik Kapoor
 
Elk presentation 2#3
Elk presentation 2#3Elk presentation 2#3
Elk presentation 2#3uzzal basak
 
Configuring elasticsearch for performance and scale
Configuring elasticsearch for performance and scaleConfiguring elasticsearch for performance and scale
Configuring elasticsearch for performance and scaleBharvi Dixit
 
Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Practical operability techniques for teams - Matthew Skelton - Agile in the C...Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Practical operability techniques for teams - Matthew Skelton - Agile in the C...Skelton Thatcher Consulting Ltd
 
Open source log analytics
Open source log analyticsOpen source log analytics
Open source log analyticsVinod Nayal
 
Elastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxElastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxKnoldus Inc.
 
Centralized logging
Centralized loggingCentralized logging
Centralized loggingblessYahu
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
 

Ähnlich wie Log Management with Logstash and ElasticSearch (20)

Centralization of all log (application, docker, security, ...)
Centralization of all log (application, docker, security, ...)Centralization of all log (application, docker, security, ...)
Centralization of all log (application, docker, security, ...)
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK Stack
 
Log analysis using elk
Log analysis using elkLog analysis using elk
Log analysis using elk
 
Filebeat Elastic Search Presentation.pptx
Filebeat Elastic Search Presentation.pptxFilebeat Elastic Search Presentation.pptx
Filebeat Elastic Search Presentation.pptx
 
Log Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and SparkLog Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and Spark
 
FluentD vs. Logstash
FluentD vs. LogstashFluentD vs. Logstash
FluentD vs. Logstash
 
Enabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesEnabling SQL Access to Data Lakes
Enabling SQL Access to Data Lakes
 
LogStash: Concept Run-Through
LogStash: Concept Run-ThroughLogStash: Concept Run-Through
LogStash: Concept Run-Through
 
Archonnex at ICPSR
Archonnex at ICPSRArchonnex at ICPSR
Archonnex at ICPSR
 
Using Elasticsearch for Analytics
Using Elasticsearch for AnalyticsUsing Elasticsearch for Analytics
Using Elasticsearch for Analytics
 
Elk presentation 2#3
Elk presentation 2#3Elk presentation 2#3
Elk presentation 2#3
 
Overview on elastic search
Overview on elastic searchOverview on elastic search
Overview on elastic search
 
Configuring elasticsearch for performance and scale
Configuring elasticsearch for performance and scaleConfiguring elasticsearch for performance and scale
Configuring elasticsearch for performance and scale
 
Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Practical operability techniques for teams - Matthew Skelton - Agile in the C...Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Practical operability techniques for teams - Matthew Skelton - Agile in the C...
 
Open source log analytics
Open source log analyticsOpen source log analytics
Open source log analytics
 
Elastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxElastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptx
 
UCIAD overview
UCIAD overviewUCIAD overview
UCIAD overview
 
Centralized logging
Centralized loggingCentralized logging
Centralized logging
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 

Kürzlich hochgeladen

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Log Management with Logstash and ElasticSearch

  • 1. Jan 2014, HAPPIEST MINDS TECHNOLOGIES Innovation @Work Log Management with Logstash and ElasticSearch Rishav Rohit SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY.
  • 2. 2 Copyright Information This document is exclusive property of Happiest Minds Technologies Pvt. Ltd.It is intended for limited circulation. © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 3. 3 Contents Copyright Information ........................................................................................................... 2 Abstract ................................................................................................................................ 4 Introduction .......................................................................................................................... 4 Problem Definition ................................................................................................................ 4 High Level Solution ................................................................................................................ 5 Solution Details ..................................................................................................................... 6 Solution Benefits ................................................................................................................... 6 Solution extend-ability ........................................................................................................ 10 Deliverables ........................................................................................................................ 10 Conclusion........................................................................................................................... 11 References .......................................................................................................................... 11 Happiest Mind Innovators ................................................................................................... 11 © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 4. 4 Abstract Gathering logs from a wide array of servers and applications to be collected,searched, and analyzed centrally, in real-time, is a challenging task. Once we overcome this challenge we can get an ocean of insights from these logs, identify problems and come up with a solution or corrective measures much quickly. In this paper, we will build a highly scalable real-time log collection, search, visualization and analysis application using Logstash, ElasticSearch and Kibana. Introduction Recent compliance mandates require not only that organizations collect all logs, but also that they be reviewed regularly, are searchable, and are stored in their original, unaltered, raw form for mandate-specific timeframes. Log management solutions address data collection and retention needs in a way that allows them to inexpensively collect, store and manage large amounts of log data. To solve this problem we can build a highly scalable solution with real-time analysis using Logstash, ElasticSearch and Kibana. Logstash: Logstash is a free, light weight and high-integrality tool for managing events and logs. It can collect logs, parse them and store them in a central location.It is free and open source under Apache license. ElasticSearch: Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. Elasticsearch is free and open source under Apache license. Kibana: Kibana is a web-based, highly scalable dashboard solution seamlessly integrated with ElasticSearch and provides real-time analysis of streaming data. This is also free and open source product. Problem Definition Logs are extremely useful in identifying security incidents, policy violations, fraudulent activity, and operational problems. They are also valuable when performing audits, forensic analysis, internal investigations and identifying operational trends and long-term problems. However, the infinite variety of log data formats makes it impossible to utilize the data without data normalization. As organizations grow, the variety of log data sources and the volume of data will increase. Compounding this challenge is the variability of data formats and distributed nature of these sources; in addition, every network infrastructure is in a constant state of change, with new systems, applications, users, and devices being added every day of the year. © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 5. 5 All these challenges can be handled in a cost-effectiveand efficient manner by a log management solution which can offer these features: Centralized Highly reliable Searchable Scalable Secure High Level Solution Given below is brief overview of different technologies used for Log Management solution. Logstash is a tool for managing events and logs. It is capable of filtering, modifying and shipping out events and logs. Logstash natively offer plugins for variety of sources like ElasticSearch, RabbitMQ, Redis, S3, Twitter, ZeroMQ, etc. Apart from single line logs it can handle json, multi-line logs also. It offers wide range of filters like grok, csv, date, geoip, kv, etc. and can it can ship out the parsed log to ElasticSearch, S3, Redis, ZeroMQ, MongoDB, etc. A complete list of Logstashinput, output and filter plugins is available at http://logstash.net/docs/latest/. The alternatives for Logstash are Splunk, Chukwa, Flume and Graylog but none of these offers the features like free and open source, high flexibility, low memory consumption and native plugins for a range of inputs, codecs, filters and outputs. ElasticSearch is rapidly growing open source search solution and it is used by thousands of enterprises in virtually every industry. It is being used in production at companies like Mozilla, StackOverflow, GitHub, Clout, McGraw-Hill, etc. ElasticSearch provides amazing features like faceted search, auto-complete, routing, sharding and scales easily. It provides search results in near real-time (close to milliseconds!). Kibana is light weight web based dashboard and analysis application capable of real-time analysis of streaming data. It provides dashboard components like maps, histogram, trends and many other basic components. The high level architecture for this solution is given in the diagram below: © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 6. 6 Diagram – HLD of Log Management Solution In the above architecture we have three components: Logstash Agent ElasticSearch Cluster Kibana UI Logstash agent is a light java application running on the server(s) which is/are producing logs. It filters and parses log and then ships out a json document to ElasticSearch cluster. ElasticSearch cluster acts as a persistent store for logs and offers real-time search capabilities. Using its distributed architecture ElasticSearch can scale massively without compromising on performance. Kibana is an UI dashboard and analysis tool. It offers both pre-configured dashboards and on-demand dashboards. Kibana makes use of REST APIs to interact with ElasticSearch. Solution Details For purpose of demo of this solution I have used clickstream logs from ECML/PKDD 2005 Discovery Challenge. Some sample log lines are shown below: 12;1075658406;195.146.109.248;05aa4f4db0162e5723331042eb9ce8a7;/ct/?c=153;http://www.shop3.cz/ 12;1075658407;212.65.194.144;86140090a2e102f1644f29e5ddadad9b;/ls/?id=34;http://www.shop3.cz/ct/?c=155 12;1075658409;62.24.70.41;851f20e644eb8bf82bfdbe4379050e2e;/txt/?c=734;http://www.shop3.cz/onakupu/ These log lines are delimited by semi-colon (;) and have below mentioned fields in order: shop_id © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 7. 7 unixtime client ip session visited page referrer For making the demo we need to create a logstash configuration file (clickstream.conf) which consists of specifying inputs, filters and outputs. The clickstream.conf file looks like: © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 8. 8 input { file { # path for clickstream log path =>"/path/to/_2004_02_01_19_click_stream.log" # define a type for all events handeled by this input type =>"weblog" start_position =>"beginning" # the clickstream log is in character set ISO-8859-1 codec => plain {charset =>"ISO-8859-1"} } } filter { csv { # define columns present in weblog columns =>[shop_id, unixtime, client_ip, session, page, referrer] separator =>";" } grok { # get visited page and page parameters match =>["page","%{URIPATH:page_visited}(?:%{URIPARAM:page_params})?"] remove_field =>["page"] } date { # as we are getting unixtime field in epoch seconds we will convert it to normal timestamp match =>["unixtime","UNIX"] } geoip { # this will convert ip to longitude-latitude using GeoLiteCity database from Maxmind source =>"client_ip" fields =>["latitude","longitude"] target =>"geoip" add_field =>["[geoip][coordinates]","%{[geoip][longitude]}"] add_field =>["[geoip][coordinates]","%{[geoip][latitude]}"] } mutate { # this will convert geoip.coordinates to float values convert =>["[geoip][coordinates]","float"]} } output { # store output in local elasticsearch cluster elasticsearch { host =>"127.0.0.1" } } In the above logstash configuration file we have defined the input to be a log file and given the absolute path for the log. In filter section of we are parsing different fields, converting epoch seconds to date time format and converting IP address to latitude-longitude © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 9. 9 combination for plotting them on map. Finally we are storing the parsed logs to a local ElasticSearch cluster. To start the logstash agent on the server run below command: java -jar logstash-1.3.2-flatjar.jar agent -f clickstream.conf --web This command will invoke logstash JVM process which will parse the logs, index them to ElasticSearch and also start Kibana UI on http://localhost:9292/. By making some simple dashboard in Kibana UI we can visualize the logs. Some sample screenshots from Kibana UI are given below: Screenshot 1 - Histogram showing page landing count for different time interval. Screenshot 2 – Map showing geographical distribution of users. © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 10. 10 Screenshot 3 – Table showing different fields of logs. Solution Benefits The benefits offered by this solution are listed below: 1. All the tools used in this solution are free and open source so this is a very costeffective solution. 2. Development effort required is very low, as on coding part only logstash configuration file needs to be written and for UI, Kibana dashboards needs to be designed. 3. This solution is highly scalable. Logstash is tested to process around 25,000 events/per node/per second and ElasticSearch is used in production by many web scale companies. 4. All the tools are open sourced and are being actively contributed to, by a large developer community. 5. Logstash consumes very less memory, around 150MB. Solution extend-ability Logstash not only manages logs but it is capable of handling different types of events like JSON, ActiveMQ, RabbitMQ, ZeroMQ, Twitter feeds, etc. It can also output aggregated counts of different events. And it is capable of shipping out events to a variety of tools like Riak, Redis, S3, Graphite, etc. Apart from used as a search engine ElasticSearch be used as a NoSQL database, historical archive and real-time analytics tool. © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
  • 11. 11 The above mentioned features of Logstash and ElasticSearch offers us practical application of this solution for many business problems. Deliverables Presentation of the solution with a focus on architecture, design and use cases. Conclusion The Log Management solution proposed using Logstash, ElasticSearch and Kibana is a costeffective, efficient, reliable and highly scalable solution. These products are backed by an active user community which keeps adding values and new functionalities to them. These are also backed and supported by the ElasticSearchcompany References Logstash - http://www.elasticsearch.org/overview/logstash/ ElasticSearch - http://www.elasticsearch.org/overview/ Kibana - http://www.elasticsearch.org/overview/kibana/ ElasticSearch Users - http://www.elasticsearch.com/case-studies/ Logstash Performance Test - https://gist.github.com/paulczar/4513552 Logstash Memory Consumption - http://blog.sematext.com/2013/11/05/logstashperformance-monitoring/ ECML/PKDD 2005 Discovery Challenge - http://lisp.vse.cz/challenge/ecmlpkdd2005/ Happiest Mind Innovators Number of contributors - 1 Names of the contributors – Rishav Rohit Role of the contributor – Solution design and development © 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved