The document discusses using Elastic Stack to store and analyze 2 million audit logs per day from distributed systems. It introduces Elastic Stack components like Logstash, Kibana, Elasticsearch and Beats. It describes how the speaker's company Gemalto used Logstash and Elasticsearch to ingest logs from .NET applications into Elasticsearch at speeds of 1000 logs/second. Future plans include using Elasticsearch's machine learning and integrating with Kafka for cross data center replication.
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
2 Million Audit Logs to Elasticsearch Daily
1. STORE 2 MILLION OF
AUDIT LOGS A DAY INTO
ELASTICSEARCH
Taswar Bhatti (Microsoft
MVP)
GEMALTO
@taswarbhatti
http://taswar.zeytinsoft.co
m
taswar@gmail.com
2. WHO AM I?
- 4 years Microsoft MVP
- 17 years in software industry
- Currently working as System Architect in Enterprise Security Space
(Gemalto)
- You may not have heard of Gemalto but 1/3 of the world population
uses Gemalto they just dont know it
- Gemalto has stacks build in many environnent .NET, Java, Node,
Lua, Python, mobile (Android, IOS), ebanking etc
9/22/2017 2
3. AGENDA
- Problem we had and wanted to solve with Elastic Stack
- Intro to Elastic Stack (Ecosystem)
- Logstash
- Kibana
- Beats
- Elastic Search flows designs that we have considered
- Future plans of using Elastic Search
9/22/2017 3
4. QUESTION & POLL
- How many of you are using Elastic or some other logging solution?
- How do you normally log? Where do you log?
- Do you log in Relational Database?
9/22/2017 4
5. HOW DO YOU TROUBLESHOOT OR
FIND YOUR BUGS
- Typically in a distributed environment one has to go through the
logs to find out where the issue is
- Could be multiple systems that you have to go through which
machine/server generated the log or monitoring multiple logs
- Even monitor firewall logs to find traffic routing through which data
center
- Chuck Norris never troubleshoot; the trouble kills themselves when
they see him coming
9/22/2017 5
7. OUR PROBLEM
- We had distributed systems (microservices) that would generate
many different types of logs, in different data centers
- We also had authentication audit logs that had to be secure and
stored for 1 year
- We generate around 2 millions records of audit logs a day, 4TB with
replications
- We need to generate reports out of our data for customers
- We were still using Monolith Solution in some core parts of the
application
- Growing pains of a successful application
- We want to use a centralized scalable logging system for all our9/22/2017 7
9. A LITTLE HISTORY OF
ELASTICSEARCH
- Shay Banon created Compass in 2004
- Released Elastic Search 1.0 in 2010
- ElasticSearch the company was formed in 2012
- Shay wife is still waiting for her receipe app
9/22/2017 9
14. ELASTICSEARCH INDICES
- Elastic organizes document in indices
- Lucene writes and maintains the index files
- ElasticSearch writes and maintains metadata on top of Lucene
- Example: field mappings, index settings and other cluster metadata
9/22/2017 14
16. ELASTIC CONCEPTS
- Cluster : A cluster is a collection of one or more nodes (servers)
- Node : A node is a single server that is part of your cluster, stores
your data, and participates in the cluster’s indexing and search
capabilities
- Index : An index is a collection of documents that have somewhat
similar characteristics. (e.g Product, Customer, etc)
- Type : Within an index, you can define one or more types. A type is
a logical category/partition of your index.
- Document : A document is a basic unit of information that can be
indexed
- Shard/Replica: Index divided into multiple pieces called shards,
replicas are copy of your shards9/22/2017 16
17. ELASTIC NODES
- Master Node : which controls the cluster
- Data Node : Data nodes hold data and perform data related
operations such as CRUD, search, and aggregations.
- Ingest Node : Ingest nodes are able to apply an ingest pipeline to a
document in order to transform and enrich the document before
indexing
- Coordinating Node : only route requests, handle the search reduce
phase, and distribute bulk indexing.
9/22/2017 17
23. LOGSTASH
- Ruby application runs under
JRuby on the JVM
- Collects, parse, enrich data
- Horizontally scalable
- Apache 2.0 License
- Large amount of public plugins
written by Community
https://github.com/logstash-
plugins
9/22/2017 23
31. BEATS
- Lightweight shippers written in Golang (Non JVM shops can use
them)
- They follow unix philosophy; do one specific thing, and do it well
- Filebeat : Logfile (think of it tail –f on steroids)
- Metricbeat : CPU, Memory (like top), redis, mongodb usage
- Packetbeat : Wireshark uses libpcap, monitoring packet http etc
- Winlogbeat : Windows event logs to elastic
- Dockbeat : Monitoring docker
- Large community lots of other beats offered as opensource
9/22/2017 31
34. X-PACK
- Elastic commercial offering (This is one of the ways they make
money)
- X-Pack is an Elastic Stack extension that bundles
- Security (https to elastic, password to access Kibana)
- Alerting
- Monitoring
- Reporting
- Graph capabilities
- Machine Learning
9/22/2017 34
36. KIBANA
- Visual Application for Elastic Search (JS, Angular, D3)
- Powerful frontend for dashboard for visualizing index information
from elastic search
- Historical data to form charts, graphs etc
- Realtime search for index information
9/22/2017 36
39. DESIGNS WE WENT THROUGH
- We started with simple design to measure throughput
- One instance of logstash and one instance of ElasticSearch with
filebeat
9/22/2017 39
40. DOTNET CORE APP
- We used a dotnetcore application to generate logs
- Serilog to generate into json format and stored on file
- Filebeat was installed on the linux machine to ship the logs to
logstash
9/22/2017 40
47. WHAT WE ARE GOING WITH FOR
NOW, UNTIL…..
9/22/2017 47
48. CONSIDERATIONS OF DATA
- Index by day make sense in some cases
- In other you may want to index by size rather (Black Friday more
traffic than other days) when Shards are not balance ElasticSearch
doesn’t like that
- Don’t index everything, if you are not going to search on specific
fields mark them as text
9/22/2017 48
49. FUTURE CONSIDERATIONS
- Investigate into Elastic Search Machine learning
- ElasticSearch with Kafka for cross data center replication
9/22/2017 49
50. THANK YOU & OPEN TO
QUESTIONS
- Questions???
- Contact: Taswar@gmail.com
- Blog:
http://Taswar.zeytinsoft.com
- Twitter: @taswarbhatti
- LinkedIn (find me and add me)
9/22/2017 50