My presentation to the Wellington AWS User Group on giving the business situational awareness, anomaly detection, and process monitoring. A how to guide on using "traditional" IT tools on the generally-more-important problems of the business.
3. IT knows the value
of fast feedback
The Second Way
of DevOps is
“shorten and amplify
feedback loops”.
https://itrevolution.com/the-three-ways-principles-underpinning-devops/
4. Feedback tools in software development
1. IDE feedback
2. Unit tests
3. Continuous Integration
4. SecOps tests in build pipeline
5. Application Performance Monitoring
5. IT has sub-second feedback
Most business leaders have the
same feedback cycle as they did
50 years ago.
6. “War is the realm of uncertainty;
three quarters of the factors on
which action in war is based are
wrapped in a fog of greater or
lesser uncertainty. A sensitive and
discriminating judgment is called
for; a skilled intelligence to scent
out the truth.”
–Carl von Clausewitz
19th Century Prussian General
7. The fog of business
Information is late, disconnected, and vague.
8. Totals Hide
Information
If your analysis uses fixed-
length periods you will miss
trends.
Weekly hours
Monday Tuesday Wednesday Thursday Friday Saturday Sunday Total
Constant 8 8 8 8 8 0 0 40
Spiky 2 5 3 12 11 5 2 40
Constant Spiky
9. Arbitrary Boundaries Hide Trends
0
20
40
60
80
2017-9-1 2017-9-10 2017-9-19 2017-9-28 2017-10-7 2017-10-16 2017-10-25
September Total
= 1865
October Total
= 1880
November
<<
October
10. “If it moves, graph it.
If it doesn’t move,
graph it anyway,
just in case it does.”
–Etsy
11. Elasticsearch
1. Makes fast feedback easy,
both for IT and business people
2. Makes awesome graphs
3. Is super fast and massively
scalable
12. What is ELK?
Elasticsearch is a RESTful API and clustering software over
Apache Lucene, which is a document database optimised for
search.
Logstash is a data ingestion tool. It transforms and ships data
across networks. Beats are a lighter, less-capable agent for
Elasticsearch.
Kibana is a powerful ad hoc query tool that quickly creates
beautiful graphs.
15. Business Intelligence Tips
1. Work closely with a champion customer
2. Start small, both in scope and audience
3. Reuse the language and labels of your audience
4. Reuse the time periods that are already part of your processes
(i.e. financial quarters)
5. Transform data and index the things that your audience think about,
like sessions, products, and orders, especially if your raw data
doesn’t quite map to them
16. Test Driven Design
1. Use Kinesis Firehose to save all of your production stream to S3,
then apply lifecycle policies
2. At the very beginning, play a static, fake data set. Replay feature of the
Logstash sleep plugin. Do not develop or test with a random generator!
3. Whenever you encounter undesirable behaviour,
add the recording segment to your test suite.
4. Test Elasticsearch with xUnit in your code pipleline
5. Monitor Kibana and Elasticsearch with your APM
17. Elasticsearch tips
1. Predefine your index mapping
2. Only use one type per index (ES6.x removes support for many types
per index)
3. Partition your index by time, typically by day
4. There are no joins, use Lambda to enrich data before loading it into
Elasticsearch
5. Ideally an ES cluster has 3 small masters and < 10 workers,
above 10 nodes scale-up before scale-out
18. AWS ES tips
1. The Elasticsearch port is 80, not 9200
2. Do NOT expose ES or Kibana to the public internet!
3. Start bigger, then shrink (IMHO, seven M4.large is
big)
4. Do not use ES as a data store; use RDS, or
DynamoDB, or Redshift, or S3 with Athena
19. Cloudwatch vs ES
1. Only fixed thresholds for alerts
2. Much easier to use
3. Much less admin
4. Scales elastically
20. Kinesis Analytics vs ES
1. Simpler for detection
2. Elastic scaling
3. No graphs
4. MillisBehindLatest can be minutes!
21. Athena & Quicksight vs ES
1. Massive, admin-free scaling
2. Need to add Lambda, even then runs
periodically not event driven
3. Worse latency
4. Conceivably could be more expensive
(1440 scheduled queries * ?)
22. Photo: Micheal Filion, https://www.flickr.com/photos/mike9alive/
Situational Awareness
100% uptime on the GPS of this car isn’t going to help anything