Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

You cant Test everything, but you should monitor it

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 30 Anzeige

You cant Test everything, but you should monitor it

Herunterladen, um offline zu lesen

We had an incident which did occur in our warehouse at KRUU. The downloading of the photos was very slow from one to the other day - well we thought that the problem started on this day.

Actually we did notice this very late and the problem started two years ago but we did notice this very late due to the reduced rentals because of Covid19.

This will never happen again thanks to our new metrics and alerting powered by the OpenSearch!

We had an incident which did occur in our warehouse at KRUU. The downloading of the photos was very slow from one to the other day - well we thought that the problem started on this day.

Actually we did notice this very late and the problem started two years ago but we did notice this very late due to the reduced rentals because of Covid19.

This will never happen again thanks to our new metrics and alerting powered by the OpenSearch!

Anzeige
Anzeige

Weitere Verwandte Inhalte

Aktuellste (20)

Anzeige

You cant Test everything, but you should monitor it

  1. 1. Hi, I am Michi! Head of Code at @michilehr
  2. 2. YOU CAN’T TEST EVERYTHING BUT YOU SHOULD MONITOR IT! 11. October 2022 @ Nerd-BBQ
  3. 3. "As Europe's leading photo booth provider, we have made it our mission to help our brides and grooms with their complete journey to their dream wedding. This is something we work tirelessly on with our team." What is doing? Philipp Schreiber - Co-Founder, KRUU.com
  4. 4. Photo Booth Cycle
  5. 5. The Incident
  6. 6. Good ~12 MB/s
  7. 7. 0.95 MB/s Bad
  8. 8. 1. When did it start? 2. Why did it happen? 3. How to prevent? 4. How to notice early? Investigate
  9. 9. When did it start? We had data in our Slack Channel, but…
  10. 10. 1. Write a script to extract the data as CSV 2. Import data to MySQL 3. Write query to aggregate by day 4. Create nice graph
  11. 11. Started long time ago…
  12. 12. What happened? Network configuration error
  13. 13. How to prevent? No idea. Things like this happen
  14. 14. How to notice early?
  15. 15. How to notice early?
  16. 16. Alerts!
  17. 17. Query
  18. 18. Trigger
  19. 19. Notification
  20. 20. ● 404 alert by threshold ● Auth failure alert by threshold to detect brute force ● … What next?
  21. 21. Thank you for your time! Questions? Feedback? Notes?

×