2. whoami
● Architecture & Operations Engineer
– SANS Institute 1+ years
● Security Architect / Analyst
– University of Minnesota 10+ years
● Application Developer
– SANS Institute 5+ years, contractor
3. Outline
● Background / Fast Forward
● Data Sources
● Framework Integration
● Dashboard Ideas
● Questions
7. Data Sources
● OS - collectd
– All: CPU, memory, disk & network I/O
– Selected: counts of important processes
● httpd processes on web server
● mysqld threads on DB server
8. Data Sources
● Service – custom scripts / graphite; collectd
– MySQL: thread states, users, query stats
– Apache: log analysis, server-status
– Mail Bounce Processor: queue depth
9. Data Sources
● Framework – integrate statsd client library
– eg. Kohana, Rails, Django, Symfony
– Hook into event, logging systems
– Performance counters:
● page generation time / memory use / cache hit %
– Details per app, controller (warning), function (danger!!)
– Use framework introspection to construct part of metric path
● framework.datacenter.server.application.controller. total_time
● ^---- this part is auto generated -------------------^ . developer-
provided
12. Framework Integration
● Target: make measurements frictionless for developers
– Example frameworks: Kohana, Django, Rails, Symfony
● Look & act like other framework components
– Seamless integration
– Include in “baseline” installation for framework
– Share externally
● POLA
– Principle of Least Astonishment
– Minimize / eliminate the learning curve
16. Framework Integration
● Auto-generate base part of metric name
● Use framework introspection & configuration
– framework.datacenter.server.application.controller. total_time
– ^---- this part is auto generated -------------------^ . developer-provided
– eg. metrics::timing('total_time', $totalTime);
17. Framework Integration
● Starting Point
– Errors: 403, 404, 500
– Execution times: controller & total
– Memory Usage
– Logging events
● Requires no application changes
● Generates useful information
19. Dashboards Ideas
● Focusing on SECURITY mindset
● System & Application Health
– Know your baseline
– vs. 7 days ago – is there a pattern?
– Web server health
● process states; memory & CPU usage
● disk & network I/O
– DB server health
● memory & CPU usage, long queries, I/O
20. Dashboard Ideas
● Find what works for your team
– Mix breadth & depth
● One metric across many systems / services
– eg. memory or CPU usage; web server status
● Many (all) metrics for one system
– eg. page load times, CPU, I/O, db conns, etc.
22. Security Dashboards
2 Classes:
● Application Behaviors
– Custom per application
– Related to application logic, intent
● Errant Behaviors
– More generic
– Can support multiple applications
– Integrate at framework to make them automatic
● Note: intent requires human interpretation, logs
24. Security Dashboards
Application Behavior
● Transaction failures
– CC declined
– Non-existent domain for email address
● Access forbidden
– User trying to access parts of app beyond their
authorization
– Forced browsing vs. exposed link
25. Security Dashboards
Application Behavior
● Trap fields populated
– Unused, empty form field with tempting name
– Not displayed to users
– Will be filled in by automated scanner / spam bot
– eg. “subject”
● CAPTCHA failures
26. Security Dashboards
Errant Behaviors
● Long running SQL Queries
– pages with poorly written queries
– SQLi causing abnormal queries to be executed
– WAITFOR / DELAY / BENCHMARK
● Blind SQLi
● Concept holds for any external data source
– Service / API call; LDAP query; etc.
30. Security Dashboards
Errant Behaviors
● Input Validation Errors
– Application scanners tend to cause sharp rise
– Generate as part of framework integration
– Check for empty inputs too (application dependent)
31. Security Dashboards
Errant Behaviors
● Page Load Times
– Also a Key UX / Performance Indicator
– Back end slowness (DB, internal services)
– Injection attacks (SQLi, command injection)
– Insufficient resources (too many requests to handle)
– Fruitful data to identify measurement gaps
● What is not measured, but impacts page performance?
32. Security Dashboards
Errant Behaviors
● Page Load Times (ctd.)
– What level of detail?
● App / Controller / Method / View / Model
– Scanning activity can cause collection DoS
● Create whisper db file for every new 404 error?
– Aggregation rules can help here
● eg. aggregate all 404 metrics by application
33. Page Load Times
● Slowest 5 applications in one framework
● Based on upper 90th percentile of page generation time
highestMax(groupByNode(framework.datacenter.*.*.*.*.total_execution.upper_90,4,"maxSeries"), 5)
34. Security Dashboards
Errant Behaviors
● Web Server Response Codes
– Per site / application / server
– Group codes into buckets
● 1xx, 2xx, 3xx, 4xx, 5xx
● 0-399, 400+
– Percentage balance should be fairly stable
● eg. small % 4xx; no 5xx
35. Web Server Error Percentages
alias(summarize(sumSeries(apache2.*.*.*.*.status.{4??,5??}.count), '$window', 'sum', false), 'error 4xx 5xx')
alias(summarize(sumSeries(apache2.*.*.*.*.status.{2??,3??}.count), '$window', 'sum', false), 'success 2xx 3xx')
36. Security Dashboards
Errant Behaviors
● Web Server Response Codes
– Typo in link (404)
● eg. bulk mailer auto-corrects part of URL
– Page removed but still referenced (404)
– Scan for known vulnerable software (404)
● eg. /wp-admin
– Injection attacks (500)
37. Summary
● Magnify benefits by minimizing cost to generate / use
metrics
● Establish a baseline
● Pay attention to what's going wrong too
● Measure across full vertical range
– Bits in/out
– Business transactions completed
● Create & instrument misuse detectors
– Trap fields, spider trap URLs
41. Grafana Tips
● Shared Crosshair
– Dashboard Settings > Features > Shared Crosshair
(Ctrl +O)
– Ease time correlation on multi-graph dashboards
● Templating Variables
– Dashboard Settings > Features > Templating
– Set a standard practice for variable names – POLA
– server, site, action, etc.
42. Grafana Tips
● Summarization window
– Templating > Variables > Add > Interval
– Include auto interval = 200
– summarize($window, max, false) in metrics
– Can provide hint to graphite for which rank of data to read
from whisper file
● Tooltip: all series, individual
– Graph > Display Styles
– see all values at point in time