HighLoad++ 2017
Зал «Мумбай», 7 ноября, 16:00
Тезисы:
http://www.highload.ru/2017/abstracts/3065.html
Brute-force attacks against web based applications are on the raise.
You will be presented with an architecture built on top of ELK (https://www.elastic.co/products) and consul (https://www.consul.io/) that is capable of reliably detecting, analysing and mitigating large scale brute-force attacks against Wordpress, Drupal, Magento and Joomla based web sites in near real time.
...
Learn the concepts of Thermodynamics on Magic Marks
Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (SiteGround)
1. Protecting the Web
at a scale using
consul and ELK
vaLentin chernoZemski • 07.Nov.2017
SiteGround • https://siteground.com
2. Who am I:
Валентин Черноземски / 33 / Bulgaria
SiteGround … 12 years
What I do
Computers for food and fun since I was 15
What I like
Science, art, history, science, *ing things
Dislike problems at work … :)
3. A problem of a hosting company
We host sites, sites get hacked
• brute-force attacks
• sql injection attacks
• command execution
• remote file disclosure
• unauthorized file upload
• … you name it
Malicious activity
• Spam
• Scam
• Phishing
• Malware
• Global sadness
• No silver bullet
4. Scale of the problem
● Managing more than 12 000 servers
● Hosting more than 1 000 000 web sites
● 60% WordPress, Drupal, Joomla!, Magento
● More than 120 000 000 malicious requests/day
● 10 000 malicious hits per server/day
5. What we needed?
● Reliably detect ongoing attacks
● Block them before they succeed or reach other sites
● Avoid false positives as much as possible
● Make customers happy again
6. System requirements
● Scalable handle number of events
● Available tolerate failures
● Easy to maintain managed as code (git)
● Easy to extend plug & play
● Data center aware “web scale”
7. Layers TL;DR;
● Attack detection
● Data collection
● Data analyses
● Data distribution
● Data visualisation
11. Data collection layer
● filebeat ship
● logstash enrich
● elasticsearch store
● kibana visualise
12. Filebeat
● Read lines
● Send them over the wire
● As JSON
● In “real time” (see GOTCHA)
● Payload is encrypted
● Payload is signed and authenticated
● Fault tolerant with exponential backoff on failure
https://www.elastic.co/products/filebeat
13. Logstash
● Read events over the wire
● Real time
● Authenticate them
● Decrypt them
● Normalize and enrich them with grok filters
● Send them to backend for storage
● Support multiple storage backends
● Fault tolerant in case of storage backend outage
https://www.elastic.co/products/logstash
14. Elasticsearch
● Search engine
● Store and index arbitrary “documents”
● Scalable
● Highly available
● Fault tolerant by design
● Query large data sets
● Allows us to get meaning from vast amount of data
https://www.elastic.co/products/elasticsearch
15. Kibana
● Picture worth a thousand words
● Visualise data stored in elasticsearch
● Visualisers
● Dashboards
● Time series databases
● Graph explorer
https://www.elastic.co/products/kibana
19. Block list generation
● Uses data stored in elasticsearch as input
● Custom rules
● Produces ip and network blocklists as output
● A lot of room for improvements
21. Data distribution layer
● consul central kv store and services catalog
● consul-replicate cross dc, consul data distribution
● consul-template handle dynamic configurations
22. consul
● Simplify service discovery and service configuration
● Join all machines in a cluster
● Failure detection on top of GOSSIP
● Built in KV store master -> slave
● Distributed locking in KV
● Event message distribution on top of GOSSIP
● Easy to query DNS, HTTP, CLI
● Concept to use your DC as a database
https://www.consul.io/
23. consul security
● Clear threat model
● Encrypted communication (pre shared key)
● Authenticated membership (signed SSL certificate)
● Per client token(s)
● Per token ACL roles
● Per endpoint ACL roles
● Does not require extra privileges
https://www.consul.io/docs/internals/security.html
24. consul scalability
● Highly available multiple servers, raft, built in failover
● Highly scalable consistent or stale responses
● Data center aware members, states
● Services aware registered services, service states
● Low net overhead GOSSIP
https://www.consul.io/docs/guides/index.html
25. consul watches
● Key watches /role/nginx/reload
● Key prefix watches /role/ipset/blocked
● Service(s) watches “logstash service registered on node X”
● Checks watches “elasticsearch failing on node Y”
● Nodes watches “example.com joined cluster”
● Event watches “eventname” {payload < 100 bytes}
https://www.consul.io/docs/agent/watches.html
26. consul in production?
● Cluster size 1900 and growing
● DCs 6 and growing
● Easy to integrate via common tools, HTTP, CLI
● It complements ansible, puppet, etc.
● Awesome docs
● Community driven
https://www.consul.io/intro/vs/index.html
27. consul-replicate
● Replicate specific KV prefixes
● Skip prefixes that are not important
● As easy as go binary
This is your master dc | This KV is important | Sync it
● Running on each consul server node
● Only one instance active at a time - HA with “consul lock”
https://github.com/hashicorp/consul-replicate/
28. consul-template
● read / monitor arbitrary consul data (from KV, service catalog, etc.)
● parse Go template format
{{range service "logstash"}}
hosts: “{{.Address}}”
{{end}}
● output is rendered on disk i.e. /etc/filebeat/filebeat.yml
● execute post render handler i.e. service filebeat reload
https://github.com/hashicorp/consul-template
29. consul-template + service discovery
● /etc/filebeat/filebeat.yml logstash nodes
● /etc/logstash/output.conf elasticsearch nodes
● /etc/nginx/captcha.conf captcha nodes
● node with service/role X added/removed/failed
● Render configs & reload services on state change
https://github.com/hashicorp/consul-template
34. Deployment results
● More than 12 000 servers protected
● More than 1 000 000 web sites protected
● Global attack detection latency ~5 seconds
● Global blocklist distribution 60 seconds (by design)
35. Webapps bruteforce filtering rate
● WordPress, Drupal, Joomla etc. bruteforce attacks rendered useless
Filtered malicious http(s) requests
● 120 millions / 24 hours
● 800 millions / 7 days
● 3 billions / 30 days
36.
37. Services bruteforce filtering rate
● ftp, smtp, imap etc. bruteforce attacks rendered useless
Failed logins rate
● ~52 failed logins / second (on the whole infrastructure)
● ~8 failed logins / second / per service
● ~0.00066 failed logins / second / service / per server
● ~0.00000874 failed logins / second / service / per domain
38.
39. So what is so cool?
● Scalable
● Available
● Fault tolerant
● Self assembled - built in service discovery
● Event based
● Easy to manage
● Easy to extend
41. TODO
● Make system real time
● filebeat tuning
● more logstash ingest nodes
● more elasticsearch index nodes for faster indexing
● Plug more sensors and co-relaltions
● Block attackers earlier
● Blocklist sharing and automated abuse reports to ISPs
42. Gotchas
● consul ARP traffic - https://www.youtube.com/watch?v=LUgE-sM5L4A
● filebeat realtime log shipping configuration is not that straightforward
● elasticsearch is feeling much better on top of SSDs … for obvious reasons
● Machine with role X export set of services [Y,Z]
● services later can be references by consul-template.
● consul GOSSIP events payload size is limited (< 100 bytes)
43. Why don’t you?
● Problem can be solved in many ways
● No solution solves them all
● System fits our stack and requirements
● Security is a process, systems are adaptive and evolve
● Qrator, Cloudflare, Sucuri etc. … our system does not replace but rather
complement