This document provides a technical summary of Dan Kaminsky's keynote presentation. The keynote discusses three main topics:
1) Denial of service attacks and how Overflowd aims to make DDoS attacks less annoying by sharing netflow data between networks.
2) Cryptography and how JFE (Jump to Full Encryption) aims to automate TLS deployment to make encryption easier.
3) Data loss prevention and how Ratelock restricts data loss by enforcing rate limits and other policies at the serverless cloud layer to increase survivability even if complex parts are compromised.
2. I’m Dan Kaminsky
Chief Scientist and Co-Founder
White Ops
• Been fixing things for almost
two decades
• Broke a big thing
• People only remember that
14. JBOS
Just A Bunch of Servers
(“There’s no such thing as a cloud,
It’s just other people’s computers.”)
15. JBOS IS A DIRTY LIE
“There’s no such thing as a skyscraper,
it’s just another pile of rock.”
16. JBOS IS A DIRTY LIE
TOO MANY BELIEVE
LOOKS LIKE REMOTE SERVERS, BUT
17. Clouds have identities that cross
organizational boundaries
Clouds have a neutral arbiter
Servers are sold.
Clouds are operated.
18.
19. • Denial of Service Attacks: DDoS is hard to remediate
• Cryptography: TLS is hard to deploy
• Data Loss Prevention: Attacks are hard to survive
• Code Safety: Not getting owned is hard
SECURITY IS HARD
20. • Denial of Service Attacks: Think globally, act locally
• Cryptography: Servers were hard to deploy too, once
• Data Loss Prevention: The cloud makes compromise
survivable, I built this here with Lambda
• Code Safety: Preventing compromise might not be
impossible after all...I would like to build this here too.
WHY AMAZON
21. MAKE SECURITY EASY: WHAT WE’RE DOING ABOUT IT
• Denial of Service Attacks: DDoS is hard to remediate
Overflowd: Let the victims of network flows, learn from Netflow
• Cryptography: TLS is hard to deploy
JFE: Launch one Daemon, all networking is TLS secured w/ valid cert
• Data Loss Prevention: Attacks are hard to survive
Ratelock: Make the cloud enforce security policies, including hard rate limits
• Code Safety: Not getting owned is hard
Autoclave: Run entire operating systems in tighter sandboxes than Chrome
23. SOMEDAY, SYSTEMS WILL NOT GET HACKED
• That day is not today.
• Mirai vs. Dyn = Parts of the Internet actually went down
• No defense survives that many nodes flooding you
• When things go wrong, what can we do?
• Step 1: Communicate
• Step 0: Figure out who we’re suppose to communicate with
25. Spoofed Traffic
Attackers lie about where they are on the network
This will always be possible
Asymmetrically Routed Traffic
Traceroute just shows how to reach your attacker
It doesn’t show how their traffic is reaching you
These are the problematic packets!
Bad Contact Data
IP address ranges are large, “Autonomous systems”
aren’t, contact data is stale
01
02
03
26. ATTACKS ARE USUALLY REMEDIATED,
BUT IT’S HARD, SLOW,
UNRELIABLE, NOT SCALING
28. THE TWO GREAT HOPES
The Stage Is Set: Attacker networks hit victim networks.
• They’re not directly connected – many parties in the middle.
Hope 1: Everyone monitors their networks
• At least for traffic management and capacity planning
• Generally use Netflow – provides source/dest metrics with light protocol
analysis
Hope 2: Not everyone on the Internet is a jerk
• And even if they are, getting abuse calls is annoying, and the big floods are
bad for business
• Many would act, if the benefit was incremental and the risk was low
33. DEMO
'data': {'bcount': 682512, 'protocol': 6, 'tos': 0, 'etime': 1325314888, 'daddr': '122.166.77.74', 'pcount':
17001…
Whitelisted flow metadata, so recipient can match
'signature': {'key': 'd52b9644ba6ffd2bdaa6505e649fd80ca…
'signature': 'z5yMEHH0pYe++uOiNhWzLkCyXsT…
NaCl Signatures, unchained for now
“Oh, somebody’s spoofing? OK, what signature have I been seeing all year, on other networks”
'metadata': {'info': 'FLOWSEEN', 'class': 'INFORMATIONAL', 'time': 1477778027.138109}}
Could also have MACHINE_SUSPICIOUS, HUMAN_SUSPICIOUS,
HUMAN_CONFIRMED_PLEASE_CONTACT, etc
‘contact’: {‘email’: ‘dan@whiteops.com’}
34. HOW DO WE REPORT?
65535/udp
• Theend
• Doesn’t require acknowledgement, does need fragmentation
ICMP
• Would follow packets further along route, maybe
• Might get dropped earlier too
HTTP/HTTPS
• Many networks have an easier time picking up .well-known web paths
• Can’t just be passively received
TODO
35. EXPLICIT PLAN
We have no idea how precisely this data would be, or should be consumed
• We do know we don’t want to share more much more data than legitimate
person should already know
• Not sending raw netflow, not sending at high rates
• May send faster on known badness – badness and packet count are not
equal!
We think interesting and useful things would be built in the presence over
overflowd
36. AMAZON TAKEAWAY #1:
FLOODS ANNOY YOU TOO
NETFLOW SHARING COULD MAKE
THEM LESS ANNOYING
(OH HAI AMAZON SHIELD)
45. REALITY (WHEN INDEPENDENT SOFTWARE WAS
WRITTEN FOR ISOLATED SERVERS)
• TLS required certificate authorities
• Certificate authorities required bizdudes
• Software vendors couldn’t automate bizdudes
• Software vendors couldn’t automate TLS
• Software vendors could and did automate listening on standard ports
• Just not with security
• The TLS mess chains back to the devops non-viability of automatically
acquiring certificates
46. WE LIVE IN THE (NEAR) FUTURE
Let’s Encrypt
• Free Certificate Authority
• Allows Automatic Certificate Provisioning using open ACME
protocol
Services can in fact autoprovision certificates now!
• Caddy
• HAProxy
• Nginx
47. SHOULD THEY BE USING AWS CERTIFICATES?
(Spoiler alert: Yes.)
54. ONE SERVICE IS LAUNCHED.
ALL SERVICES SUPPORT TLS.
ALL OF THE CRYPTO
NONE OF THE DRAMA
55. HOW THIS IS WORKING NOW
• Grab all traffic from port 23 through 65K, send it to port 1
• Allow listener on Port 1, to received traffic from other IPs and
Ports
• Sniff the first 128 bytes on the socket, without actually
“draining” from it
• In TLS, client speaks first. If demands crypto, can provide.
• Do things (like get a new cert) during initial handshaking
• Get cert from Let’s Encrypt (with a little help)
• Mechanisms: iptables TPROXY, setsockopt IP_TRANSPARENT, MSG_PEEK,
set_servername_callback in Python SSL, certbotClient.issue_certificate in
free_tls_certificates
56. OK, LINUX MAKES THIS A LOT OF DRAMA
LINUX DOES NOT LIKE INTERCEPTING SOCKETS
JFE DON’T CARE
57.
58. PROBLEMS WITH JFE
• Low Performance
• Very few languages support all the operational dependencies
(setsockopt and MSG_PEEK and cert acquisition and in-handshake
replacement)
• Only Python did, and only in a particularly slow threading mode
• Localhost
• Connections appear to come from localhost (not great)
• Connections are routed to localhost (actually bad, things that TCP
bind to 127.0.0.1 are exposed to the Internet)
• Security blocking!
59. FIXING JFE WITH KERNEL SURGERY
• IPTables TPROXY is janky and clearly nobody else has fixed this either
• Squid, HAProxy, various SSL MITM attack tools (lol) all get stuck here, try to
just be an intercepting proxy to another host downwire
• NFTables clearly the approach to take
• New firewalling subsystem in Linux
• Could gate packet redirection with IP Address Aliases (eth0:1)
• Could gate packet redirection with cgroups (as per containers)
62. HOW ELSE COULD JFE WORK
• Docker Containers
• Theoretically have Network Plugins
• “VPN”/VPC modes could intercept and upgrade
• They’re already doing crazy kernel surgery
• With mixed results
• (ECS)
• Virtual Machines
• Already intercepting packets (or in a position to choose to)
• Encryption/Decryption breaks zero copy by definition
63. HOW ELSE COULD JFE WORK
(Amazon Edition)
• EC2 Hypervisor
• We know it’s QEMU-XEN
• Has keys at 169.254.169.254
• MAGIC REST ENDPOINT WITH GREAT THINGS NOBODY KNOWS ABOUT
• It can sign things
• It can’t leak keys
• JFE has a real problem with knowing which domains to request certs for
• Zero config == attacker tells you what to request == “please give me cert for
google.com”
• Wouldn’t matter, but rate limits at LE are harsh and non-negotiable
• It’s much nicer to be able to pay someone for service
64. YOU KNOW MY DOMAINS. WE USE ROUTE 53.
YOU CAN SECURE MY DOMAINS. YOU HAVE A CA.
YOU SEE MY PACKETS. YOU CAN FIX THEM.
ALL THE CRYPTO
EVEN LESS DRAMA
though I said it was none
65. WE DON’T EVEN NEED TO OVERLOAD THE
HYPERVISOR (WE MIGHT WANT TO)
66. SOME NOTES
• With ELB, server wouldn’t be able to easily differentiate encrypted
from unencrypted link
• Can’t opportunistically secure clients like server
• Attacker: “Aw shucks, that TCP endpoint doesn’t support TLS. Better go
plaintext”.
• Could require TLS for all outbound connections, though.
• Not constrained to TCP – DTLS exists
• Don’t need Hypervisor/ELB for Route 53 integration
• Upcoming release of JFE will get zones via libcloud (still config )
• This is the path for DNSSEC/DANE
• The hard part is pushing key material back into DNS
• Only hard in JBoS, much easier in an integrated cloud
67. USEFUL TO WRAP TLS WITH TLS
ALWAYS SCORE PERFECT, RDP WOULD FINALLY WORK
70. RISK MANAGEMENT IS NOT ALL OR NOTHING
• There’s $20 in the Gas Station Cash Register
• Not all corporate payroll for the month of July
• But we assume if they can get any of our data,
they probably got all of our data
• Why?
72. OUR DESIGNS ARE OFTEN
“ALL OR NOTHING” AFFAIRS
• Classical JBOS (Just a Bunch Of Servers) design
• Shared credentials
• Complex services
• Full mutual trust – root on one is root on all
• Rate limits for a database would be useless in the event of a hack
• If you can steal some data…
• …you can disable the rate limits…
• …and steal all the data.
• This is why you’re supposed to salt and stretch stored password hashes
• “After your data is lost, make it hard for an attacker to convert it back to
passwords”
77. AWS IS
NOT JBOS.
Somebody else’s problem Somebody else’s problem Somebody else’s
problem Somebody else’s problem Somebody else’s problem Somebody
else’s problem Somebody else’s problem Somebody else’s problem
Somebody else’s problem Somebody else’s problem Somebody else’s
problem Somebody else’s problem
It provides services
with authenticated
semantics.
78. HOW RATELOCK WORKS
1) Proxy access to data via Lambda function
2) Store data (possibly encrypted) in DynamoDB
3) Provide client enough rights to access function but not
enough to modify or bypass
4) Implement arbitrary policy in Lambda, isolated by Amazon
81. ./ratelock.py add foo bar
true
(Password stored in DynamoDB, proxied through Lambda)
82. ./ratelock.py check foo bar
true
./ratelock.py check foo wrong
false
• Both checks against DynamoDB, proxied.
• Lambda “invoke” right against function “ratelock” only thing required.
83. # while [ 1 ];
do ./ratelock.py check
foo bar;
sleep 0.25; done
true ... true ... true ... true ... false…
false ... false
• The proxy starts providing false errors. The caller doesn’t have the ability
to directly bypass the proxy.
• (Yes, vulnerable to timing – can differentiate fake from real false).
• The complex server can get completely compromised. The simple policy
survives.
86. HERE’S A STRING AMAZON WILL VERIFY,
BUT NEVER LEAK, EVEN TO YOU. USEFUL
87. $ ./walliam.py add
demouser 1234567
$ cat authdb.json
{"demouser":"BvL40myloWAo39h
bIpRpKOy4Skdtswcaa7WJUzWf"}
We actually create an IAM user “demouser” under a special path. We
just create the user, we don’t grant privileges. But we do get a secret
key…which that isn’t.
88. add_user
aes = (CTR, sha256(userpw))
raw = b64decode(aws_secret)
enc = aes.encrypt(raw)
saved_pw = b64encode(enc)
The secret key is first base64 decoded, and then encrypted
with the user’s password. We save that. Why decode?
89. check_user
enc = b64decode(saved_pw)
aes = (CTR, sha256(userpw))
raw = aes.decrypt(enc)
aws_secret = b64encode(raw)
To invert the process, we decrypt the saved value with what is
supposed to be the user’s password, and base64 encode.
90. aws_secret can’t be checked offline.
They have to ask IAM. Online.
GOOD LUCK DOING THAT 100M TIMES.
91. If there’s one thing you
(Amazon) are going to
keep online, it’s IAM.
92. If we didn’t b64decode the Secret
Key, there’d be a simple offline
attack – post-decrypt, is it Base64?
This is why we aren’t using PyNaCl – we need
encryption without integrity, for maybe the first time ever!
93. SOME NOTES
• One of the largest e-commerce sites in the world provided required rates for their
password server
• 7/sec
• Yahoo 500M / 7 per sec = 2.26 years
• Who are we building instadump for, anyway?
• Backups can go to an asymmetric key – encrypt online, decrypt offline
• Not just for passwords, this can rate limit any sort of data loss
• Working on this
• Not just for rate loss, can apply any policy
• Notification, delay, extra approvals
• What else can we factor out to the cloud functions?
• OpenSSL Engine?
94. Many server breaches.
No known Lambda breaches.
No known IAM breaches.
Nice table, is it…actuarial?
95. JBOS IS A DIRTY LIE
(told ya so)
This would be painfully obvious
if we were developing actuarial tables.
The Great Hope of Cyberinsurance is that somebody will.
97. I CANNOT LOSE WHAT I DO NOT HAVE:
LET ME STRIP AT LEAST ALL ONLINE ACCESS
(LIKE GOOGLE CLOUD)
98. LAMBDA HAS ITS OWN RATE LIMITS
I might just want Lambda’s isolation,
and provision servers as in ECS
not hope for the best as in ELB (or IAM?)
99. HEALTH CARE
Lambda isn’t part of Amazon’s HIPAA portfolio
Ratelock’s strongest support has come from
organizations preferring to lose
40 medical records, not 4 million
Partnered with medal.com (also on AWS) to
develop with dedicated resources
100. IF WE CAN TRUST THE CLOUD
WE SHOULD USE MORE OF IT
IF WE CAN’T TRUST THE CLOUD
IF LAMBDA MAKES SACRIFICES
FOR MILLISECONDS
CAN WE FIX THAT
106. Have you ever tried to find
documentation on sandboxing.
Chrome Source Code
doesn’t count.
#DocBounty
107. WHAT ARE WE TRYING TO
GET FROM A SANDBOX?
A safe place to play,
that starts out clean,
and ends up thrown away.
108. WHAT ARE WE TRYING TO
GET FROM A SANDBOX?
Well defined interfaces.
Known good state.
109. WHAT’S WRONG WITH EC2 THEN?
We still need performance.
60,000-180,000ms to reset to
Known Good State.
(And there’s a lot you can’t do in Lambda.)
(I tried.)
(“I spent a month there one weekend.”)
118. ALL OF CHROME, DOCKER, LINUX, JAVA…
13 SYSCALLS.
• futex ioctl ppoll read recvfrom recvmsg sendto write rt_sigaction
rt_sigreturn readv writev close
• (Yes, shared memory maps and open files are minimal as well.)
• It is much easier to secure 13 syscalls than 98. In fact…
119. ACTUALLY, IT LOOKS LIKE THIS.
(PLUS A BIT OF GOOP TO FURTHER
LOCKDOWN IOCTL.)
IT COULD PROBABLY BE SMALLER.
125. IF YOU’D LIKE TO TRY TO BREAK OUT, HERE’S
HYPERVISOR ROOT (CTRL-F2)
126. WHO WANTS TO HAVE A
PDF PARSING PARTY!
(They’re even more fun than
crypto parties)
127. HOW IS THIS SECURE
HOW IS THIS FAST???
I’m glad you asked!
128. WHAT’S GOING ON?
• VMs have always required less of the host than containers
• Easier to secure kernel-to-kernel than userspace-to-kernel
• VMs require many more syscalls to start up, than to
continue running
• Syscall firewall is thus delayed as long as possible – until
VNC/network/explicit post-boot activation
• Probably the one significant security contribution here
• VMs can be restored from memory, I mean, they actually
can
• Linux does not really allow process freeze/restore
• CRIU tries. Oh, does it try.
• Hibernation does not work on EC2, at any speed
129. BYPASS-SHARED-MEMORY
• Patch from hyper.sh crew
• I was trying to do this myself, but they actually manage a qemu
fork
• When restoring from memory, the big part is system memory. It’s read()
in during restore, not fast
• Better method: Generate memory image incrementally with
mmap/MAP_SHARED, execute new restorations with
mmap/MAP_PRIVATE
• Means 100 instances share the “template state” via Copy on Write
• It’s fine, we block madvise
• (Well, now we do)
• Restores move from 5s to <250ms
130. I CAN RENT A MACHINE WITH
1TB RAM
COMPUTERS ARE DIFFERENT
NOW
131. NO AUTOCLAVE ON AWS
QEMU software emulation doesn’t
count
No nested virtualization on AWS
No bare metal cloud on AWS
132. WHY NO BARE METAL CLOUD?
10,000 PARTS FLYING IN CLOSE FORMATION
133. WHY NO NESTED VIRTUALIZATION?
• Traditionally pretty slow, even with hardware acceleration (EPT)
• Disney-fication (n): To make a fragmented memory space appear
contiguous for purposes of a guest operating system
• Allows higher densities
• Kills perf (or at least, appears to in $UNNAMED_OTHER_VENDOR)
134. APPROACHES BEING EXPLORED
• User-Mode Linux
• It’s still around, and still works
• Not entirely sure I need Windows support, don’t entirely love KVM SMP
• Works with Ptrace – basically, you’re running an internal kernel inside a
debugger that makes it compatible with a real kernel
• Ptrace is slow
• SECCOMP is not
• We could potentially implement the Ptrace jump in a SECCOMP action
• Fast Nested Virt
• Maybe I can guarantee contiguous memory with a fixed offset
• Maybe I can have my guest VMs share 64 bit address space, and EPT
is only used to guarantee page faults when guests try to muck with
eachother
135. FORKALL
• Just how fast can this be?
• Right now – subsecond to spin
up a new VM
• But still doing redundant QEMU
init
• Would fork() but QEMU has threads
and fork() doesn’t actually clone
thread structure
• So we’ll add a syscall or a process
attribute…
• Already faster than container init
in many cases
• Yes. That’s a surgeon with a
fork.
138. MAYBE WE DON’T NEED UNIKERNELS
TO GIVE EVERY INCOMING
CONNECTION A COMPLETELY
FRESH/EPHEMERAL VM
• We like to cheat
• We like we like to cheat
139. SECURITY GETS A SYSCALL FIREWALL.
PERFORMANCE GETS INSTANT BOOT.
DEVELOPERS GET FREE REIGN AS ROOT.
THIS IS NOT A ZERO SUM GAME!
Developer Ergonomics is the best phrase.
140. LET’S MAKE SECURITY EASY
• Finding an abuse contact was hard. Now you just look for the
tracers amongst the noise. Easy.
• TLS was hard. Now you run a daemon, and it’s just there.
Easy.
• Surviving a breach was hard. Now you design your systems to
lose an amount you can live with. Easy.
• Running dangerous code was…ok, it was always easy. But now
not getting infected by that code is also easy.
141. #MAKESECURITYEASY
NOT JUST A HASHTAG. WE CAN DO THIS.
• HALP
• I can’t write it all!
• https://github.com/dakami
• https://labs.whiteops.com