Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Wo defensive trickery_13mar2017

433 Aufrufe

Veröffentlicht am

w

Veröffentlicht in: Ingenieurwesen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Wo defensive trickery_13mar2017

  1. 1. DEFENSIVE TRICKERY <iframe src=“Dan Kaminsky”> A TECHNICAL DIVE INTO
  2. 2. I’m Dan Kaminsky Chief Scientist and Co-Founder White Ops • Been fixing things for almost two decades • Broke a big thing • People only remember that
  3. 3. MISSION OF THIS TALK
  4. 4. DEATH TO NIHILISM WITH THE HEALING POWER OF SURPRISING DATA 
  5. 5. You may think things are IMPOSSIBLE
  6. 6. You may think some very specific things are IMPOSSIBLE
  7. 7. I want to challenge your ASSUMPTIONS
  8. 8. THIS KEYNOTE IS DIFFERENT NOT GENERIC THIS IS FOR YOU, $CLOUDVENDOR
  9. 9. No, seriously Amazon. You build the cloud. You build yourselves on your cloud. My company built itself on your cloud.
  10. 10. YES, YOU You build AWS. You write code running on AWS. You guide products using AWS. You document AWS
  11. 11. DOC BOUNTIES Like bug bounties, only for documentation. Including documentation on how not to write bugs.
  12. 12. THE CRITICAL LESSON If you remember one thing, remember this
  13. 13. JBOS Just A Bunch of Servers (“There’s no such thing as a cloud, It’s just other people’s computers.”)
  14. 14. JBOS IS A DIRTY LIE “There’s no such thing as a skyscraper, it’s just another pile of rock.”
  15. 15. JBOS IS A DIRTY LIE TOO MANY BELIEVE LOOKS LIKE REMOTE SERVERS, BUT
  16. 16. Clouds have identities that cross organizational boundaries Clouds have a neutral arbiter Servers are sold. Clouds are operated.
  17. 17. • Denial of Service Attacks: DDoS is hard to remediate • Cryptography: TLS is hard to deploy • Data Loss Prevention: Attacks are hard to survive • Code Safety: Not getting owned is hard SECURITY IS HARD
  18. 18. • Denial of Service Attacks: Think globally, act locally • Cryptography: Servers were hard to deploy too, once • Data Loss Prevention: The cloud makes compromise survivable, I built this here with Lambda • Code Safety: Preventing compromise might not be impossible after all...I would like to build this here too. WHY AMAZON
  19. 19. MAKE SECURITY EASY: WHAT WE’RE DOING ABOUT IT • Denial of Service Attacks: DDoS is hard to remediate Overflowd: Let the victims of network flows, learn from Netflow • Cryptography: TLS is hard to deploy JFE: Launch one Daemon, all networking is TLS secured w/ valid cert • Data Loss Prevention: Attacks are hard to survive Ratelock: Make the cloud enforce security policies, including hard rate limits • Code Safety: Not getting owned is hard Autoclave: Run entire operating systems in tighter sandboxes than Chrome
  20. 20. DENIAL OF SERVICE ATTACKS DDOS IS HARD TO REMEDIATE
  21. 21. SOMEDAY, SYSTEMS WILL NOT GET HACKED • That day is not today. • Mirai vs. Dyn = Parts of the Internet actually went down • No defense survives that many nodes flooding you • When things go wrong, what can we do? • Step 1: Communicate • Step 0: Figure out who we’re suppose to communicate with
  22. 22. (Besides being called monkeys) THE NOCMONKEY CURSE
  23. 23. Spoofed Traffic Attackers lie about where they are on the network This will always be possible Asymmetrically Routed Traffic Traceroute just shows how to reach your attacker It doesn’t show how their traffic is reaching you These are the problematic packets! Bad Contact Data IP address ranges are large, “Autonomous systems” aren’t, contact data is stale 01 02 03
  24. 24. ATTACKS ARE USUALLY REMEDIATED, BUT IT’S HARD, SLOW, UNRELIABLE, NOT SCALING
  25. 25. LITERALLY THE OPPOSITE OF WHAT THE NET IS SUPPOSED TO BE CAN WE DO BETTER?
  26. 26. THE TWO GREAT HOPES The Stage Is Set: Attacker networks hit victim networks. • They’re not directly connected – many parties in the middle. Hope 1: Everyone monitors their networks • At least for traffic management and capacity planning • Generally use Netflow – provides source/dest metrics with light protocol analysis Hope 2: Not everyone on the Internet is a jerk • And even if they are, getting abuse calls is annoying, and the big floods are bad for business • Many would act, if the benefit was incremental and the risk was low
  27. 27. NETFLOW USUALLY JUST GOES TO A NETWORK’S OWN OPERATORS, AND MASS AGGREGATORS.
  28. 28. MAYBE JUST A LITTLE SHOULD FLOW TO THE NETWORKS BEING AFFECTED.
  29. 29. IF ATTACKING NETWORKS ALREADY “KNEW”, WHY DO WE HAVE TO CALL THEM?
  30. 30. OVERFLOWD: Stochastic Traffic Factoring Utility 1/1M packets cause anti-abuse metadata to be sent to source and dest, by Netflow infrastructure.
  31. 31. DEMO 'data': {'bcount': 682512, 'protocol': 6, 'tos': 0, 'etime': 1325314888, 'daddr': '122.166.77.74', 'pcount': 17001… Whitelisted flow metadata, so recipient can match 'signature': {'key': 'd52b9644ba6ffd2bdaa6505e649fd80ca… 'signature': 'z5yMEHH0pYe++uOiNhWzLkCyXsT… NaCl Signatures, unchained for now “Oh, somebody’s spoofing? OK, what signature have I been seeing all year, on other networks” 'metadata': {'info': 'FLOWSEEN', 'class': 'INFORMATIONAL', 'time': 1477778027.138109}} Could also have MACHINE_SUSPICIOUS, HUMAN_SUSPICIOUS, HUMAN_CONFIRMED_PLEASE_CONTACT, etc ‘contact’: {‘email’: ‘dan@whiteops.com’}
  32. 32. HOW DO WE REPORT? 65535/udp • Theend • Doesn’t require acknowledgement, does need fragmentation ICMP • Would follow packets further along route, maybe • Might get dropped earlier too HTTP/HTTPS • Many networks have an easier time picking up .well-known web paths • Can’t just be passively received TODO
  33. 33. EXPLICIT PLAN We have no idea how precisely this data would be, or should be consumed • We do know we don’t want to share more much more data than legitimate person should already know • Not sending raw netflow, not sending at high rates • May send faster on known badness – badness and packet count are not equal! We think interesting and useful things would be built in the presence over overflowd
  34. 34. AMAZON TAKEAWAY #1: FLOODS ANNOY YOU TOO NETFLOW SHARING COULD MAKE THEM LESS ANNOYING (OH HAI AMAZON SHIELD)
  35. 35. AMAZON TAKEAWAY #2: INTERNET ARCHITECTURE IS NOT SET IN STONE WE CAN BIAS IT TOWARDS BEING EASIER TO MANAGE
  36. 36. BETTER IS NOT ENOUGH. BETTER DOESN’T EVEN MEAN ANYTHING. SECURITY NEEDS TO BE CHEAPER OPS-SECONDS MATTER
  37. 37. CRYPTOGRAPHY TLS IS HARD TO DEPLOY
  38. 38. CRYPTO IS HARD.
  39. 39. THAT’S JUST ONE SERVICE. HERE’S MORE.
  40. 40. HAS ANYONE EVER NOT SEEN THIS?
  41. 41. WELL, AT LEAST NOBODY’S JUDGING YOU FOR A NOT ENTIRELY PERFECT TLS SUITE…
  42. 42. THOSE ARE SECURE CONFIGURATIONS. HERE’S THE INSECURE ONE.
  43. 43. REALITY (WHEN INDEPENDENT SOFTWARE WAS WRITTEN FOR ISOLATED SERVERS) • TLS required certificate authorities • Certificate authorities required bizdudes • Software vendors couldn’t automate bizdudes • Software vendors couldn’t automate TLS • Software vendors could and did automate listening on standard ports • Just not with security • The TLS mess chains back to the devops non-viability of automatically acquiring certificates
  44. 44. WE LIVE IN THE (NEAR) FUTURE Let’s Encrypt • Free Certificate Authority • Allows Automatic Certificate Provisioning using open ACME protocol Services can in fact autoprovision certificates now! • Caddy • HAProxy • Nginx
  45. 45. SHOULD THEY BE USING AWS CERTIFICATES? (Spoiler alert: Yes.)
  46. 46. APPS STILL NEED TO BE PORTED DEV-SECONDS MATTER TOO
  47. 47. JFE: Jump to Full Encryption
  48. 48. # ./jfe -D https://github.com/dakami/jfe Step 1: Start JFE
  49. 49. # curl http://163.jfe.example hello worl Step 2: Access basic webserver
  50. 50. # curl https://163.jfe.example hello worl Step 3: Access webserver w/ TLS
  51. 51. # curl https://163.jfe.example:40080 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> Step 4: Access anything w/ TLS
  52. 52. ONE SERVICE IS LAUNCHED. ALL SERVICES SUPPORT TLS. ALL OF THE CRYPTO NONE OF THE DRAMA
  53. 53. HOW THIS IS WORKING NOW • Grab all traffic from port 23 through 65K, send it to port 1 • Allow listener on Port 1, to received traffic from other IPs and Ports • Sniff the first 128 bytes on the socket, without actually “draining” from it • In TLS, client speaks first. If demands crypto, can provide. • Do things (like get a new cert) during initial handshaking • Get cert from Let’s Encrypt (with a little help) • Mechanisms: iptables TPROXY, setsockopt IP_TRANSPARENT, MSG_PEEK, set_servername_callback in Python SSL, certbotClient.issue_certificate in free_tls_certificates
  54. 54. OK, LINUX MAKES THIS A LOT OF DRAMA LINUX DOES NOT LIKE INTERCEPTING SOCKETS JFE DON’T CARE
  55. 55. PROBLEMS WITH JFE • Low Performance • Very few languages support all the operational dependencies (setsockopt and MSG_PEEK and cert acquisition and in-handshake replacement) • Only Python did, and only in a particularly slow threading mode • Localhost • Connections appear to come from localhost (not great) • Connections are routed to localhost (actually bad, things that TCP bind to 127.0.0.1 are exposed to the Internet) • Security blocking!
  56. 56. FIXING JFE WITH KERNEL SURGERY • IPTables TPROXY is janky and clearly nobody else has fixed this either • Squid, HAProxy, various SSL MITM attack tools (lol) all get stuck here, try to just be an intercepting proxy to another host downwire • NFTables clearly the approach to take • New firewalling subsystem in Linux • Could gate packet redirection with IP Address Aliases (eth0:1) • Could gate packet redirection with cgroups (as per containers)
  57. 57. COMPUTER SCIENCE: BULLDOZING YOUR PROBLEMS ELSEWHERE
  58. 58. WE’RE GOING TO NEED A BIGGER BULLDOZER
  59. 59. HOW ELSE COULD JFE WORK • Docker Containers • Theoretically have Network Plugins • “VPN”/VPC modes could intercept and upgrade • They’re already doing crazy kernel surgery • With mixed results • (ECS) • Virtual Machines • Already intercepting packets (or in a position to choose to) • Encryption/Decryption breaks zero copy by definition
  60. 60. HOW ELSE COULD JFE WORK (Amazon Edition) • EC2 Hypervisor • We know it’s QEMU-XEN • Has keys at 169.254.169.254 • MAGIC REST ENDPOINT WITH GREAT THINGS NOBODY KNOWS ABOUT • It can sign things • It can’t leak keys • JFE has a real problem with knowing which domains to request certs for • Zero config == attacker tells you what to request == “please give me cert for google.com” • Wouldn’t matter, but rate limits at LE are harsh and non-negotiable • It’s much nicer to be able to pay someone for service
  61. 61. YOU KNOW MY DOMAINS. WE USE ROUTE 53. YOU CAN SECURE MY DOMAINS. YOU HAVE A CA. YOU SEE MY PACKETS. YOU CAN FIX THEM. ALL THE CRYPTO EVEN LESS DRAMA though I said it was none
  62. 62. WE DON’T EVEN NEED TO OVERLOAD THE HYPERVISOR (WE MIGHT WANT TO)
  63. 63. SOME NOTES • With ELB, server wouldn’t be able to easily differentiate encrypted from unencrypted link • Can’t opportunistically secure clients like server • Attacker: “Aw shucks, that TCP endpoint doesn’t support TLS. Better go plaintext”. • Could require TLS for all outbound connections, though. • Not constrained to TCP – DTLS exists • Don’t need Hypervisor/ELB for Route 53 integration • Upcoming release of JFE will get zones via libcloud (still config ) • This is the path for DNSSEC/DANE • The hard part is pushing key material back into DNS • Only hard in JBoS, much easier in an integrated cloud
  64. 64. USEFUL TO WRAP TLS WITH TLS ALWAYS SCORE PERFECT, RDP WOULD FINALLY WORK
  65. 65. TCP IS NOT HARD TO DEPLOY. WHY SHOULD TLS BE?
  66. 66. DATA LOSS PREVENTION ATTACKS ARE HARD TO SURVIVE
  67. 67. RISK MANAGEMENT IS NOT ALL OR NOTHING • There’s $20 in the Gas Station Cash Register • Not all corporate payroll for the month of July • But we assume if they can get any of our data, they probably got all of our data • Why?
  68. 68. THEY PROBABLY GOT ALL OF OUR DATA
  69. 69. OUR DESIGNS ARE OFTEN “ALL OR NOTHING” AFFAIRS • Classical JBOS (Just a Bunch Of Servers) design • Shared credentials • Complex services • Full mutual trust – root on one is root on all • Rate limits for a database would be useless in the event of a hack • If you can steal some data… • …you can disable the rate limits… • …and steal all the data. • This is why you’re supposed to salt and stretch stored password hashes • “After your data is lost, make it hard for an attacker to convert it back to passwords”
  70. 70. WHAT IS THIS “AFTER”?
  71. 71. SURVIVABILITY > NIHILISM SPLIT COMPLEX PARTS YOU CAN LOSE SIMPLE PARTS YOU CAN SAVE
  72. 72. RATELOCK: https://github.com/dakami/ratelock Restricting Data Loss with Serverless Cloud Enforcement
  73. 73. AWS IS NOT JBOS. Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem It provides services with authenticated semantics.
  74. 74. HOW RATELOCK WORKS 1) Proxy access to data via Lambda function 2) Store data (possibly encrypted) in DynamoDB 3) Provide client enough rights to access function but not enough to modify or bypass 4) Implement arbitrary policy in Lambda, isolated by Amazon
  75. 75. LAMBDA IS NOT TPM Actual human beings can deploy code on it
  76. 76. LAMBDA IS NOT SECURE ENCLAVES Actual human beings can deploy code on it It is not (as) obsessed with hiding what it’s doing
  77. 77. ./ratelock.py add foo bar true (Password stored in DynamoDB, proxied through Lambda)
  78. 78. ./ratelock.py check foo bar true ./ratelock.py check foo wrong false • Both checks against DynamoDB, proxied. • Lambda “invoke” right against function “ratelock” only thing required.
  79. 79. # while [ 1 ]; do ./ratelock.py check foo bar; sleep 0.25; done true ... true ... true ... true ... false… false ... false • The proxy starts providing false errors. The caller doesn’t have the ability to directly bypass the proxy. • (Yes, vulnerable to timing – can differentiate fake from real false). • The complex server can get completely compromised. The simple policy survives.
  80. 80. SURVIVABILITY IS COOL
  81. 81. “What if you can’t trust Lambda?”
  82. 82. HERE’S A STRING AMAZON WILL VERIFY, BUT NEVER LEAK, EVEN TO YOU. USEFUL
  83. 83. $ ./walliam.py add demouser 1234567 $ cat authdb.json {"demouser":"BvL40myloWAo39h bIpRpKOy4Skdtswcaa7WJUzWf"} We actually create an IAM user “demouser” under a special path. We just create the user, we don’t grant privileges. But we do get a secret key…which that isn’t.
  84. 84. add_user aes = (CTR, sha256(userpw)) raw = b64decode(aws_secret) enc = aes.encrypt(raw) saved_pw = b64encode(enc) The secret key is first base64 decoded, and then encrypted with the user’s password. We save that. Why decode?
  85. 85. check_user enc = b64decode(saved_pw) aes = (CTR, sha256(userpw)) raw = aes.decrypt(enc) aws_secret = b64encode(raw) To invert the process, we decrypt the saved value with what is supposed to be the user’s password, and base64 encode.
  86. 86. aws_secret can’t be checked offline. They have to ask IAM. Online. GOOD LUCK DOING THAT 100M TIMES.
  87. 87. If there’s one thing you (Amazon) are going to keep online, it’s IAM.
  88. 88. If we didn’t b64decode the Secret Key, there’d be a simple offline attack – post-decrypt, is it Base64? This is why we aren’t using PyNaCl – we need encryption without integrity, for maybe the first time ever!
  89. 89. SOME NOTES • One of the largest e-commerce sites in the world provided required rates for their password server • 7/sec • Yahoo 500M / 7 per sec = 2.26 years • Who are we building instadump for, anyway? • Backups can go to an asymmetric key – encrypt online, decrypt offline • Not just for passwords, this can rate limit any sort of data loss • Working on this • Not just for rate loss, can apply any policy • Notification, delay, extra approvals • What else can we factor out to the cloud functions? • OpenSSL Engine?
  90. 90. Many server breaches. No known Lambda breaches. No known IAM breaches. Nice table, is it…actuarial?
  91. 91. JBOS IS A DIRTY LIE (told ya so) This would be painfully obvious if we were developing actuarial tables. The Great Hope of Cyberinsurance is that somebody will.
  92. 92. AMAZON WISHLIST This was already built on Amazon tech I can ask for more 
  93. 93. I CANNOT LOSE WHAT I DO NOT HAVE: LET ME STRIP AT LEAST ALL ONLINE ACCESS (LIKE GOOGLE CLOUD)
  94. 94. LAMBDA HAS ITS OWN RATE LIMITS I might just want Lambda’s isolation, and provision servers as in ECS not hope for the best as in ELB (or IAM?)
  95. 95. HEALTH CARE Lambda isn’t part of Amazon’s HIPAA portfolio Ratelock’s strongest support has come from organizations preferring to lose 40 medical records, not 4 million Partnered with medal.com (also on AWS) to develop with dedicated resources
  96. 96. IF WE CAN TRUST THE CLOUD WE SHOULD USE MORE OF IT IF WE CAN’T TRUST THE CLOUD IF LAMBDA MAKES SACRIFICES FOR MILLISECONDS CAN WE FIX THAT
  97. 97. CODE SAFETY NOT GETTING OWNED IS HARD.
  98. 98. “If only users would stop running dangerous code.”
  99. 99. THIS PDF MUST BE READ BY SOMEBODY. THAT IS THEIR JOB.
  100. 100. STOP VICTIM SHAMING. It’s not helping.
  101. 101. “Why isn’t everything run in a sandbox? Or at least AV?”
  102. 102. Have you ever tried to find documentation on sandboxing. Chrome Source Code doesn’t count. #DocBounty
  103. 103. WHAT ARE WE TRYING TO GET FROM A SANDBOX? A safe place to play, that starts out clean, and ends up thrown away.
  104. 104. WHAT ARE WE TRYING TO GET FROM A SANDBOX? Well defined interfaces. Known good state.
  105. 105. WHAT’S WRONG WITH EC2 THEN? We still need performance. 60,000-180,000ms to reset to Known Good State. (And there’s a lot you can’t do in Lambda.) (I tried.) (“I spent a month there one weekend.”)
  106. 106. WHAT ABOUT CONTAINERS? What about Docker?
  107. 107. DOCKER RUN -IT --PRIVILEGED -P80:80 DAKAMI/GUACHROME
  108. 108. GREAT FOR DEVELOPERS Security? Is it easy?
  109. 109. THERE’S JUST A LOT THAT CONTAINERS NEED TO SECURE: accept access arch_prctl bind brk capset chdir chmod clone close connect creat dup epoll_create epoll_ctl epoll_wait execve exit exit_group fchmod fchown fcntl fdatasync fstat ftruncate futex getcwd getdents getegid geteuid getgid getpeername getpid getpriority getrlimit getsockname getsockopt gettid getuid ioctl kill listen lseek lstat madvise mkdir mmap mount mprotect mremap munmap nanosleep newfstatat open openat pipe poll ppoll prctl pread pwrite read readlink recvfrom recvmsg rename rt_sigaction rt_sigprocmask sched_getaffinity sched_setscheduler sched_yield select sendmsg sendto setfsgid setfsuid setitimer setpriority setrlimit set_robust_list setsockopt shmat shmctl shmget shutdown signaldeliver sigreturn socket socketpair stat statfs times umask uname unlink wait4 write writev That chrome instance needs 98 syscalls from the host.
  110. 110. 1. WHY IT’S 122 PAGES 2. HOW IT’S NOT EASY (FOR ANYONE)
  111. 111. So, zero sum game, then? Security is hard Vulnerability is easy Let’s all go to the pub
  112. 112. Let’s
  113. 113. SAME CODE, HOSTED SLIGHTLY DIFFERENTLY…
  114. 114. ALL OF CHROME, DOCKER, LINUX, JAVA… 13 SYSCALLS. • futex ioctl ppoll read recvfrom recvmsg sendto write rt_sigaction rt_sigreturn readv writev close • (Yes, shared memory maps and open files are minimal as well.) • It is much easier to secure 13 syscalls than 98. In fact…
  115. 115. ACTUALLY, IT LOOKS LIKE THIS. (PLUS A BIT OF GOOP TO FURTHER LOCKDOWN IOCTL.) IT COULD PROBABLY BE SMALLER.
  116. 116. AUTOCLAVE: • https://github.com/dakami/autoclave • WARNING: Lots of stuff hasn’t been pushed to master. I prioritized the code other people helped with, and I’d do it again. Syscall firewalls for vm isolation
  117. 117. LIVE DEMO? SURE, GO TO https://autoclave.run
  118. 118. YOU’LL SEE:
  119. 119. LINUX AND WINDOWS RUNNING FINE UNDER EXTREME SYSCALL FIREWALLS.
  120. 120. FULLY EPHEMERAL, FULLY REPEATABLE. (SLIGHTLY WIDER RULESET THAN JUST DESCRIBED)
  121. 121. IF YOU’D LIKE TO TRY TO BREAK OUT, HERE’S HYPERVISOR ROOT (CTRL-F2)
  122. 122. WHO WANTS TO HAVE A PDF PARSING PARTY! (They’re even more fun than crypto parties)
  123. 123. HOW IS THIS SECURE HOW IS THIS FAST??? I’m glad you asked!
  124. 124. WHAT’S GOING ON? • VMs have always required less of the host than containers • Easier to secure kernel-to-kernel than userspace-to-kernel • VMs require many more syscalls to start up, than to continue running • Syscall firewall is thus delayed as long as possible – until VNC/network/explicit post-boot activation • Probably the one significant security contribution here • VMs can be restored from memory, I mean, they actually can • Linux does not really allow process freeze/restore • CRIU tries. Oh, does it try. • Hibernation does not work on EC2, at any speed
  125. 125. BYPASS-SHARED-MEMORY • Patch from hyper.sh crew • I was trying to do this myself, but they actually manage a qemu fork • When restoring from memory, the big part is system memory. It’s read() in during restore, not fast • Better method: Generate memory image incrementally with mmap/MAP_SHARED, execute new restorations with mmap/MAP_PRIVATE • Means 100 instances share the “template state” via Copy on Write • It’s fine, we block madvise • (Well, now we do) • Restores move from 5s to <250ms
  126. 126. I CAN RENT A MACHINE WITH 1TB RAM COMPUTERS ARE DIFFERENT NOW
  127. 127. NO AUTOCLAVE ON AWS QEMU software emulation doesn’t count No nested virtualization on AWS No bare metal cloud on AWS
  128. 128. WHY NO BARE METAL CLOUD? 10,000 PARTS FLYING IN CLOSE FORMATION
  129. 129. WHY NO NESTED VIRTUALIZATION? • Traditionally pretty slow, even with hardware acceleration (EPT) • Disney-fication (n): To make a fragmented memory space appear contiguous for purposes of a guest operating system • Allows higher densities • Kills perf (or at least, appears to in $UNNAMED_OTHER_VENDOR)
  130. 130. APPROACHES BEING EXPLORED • User-Mode Linux • It’s still around, and still works • Not entirely sure I need Windows support, don’t entirely love KVM SMP • Works with Ptrace – basically, you’re running an internal kernel inside a debugger that makes it compatible with a real kernel • Ptrace is slow • SECCOMP is not • We could potentially implement the Ptrace jump in a SECCOMP action • Fast Nested Virt • Maybe I can guarantee contiguous memory with a fixed offset • Maybe I can have my guest VMs share 64 bit address space, and EPT is only used to guarantee page faults when guests try to muck with eachother
  131. 131. FORKALL • Just how fast can this be? • Right now – subsecond to spin up a new VM • But still doing redundant QEMU init • Would fork() but QEMU has threads and fork() doesn’t actually clone thread structure • So we’ll add a syscall or a process attribute… • Already faster than container init in many cases • Yes. That’s a surgeon with a fork.
  132. 132. WELL DEFINED INTERFACES KNOWN GOOD STATE
  133. 133. MAYBE WE DON’T NEED UNIKERNELS TO GIVE EVERY INCOMING CONNECTION A COMPLETELY FRESH/EPHEMERAL VM • We like to cheat • We like we like to cheat
  134. 134. SECURITY GETS A SYSCALL FIREWALL. PERFORMANCE GETS INSTANT BOOT. DEVELOPERS GET FREE REIGN AS ROOT. THIS IS NOT A ZERO SUM GAME! Developer Ergonomics is the best phrase.
  135. 135. LET’S MAKE SECURITY EASY • Finding an abuse contact was hard. Now you just look for the tracers amongst the noise. Easy. • TLS was hard. Now you run a daemon, and it’s just there. Easy. • Surviving a breach was hard. Now you design your systems to lose an amount you can live with. Easy. • Running dangerous code was…ok, it was always easy. But now not getting infected by that code is also easy.
  136. 136. #MAKESECURITYEASY NOT JUST A HASHTAG. WE CAN DO THIS. • HALP • I can’t write it all! • https://github.com/dakami • https://labs.whiteops.com
  137. 137. Thank You

×