Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent 2017

824 Aufrufe

Veröffentlicht am

Netflix is big and dynamic. At Netflix, IP addresses mean nothing in the cloud. This is a big challenge with Amazon VPC Flow Logs. VPC Flow Log entries only present network-level information (L3 and L4), which is virtually meaningless. Our goal is to map each IP address back to an application, at scale, to derive true network-level insight within Amazon VPC. In this session, the Cloud Network Engineering team discusses the temporal nature of IP address utilization in AWS and the problem with looking at OSI Layer 3 and Layer 4 information in the cloud.

  • Als Erste(r) kommentieren

A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent 2017

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:Invent A Day in the life of a Cloud Network Engineer at Netflix D o n a v a n F r i t z : S r . C l o u d N e t w o r k S R E J o e l K o d a m a : S r . C l o u d N e t w o r k S R E N E T 3 0 3
  2. 2. 109,000,000 Global Subscribers
  3. 3. 1,000,000+ Requests Per Second
  4. 4. 150,000+ EC2 Instances
  5. 5. 75+ Accounts
  6. 6. 4 AWS Regions
  7. 7. AWS Infrastructure
  8. 8. AWS EC2-Classic
  9. 9. 10.0.0.0/8 EC2 instance EC2 instance EC2 instance EC2 instance EC2 instance EC2 instance EC2 instance EC2 instance EC2 instanceEC2 instance Account A Account B Account C AWS EC2-Classic
  10. 10. EC2 instance EC2 instance EC2 instance EC2 instance Public Private VPC NAT Gateway Internet
  11. 11. EC2 instance EC2 instance EC2 instance EC2 instance EC2 instance EC2 instance VPC peering EC2 instance EC2 instance
  12. 12. Internet MPLS Backbone
  13. 13. Globally Unique IP Space is Good 100.64.0.0/10
  14. 14. IP Management is Hard
  15. 15. Infrastructure Insight
  16. 16. Infrastructure Insight
  17. 17. DNS
  18. 18. api.netflix.com api.netflix.com www.netflix.com
  19. 19. DNS Insight Availability and Performance
  20. 20. Network Insight
  21. 21. “Hi there, can someone help me resolve a network connectivity issue between one microservice to another?” - Sr. Platform Engineer “Does anyone know if there are any network weather events in us-east-1? We’ve seen a couple hosts run into network partitions.” - Sr. Database Engineer “I'm thinking this might be due to networking unpleasantness...” - Sr. Edge Engineer “I am seeing what seem to be network related errors on start-up.” - Stunning Colleague #1
  22. 22. VPC Flow Logs Really Good, Meaningless Data. VPC Flow Logs Really Good. VPC Flow Logs
  23. 23. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK
  24. 24. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK
  25. 25. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21
  26. 26. Network Segmentation
  27. 27. EC2 instance Foo EC2 instance Foo Auto Scaling group EC2 instance Bar EC2 instance Bar Auto Scaling group EC2 instance Baz EC2 instance Baz Auto Scaling group Classic Load Balancer Lambda Function RDS DB instance Application Load Balancer ElastiCache Redis Instance
  28. 28. 172.31.16.139 172.31.16.21 Foo Foo Auto Scaling group 172.31.16.54 Bar 172.31.16.248 Bar Auto Scaling group 172.31.61.95 Baz 172.16.31.10 Baz Auto Scaling group 172.31.16.22 172.31.16.19 172.31.16.60172.31.16.133172.31.16.231
  29. 29. EC2 Instance EC2 Instance Foo Foo Auto Scaling group Amazon SQS
  30. 30. 172.31.16.139 172.31.16.21 Foo Foo Auto Scaling group 72.21.207.173
  31. 31. What app has these IPs?
  32. 32. IP Address: 172.16.100.100 t0 tnt1 EC2 Instance t3 EC2 Instance t2 Lambda Function
  33. 33. What app had these IPs, at this time?
  34. 34. EC2 instance Foo EC2 instance Foo Auto Scaling group 172.31.0.0/16 EC2 instance Bar EC2 instance Bar Auto Scaling group 172.31.0.0/16
  35. 35. What app had these IPs, at this time, in this routing domain?
  36. 36. VPC Flow LogsIP Addresses Mean Nothing Challenges
  37. 37. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21
  38. 38. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21 TCP/20641 TCP/22 IP Addresses Mean Nothing Stateless Challenges //
  39. 39. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21 TCP/11211 TCP/8008??? IP Addresses Mean Nothing Stateless Challenges //
  40. 40. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21 TCP/11211 TCP/8008memcached IP Addresses Mean Nothing Stateless Challenges //
  41. 41. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21 TCP/11211 TCP/8008 Classic Load Balancer EC2 Instance memcached IP Addresses Mean Nothing Stateless Challenges //
  42. 42. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21 TCP/11211 TCP/8008 Classic Load Balancer EC2 Instance memcached IP Addresses Mean Nothing Stateless Challenges //
  43. 43. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21 TCP/11211 TCP/8008HTTP Classic Load Balancer EC2 Instance IP Addresses Mean Nothing Stateless Challenges //
  44. 44. 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 11211 8008 6 20 4249 1418530010 1418530070 ACCEPT OK 172.31.16.139 172.31.16.21 TCP/11211 TCP/8008HTTP Classic Load Balancer EC2 Instance IP Addresses Mean Nothing Stateless Challenges //
  45. 45. VPC Flow LogsIP Addresses Mean Nothing Stateless Fragmented Challenges
  46. 46. instance instance 1 TCP Connection IP Addresses Mean Nothing Stateless Challenges Fragmented////
  47. 47. instance instance 1 TCP Connection, 4 VPC Flow Log Records IP Addresses Mean Nothing Stateless Challenges Fragmented////
  48. 48. instance instance 1 TCP Connection, 4 VPC Flow Log Records elastic network interface elastic network interface IP Addresses Mean Nothing Stateless Challenges Fragmented////
  49. 49. instance Amazon SQS 1 TCP Connection, 6 VPC Flow Log Records VPC NAT Gateway IP Addresses Mean Nothing Stateless Challenges Fragmented////
  50. 50. instance 2 TCP Connections Classic Load Balancer instance , 12 VPC Flow Log Records VPC NAT Gateway IP Addresses Mean Nothing Stateless Challenges Fragmented////
  51. 51. instance What I care about instance IP Addresses Mean Nothing Stateless Challenges Fragmented////
  52. 52. VPC Flow Logs We have a lot of Flow Logs IP Addresses Mean Nothing Stateless Fragmented Challenges
  53. 53. 1,000,000+ Requests Per Second 4 AWS Regions 75+ of accounts 150,000+ EC2 Instances IP Addresses Mean Nothing Stateless Challenges We have a lot of Flow LogsFragmented //////
  54. 54. 10,000,000+ Flow Log Records Every Second IP Addresses Mean Nothing Stateless Challenges We have a lot of Flow LogsFragmented //////
  55. 55. VPC Flow LogsIP Addresses Mean Nothing Stateless We have a lot of Flow Logs Fragmented Solutions
  56. 56. What app had these IPs, at this time, in this routing domain? IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  57. 57. f(domain, ip, time) = app IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  58. 58. Sonar IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  59. 59. Extract Transform Load AWS APIs / Logs Netflix APIs / Logs CloudWatch Events DNS Crawling Polling Event Processing Netflix Events t1: ip(172.31.2.2) + eni-123 t2: ip(172.31.2.2) + i-abcdef t3: ip(172.31.2.2) + titus cabc t10: ip(172.31.2.2) - titus cabc t11: ip(172.31.2.2) - eni-123 t12: ip(172.31.2.2) - i-abcdef ... IP Change Events t20: ip(1.1.1.1) + AWS SNS t21: ip(2.2.2.2) + AWS SQS t30: ip(2.2.2.2) - AWS SQS t31: ip(1.1.1.1) - AWS SNS ... Sonar IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  60. 60. VPC Flow LogsIP Addresses Mean Nothing Stateless We have a lot of Flow Logs Fragmented Solutions
  61. 61. TCP/80 TCP/443 TCP/8080 TCP/8443 ... SSM Agent EC2 Instances IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  62. 62. VPC Flow LogsIP Addresses Mean Nothing Stateless We have a lot of Flow Logs Fragmented Solutions
  63. 63. Known Deficiency IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  64. 64. VPC Flow LogsIP Addresses Mean Nothing Stateless We have a lot of Flow Logs Fragmented Solutions
  65. 65. Dredge IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  66. 66. IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented ////// Dredge Amazon VPC Flow Logs (via Kinesis ) IP Change Events (Sonar) Stream Joins Netflix Data Pipeline
  67. 67. VPC Flow Logs (via Amazon Kinesis) IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  68. 68. Stream Joins 2 123456789010 eni-abc123de 6 OK172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070424920 IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented ////// ACCEPT
  69. 69. f(domain, ip, time) = app IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented ////// IPv4 Addresses TimestampRouting Domain 2 123456789010 eni-abc123de 6 OK172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070424920 ACCEPT Stream Joins
  70. 70. f(0, 172.31.16.139, 1418530010) = f(0, 172.31.16.21, 1418530010) = foo bar IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented ////// IPv4 Addresses TimestampRouting Domain 2 123456789010 eni-abc123de 6 OK172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070424920 ACCEPT Stream Joins
  71. 71. 172.31.16.139:20641 = Not Listening Outbound Request f(0, 172.31.16.139, 1418530010) = f(0, 172.31.16.21, 1418530010) = foo bar = IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented ////// IPv4 Addresses TimestampRouting Domain 2 123456789010 eni-abc123de 6 OK172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070424920 ACCEPT Stream Joins
  72. 72. { srcIP: ‘172.31.16.139’, dstIP: ‘172.31.16.21’, srcPort: 20641, dstPort: 22, packets: 20, bytes: 4249, startTs: 1418530010, endTs: 1418530070, action: ‘ACCEPT’, srcApp: ‘foo’, dstApp: ‘bar’, state: ‘Outbound Request’, … } { srcIP: ‘ ’, dstIP: ‘ ’, srcPort: , dstPort: , packets: , bytes: , startTs: , endTs: , action: ‘ ’, srcApp: ‘ ’, dstApp: ‘ ’, state: ‘ ’, … } 172.31.16.139 172.31.16.21 20641 22 1418530010 1418530070 foo bar Outbound Request 4249 20 Transform ACCEPT IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented //////
  73. 73. Load IP Addresses Mean Nothing Stateless Solutions We have a lot of Flow LogsFragmented ////// Netflix Data Pipeline { srcIP: ‘172.31.16.139’, dstIP: ‘172.31.16.21’, srcPort: 20641, dstPort: 22, packets: 20, bytes: 4249, startTs: 1418530010, endTs: 1418530070, action: ‘ACCEPT’, srcApp: ‘foo’, dstApp: ‘bar’, state: ‘Outbound Request’, … }
  74. 74. Results
  75. 75. Slack App
  76. 76. $ netstat -tpna Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 100.81.51.147:35024 100.81.3.96:8080 TIME_WAIT - tcp 0 0 100.81.51.147:27945 100.76.196.35:12345 ESTABLISHED - tcp 0 0 100.81.51.147:40881 100.76.155.222:12345 ESTABLISHED - tcp 0 0 100.81.51.147:58127 100.81.77.56:8080 ESTABLISHED - tcp 0 0 100.81.51.147:57581 100.76.157.241:12345 ESTABLISHED - tcp 0 0 100.81.51.147:8080 100.81.213.243:47269 ESTABLISHED - tcp 0 0 100.81.51.147:42184 100.81.50.229:8080 ESTABLISHED - tcp 0 0 100.81.51.147:37429 100.81.57.18:8080 TIME_WAIT - tcp 0 0 100.81.51.147:38336 100.81.75.198:8080 ESTABLISHED - tcp 0 0 100.81.51.147:21432 100.81.90.93:8080 TIME_WAIT - tcp 0 0 100.81.51.147:22824 100.76.228.39:12345 ESTABLISHED - tcp 1 0 100.81.51.147:13514 100.81.107.125:8080 CLOSE_WAIT - tcp 0 0 100.81.51.147:63556 100.81.160.56:8080 ESTABLISHED - tcp 0 0 100.81.51.147:21591 100.81.19.52:8080 TIME_WAIT - tcp 0 0 100.81.51.147:41689 100.81.59.253:8081 ESTABLISHED - tcp 0 0 100.81.51.147:8080 100.81.37.100:31639 FIN_WAIT2 - tcp 54 0 100.81.51.147:52883 52.218.128.113:443 ESTABLISHED - tcp 0 0 100.81.51.147:27556 100.76.198.44:12345 ESTABLISHED - tcp 1 0 100.81.51.147:25435 100.81.79.120:8080 CLOSE_WAIT - tcp 1 54 100.81.51.147:14703 52.218.128.121:443 ESTABLISHED - tcp 1 0 100.81.51.147:53777 100.81.107.125:8080 CLOSE_WAIT - tcp 0 0 100.81.51.147:38366 100.76.157.217:12345 ESTABLISHED - tcp 1 0 100.81.51.147:62763 100.81.107.125:8080 ESTABLISHED - tcp 0 0 100.81.51.147:55510 100.81.22.63:8080 TIME_WAIT - tcp 0 0 100.81.51.147:8080 100.81.234.159:27884 ESTABLISHED -
  77. 77. +-----------+-------------+----------------------------+-----------+-----------+-------------+-----+ | Direction | ForeignKind | ExtraInfo | Account | Region | State | Qty | +-----------+-------------+----------------------------+-----------+-----------+-------------+-----+ | inbound | Instance | asg: bastion-v078 | 111111111 | us-west-1 | ESTABLISHED | 1 | | outbound | Instance | asg: ledo-v004 | 222222222 | us-east-1 | ESTABLISHED | 80 | | outbound | Instance | asg: ledo-v003 | 222222222 | us-east-1 | ESTABLISHED | 80 | | outbound | AwsService | dynamodb | | us-east-1 | ESTABLISHED | 19 | | outbound | AwsService | kinesis | | us-east-1 | ESTABLISHED | 14 | | outbound | Instance | asg: brigo-us-east-1e-v011 | 333333333 | us-east-1 | ESTABLISHED | 8 | | outbound | Instance | asg: brigo-us-east-1d-v011 | 333333333 | us-east-1 | ESTABLISHED | 8 | | outbound | Instance | asg: brigo-us-east-1c-v012 | 333333333 | us-east-1 | ESTABLISHED | 8 | | outbound | Instance | asg: berberb-v012 | 333333333 | us-east-1 | ESTABLISHED | 3 | | outbound | Instance | asg: pikango-v003 | 444444444 | us-east-1 | ESTABLISHED | 2 | | outbound | Instance | asg: endai-v003 | 555555555 | us-east-1 | ESTABLISHED | 1 | | outbound | Instance | asg: kotts-v111 | 444444444 | us-east-1 | ESTABLISHED | 1 | | outbound | Instance | asg: akrah-v000 | 333333333 | us-east-1 | ESTABLISHED | 1 | | outbound | Instance | asg: barta-v095 | 333333333 | us-east-1 | ESTABLISHED | 1 | | outbound | Instance | asg: padok-v061 | 333333333 | us-east-1 | ESTABLISHED | 1 | | outbound | AwsService | kinesis | | us-east-1 | TIME_WAIT | 3 | | outbound | Instance | asg: ledo-v004 | 222222222 | us-east-1 | TIME_WAIT | 2 | | outbound | Instance | asg: ledo-v003 | 222222222 | us-east-1 | TIME_WAIT | 1 | | outbound | Instance | asg: berberb-v012 | 333333333 | us-east-1 | TIME_WAIT | 1 | +-----------+-------------+----------------------------+-----------+-----------+-------------+-----+
  78. 78. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Donavan Fritz: Sr. Cloud Network SRE d f r i t z @ n e t f l i x . c o m Joel Kodama: Sr. Cloud Network SRE j k o d a m a @ n e t f l i x . c o m

×