SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Ne#lix	
  Cloud	
  Architecture	
  

       Qcon	
  Tokyo	
  April	
  12,	
  2011	
  
           Adrian	
  Cockcro<	
  
  @adrianco	
  #ne#lixcloud	
  h?p://slideshare.net/adrianco	
  
                acockcro<@ne#lix.com	
  
Who,	
  Why,	
  What	
  

           Ne#lix	
  in	
  the	
  Cloud	
  
   Cloud	
  Challenges	
  and	
  Learnings	
  
Systems	
  and	
  OperaJons	
  Architecture	
  
                        	
  
Ne#lix	
  Inc.	
  
     With	
  more	
  than	
  20	
  million	
  subscribers	
  in	
  the	
  United	
  
     States	
  and	
  Canada,	
  Ne9lix,	
  Inc.	
  is	
  the	
  world’s	
  leading	
  
     Internet	
  subscripAon	
  service	
  for	
  enjoying	
  movies	
  and	
  
                                      TV	
  shows.	
  
                                             	
  
                           InternaAonal	
  Expansion	
  
     We	
  plan	
  to	
  expand	
  into	
  an	
  addiAonal	
  market	
  in	
  the	
  
     second	
  half	
  of	
  2011…	
  If	
  the	
  second	
  market	
  meets	
  our	
  
     expectaAons…	
  we	
  will	
  conAnue	
  to	
  invest	
  and	
  expand	
  
                              aggressively	
  in	
  2012.	
  
Source:	
  h?p://ir.ne#lix.com	
  
Unlimited	
  streaming	
  for	
  $7.99/month,	
  large	
  and	
  growing	
  catalog	
  of	
  movies	
  and	
  TV	
  
Adrian	
  Cockcro<	
  
•  Director,	
  Architecture	
  for	
  Cloud	
  Systems,	
  Ne#lix	
  Inc.	
  
      –  Previously	
  Director	
  for	
  PersonalizaJon	
  Pla#orm	
  

•  DisJnguished	
  Availability	
  Engineer,	
  eBay	
  Inc.	
  2004-­‐7	
  
      –  Founding	
  member	
  of	
  eBay	
  Research	
  Labs	
  

•  DisJnguished	
  Engineer,	
  Sun	
  Microsystems	
  Inc.	
  1988-­‐2004	
  
      –    2003-­‐4	
  Chief	
  Architect	
  High	
  Performance	
  Technical	
  CompuJng	
  
      –    2001	
  Author:	
  Capacity	
  Planning	
  for	
  Web	
  Services	
  
      –    1999	
  Author:	
  Resource	
  Management	
  
      –    1995	
  &	
  1998	
  Author:	
  Sun	
  Performance	
  and	
  Tuning	
  
      –    1996	
  Japanese	
  EdiJon	
  of	
  Sun	
  Performance	
  and	
  Tuning	
  
             •  	
  SPARC	
  &	
  Solaris                     (                          )	
  
Why	
  is	
  Ne#lix	
  Talking	
  about	
  
               Cloud?	
  
Ne#lix	
  is	
  Path-­‐finding	
  

   The	
  Cloud	
  ecosystem	
  is	
  evolving	
  very	
  fast	
  
Share	
  with	
  and	
  learn	
  from	
  the	
  cloud	
  community	
  
We	
  want	
  to	
  use	
  clouds,	
  
             not	
  build	
  them	
  
   Cloud	
  technology	
  should	
  be	
  a	
  commodity	
  
Public	
  cloud	
  and	
  open	
  source	
  for	
  agility	
  and	
  scale	
  
Why	
  Use	
  Cloud?	
  
                         	
  
        For	
  Be?er	
  Business	
  Agility	
  
For	
  Unpredictable	
  Business	
  Growth	
  
Data	
  Center	
                  Ne#lix	
  could	
  not	
  
                                     build	
  new	
  
                                  datacenters	
  fast	
  
                                      enough	
  

  Capacity	
  growth	
  is	
  acceleraJng,	
  unpredictable	
  
  Product	
  launch	
  spikes	
  -­‐	
  iPhone,	
  Wii,	
  PS3,	
  XBox	
  
20	
  Million	
  Customers	
  
   2010-­‐Q3	
  year/year	
  +52%	
  Total	
  and	
  +145%	
  Streaming	
  
         25	
  

          20	
  

          15	
  

           10	
  

              5	
  

               0	
  
                       2009Q2	
  2009Q3	
  
                                           2009Q4	
   2010Q1	
  
                                                                   2010Q2	
  
                                                                                2010Q3	
  
                                                                                             2010Q4	
  

Source:	
  h?p://ir.ne#lix.com	
  
Out-­‐Growing	
  Data	
  Center	
  
             h?p://techblog.ne#lix.com/2011/02/redesigning-­‐ne#lix-­‐api.html   	
  


                               37x	
  Growth	
  Jan	
  
                               2010-­‐Jan	
  2011	
  


Datacenter	
  
Capacity	
  
Ne#lix.com	
  is	
  now	
  ~100%	
  Cloud	
  

   Account	
  sign-­‐up	
  is	
  currently	
  being	
  moved	
  to	
  cloud	
  
     All	
  internaJonal	
  product	
  will	
  be	
  cloud	
  based	
  
    USA	
  specific	
  logisJcs	
  remains	
  in	
  the	
  Datacenter	
  	
  
Leverage	
  AWS	
  Scale	
  
   “the	
  biggest	
  public	
  cloud”	
  
       AWS	
  investment	
  in	
  tooling	
  and	
  automaJon	
  
Use	
  many	
  AWS	
  zones	
  for	
  high	
  availability,	
  scalability	
  
       AWS	
  skills	
  are	
  most	
  common	
  on	
  resumes…	
  
Leverage	
  AWS	
  Feature	
  Set	
  
      “the	
  market	
  leader”	
  
EC2,	
  S3,	
  SDB,	
  SQS,	
  EBS,	
  EMR,	
  ELB,	
  ASG,	
  IAM,	
  RDB,	
  VPC…	
  
                        h?p://aws.amazon.com/jp	
  
Amazon Cloud Terminology
                                   See http://aws.amazon.com/jp for Japanese
                               This is not a full list of Amazon Web Service features

•    AWS	
  –	
  Amazon	
  Web	
  Services	
  (common	
  name	
  for	
  Amazon	
  cloud)	
  
•    AMI	
  –	
  Amazon	
  Machine	
  Image	
  (archived	
  boot	
  disk,	
  Linux,	
  Windows	
  etc.	
  plus	
  applicaJon	
  code)	
  
•    EC2	
  –	
  ElasJc	
  Compute	
  Cloud	
  
       –    Range	
  of	
  virtual	
  machine	
  types	
  m1,	
  m2,	
  c1,	
  cc,	
  cg.	
  Varying	
  memory,	
  CPU	
  and	
  disk	
  configuraJons.	
  
       –    Instance	
  –	
  a	
  running	
  computer	
  system.	
  Ephemeral,	
  when	
  it	
  is	
  de-­‐allocated	
  nothing	
  is	
  kept.	
  
       –    Reserved	
  Instances	
  –	
  pre-­‐paid	
  to	
  reduce	
  cost	
  for	
  long	
  term	
  usage	
  
       –    Availability	
  Zone	
  –	
  datacenter	
  with	
  own	
  power	
  and	
  cooling	
  hosJng	
  cloud	
  instances	
  
       –    Region	
  –	
  group	
  of	
  Availability	
  Zones	
  –	
  US-­‐East,	
  US-­‐West,	
  EU-­‐Eire,	
  Asia-­‐Singapore,	
  Asia-­‐Japan	
  
•    ASG	
  –	
  Auto	
  Scaling	
  Group	
  (instances	
  booJng	
  from	
  the	
  same	
  AMI)	
  
•    S3	
  –	
  Simple	
  Storage	
  Service	
  (h?p	
  access)	
  
•    EBS	
  –	
  ElasJc	
  Block	
  Storage	
  (network	
  disk	
  filesystem	
  can	
  be	
  mounted	
  on	
  an	
  instance)	
  
•    RDB	
  –	
  RelaJonal	
  Data	
  Base	
  (managed	
  MySQL	
  master	
  and	
  slaves)	
  
•    SDB	
  –	
  Simple	
  Data	
  Base	
  (hosted	
  h?p	
  based	
  NoSQL	
  data	
  store)	
  
•    SQS	
  –	
  Simple	
  Queue	
  Service	
  (h?p	
  based	
  message	
  queue)	
  
•    SNS	
  –	
  Simple	
  NoJficaJon	
  Service	
  (h?p	
  and	
  email	
  based	
  topics	
  and	
  messages)	
  
•    EMR	
  –	
  ElasJc	
  Map	
  Reduce	
  (automaJcally	
  managed	
  Hadoop	
  cluster)	
  
•    ELB	
  –	
  ElasJc	
  Load	
  Balancer	
  
•    EIP	
  –	
  ElasJc	
  IP	
  (stable	
  IP	
  address	
  mapping	
  assigned	
  to	
  instance	
  or	
  ELB)	
  
•    VPC	
  –	
  Virtual	
  Private	
  Cloud	
  (extension	
  of	
  enterprise	
  datacenter	
  network	
  into	
  cloud)	
  
•    IAM	
  –	
  IdenJty	
  and	
  Access	
  Management	
  (fine	
  grain	
  role	
  based	
  security	
  keys)	
  
“The	
  cloud	
  lets	
  its	
  users	
  focus	
  
         on	
  delivering	
  differenAaAng	
  
         business	
  value	
  instead	
  of	
  
         wasAng	
  valuable	
  resources	
  
         on	
  the	
  undifferen)ated	
  
         heavy	
  li0ing	
  that	
  makes	
  
         up	
  most	
  of	
  IT	
  
         infrastructure.”	
  
	
  
     	
  Werner	
  Vogels	
  
     	
  Amazon	
  CTO	
  
	
  
We	
  want	
  to	
  use	
  clouds,	
  
we	
  don’t	
  have	
  Jme	
  to	
  build	
  them	
  
                  Public	
  cloud	
  for	
  agility	
  and	
  scale	
  
 AWS	
  because	
  they	
  are	
  big	
  enough	
  to	
  allocate	
  thousands	
  
           of	
  instances	
  per	
  hour	
  when	
  we	
  need	
  to	
  
Ne#lix	
  EC2	
  Instances	
  per	
  Account	
  
          (summer	
  2010,	
  producJon	
  is	
  much	
  higher	
  now…)	
  
“Many	
  Thousands”	
  




           Content	
  Encoding	
  




          Test	
  and	
  ProducJon	
  
                                             Log	
  Analysis	
  

                                         “Several	
  Months”	
  
Ne#lix	
  Deployed	
  on	
  AWS	
  

Content	
            Logs	
             Play	
          WWW	
            API	
  
    Video	
  
                           S3	
            DRM	
          Search	
       Metadata	
  
   Masters	
  


                        EMR	
              CDN	
          Movie	
          Device	
  
     EC2	
  
                       Hadoop	
           rouJng	
       Choosing	
        Config	
  


                                                                         TV	
  Movie	
  
      S3	
               Hive	
         Bookmarks	
       RaJngs	
  
                                                                         Choosing	
  

                       Business	
                                         Mobile	
  
     CDN	
                               Logging	
        Similars	
  
                     Intelligence	
                                       iPhone	
  
Cloud	
  Encoding	
  Pipeline	
  

                                                                   Encode	
       S3	
      Encode	
              S3	
  
Movie	
       Master	
                  Network	
      S3	
                                                                  Copy	
  to	
      CDN	
       Stream	
  
Studios	
                  Ne#lix	
                   Master	
     Mezza-­‐     Mezza-­‐    to	
  	
  50+	
     Origin	
  
                                                                                                                                              Origin	
  
              Tapes	
                   Upload	
                                 nine	
                          files	
       CDN	
                         to	
  TV	
  
                                                                    nine	
                   files	
  




     Licensed	
  content	
  is	
  provided	
  to	
  Ne#lix	
  as	
  high	
  quality	
  master	
  tapes	
  
     Many	
  formats	
  are	
  reduced	
  to	
  a	
  single	
  high	
  quality	
  mezzanine	
  format	
  on	
  S3	
  
     Individual	
  formats	
  and	
  speeds	
  are	
  encoded	
  in	
  over	
  50	
  combinaJons	
  
          	
  Many	
  formats	
  for	
  older	
  and	
  newer	
  hardware	
  and	
  various	
  game	
  consoles	
  
          	
  Many	
  speeds	
  from	
  mobile	
  through	
  standard	
  and	
  high	
  definiJon	
  
     StaJc	
  files	
  are	
  copied	
  to	
  each	
  Content	
  Delivery	
  Network’s	
  “origin	
  server”	
  
     CDNs	
  migrate	
  files	
  to	
  “edge	
  servers”	
  near	
  the	
  end	
  user	
  
     Files	
  stream	
  to	
  PC/Mac/iPad	
  or	
  TV	
  over	
  HTTP	
  using	
  “range	
  get”	
  to	
  move	
  chunks	
  
Cloud	
  Architecture	
  
Product	
  Trade-­‐off	
  
User	
  Experience	
     ImplementaJon	
  




  Consistent	
           Development	
  
  Experience	
            complexity	
  


                          OperaJonal	
  
 Low	
  Latency	
  
                          complexity	
  
Ne#lix	
  Cloud	
  Goals	
  
•  Faster	
  
     –  Lower	
  latency	
  than	
  the	
  equivalent	
  datacenter	
  web	
  pages	
  and	
  API	
  calls	
  
     –  Measured	
  as	
  mean	
  and	
  99th	
  percenJle	
  
     –  For	
  both	
  first	
  hit	
  (e.g.	
  home	
  page)	
  and	
  in-­‐session	
  hits	
  for	
  the	
  same	
  user	
  
•  Scalable	
  
     –  Avoid	
  needing	
  any	
  more	
  datacenter	
  capacity	
  as	
  subscriber	
  count	
  increases	
  
     –  No	
  central	
  verJcally	
  scaled	
  databases	
  
     –  Leverage	
  AWS	
  elasJc	
  capacity	
  effecJvely	
  
•  Available	
  
     –  SubstanJally	
  higher	
  robustness	
  and	
  availability	
  than	
  datacenter	
  services	
  
     –  Leverage	
  mulJple	
  AWS	
  availability	
  zones	
  
     –  No	
  scheduled	
  down	
  Jme,	
  no	
  central	
  database	
  schema	
  to	
  change	
  
•  ProducJve	
  
     –  OpJmize	
  agility	
  of	
  a	
  large	
  development	
  team	
  with	
  automaJon	
  and	
  tools	
  
     –  Leave	
  behind	
  complex	
  tangled	
  datacenter	
  code	
  base	
  (~8	
  year	
  old	
  architecture)	
  
     –  Enforce	
  clean	
  layered	
  interfaces	
  and	
  re-­‐usable	
  components	
  
Old	
  Datacenter	
  vs.	
  New	
  Cloud	
  Arch	
  
    Central	
  SQL	
  Database	
          Distributed	
  Key/Value	
  NoSQL	
  

 SJcky	
  In-­‐Memory	
  Session	
         Shared	
  Memcached	
  Session	
  

       Cha?y	
  Protocols	
                 Latency	
  Tolerant	
  Protocols	
  

 Tangled	
  Service	
  Interfaces	
         Layered	
  Service	
  Interfaces	
  

     Instrumented	
  Code	
              Instrumented	
  Service	
  Pa?erns	
  

    Fat	
  Complex	
  Objects	
          Lightweight	
  Serializable	
  Objects	
  

  Components	
  as	
  Jar	
  Files	
         Components	
  as	
  Services	
  
Learnings	
  
•  Datacenter	
  oriented	
  tools	
  don’t	
  work	
  
     –  Ephemeral	
  instances	
  
     –  High	
  rate	
  of	
  change	
  
     –  Need	
  too	
  much	
  hand-­‐holding	
  and	
  manual	
  setup	
  

•  Cloud	
  Tools	
  Don’t	
  Scale	
  for	
  Enterprise	
  
     –  Too	
  many	
  tools	
  are	
  “Startup”	
  oriented	
  
     –  Built	
  our	
  own	
  tools	
  for	
  1000’s	
  of	
  instances	
  
     –  Drove	
  vendors	
  to	
  be	
  dynamic,	
  scale,	
  add	
  APIs	
  

•  Un-­‐modified	
  Datacenter	
  Apps	
  are	
  Fragile	
  
     –  Too	
  many	
  datacenter	
  oriented	
  assumpJons	
  
     –  We	
  re-­‐wrote	
  our	
  code	
  base!	
  
     –  (We	
  re-­‐write	
  it	
  conJnuously	
  anyway)	
  
Ne#lix	
  Systems	
  Architecture	
  
API	
  
 AWS	
  EC2	
  
                                         Front	
  End	
  Load	
  Balancer	
  
             Discovery	
  
              Service	
                            API	
  Proxy	
                              API	
  etc.	
  

                                                Load	
  Balancer	
  


           Component	
                                  API	
               SQS	
  
            Services	
                                                                       Oracl
                                                                                              e	
  
                                                                                              Oracle	
  
                                                                                                    Oracle	
  
                     memcached	
                         memcached	
        ReplicaJon	
  



        EBS	
                                                                                Ne?lix	
  
                                S3	
                                                         Data	
  Center	
  
AWS	
  Storage	
                                                      SimpleDB	
  
Database	
  MigraJon	
  
•  Why	
  SimpleDB?	
  
    –  No	
  DBA’s	
  in	
  the	
  cloud,	
  Amazon	
  hosted	
  service	
  
    –  Work	
  started	
  two	
  years	
  ago,	
  fewer	
  viable	
  opJons	
  
    –  Worked	
  with	
  Amazon	
  to	
  speed	
  up	
  and	
  scale	
  SimpleDB	
  
•  AlternaJves?	
  
    –  Rolling	
  out	
  Cassandra	
  as	
  “upgrade”	
  from	
  SimpleDB	
  
    –  Need	
  several	
  opJons	
  to	
  match	
  use	
  cases	
  well	
  
•  Detailed	
  NoSQL	
  and	
  SimpleDB	
  Advice	
  
    –  Sid	
  Anand	
  	
  -­‐	
  QConSF	
  Nov	
  5th	
  –	
  Ne#lix’	
  TransiJon	
  to	
  High	
  
       Availability	
  Storage	
  Systems	
  
    –  Blog	
  -­‐	
  h?p://pracJcalcloudcompuJng.com/	
  
    –  Download	
  Paper	
  PDF	
  -­‐	
  h?p://bit.ly/bhOTLu	
  
Cloud	
  OperaJons	
  

  Model	
  Driven	
  Architecture	
  
Capacity	
  Planning	
  &	
  Monitoring	
  
Tools	
  and	
  AutomaJon	
  
•  Developer	
  and	
  Build	
  Tools	
  
     –  Jira,	
  Perforce,	
  Eclipse,	
  Jeeves,	
  Ivy,	
  ArJfactory	
  
     –  Builds,	
  creates	
  .war	
  file,	
  .rpm,	
  bakes	
  AMI	
  and	
  launches	
  
•  Custom	
  Ne#lix	
  ApplicaJon	
  Console	
  
     –  AWS	
  Features	
  at	
  Enterprise	
  Scale	
  (hide	
  the	
  AWS	
  security	
  keys!)	
  
     –  Auto	
  Scaler	
  Group	
  is	
  unit	
  of	
  deployment	
  to	
  producJon	
  
•  Open	
  Source	
  +	
  Support	
  
     –  Apache,	
  Tomcat,	
  Cassandra,	
  Hadoop,	
  OpenJDK,	
  CentOS	
  
•  Monitoring	
  Tools	
  
     –    Keynote	
  –	
  service	
  monitoring	
  and	
  alerJng	
  
     –    AppDynamics	
  –	
  Developer	
  focus	
  for	
  cloud	
  h?p://appdynamics.com	
  
     –    EpicNMS	
  –	
  flexible	
  data	
  collecJon	
  and	
  plots	
  h?p://epicnms.com	
  
     –    Nimso<	
  NMS	
  –	
  ITOps	
  focus	
  for	
  Datacenter	
  +	
  Cloud	
  alerJng	
  
Model	
  Driven	
  Architecture	
  
•  Datacenter	
  PracJces	
  
   –  Lots	
  of	
  unique	
  hand-­‐tweaked	
  systems	
  
   –  Hard	
  to	
  enforce	
  pa?erns	
  

•  Model	
  Driven	
  Cloud	
  Architecture	
  
   –  Perforce/Ivy/Jeeves	
  based	
  builds	
  for	
  everything	
  
   –  Every	
  producJon	
  instance	
  is	
  a	
  pre-­‐baked	
  AMI	
  
   –  Every	
  applicaJon	
  is	
  managed	
  by	
  an	
  Autoscaler	
  

            No	
  excep)ons,	
  every	
  change	
  is	
  a	
  new	
  AMI	
  
High	
  Availability	
  Zones	
  
•  Each	
  zone	
  is	
  a	
  separate	
  datacenter	
  
    –  Private	
  power,	
  cooling,	
  network	
  connecJons	
  
    –  Located	
  close	
  together	
  for	
  low	
  latency	
  
•  ASG	
  Instances	
  are	
  distributed	
  over	
  3	
  zones	
  
•  Data	
  wri?en	
  to	
  one	
  zone	
  appears	
  in	
  all	
  zones	
  
•  Ne#lix	
  can	
  survive	
  total	
  failure	
  of	
  one	
  zone	
  
    –  Increase	
  capacity	
  of	
  exisJng	
  zones	
  by	
  50%	
  
    –  Small	
  or	
  zero	
  downJme	
  
Region	
  MigraJon	
  
(Ne#lix	
  is	
  working	
  to	
  have	
  this	
  in	
  place	
  during	
  2011,	
  for	
  internaJonal	
  roll-­‐out	
  
                                           and	
  disaster	
  recovery)	
  

•  Data	
  is	
  backed	
  up	
  into	
  a	
  different	
  cloud	
  region	
  
     –  Cloud	
  bandwidth	
  is	
  much	
  higher	
  than	
  Datacenter	
  
•  Restore	
  to	
  a	
  new	
  region	
  
     –  “A	
  few	
  hours”	
  to	
  load	
  data	
  and	
  create	
  databases	
  
•  Create	
  model	
  driven	
  architecture	
  
     –  “A	
  few	
  hours”	
  to	
  create	
  service	
  instances	
  and	
  test	
  
•  Send	
  traffic	
  to	
  new	
  region	
  
     –  Setup	
  DNS	
  records	
  and	
  start	
  customer	
  service	
  
Model	
  Driven	
  ImplicaJons	
  
•  Automated	
  “Least	
  Privilege”	
  Security	
  
   –  Tightly	
  specified	
  security	
  groups	
  
   –  Fine	
  grain	
  IAM	
  keys	
  to	
  access	
  AWS	
  resources	
  
   –  Performance	
  tools	
  security	
  and	
  integraJon	
  


•  Model	
  Driven	
  Performance	
  Monitoring	
  
   –  Hundreds	
  of	
  instances	
  appear	
  in	
  a	
  few	
  minutes…	
  
   –  Tools	
  have	
  to	
  “garbage	
  collect”	
  dead	
  instances	
  	
  
Ne#lix	
  App	
  Console	
  
Auto	
  Scale	
  Group	
  ConfiguraJon	
  
Capacity	
  Planning	
  &	
  Monitoring	
  
Capacity	
  Planning	
  in	
  Clouds	
  
                     (a	
  few	
  things	
  have	
  changed…)	
  

•    Capacity	
  is	
  expensive	
  
•    Capacity	
  takes	
  Jme	
  to	
  buy	
  and	
  provision	
  
•    Capacity	
  only	
  increases,	
  can’t	
  be	
  shrunk	
  easily	
  
•    Capacity	
  comes	
  in	
  big	
  chunks,	
  paid	
  up	
  front	
  
•    Planning	
  errors	
  can	
  cause	
  big	
  problems	
  
•    Systems	
  are	
  clearly	
  defined	
  assets	
  
•    Systems	
  can	
  be	
  instrumented	
  in	
  detail	
  
•    Depreciate	
  assets	
  over	
  3	
  years	
  (reservaJons!)	
  
Monitoring	
  Issues	
  
•  Problem	
  
   –  Too	
  many	
  tools,	
  each	
  with	
  a	
  good	
  reason	
  to	
  exist	
  
   –  Hard	
  to	
  get	
  an	
  integrated	
  view	
  of	
  a	
  problem	
  
   –  Too	
  much	
  manual	
  work	
  building	
  dashboards	
  
   –  Tools	
  are	
  not	
  discoverable,	
  views	
  are	
  not	
  filtered	
  

•  SoluJon	
  
   –  Get	
  vendors	
  to	
  add	
  deep	
  linking	
  URLs	
  and	
  APIs	
  
   –  IntegraJon	
  “portal”	
  Jes	
  everything	
  together	
  
   –  Underlying	
  dependency	
  database	
  
   –  Dynamic	
  portal	
  generaJon,	
  relevant	
  data,	
  all	
  tools	
  
Data	
  Sources	
  
                                      • External	
  URL	
  availability	
  and	
  latency	
  alerts	
  and	
  reports	
  –	
  Keynote	
  
     External	
  TesJng	
             • Stress	
  tesJng	
  -­‐	
  SOASTA	
  

                                      • Ne#lix	
  REST	
  calls	
  –	
  Chukwa	
  to	
  DataOven	
  with	
  GUID	
  transacJon	
  idenJfier	
  
 Request	
  Trace	
  Logging	
        • Generic	
  HTTP	
  –	
  AppDynamics	
  service	
  Jer	
  aggregaJon,	
  end	
  to	
  end	
  tracking	
  

                                      • Tracers	
  and	
  counters	
  –	
  log4j,	
  tracer	
  central,	
  Chukwa	
  to	
  DataOven	
  
   ApplicaJon	
  logging	
            • Trackid	
  and	
  Audit/Debug	
  logging	
  –	
  DataOven,	
  Appdynamics	
  	
  GUID	
  cross	
  reference	
  

                                      • ApplicaJon	
  specific	
  real	
  Jme	
  –	
  Nimso<,	
  Appdynamics,	
  Epic	
  
        JMX	
  	
  Metrics	
          • Service	
  and	
  SLA	
  percenJles	
  –	
  Nimso<,	
  Appdynamics,	
  Epic,logged	
  to	
  DataOven	
  

                                      • Stdout	
  logs	
  –	
  S3	
  –	
  DataOven,	
  Nimso<	
  alerJng	
  
Tomcat	
  and	
  Apache	
  logs	
     • Standard	
  format	
  Access	
  and	
  Error	
  logs	
  –	
  S3	
  –	
  DataOven,	
  Nimso<	
  AlerJng	
  

                                      • Garbage	
  CollecJon	
  –	
  Nimso<,	
  Appdynamics	
  
               JVM	
                  • Memory	
  usage,	
  call	
  stacks,	
  resource/call	
  -­‐	
  AppDynamics	
  

                                      • system	
  CPU/Net/RAM/Disk	
  metrics	
  –	
  AppDynamics,	
  Epic,	
  Nimso<	
  AlerJng	
  
              Linux	
                 • SNMP	
  metrics	
  –	
  Epic,	
  Network	
  flows	
  -­‐	
  FasJp	
  

                                      • Load	
  balancer	
  traffic	
  –	
  Amazon	
  Cloudwatch,	
  SimpleDB	
  usage	
  stats	
  
              AWS	
                   • System	
  configuraJon	
  	
  -­‐	
  CPU	
  count/speed	
  and	
  RAM	
  size,	
  overall	
  usage	
  -­‐	
  AWS	
  
Integrated	
  Dashboards	
  
Dashboards	
  Architecture	
  
•  Integrated	
  Dashboard	
  View	
  
    –  Single	
  web	
  page	
  containing	
  content	
  from	
  many	
  tools	
  
    –  Filtered	
  to	
  highlight	
  most	
  “interesJng”	
  data	
  
•  Relevance	
  Controller	
  
    –  Drill	
  in,	
  add	
  and	
  remove	
  content	
  interacJvely	
  
    –  Given	
  an	
  applicaJon,	
  alert	
  or	
  problem	
  area,	
  dynamically	
  
       build	
  a	
  dashboard	
  relevant	
  to	
  your	
  role	
  and	
  needs	
  
•  Dependency	
  and	
  Incident	
  Model	
  
    –  Model	
  Driven	
  -­‐	
  Interrogates	
  tools	
  and	
  AWS	
  APIs	
  
    –  Document	
  store	
  to	
  capture	
  dependency	
  tree	
  and	
  states	
  
Dashboard	
  Prototype	
  
  (not	
  everything	
  is	
  integrated	
  yet)	
  
AppDynamics	
  
        How	
  to	
  look	
  deep	
  inside	
  your	
  cloud	
  applicaJons	
  

•  AutomaJc	
  Monitoring	
  
   –  Base	
  AMI	
  bakes	
  in	
  all	
  monitoring	
  tools	
  
   –  Outbound	
  calls	
  only	
  –	
  no	
  discovery/polling	
  issues	
  
   –  InacJve	
  instances	
  removed	
  a<er	
  a	
  few	
  days	
  
   	
  
•  Incident	
  Alarms	
  (deviaJon	
  from	
  baseline)	
  
   –  Business	
  TransacJon	
  latency	
  and	
  error	
  rate	
  
   –  Alarm	
  thresholds	
  discover	
  their	
  own	
  baseline	
  
   –  Email	
  contains	
  URL	
  to	
  Incident	
  Workbench	
  UI	
  
Using	
  AppDynamics	
  
(simple	
  example	
  from	
  early	
  2010)	
  
Point	
  Finger	
  and	
  Assess	
  Impact	
  
 (an	
  async	
  S3	
  write	
  was	
  slow,	
  no	
  big	
  deal)	
  
Monitoring	
  Summary	
  
•  Broken	
  datacenter	
  oriented	
  tools	
  is	
  a	
  big	
  problem	
  

•  IntegraJng	
  many	
  different	
  tools	
  
     –  They	
  are	
  not	
  designed	
  to	
  be	
  integrated	
  
     –  We	
  have	
  “persuaded”	
  vendors	
  to	
  add	
  APIs	
  


•  If	
  you	
  can’t	
  see	
  deep	
  inside	
  your	
  app,	
  you’re	
  L	
  
Wrap	
  Up	
  
ImplicaJons	
  for	
  IT	
  OperaJons	
  
•  Cloud	
  is	
  run	
  by	
  developer	
  organizaJon	
  
    –  Our	
  IT	
  department	
  is	
  Amazon	
  Cloud	
  

•  Cloud	
  capacity	
  is	
  much	
  bigger	
  than	
  Datacenter	
  
    –  Datacenter	
  oriented	
  IT	
  staffing	
  is	
  flat	
  
    –  We	
  have	
  no	
  IT	
  staff	
  working	
  on	
  cloud	
  
    –  We	
  have	
  moved	
  3	
  people	
  out	
  of	
  IT	
  to	
  write	
  code	
  

•  TradiJonal	
  IT	
  Roles	
  are	
  going	
  away	
  
    –  Don’t	
  need	
  SA,	
  DBA,	
  Storage,	
  Network	
  admins	
  
Next	
  Few	
  Years…	
  
•  “System	
  of	
  Record”	
  moves	
  to	
  Cloud	
  (now)	
  
      –  Master	
  copies	
  of	
  data	
  live	
  only	
  in	
  the	
  cloud,	
  with	
  backups	
  
      –  Cut	
  the	
  datacenter	
  to	
  cloud	
  replicaJon	
  link	
  

•  InternaJonal	
  Expansion	
  –	
  Global	
  Clouds	
  (later	
  in	
  2011)	
  
      –  Rapid	
  deployments	
  to	
  new	
  markets	
  

•  Cloud	
  StandardizaJon?	
  
      –      Cloud	
  features	
  and	
  APIs	
  should	
  be	
  a	
  commodity	
  not	
  a	
  differenJator	
  
      –      DifferenJate	
  on	
  scale	
  and	
  quality	
  of	
  service	
  
      –      CompeJJon	
  also	
  drives	
  cost	
  down	
  
      –      Higher	
  resilience	
  and	
  scalability	
  

      	
  
      We	
  would	
  prefer	
  to	
  be	
  an	
  insignificant	
  customer	
  in	
  a	
  giant	
  cloud	
  
Takeaway	
  
                                	
  
Ne9lix	
  is	
  path-­‐finding	
  the	
  use	
  of	
  public	
  AWS	
  
 cloud	
  to	
  replace	
  in-­‐house	
  IT	
  for	
  non-­‐trivial	
  
applicaAons	
  with	
  hundreds	
  of	
  developers	
  and	
  
                  thousands	
  of	
  systems.	
  
                                	
  
                    acockcro<@ne#lix.com	
  
            h?p://www.linkedin.com/in/adriancockcro<	
  
                    @adrianco	
  #ne#lixcloud	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Amazon Web Service Sales Role Play - Case Study
Amazon Web Service Sales Role Play - Case StudyAmazon Web Service Sales Role Play - Case Study
Amazon Web Service Sales Role Play - Case StudyVineet Sood
 
Encryption and Key Management in AWS
Encryption and Key Management in AWSEncryption and Key Management in AWS
Encryption and Key Management in AWSAmazon Web Services
 
Defining Your Cloud Strategy
Defining Your Cloud StrategyDefining Your Cloud Strategy
Defining Your Cloud StrategyInternap
 
Google Cloud Platform
Google Cloud PlatformGoogle Cloud Platform
Google Cloud PlatformVMware Tanzu
 
AWS Security Strategy
AWS Security StrategyAWS Security Strategy
AWS Security StrategyTeri Radichel
 
Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon Web Services
 
20191127 AWS Black Belt Online Seminar Amazon CloudWatch Container Insights で...
20191127 AWS Black Belt Online Seminar Amazon CloudWatch Container Insights で...20191127 AWS Black Belt Online Seminar Amazon CloudWatch Container Insights で...
20191127 AWS Black Belt Online Seminar Amazon CloudWatch Container Insights で...Amazon Web Services Japan
 
Serverless Architecture on AWS
Serverless Architecture on AWSServerless Architecture on AWS
Serverless Architecture on AWSRajind Ruparathna
 
AWS 101: Introduction to AWS
AWS 101: Introduction to AWSAWS 101: Introduction to AWS
AWS 101: Introduction to AWSIan Massingham
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerAmazon Web Services
 
DevSecOps and the CI/CD Pipeline
 DevSecOps and the CI/CD Pipeline DevSecOps and the CI/CD Pipeline
DevSecOps and the CI/CD PipelineJames Wickett
 
AWS + Confluent: Better Together
AWS + Confluent: Better TogetherAWS + Confluent: Better Together
AWS + Confluent: Better Togetherconfluent
 
Migration to Alibaba Cloud
Migration to Alibaba CloudMigration to Alibaba Cloud
Migration to Alibaba CloudAlibaba Cloud
 
Turning Raw Data Into Gold With A Data Lakehouse.pptx
Turning Raw Data Into Gold With A Data Lakehouse.pptxTurning Raw Data Into Gold With A Data Lakehouse.pptx
Turning Raw Data Into Gold With A Data Lakehouse.pptxedwardoldham1
 
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理Amazon Web Services Japan
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsAdrian Cockcroft
 

Was ist angesagt? (20)

Amazon Web Service Sales Role Play - Case Study
Amazon Web Service Sales Role Play - Case StudyAmazon Web Service Sales Role Play - Case Study
Amazon Web Service Sales Role Play - Case Study
 
Encryption and Key Management in AWS
Encryption and Key Management in AWSEncryption and Key Management in AWS
Encryption and Key Management in AWS
 
Defining Your Cloud Strategy
Defining Your Cloud StrategyDefining Your Cloud Strategy
Defining Your Cloud Strategy
 
Google Cloud Platform
Google Cloud PlatformGoogle Cloud Platform
Google Cloud Platform
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
AWS Security Strategy
AWS Security StrategyAWS Security Strategy
AWS Security Strategy
 
Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for Kubernetes
 
20191127 AWS Black Belt Online Seminar Amazon CloudWatch Container Insights で...
20191127 AWS Black Belt Online Seminar Amazon CloudWatch Container Insights で...20191127 AWS Black Belt Online Seminar Amazon CloudWatch Container Insights で...
20191127 AWS Black Belt Online Seminar Amazon CloudWatch Container Insights で...
 
Serverless Architecture on AWS
Serverless Architecture on AWSServerless Architecture on AWS
Serverless Architecture on AWS
 
AWS 101: Introduction to AWS
AWS 101: Introduction to AWSAWS 101: Introduction to AWS
AWS 101: Introduction to AWS
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMaker
 
Deep dive - AWS Fargate
Deep dive - AWS FargateDeep dive - AWS Fargate
Deep dive - AWS Fargate
 
DevOps and AWS
DevOps and AWSDevOps and AWS
DevOps and AWS
 
What is AWS?
What is AWS?What is AWS?
What is AWS?
 
DevSecOps and the CI/CD Pipeline
 DevSecOps and the CI/CD Pipeline DevSecOps and the CI/CD Pipeline
DevSecOps and the CI/CD Pipeline
 
AWS + Confluent: Better Together
AWS + Confluent: Better TogetherAWS + Confluent: Better Together
AWS + Confluent: Better Together
 
Migration to Alibaba Cloud
Migration to Alibaba CloudMigration to Alibaba Cloud
Migration to Alibaba Cloud
 
Turning Raw Data Into Gold With A Data Lakehouse.pptx
Turning Raw Data Into Gold With A Data Lakehouse.pptxTurning Raw Data Into Gold With A Data Lakehouse.pptx
Turning Raw Data Into Gold With A Data Lakehouse.pptx
 
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
 

Andere mochten auch

Performance in The Cloud - AppDynamics
Performance in The Cloud - AppDynamicsPerformance in The Cloud - AppDynamics
Performance in The Cloud - AppDynamicstlevey
 
Netflix cloud architecture...continued
Netflix cloud architecture...continuedNetflix cloud architecture...continued
Netflix cloud architecture...continuedCloud Genius
 
Cloud Migration: Tales from the Trenches
Cloud Migration: Tales from the TrenchesCloud Migration: Tales from the Trenches
Cloud Migration: Tales from the TrenchesHostway|HOSTING
 
Migrating your Existing Applications to the Cloud
Migrating your Existing Applications to the CloudMigrating your Existing Applications to the Cloud
Migrating your Existing Applications to the CloudNestweaver
 
Planning the Migration to the Cloud - AWS India Summit 2012
Planning the Migration to the Cloud - AWS India Summit 2012Planning the Migration to the Cloud - AWS India Summit 2012
Planning the Migration to the Cloud - AWS India Summit 2012Amazon Web Services
 
Migration to Cloud - How difficult is it ? A sample migration scenario
Migration to Cloud - How difficult is it ? A sample migration scenarioMigration to Cloud - How difficult is it ? A sample migration scenario
Migration to Cloud - How difficult is it ? A sample migration scenarioSachin Agarwal
 
Netflix oss season 1 episode 3
Netflix oss season 1 episode 3 Netflix oss season 1 episode 3
Netflix oss season 1 episode 3 Ruslan Meshenberg
 
Delivering IaaS with Open Source Software
Delivering IaaS with Open Source SoftwareDelivering IaaS with Open Source Software
Delivering IaaS with Open Source SoftwareMark Hinkle
 
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsNetflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsChris Saint-Amant
 
Community IT Webinar - Cloud Migration Planning
Community IT Webinar - Cloud Migration PlanningCommunity IT Webinar - Cloud Migration Planning
Community IT Webinar - Cloud Migration PlanningCommunity IT Innovators
 

Andere mochten auch (12)

Performance in The Cloud - AppDynamics
Performance in The Cloud - AppDynamicsPerformance in The Cloud - AppDynamics
Performance in The Cloud - AppDynamics
 
Netflix cloud architecture...continued
Netflix cloud architecture...continuedNetflix cloud architecture...continued
Netflix cloud architecture...continued
 
Cloud Migration: Tales from the Trenches
Cloud Migration: Tales from the TrenchesCloud Migration: Tales from the Trenches
Cloud Migration: Tales from the Trenches
 
Cloud Migration
Cloud MigrationCloud Migration
Cloud Migration
 
Migrating your Existing Applications to the Cloud
Migrating your Existing Applications to the CloudMigrating your Existing Applications to the Cloud
Migrating your Existing Applications to the Cloud
 
Planning the Migration to the Cloud - AWS India Summit 2012
Planning the Migration to the Cloud - AWS India Summit 2012Planning the Migration to the Cloud - AWS India Summit 2012
Planning the Migration to the Cloud - AWS India Summit 2012
 
Migration to Cloud - How difficult is it ? A sample migration scenario
Migration to Cloud - How difficult is it ? A sample migration scenarioMigration to Cloud - How difficult is it ? A sample migration scenario
Migration to Cloud - How difficult is it ? A sample migration scenario
 
Migrating to Public Cloud
Migrating to Public CloudMigrating to Public Cloud
Migrating to Public Cloud
 
Netflix oss season 1 episode 3
Netflix oss season 1 episode 3 Netflix oss season 1 episode 3
Netflix oss season 1 episode 3
 
Delivering IaaS with Open Source Software
Delivering IaaS with Open Source SoftwareDelivering IaaS with Open Source Software
Delivering IaaS with Open Source Software
 
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsNetflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
 
Community IT Webinar - Cloud Migration Planning
Community IT Webinar - Cloud Migration PlanningCommunity IT Webinar - Cloud Migration Planning
Community IT Webinar - Cloud Migration Planning
 

Ähnlich wie Netflix Cloud Architecture at Qcon Tokyo 2011

Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Adrian Cockcroft
 
Netflix keynote-adrian-qcon
Netflix keynote-adrian-qconNetflix keynote-adrian-qcon
Netflix keynote-adrian-qconYiwei Ma
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumAdrian Cockcroft
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 
CommunityOneEast 09 - Running Java On Amazon EC2
CommunityOneEast 09 - Running Java On Amazon EC2CommunityOneEast 09 - Running Java On Amazon EC2
CommunityOneEast 09 - Running Java On Amazon EC2Chris Richardson
 
SD Forum Java SIG - Running Java Applications On Amazon EC2
SD Forum Java SIG - Running Java Applications On Amazon EC2SD Forum Java SIG - Running Java Applications On Amazon EC2
SD Forum Java SIG - Running Java Applications On Amazon EC2Chris Richardson
 
AWS IoT: From Testing to Scaling
AWS IoT: From Testing to ScalingAWS IoT: From Testing to Scaling
AWS IoT: From Testing to ScalingNeel Sendas
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Adrian Cockcroft
 
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...Amazon Web Services
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qconYiwei Ma
 
Razorfish Technology Summit 2012 - Introduction
Razorfish Technology Summit 2012 - IntroductionRazorfish Technology Summit 2012 - Introduction
Razorfish Technology Summit 2012 - IntroductionRazorfish
 
Serving Media From The Edge - Miles Ward - AWS Summit 2012 Australia
Serving Media From The Edge - Miles Ward - AWS Summit 2012 AustraliaServing Media From The Edge - Miles Ward - AWS Summit 2012 Australia
Serving Media From The Edge - Miles Ward - AWS Summit 2012 AustraliaAmazon Web Services
 
Amazon on Amazon: How Amazon Designs Chips on AWS (MFG305) - AWS re:Invent 2018
Amazon on Amazon: How Amazon Designs Chips on AWS (MFG305) - AWS re:Invent 2018Amazon on Amazon: How Amazon Designs Chips on AWS (MFG305) - AWS re:Invent 2018
Amazon on Amazon: How Amazon Designs Chips on AWS (MFG305) - AWS re:Invent 2018Amazon Web Services
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Adrian Cockcroft
 
Cloud computing workshop at IIT bombay
Cloud computing workshop at IIT bombayCloud computing workshop at IIT bombay
Cloud computing workshop at IIT bombayNilesh Satpute
 
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...Amazon Web Services
 
C# Client to Cloud
C# Client to CloudC# Client to Cloud
C# Client to CloudStuart Lodge
 
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018Zenko & MetalK8s @ Dublin Docker Meetup, June 2018
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018Laure Vergeron
 
Deploy Deep Learning Models on Amazon ECS - DevDay Austin 2017
Deploy Deep Learning Models on Amazon ECS - DevDay Austin 2017Deploy Deep Learning Models on Amazon ECS - DevDay Austin 2017
Deploy Deep Learning Models on Amazon ECS - DevDay Austin 2017Amazon Web Services
 

Ähnlich wie Netflix Cloud Architecture at Qcon Tokyo 2011 (20)

Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011
 
Netflix keynote-adrian-qcon
Netflix keynote-adrian-qconNetflix keynote-adrian-qcon
Netflix keynote-adrian-qcon
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV Forum
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
CommunityOneEast 09 - Running Java On Amazon EC2
CommunityOneEast 09 - Running Java On Amazon EC2CommunityOneEast 09 - Running Java On Amazon EC2
CommunityOneEast 09 - Running Java On Amazon EC2
 
Netflix in the Cloud
Netflix in the CloudNetflix in the Cloud
Netflix in the Cloud
 
SD Forum Java SIG - Running Java Applications On Amazon EC2
SD Forum Java SIG - Running Java Applications On Amazon EC2SD Forum Java SIG - Running Java Applications On Amazon EC2
SD Forum Java SIG - Running Java Applications On Amazon EC2
 
AWS IoT: From Testing to Scaling
AWS IoT: From Testing to ScalingAWS IoT: From Testing to Scaling
AWS IoT: From Testing to Scaling
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
 
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qcon
 
Razorfish Technology Summit 2012 - Introduction
Razorfish Technology Summit 2012 - IntroductionRazorfish Technology Summit 2012 - Introduction
Razorfish Technology Summit 2012 - Introduction
 
Serving Media From The Edge - Miles Ward - AWS Summit 2012 Australia
Serving Media From The Edge - Miles Ward - AWS Summit 2012 AustraliaServing Media From The Edge - Miles Ward - AWS Summit 2012 Australia
Serving Media From The Edge - Miles Ward - AWS Summit 2012 Australia
 
Amazon on Amazon: How Amazon Designs Chips on AWS (MFG305) - AWS re:Invent 2018
Amazon on Amazon: How Amazon Designs Chips on AWS (MFG305) - AWS re:Invent 2018Amazon on Amazon: How Amazon Designs Chips on AWS (MFG305) - AWS re:Invent 2018
Amazon on Amazon: How Amazon Designs Chips on AWS (MFG305) - AWS re:Invent 2018
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
 
Cloud computing workshop at IIT bombay
Cloud computing workshop at IIT bombayCloud computing workshop at IIT bombay
Cloud computing workshop at IIT bombay
 
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
 
C# Client to Cloud
C# Client to CloudC# Client to Cloud
C# Client to Cloud
 
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018Zenko & MetalK8s @ Dublin Docker Meetup, June 2018
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018
 
Deploy Deep Learning Models on Amazon ECS - DevDay Austin 2017
Deploy Deep Learning Models on Amazon ECS - DevDay Austin 2017Deploy Deep Learning Models on Amazon ECS - DevDay Austin 2017
Deploy Deep Learning Models on Amazon ECS - DevDay Austin 2017
 

Mehr von Adrian Cockcroft

Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesAdrian Cockcroft
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Adrian Cockcroft
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...Adrian Cockcroft
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Adrian Cockcroft
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowAdrian Cockcroft
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Adrian Cockcroft
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionAdrian Cockcroft
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAdrian Cockcroft
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSFAdrian Cockcroft
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud ArchitectureAdrian Cockcroft
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformAdrian Cockcroft
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSAdrian Cockcroft
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconAdrian Cockcroft
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Adrian Cockcroft
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Adrian Cockcroft
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connectAdrian Cockcroft
 

Mehr von Adrian Cockcroft (20)

Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
 
Gluecon keynote
Gluecon keynoteGluecon keynote
Gluecon keynote
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
Netflix and Open Source
Netflix and Open SourceNetflix and Open Source
Netflix and Open Source
 
NetflixOSS Meetup
NetflixOSS MeetupNetflixOSS Meetup
NetflixOSS Meetup
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at Netflix
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source Platform
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connect
 

Kürzlich hochgeladen

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Kürzlich hochgeladen (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Netflix Cloud Architecture at Qcon Tokyo 2011

  • 1. Ne#lix  Cloud  Architecture   Qcon  Tokyo  April  12,  2011   Adrian  Cockcro<   @adrianco  #ne#lixcloud  h?p://slideshare.net/adrianco   acockcro<@ne#lix.com  
  • 2. Who,  Why,  What   Ne#lix  in  the  Cloud   Cloud  Challenges  and  Learnings   Systems  and  OperaJons  Architecture    
  • 3. Ne#lix  Inc.   With  more  than  20  million  subscribers  in  the  United   States  and  Canada,  Ne9lix,  Inc.  is  the  world’s  leading   Internet  subscripAon  service  for  enjoying  movies  and   TV  shows.     InternaAonal  Expansion   We  plan  to  expand  into  an  addiAonal  market  in  the   second  half  of  2011…  If  the  second  market  meets  our   expectaAons…  we  will  conAnue  to  invest  and  expand   aggressively  in  2012.   Source:  h?p://ir.ne#lix.com  
  • 4. Unlimited  streaming  for  $7.99/month,  large  and  growing  catalog  of  movies  and  TV  
  • 5. Adrian  Cockcro<   •  Director,  Architecture  for  Cloud  Systems,  Ne#lix  Inc.   –  Previously  Director  for  PersonalizaJon  Pla#orm   •  DisJnguished  Availability  Engineer,  eBay  Inc.  2004-­‐7   –  Founding  member  of  eBay  Research  Labs   •  DisJnguished  Engineer,  Sun  Microsystems  Inc.  1988-­‐2004   –  2003-­‐4  Chief  Architect  High  Performance  Technical  CompuJng   –  2001  Author:  Capacity  Planning  for  Web  Services   –  1999  Author:  Resource  Management   –  1995  &  1998  Author:  Sun  Performance  and  Tuning   –  1996  Japanese  EdiJon  of  Sun  Performance  and  Tuning   •   SPARC  &  Solaris ( )  
  • 6. Why  is  Ne#lix  Talking  about   Cloud?  
  • 7. Ne#lix  is  Path-­‐finding   The  Cloud  ecosystem  is  evolving  very  fast   Share  with  and  learn  from  the  cloud  community  
  • 8. We  want  to  use  clouds,   not  build  them   Cloud  technology  should  be  a  commodity   Public  cloud  and  open  source  for  agility  and  scale  
  • 9. Why  Use  Cloud?     For  Be?er  Business  Agility   For  Unpredictable  Business  Growth  
  • 10. Data  Center   Ne#lix  could  not   build  new   datacenters  fast   enough   Capacity  growth  is  acceleraJng,  unpredictable   Product  launch  spikes  -­‐  iPhone,  Wii,  PS3,  XBox  
  • 11. 20  Million  Customers   2010-­‐Q3  year/year  +52%  Total  and  +145%  Streaming   25   20   15   10   5   0   2009Q2  2009Q3   2009Q4   2010Q1   2010Q2   2010Q3   2010Q4   Source:  h?p://ir.ne#lix.com  
  • 12. Out-­‐Growing  Data  Center   h?p://techblog.ne#lix.com/2011/02/redesigning-­‐ne#lix-­‐api.html   37x  Growth  Jan   2010-­‐Jan  2011   Datacenter   Capacity  
  • 13. Ne#lix.com  is  now  ~100%  Cloud   Account  sign-­‐up  is  currently  being  moved  to  cloud   All  internaJonal  product  will  be  cloud  based   USA  specific  logisJcs  remains  in  the  Datacenter    
  • 14. Leverage  AWS  Scale   “the  biggest  public  cloud”   AWS  investment  in  tooling  and  automaJon   Use  many  AWS  zones  for  high  availability,  scalability   AWS  skills  are  most  common  on  resumes…  
  • 15. Leverage  AWS  Feature  Set   “the  market  leader”   EC2,  S3,  SDB,  SQS,  EBS,  EMR,  ELB,  ASG,  IAM,  RDB,  VPC…   h?p://aws.amazon.com/jp  
  • 16. Amazon Cloud Terminology See http://aws.amazon.com/jp for Japanese This is not a full list of Amazon Web Service features •  AWS  –  Amazon  Web  Services  (common  name  for  Amazon  cloud)   •  AMI  –  Amazon  Machine  Image  (archived  boot  disk,  Linux,  Windows  etc.  plus  applicaJon  code)   •  EC2  –  ElasJc  Compute  Cloud   –  Range  of  virtual  machine  types  m1,  m2,  c1,  cc,  cg.  Varying  memory,  CPU  and  disk  configuraJons.   –  Instance  –  a  running  computer  system.  Ephemeral,  when  it  is  de-­‐allocated  nothing  is  kept.   –  Reserved  Instances  –  pre-­‐paid  to  reduce  cost  for  long  term  usage   –  Availability  Zone  –  datacenter  with  own  power  and  cooling  hosJng  cloud  instances   –  Region  –  group  of  Availability  Zones  –  US-­‐East,  US-­‐West,  EU-­‐Eire,  Asia-­‐Singapore,  Asia-­‐Japan   •  ASG  –  Auto  Scaling  Group  (instances  booJng  from  the  same  AMI)   •  S3  –  Simple  Storage  Service  (h?p  access)   •  EBS  –  ElasJc  Block  Storage  (network  disk  filesystem  can  be  mounted  on  an  instance)   •  RDB  –  RelaJonal  Data  Base  (managed  MySQL  master  and  slaves)   •  SDB  –  Simple  Data  Base  (hosted  h?p  based  NoSQL  data  store)   •  SQS  –  Simple  Queue  Service  (h?p  based  message  queue)   •  SNS  –  Simple  NoJficaJon  Service  (h?p  and  email  based  topics  and  messages)   •  EMR  –  ElasJc  Map  Reduce  (automaJcally  managed  Hadoop  cluster)   •  ELB  –  ElasJc  Load  Balancer   •  EIP  –  ElasJc  IP  (stable  IP  address  mapping  assigned  to  instance  or  ELB)   •  VPC  –  Virtual  Private  Cloud  (extension  of  enterprise  datacenter  network  into  cloud)   •  IAM  –  IdenJty  and  Access  Management  (fine  grain  role  based  security  keys)  
  • 17. “The  cloud  lets  its  users  focus   on  delivering  differenAaAng   business  value  instead  of   wasAng  valuable  resources   on  the  undifferen)ated   heavy  li0ing  that  makes   up  most  of  IT   infrastructure.”      Werner  Vogels    Amazon  CTO    
  • 18. We  want  to  use  clouds,   we  don’t  have  Jme  to  build  them   Public  cloud  for  agility  and  scale   AWS  because  they  are  big  enough  to  allocate  thousands   of  instances  per  hour  when  we  need  to  
  • 19. Ne#lix  EC2  Instances  per  Account   (summer  2010,  producJon  is  much  higher  now…)   “Many  Thousands”   Content  Encoding   Test  and  ProducJon   Log  Analysis   “Several  Months”  
  • 20. Ne#lix  Deployed  on  AWS   Content   Logs   Play   WWW   API   Video   S3   DRM   Search   Metadata   Masters   EMR   CDN   Movie   Device   EC2   Hadoop   rouJng   Choosing   Config   TV  Movie   S3   Hive   Bookmarks   RaJngs   Choosing   Business   Mobile   CDN   Logging   Similars   Intelligence   iPhone  
  • 21. Cloud  Encoding  Pipeline   Encode   S3   Encode   S3   Movie   Master   Network   S3   Copy  to   CDN   Stream   Studios   Ne#lix   Master   Mezza-­‐ Mezza-­‐ to    50+   Origin   Origin   Tapes   Upload   nine   files   CDN   to  TV   nine   files   Licensed  content  is  provided  to  Ne#lix  as  high  quality  master  tapes   Many  formats  are  reduced  to  a  single  high  quality  mezzanine  format  on  S3   Individual  formats  and  speeds  are  encoded  in  over  50  combinaJons    Many  formats  for  older  and  newer  hardware  and  various  game  consoles    Many  speeds  from  mobile  through  standard  and  high  definiJon   StaJc  files  are  copied  to  each  Content  Delivery  Network’s  “origin  server”   CDNs  migrate  files  to  “edge  servers”  near  the  end  user   Files  stream  to  PC/Mac/iPad  or  TV  over  HTTP  using  “range  get”  to  move  chunks  
  • 23. Product  Trade-­‐off   User  Experience   ImplementaJon   Consistent   Development   Experience   complexity   OperaJonal   Low  Latency   complexity  
  • 24. Ne#lix  Cloud  Goals   •  Faster   –  Lower  latency  than  the  equivalent  datacenter  web  pages  and  API  calls   –  Measured  as  mean  and  99th  percenJle   –  For  both  first  hit  (e.g.  home  page)  and  in-­‐session  hits  for  the  same  user   •  Scalable   –  Avoid  needing  any  more  datacenter  capacity  as  subscriber  count  increases   –  No  central  verJcally  scaled  databases   –  Leverage  AWS  elasJc  capacity  effecJvely   •  Available   –  SubstanJally  higher  robustness  and  availability  than  datacenter  services   –  Leverage  mulJple  AWS  availability  zones   –  No  scheduled  down  Jme,  no  central  database  schema  to  change   •  ProducJve   –  OpJmize  agility  of  a  large  development  team  with  automaJon  and  tools   –  Leave  behind  complex  tangled  datacenter  code  base  (~8  year  old  architecture)   –  Enforce  clean  layered  interfaces  and  re-­‐usable  components  
  • 25. Old  Datacenter  vs.  New  Cloud  Arch   Central  SQL  Database   Distributed  Key/Value  NoSQL   SJcky  In-­‐Memory  Session   Shared  Memcached  Session   Cha?y  Protocols   Latency  Tolerant  Protocols   Tangled  Service  Interfaces   Layered  Service  Interfaces   Instrumented  Code   Instrumented  Service  Pa?erns   Fat  Complex  Objects   Lightweight  Serializable  Objects   Components  as  Jar  Files   Components  as  Services  
  • 26. Learnings   •  Datacenter  oriented  tools  don’t  work   –  Ephemeral  instances   –  High  rate  of  change   –  Need  too  much  hand-­‐holding  and  manual  setup   •  Cloud  Tools  Don’t  Scale  for  Enterprise   –  Too  many  tools  are  “Startup”  oriented   –  Built  our  own  tools  for  1000’s  of  instances   –  Drove  vendors  to  be  dynamic,  scale,  add  APIs   •  Un-­‐modified  Datacenter  Apps  are  Fragile   –  Too  many  datacenter  oriented  assumpJons   –  We  re-­‐wrote  our  code  base!   –  (We  re-­‐write  it  conJnuously  anyway)  
  • 28. API   AWS  EC2   Front  End  Load  Balancer   Discovery   Service   API  Proxy   API  etc.   Load  Balancer   Component   API   SQS   Services   Oracl e   Oracle   Oracle   memcached   memcached   ReplicaJon   EBS   Ne?lix   S3   Data  Center   AWS  Storage   SimpleDB  
  • 29. Database  MigraJon   •  Why  SimpleDB?   –  No  DBA’s  in  the  cloud,  Amazon  hosted  service   –  Work  started  two  years  ago,  fewer  viable  opJons   –  Worked  with  Amazon  to  speed  up  and  scale  SimpleDB   •  AlternaJves?   –  Rolling  out  Cassandra  as  “upgrade”  from  SimpleDB   –  Need  several  opJons  to  match  use  cases  well   •  Detailed  NoSQL  and  SimpleDB  Advice   –  Sid  Anand    -­‐  QConSF  Nov  5th  –  Ne#lix’  TransiJon  to  High   Availability  Storage  Systems   –  Blog  -­‐  h?p://pracJcalcloudcompuJng.com/   –  Download  Paper  PDF  -­‐  h?p://bit.ly/bhOTLu  
  • 30. Cloud  OperaJons   Model  Driven  Architecture   Capacity  Planning  &  Monitoring  
  • 31. Tools  and  AutomaJon   •  Developer  and  Build  Tools   –  Jira,  Perforce,  Eclipse,  Jeeves,  Ivy,  ArJfactory   –  Builds,  creates  .war  file,  .rpm,  bakes  AMI  and  launches   •  Custom  Ne#lix  ApplicaJon  Console   –  AWS  Features  at  Enterprise  Scale  (hide  the  AWS  security  keys!)   –  Auto  Scaler  Group  is  unit  of  deployment  to  producJon   •  Open  Source  +  Support   –  Apache,  Tomcat,  Cassandra,  Hadoop,  OpenJDK,  CentOS   •  Monitoring  Tools   –  Keynote  –  service  monitoring  and  alerJng   –  AppDynamics  –  Developer  focus  for  cloud  h?p://appdynamics.com   –  EpicNMS  –  flexible  data  collecJon  and  plots  h?p://epicnms.com   –  Nimso<  NMS  –  ITOps  focus  for  Datacenter  +  Cloud  alerJng  
  • 32. Model  Driven  Architecture   •  Datacenter  PracJces   –  Lots  of  unique  hand-­‐tweaked  systems   –  Hard  to  enforce  pa?erns   •  Model  Driven  Cloud  Architecture   –  Perforce/Ivy/Jeeves  based  builds  for  everything   –  Every  producJon  instance  is  a  pre-­‐baked  AMI   –  Every  applicaJon  is  managed  by  an  Autoscaler   No  excep)ons,  every  change  is  a  new  AMI  
  • 33. High  Availability  Zones   •  Each  zone  is  a  separate  datacenter   –  Private  power,  cooling,  network  connecJons   –  Located  close  together  for  low  latency   •  ASG  Instances  are  distributed  over  3  zones   •  Data  wri?en  to  one  zone  appears  in  all  zones   •  Ne#lix  can  survive  total  failure  of  one  zone   –  Increase  capacity  of  exisJng  zones  by  50%   –  Small  or  zero  downJme  
  • 34. Region  MigraJon   (Ne#lix  is  working  to  have  this  in  place  during  2011,  for  internaJonal  roll-­‐out   and  disaster  recovery)   •  Data  is  backed  up  into  a  different  cloud  region   –  Cloud  bandwidth  is  much  higher  than  Datacenter   •  Restore  to  a  new  region   –  “A  few  hours”  to  load  data  and  create  databases   •  Create  model  driven  architecture   –  “A  few  hours”  to  create  service  instances  and  test   •  Send  traffic  to  new  region   –  Setup  DNS  records  and  start  customer  service  
  • 35. Model  Driven  ImplicaJons   •  Automated  “Least  Privilege”  Security   –  Tightly  specified  security  groups   –  Fine  grain  IAM  keys  to  access  AWS  resources   –  Performance  tools  security  and  integraJon   •  Model  Driven  Performance  Monitoring   –  Hundreds  of  instances  appear  in  a  few  minutes…   –  Tools  have  to  “garbage  collect”  dead  instances    
  • 37. Auto  Scale  Group  ConfiguraJon  
  • 38. Capacity  Planning  &  Monitoring  
  • 39. Capacity  Planning  in  Clouds   (a  few  things  have  changed…)   •  Capacity  is  expensive   •  Capacity  takes  Jme  to  buy  and  provision   •  Capacity  only  increases,  can’t  be  shrunk  easily   •  Capacity  comes  in  big  chunks,  paid  up  front   •  Planning  errors  can  cause  big  problems   •  Systems  are  clearly  defined  assets   •  Systems  can  be  instrumented  in  detail   •  Depreciate  assets  over  3  years  (reservaJons!)  
  • 40. Monitoring  Issues   •  Problem   –  Too  many  tools,  each  with  a  good  reason  to  exist   –  Hard  to  get  an  integrated  view  of  a  problem   –  Too  much  manual  work  building  dashboards   –  Tools  are  not  discoverable,  views  are  not  filtered   •  SoluJon   –  Get  vendors  to  add  deep  linking  URLs  and  APIs   –  IntegraJon  “portal”  Jes  everything  together   –  Underlying  dependency  database   –  Dynamic  portal  generaJon,  relevant  data,  all  tools  
  • 41. Data  Sources   • External  URL  availability  and  latency  alerts  and  reports  –  Keynote   External  TesJng   • Stress  tesJng  -­‐  SOASTA   • Ne#lix  REST  calls  –  Chukwa  to  DataOven  with  GUID  transacJon  idenJfier   Request  Trace  Logging   • Generic  HTTP  –  AppDynamics  service  Jer  aggregaJon,  end  to  end  tracking   • Tracers  and  counters  –  log4j,  tracer  central,  Chukwa  to  DataOven   ApplicaJon  logging   • Trackid  and  Audit/Debug  logging  –  DataOven,  Appdynamics    GUID  cross  reference   • ApplicaJon  specific  real  Jme  –  Nimso<,  Appdynamics,  Epic   JMX    Metrics   • Service  and  SLA  percenJles  –  Nimso<,  Appdynamics,  Epic,logged  to  DataOven   • Stdout  logs  –  S3  –  DataOven,  Nimso<  alerJng   Tomcat  and  Apache  logs   • Standard  format  Access  and  Error  logs  –  S3  –  DataOven,  Nimso<  AlerJng   • Garbage  CollecJon  –  Nimso<,  Appdynamics   JVM   • Memory  usage,  call  stacks,  resource/call  -­‐  AppDynamics   • system  CPU/Net/RAM/Disk  metrics  –  AppDynamics,  Epic,  Nimso<  AlerJng   Linux   • SNMP  metrics  –  Epic,  Network  flows  -­‐  FasJp   • Load  balancer  traffic  –  Amazon  Cloudwatch,  SimpleDB  usage  stats   AWS   • System  configuraJon    -­‐  CPU  count/speed  and  RAM  size,  overall  usage  -­‐  AWS  
  • 43. Dashboards  Architecture   •  Integrated  Dashboard  View   –  Single  web  page  containing  content  from  many  tools   –  Filtered  to  highlight  most  “interesJng”  data   •  Relevance  Controller   –  Drill  in,  add  and  remove  content  interacJvely   –  Given  an  applicaJon,  alert  or  problem  area,  dynamically   build  a  dashboard  relevant  to  your  role  and  needs   •  Dependency  and  Incident  Model   –  Model  Driven  -­‐  Interrogates  tools  and  AWS  APIs   –  Document  store  to  capture  dependency  tree  and  states  
  • 44. Dashboard  Prototype   (not  everything  is  integrated  yet)  
  • 45. AppDynamics   How  to  look  deep  inside  your  cloud  applicaJons   •  AutomaJc  Monitoring   –  Base  AMI  bakes  in  all  monitoring  tools   –  Outbound  calls  only  –  no  discovery/polling  issues   –  InacJve  instances  removed  a<er  a  few  days     •  Incident  Alarms  (deviaJon  from  baseline)   –  Business  TransacJon  latency  and  error  rate   –  Alarm  thresholds  discover  their  own  baseline   –  Email  contains  URL  to  Incident  Workbench  UI  
  • 46. Using  AppDynamics   (simple  example  from  early  2010)  
  • 47. Point  Finger  and  Assess  Impact   (an  async  S3  write  was  slow,  no  big  deal)  
  • 48. Monitoring  Summary   •  Broken  datacenter  oriented  tools  is  a  big  problem   •  IntegraJng  many  different  tools   –  They  are  not  designed  to  be  integrated   –  We  have  “persuaded”  vendors  to  add  APIs   •  If  you  can’t  see  deep  inside  your  app,  you’re  L  
  • 50. ImplicaJons  for  IT  OperaJons   •  Cloud  is  run  by  developer  organizaJon   –  Our  IT  department  is  Amazon  Cloud   •  Cloud  capacity  is  much  bigger  than  Datacenter   –  Datacenter  oriented  IT  staffing  is  flat   –  We  have  no  IT  staff  working  on  cloud   –  We  have  moved  3  people  out  of  IT  to  write  code   •  TradiJonal  IT  Roles  are  going  away   –  Don’t  need  SA,  DBA,  Storage,  Network  admins  
  • 51. Next  Few  Years…   •  “System  of  Record”  moves  to  Cloud  (now)   –  Master  copies  of  data  live  only  in  the  cloud,  with  backups   –  Cut  the  datacenter  to  cloud  replicaJon  link   •  InternaJonal  Expansion  –  Global  Clouds  (later  in  2011)   –  Rapid  deployments  to  new  markets   •  Cloud  StandardizaJon?   –  Cloud  features  and  APIs  should  be  a  commodity  not  a  differenJator   –  DifferenJate  on  scale  and  quality  of  service   –  CompeJJon  also  drives  cost  down   –  Higher  resilience  and  scalability     We  would  prefer  to  be  an  insignificant  customer  in  a  giant  cloud  
  • 52. Takeaway     Ne9lix  is  path-­‐finding  the  use  of  public  AWS   cloud  to  replace  in-­‐house  IT  for  non-­‐trivial   applicaAons  with  hundreds  of  developers  and   thousands  of  systems.     acockcro<@ne#lix.com   h?p://www.linkedin.com/in/adriancockcro<   @adrianco  #ne#lixcloud