SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Global	
  Ne)lix	
  
                                                                                      	
  
Replacing	
  Datacenter	
  Oracle	
  with	
  Global	
  Apache	
  Cassandra	
  on	
  AWS


                         October	
  24th,	
  2011	
  
                          Adrian	
  Cockcro6	
  
                       @adrianco	
  #ne9lixcloud	
  
               h=p://www.linkedin.com/in/adriancockcro6	
  
Ne9lix	
  Inc.	
  
     With	
  over	
  25	
  million	
  members	
  in	
  the	
  United	
  States,	
  
     Canada	
  and	
  La8n	
  America,	
  Ne<lix,	
  Inc.	
  is	
  the	
  world's	
  
       leading	
  Internet	
  subscrip8on	
  service	
  for	
  enjoying	
  
                             movies	
  and	
  TV	
  shows.	
  
                                         	
  
                           Interna8onal	
  Expansion	
  
        Ne<lix,	
  Inc.,	
  the	
  leading	
  global	
  Internet	
  movie	
  
     subscrip8on	
  service,	
  today	
  announced	
  it	
  will	
  expand	
  
       to	
  the	
  United	
  Kingdom	
  and	
  Ireland	
  in	
  early	
  2012.	
  

Source:	
  h=p://ir.ne9lix.com	
  
Building	
  a	
  Global	
  Ne9lix	
  Service	
  

              Ne9lix	
  Cloud	
  MigraLon	
  
           Highly	
  Available	
  and	
  Globally	
  
                 Distributed	
  Data	
  
           Scalability	
  and	
  Performance	
  
Why	
  Use	
  Public	
  Cloud?	
  
Things	
  We	
  Don’t	
  Do	
  
Be=er	
  Business	
  Agility	
  
Data	
  Center	
                  Ne9lix	
  could	
  not	
  
                                     build	
  new	
  
                                  datacenters	
  fast	
  
                                      enough	
  

  Capacity	
  growth	
  is	
  acceleraLng,	
  unpredictable	
  
  Product	
  launch	
  spikes	
  -­‐	
  iPhone,	
  Wii,	
  PS3,	
  XBox	
  
Out-­‐Growing	
  Data	
  Center	
  
             h=p://techblog.ne9lix.com/2011/02/redesigning-­‐ne9lix-­‐api.html   	
  


                               37x	
  Growth	
  Jan	
  
                               2010-­‐Jan	
  2011	
  


Datacenter	
  
Capacity	
  
Ne9lix.com	
  is	
  now	
  ~100%	
  Cloud	
  
    A	
  few	
  small	
  back	
  end	
  data	
  sources	
  sLll	
  in	
  progress	
  
            All	
  internaLonal	
  product	
  is	
  cloud	
  based	
  
     USA	
  specific	
  logisLcs	
  remains	
  in	
  the	
  Datacenter	
  
  Working	
  aggressively	
  on	
  billing,	
  PCI	
  compliance	
  on	
  AWS	
  
Ne9lix	
  Choice	
  was	
  AWS	
  with	
  our	
  
   own	
  pla9orm	
  and	
  tools	
  
     Unique	
  pla9orm	
  requirements	
  and	
  
       extreme	
  agility	
  and	
  flexibility	
  
Leverage	
  AWS	
  Scale	
  
   “the	
  biggest	
  public	
  cloud”	
  
 AWS	
  investment	
  in	
  features	
  and	
  automaLon	
  
Use	
  AWS	
  zones	
  and	
  regions	
  for	
  high	
  availability,	
  
         scalability	
  and	
  global	
  deployment	
  
We	
  want	
  to	
  use	
  clouds,	
  
we	
  don’t	
  have	
  Lme	
  to	
  build	
  them	
  
                  Public	
  cloud	
  for	
  agility	
  and	
  scale	
  
 AWS	
  because	
  they	
  are	
  big	
  enough	
  to	
  allocate	
  thousands	
  
           of	
  instances	
  per	
  hour	
  when	
  we	
  need	
  to	
  
Ne9lix	
  Deployed	
  on	
  AWS	
  

Content	
            Logs	
             Play	
          WWW	
             API	
  
    Video	
  
                           S3	
            DRM	
          Sign-­‐Up	
     Metadata	
  
   Masters	
  


                        EMR	
              CDN	
                            Device	
  
     EC2	
                                                Search	
  
                       Hadoop	
           rouLng	
                          Config	
  


                                                          Movie	
         TV	
  Movie	
  
      S3	
               Hive	
         Bookmarks	
  
                                                         Choosing	
       Choosing	
  

                       Business	
                                          Mobile	
  
     CDN	
                               Logging	
        RaLngs	
  
                     Intelligence	
                                        iPhone	
  
Datacenter	
  AnL-­‐Pa=erns	
  

 What	
  did	
  we	
  do	
  in	
  the	
  datacenter	
  
that	
  prevented	
  us	
  from	
  meeLng	
  our	
  
                      goals?	
  
                           	
  
Old	
  Datacenter	
  vs.	
  New	
  Cloud	
  Arch	
  
    Central	
  SQL	
  Database	
          Distributed	
  Key/Value	
  NoSQL	
  

 SLcky	
  In-­‐Memory	
  Session	
         Shared	
  Memcached	
  Session	
  

       Cha=y	
  Protocols	
                 Latency	
  Tolerant	
  Protocols	
  

 Tangled	
  Service	
  Interfaces	
         Layered	
  Service	
  Interfaces	
  

     Instrumented	
  Code	
              Instrumented	
  Service	
  Pa=erns	
  

    Fat	
  Complex	
  Objects	
          Lightweight	
  Serializable	
  Objects	
  

  Components	
  as	
  Jar	
  Files	
         Components	
  as	
  Services	
  
The	
  Central	
  SQL	
  Database	
  
•  Datacenter	
  has	
  central	
  Oracle	
  databases	
  
   –  Everything	
  in	
  one	
  place	
  is	
  convenient	
  unLl	
  it	
  fails	
  
   –  Customers,	
  movies,	
  history,	
  configuraLon	
  
•  Schema	
  changes	
  require	
  downLme	
  
                              	
  
    An8-­‐paOern	
  impacts	
  scalability,	
  availability	
  
The	
  Distributed	
  Key-­‐Value	
  Store	
  
•  Cloud	
  has	
  many	
  key-­‐value	
  data	
  stores	
  
    –  More	
  complex	
  to	
  keep	
  track	
  of,	
  do	
  backups	
  etc.	
  
    –  Each	
  store	
  is	
  much	
  simpler	
  to	
  administer	
  
    –  Joins	
  take	
  place	
  in	
  java	
  code	
                     DBA	
  
•  No	
  schema	
  to	
  change,	
  no	
  scheduled	
  downLme	
  
•  Latency	
  for	
  typical	
  queries	
  
    –  Memcached	
  is	
  dominated	
  by	
  network	
  latency	
  <1ms	
  
    –  Cassandra	
  replicaLon	
  takes	
  a	
  few	
  milliseconds	
  
    –  Oracle	
  for	
  simple	
  queries	
  is	
  a	
  few	
  milliseconds	
  
    –  SimpleDB	
  replicaLon	
  and	
  REST	
  auth	
  overheads	
  >10ms	
  
Data	
  MigraLon	
  to	
  Cassandra	
  
TransiLonal	
  Steps	
  
•  BidirecLonal	
  ReplicaLon	
  
   –  Oracle	
  to	
  SimpleDB	
  
   –  Queued	
  reverse	
  path	
  using	
  SQS	
  
   –  Backups	
  remain	
  in	
  Datacenter	
  via	
  Oracle	
  
•  New	
  Cloud-­‐Only	
  Data	
  Sources	
  
   –  Cassandra	
  based	
  
   –  No	
  replicaLon	
  to	
  Datacenter	
  
   –  Backups	
  performed	
  in	
  the	
  cloud	
  
API	
  
AWS	
  EC2	
  
                                            Front	
  End	
  Load	
  Balancer	
  
             Discovery	
  
              Service	
                               API	
  Proxy	
                              API	
  etc.	
  

                                                   Load	
  Balancer	
  


          Component	
                                      API	
               SQS	
  
           Services	
                                                                           Oracl
                                                                                                 e	
  
                                                                                                 Oracle	
  
                                                                                                       Oracle	
  
Cassandra	
             memcached	
                                            ReplicaLon	
  
                                                            memcached	
  
           EC2	
  
         Internal	
  
           Disks	
  

                                                                                                Ne)lix	
  
                                   S3	
                                                         Data	
  Center	
  
                                                                         SimpleDB	
  
Cuong	
  the	
  Umbilical	
  
•  TransiLon	
  Oracle	
  Data	
  Sources	
  to	
  Cassandra	
  
    –  Offload	
  Datacenter	
  Oracle	
  hardware	
  
    –  Free	
  up	
  capacity	
  for	
  growth	
  of	
  remaining	
  services	
  
•  TransiLon	
  SimpleDB+Memcached	
  to	
  Cassandra	
  
    –  Primary	
  data	
  sources	
  that	
  need	
  backup	
  
    –  Keep	
  simplest	
  small	
  use	
  cases	
  for	
  now	
  
•  New	
  challenges	
  
    –  Backup,	
  restore,	
  archive,	
  business	
  conLnuity	
  
    –  Business	
  Intelligence	
  integraLon	
  
API	
  
AWS	
  EC2	
  
                                   Front	
  End	
  Load	
  Balancer	
  
            Discovery	
  
             Service	
                        API	
  Proxy	
  

                                          Load	
  Balancer	
  


          Component	
                             API	
  
           Services	
  



                 memcached	
                  Cassandra	
  
                                                              EC2	
  
                                                            Internal	
  
                                                              Disks	
  

                                 Backup	
  
                   S3	
  
                                                                           SimpleDB	
  
High	
  Availability	
  
•  Cassandra	
  stores	
  3	
  local	
  copies,	
  1	
  per	
  zone	
  
       –  Synchronous	
  access,	
  durable,	
  highly	
  available	
  
       –  Read/Write	
  One	
  fastest,	
  least	
  consistent	
  -­‐	
  ~1ms	
  
       –  Read/Write	
  Quorum	
  2	
  of	
  3,	
  consistent	
  -­‐	
  ~3ms	
  
•  AWS	
  Availability	
  Zones	
  
       –  Separate	
  buildings	
  
       –  Separate	
  power	
  etc.	
  
       –  Close	
  together	
  
	
  
Cassandra	
  Write	
  Data	
  Flows	
  
                         Single	
  Region,	
  MulLple	
  Availability	
  Zone	
  

                                                              Cassandra	
  
                                                              • Disks	
  
                                                              • Zone	
  A	
  
                                                             2	
                 2	
  
                                                                       4	
   2	
  
1.  Client	
  Writes	
  to	
  any	
     Cassandra	
  3	
                                 3	
  
                                                                                          Cassandra	
         If	
  a	
  node	
  goes	
  offline,	
  
    Cassandra	
  Node	
                 • Disks	
   5                                     • Disks	
   5	
     hinted	
  handoff	
  
2.  Coordinator	
  Node	
               • Zone	
  C	
                 1                   • Zone	
  A	
       completes	
  the	
  write	
  
    replicates	
  to	
  nodes	
                                                                               when	
  the	
  node	
  comes	
  
    and	
  Zones	
                                                                                            back	
  up.	
  
3.  Nodes	
  return	
  ack	
  to	
                           Clients	
                                        	
  
    coordinator	
                                                                                             Requests	
  can	
  choose	
  to	
  
4.  Coordinator	
  returns	
                                                                3	
               wait	
  for	
  one	
  node,	
  a	
  
                                        Cassandra	
                                       Cassandra	
  
    ack	
  to	
  client	
               • Disks	
                                         • Disks	
   5	
     quorum,	
  or	
  all	
  nodes	
  to	
  
5.  Data	
  wri=en	
  to	
              • Zone	
  C	
                                     • Zone	
  B	
       ack	
  the	
  write	
  
    internal	
  commit	
  log	
                                                                               	
  
    disk	
                                                    Cassandra	
                                     SSTable	
  disk	
  writes	
  and	
  
                                                              • Disks	
  
                                                              • Zone	
  B	
  
                                                                                                              compacLons	
  occur	
  
                                                                                                              asynchronously	
  
Data	
  Flows	
  for	
  MulL-­‐Region	
  Writes	
  
                                    Consistency	
  Level	
  =	
  Local	
  Quorum	
  

1.  Client	
  Writes	
  to	
  any	
                                                If	
  a	
  node	
  or	
  region	
  goes	
  offline,	
  hinted	
  handoff	
  
    Cassandra	
  Node	
                                                            completes	
  the	
  write	
  when	
  the	
  node	
  comes	
  back	
  up.	
  
2.  Coordinator	
  node	
  replicates	
                                            Nightly	
  global	
  compare	
  and	
  repair	
  jobs	
  ensure	
  
    to	
  other	
  nodes	
  Zones	
  and	
                                         everything	
  stays	
  consistent.	
  
    regions	
  
3.  Local	
  write	
  acks	
  returned	
  to	
  
    coordinator	
                                                                                                             100+ms	
  latency	
  
                                                                                    Cassandra	
  
                                                                                                       2                                                          7	
  
4.  Client	
  gets	
  ack	
  when	
  2	
  of	
  3	
  
                                                                                                                                                                  Cassandra	
  
                                                                                    •  Disks	
                                                                    •  Disks	
   8	
  
                                                                                    2	
           2	
  
                                                                                    •  Zone	
  A	
  
                                                                                          4	
   2	
                                                               6	
   6	
  
                                                                                                                                                                  •  Zone	
  A	
  

    local	
  nodes	
  are	
  commi=ed	
                 Cassandra	
  
                                                                           3	
                              3	
  
                                                                                                           Cassandra	
                            7	
  
                                                                                                                                               Cassandra	
                             Cassandra	
  
                                                                  5	
                                                         5	
  
5.  Data	
  wri=en	
  to	
  internal	
                                                                                                                    8	
  
                                                        •  Disks	
                                         •  Disks	
                          •  Disks	
                              •  Disks	
  
                                                        •  Zone	
  C	
                                     •  Zone	
  A	
                      •  Zone	
  C	
                          •  Zone	
  A	
  
                                                                                             1	
  
    commit	
  log	
  disks	
                                                         US	
                                                                           EU	
  
6.  When	
  data	
  arrives,	
  remote	
                                           Clients	
                                                                      Clients	
  
                                                        Cassandra	
                                              3	
  
                                                                                                           Cassandra	
                         Cassandra	
                             7	
  
                                                                                                                                                                                       Cassandra	
  
    node	
  replicates	
  data	
                        •  Disks	
  
                                                        •  Zone	
  C	
  
                                                                                                           •  Disks	
  
                                                                                                           •  Zone	
  B	
     5	
  
                                                                                                                                               •  Disks	
  
                                                                                                                                               •  Zone	
  C	
  
                                                                                                                                                                                       •  Disks	
  
                                                                                                                                                                                       •  Zone	
  B	
   8	
  

7.  Ack	
  direct	
  to	
  source	
  region	
                                       Cassandra	
                                                                    Cassandra	
  

    coordinator	
  
                                                                                    •  Disks	
                                                                     •  Disks	
  
                                                                                    •  Zone	
  B	
                                                                 •  Zone	
  B	
  



8.  Remote	
  copies	
  wri=en	
  to	
  
    commit	
  log	
  disks	
  
Remote	
  Copies	
  
•  Cassandra	
  duplicates	
  across	
  AWS	
  regions	
  
    –  Asynchronous	
  write,	
  replicates	
  at	
  desLnaLon	
  
    –  Doesn’t	
  directly	
  affect	
  local	
  read/write	
  latency	
  
•  Global	
  Coverage	
  
    –  Business	
  agility	
  
    –  Follow	
  AWS…	
  
•  Local	
  Access	
                                        3
                                                        3
    –  Be=er	
  latency	
               3
                                                                            3
    –  Fault	
  IsolaLon	
  
    	
  
Cassandra	
  Backup	
  
•  Full	
  Backup	
                                                                       Cassandra	
  



    –  Time	
  based	
  snapshot	
  
                                                                   Cassandra	
                                   Cassandra	
  




    –  SSTable	
  compress	
  -­‐>	
  S3	
         Cassandra	
                                                                   Cassandra	
  



•  Incremental	
                                                                            S3	
  

    –  SSTable	
  write	
  triggers	
           Cassandra	
  
                                                                                          Backup	
  
                                                                                                                                    Cassandra	
  


       compressed	
  copy	
  to	
  S3	
  
•  ConLnuous	
  OpLon	
                                   Cassandra	
                                                     Cassandra	
  




    –  Scrape	
  commit	
  log	
                                              Cassandra	
             Cassandra	
  



    –  Write	
  to	
  EBS	
  every	
  30s	
  
Cassandra	
  Restore	
  
•  Full	
  Restore	
                                                                   Cassandra	
  

                                                                Cassandra	
                                   Cassandra	
  

    –  Replace	
  previous	
  data	
  
•  New	
  Ring	
  from	
  Backup	
              Cassandra	
                                                                   Cassandra	
  




    –  New	
  name	
  old	
  data	
                                                      S3	
  
                                                                                       Backup	
  
                                             Cassandra	
                                                                         Cassandra	
  

•  Scripted	
  
    –  Create	
  new	
  instances	
                    Cassandra	
                                                     Cassandra	
  



    –  Parallel	
  load	
  -­‐	
  fast	
                                   Cassandra	
             Cassandra	
  
Cassandra	
  Online	
  AnalyLcs	
  
•  Brisk	
  =	
  Hadoop	
  +	
  Cass	
                                                   Cassandra	
  

                                                                 Brisk	
                                        Cassandra	
  

    –  Use	
  split	
  Brisk	
  ring	
  
    –  Size	
  each	
  separately	
              Brisk	
                                                                        Cassandra	
  




•  Direct	
  Access	
                                                                      S3	
  
                                                                                         Backup	
  
                                           Cassandra	
                                                                             Cassandra	
  

    –  Keyspaces	
  
    –  Hive/Pig/Map-­‐Reduce	
                       Cassandra	
                                                         Cassandra	
  


    –  Hdfs	
  as	
  a	
  keyspace	
                                         Cassandra	
             Cassandra	
  


    –  Distributed	
  namenode	
  
Cassandra	
  Archive	
  
                     Appropriate	
  level	
  of	
  paranoia	
  needed…                	
  
•  Archive	
  could	
  be	
  un-­‐readable	
  
     –  Restore	
  S3	
  backups	
  weekly	
  from	
  prod	
  to	
  test	
  

•  Archive	
  could	
  be	
  stolen	
  
     –  PGP	
  Encrypt	
  archive	
  

•  AWS	
  East	
  Region	
  could	
  have	
  a	
  problem	
  
     –  Copy	
  data	
  to	
  AWS	
  West	
  

•  ProducLon	
  AWS	
  Account	
  could	
  have	
  an	
  issue	
  
     –  Separate	
  Archive	
  account	
  with	
  no-­‐delete	
  S3	
  ACL	
  

•  AWS	
  S3	
  could	
  have	
  a	
  global	
  problem	
  
     –  Create	
  an	
  extra	
  copy	
  on	
  a	
  different	
  cloud	
  vendor	
  
Tools	
  and	
  AutomaLon	
  
•  Developer	
  and	
  Build	
  Tools	
  
      –  Jira,	
  Perforce,	
  Eclipse,	
  Jenkins,	
  Ivy,	
  ArLfactory	
  
      –  Builds,	
  creates	
  .war	
  file,	
  .rpm,	
  bakes	
  AMI	
  and	
  launches	
  

•  Custom	
  Ne9lix	
  ApplicaLon	
  Console	
  
      –  AWS	
  Features	
  at	
  Enterprise	
  Scale	
  (hide	
  the	
  AWS	
  security	
  keys!)	
  
      –  Auto	
  Scaler	
  Group	
  is	
  unit	
  of	
  deployment	
  to	
  producLon	
  

•  Open	
  Source	
  +	
  Support	
  
      –  Apache,	
  Tomcat,	
  Cassandra,	
  Hadoop,	
  OpenJDK,	
  CentOS	
  
      –  Datastax	
  support	
  for	
  Cassandra,	
  AWS	
  support	
  for	
  Hadoop	
  via	
  EMR	
  

•  Monitoring	
  Tools	
  
      –  Datastax	
  Opscenter	
  for	
  monitoring	
  Cassandra	
  
      –  AppDynamics	
  –	
  Developer	
  focus	
  for	
  cloud	
  h=p://appdynamics.com	
  
Developer	
  MigraLon	
  
•  Detailed	
  SQL	
  to	
  NoSQL	
  TransiLon	
  Advice	
  
   –  Sid	
  Anand	
  	
  -­‐	
  QConSF	
  Nov	
  5th	
  –	
  Ne9lix’	
  TransiLon	
  
      to	
  High	
  Availability	
  Storage	
  Systems	
  
   –  Blog	
  -­‐	
  h=p://pracLcalcloudcompuLng.com/	
  
   –  Download	
  Paper	
  PDF	
  -­‐	
  h=p://bit.ly/bhOTLu	
  
•  Mark	
  Atwood,	
  "Guide	
  to	
  NoSQL,	
  redux”	
  
   –  YouTube	
  h=p://youtu.be/zAbFRiyT3LU	
  
Cloud	
  OperaLons	
  

   Cassandra	
  Use	
  Cases	
  
Model	
  Driven	
  Architecture	
  
Performance	
  and	
  Scalability	
  
Cassandra	
  Use	
  Cases	
  
•  Key	
  by	
  Customer	
  –	
  Cross-­‐region	
  clusters	
  
     –  Many	
  app	
  specific	
  Cassandra	
  clusters,	
  read-­‐intensive	
  
     –  Keys+Rows	
  in	
  memory	
  using	
  m2.4xl	
  Instances	
  

•  Key	
  by	
  Customer:Movie	
  –	
  e.g.	
  Viewing	
  History	
  
     –  Growing	
  fast,	
  write	
  intensive	
  –	
  m1.xl	
  instances	
  
     –  Keys	
  cached	
  in	
  memory,	
  one	
  cluster	
  per	
  region	
  

•  Large	
  scale	
  data	
  logging	
  –	
  lots	
  of	
  writes	
  
     –  Column	
  data	
  expires	
  a6er	
  Lme	
  period	
  
     –  Distributed	
  counters,	
  one	
  cluster	
  per	
  region	
  
Model	
  Driven	
  Architecture	
  
•  Datacenter	
  PracLces	
  
   –  Lots	
  of	
  unique	
  hand-­‐tweaked	
  systems	
  
   –  Hard	
  to	
  enforce	
  pa=erns	
  

•  Model	
  Driven	
  Cloud	
  Architecture	
  
   –  Perforce/Ivy/Jenkins	
  based	
  builds	
  for	
  everything	
  
   –  Every	
  producLon	
  instance	
  is	
  a	
  pre-­‐baked	
  AMI	
  
   –  Every	
  applicaLon	
  is	
  managed	
  by	
  an	
  Autoscaler	
  

                       Every	
  change	
  is	
  a	
  new	
  AMI	
  
Ne9lix	
  Pla9orm	
  Cassandra	
  AMI	
  
•  Tomcat	
  server	
  
   –  Always	
  running,	
  registers	
  with	
  pla9orm	
  
   –  Manages	
  Cassandra	
  state,	
  tokens,	
  backups	
  
•  Removed	
  Root	
  Disk	
  Dependency	
  on	
  EBS	
  
   –  Use	
  S3	
  backed	
  AMI	
  for	
  stateful	
  services	
  
   –  Normally	
  use	
  EBS	
  backed	
  AMI	
  for	
  fast	
  provisioning	
  
Chaos	
  Monkey	
  
•  Make	
  sure	
  systems	
  are	
  resilient	
  
    –  Allow	
  any	
  instance	
  to	
  fail	
  without	
  customer	
  impact	
  
•  Chaos	
  Monkey	
  hours	
  
    –  Monday-­‐Thursday	
  9am-­‐3pm	
  random	
  instance	
  kill	
  
•  ApplicaLon	
  configuraLon	
  opLon	
  
    –  Apps	
  now	
  have	
  to	
  opt-­‐out	
  from	
  Chaos	
  Monkey	
  
•  Computers	
  (Datacenter	
  or	
  AWS)	
  randomly	
  die	
  
    –  Fact	
  of	
  life,	
  but	
  too	
  infrequent	
  to	
  test	
  resiliency	
  
AppDynamics	
  Monitoring	
  of	
  Cassandra	
  –	
  AutomaLc	
  Discovery	
  
Ne9lix	
  ContribuLons	
  to	
  Cassandra	
  
•  Cassandra	
  as	
  a	
  mutable	
  toolkit	
  
    –  Cassandra	
  is	
  in	
  Java,	
  pluggable,	
  well	
  structured	
  
    –  Ne9lix	
  has	
  a	
  building	
  full	
  of	
  Java	
  engineers….	
  

•  Actual	
  ContribuLons	
  delivered	
  in	
  0.8	
  
    –  First	
  prototype	
  of	
  off-­‐heap	
  row	
  cache	
  
    –  Incremental	
  backup	
  SSTable	
  write	
  callback	
  

•  Work	
  In	
  Progress	
  
    –  AWS	
  integraLon	
  and	
  backup	
  using	
  Tomcat	
  helper	
  
    –  Astyanax	
  re-­‐write	
  of	
  Hector	
  Java	
  client	
  library	
  
Performance	
  TesLng	
  
•  Cloud	
  Based	
  TesLng	
  –	
  fricLonless,	
  elasLc	
  
    –  Create/destroy	
  any	
  sized	
  cluster	
  in	
  minutes	
  
    –  Many	
  test	
  scenarios	
  run	
  in	
  parallel	
  

•  Test	
  Scenarios	
  
    –  Internal	
  app	
  specific	
  tests	
  
    –  Simple	
  “stress”	
  tool	
  provided	
  with	
  Cassandra	
  

•  Scale	
  test,	
  keep	
  making	
  the	
  cluster	
  bigger	
  
    –  Check	
  that	
  tooling	
  and	
  automaLon	
  works…	
  
    –  How	
  many	
  ten	
  column	
  row	
  writes/sec	
  can	
  we	
  do?	
  
<DrEvil>ONE	
  MILLION</DrEvil>	
  
Scale-­‐Up	
  Linearity	
  
                        Client	
  Writes/s	
  by	
  node	
  count	
  –	
  ReplicaEon	
  Factor	
  =	
  3	
  
1200000	
  
                                                                                                   1099837	
  
1000000	
  

 800000	
  

 600000	
  
                                                              537172	
  
 400000	
                                        366828	
  

 200000	
                           174373	
  

        0	
  
                0	
             50	
         100	
        150	
            200	
     250	
        300	
          350	
  
Per	
  Node	
  AcLvity	
  
          Per	
  Node	
               48	
  Nodes	
         96	
  Nodes	
         144	
  Nodes	
           288	
  Nodes	
  
Per	
  Server	
  Writes/s	
           10,900	
  w/s	
       11,460	
  w/s	
          11,900	
  w/s	
            11,456	
  w/s	
  
Mean	
  Server	
  Latency	
            0.0117	
  ms	
        0.0134	
  ms	
           0.0148	
  ms	
             0.0139	
  ms	
  
Mean	
  CPU	
  %Busy	
                      74.4	
  %	
           75.4	
  %	
              72.5	
  %	
                81.5	
  %	
  
Disk	
  Read	
                        5,600	
  KB/s	
       4,590	
  KB/s	
          4,060	
  KB/s	
            4,280	
  KB/s	
  
Disk	
  Write	
                      12,800	
  KB/s	
   11,590	
  KB/s	
            10,380	
  KB/s	
           10,080	
  KB/s	
  
Network	
  Read	
                    22,460	
  KB/s	
   23,610	
  KB/s	
            21,390	
  KB/s	
           23,640	
  KB/s	
  
Network	
  Write	
                   18,600	
  KB/s	
   19,600	
  KB/s	
            17,810	
  KB/s	
           19,770	
  KB/s	
  


           Node	
  specificaLon	
  –	
  Xen	
  Virtual	
  Images,	
  AWS	
  US	
  East,	
  three	
  zones	
  
           •  Cassandra	
  0.8.6,	
  CentOS,	
  SunJDK6	
  
           •  AWS	
  EC2	
  m1	
  Extra	
  Large	
  –	
  Standard	
  price	
  $	
  0.68/Hour	
  
           •  15	
  GB	
  RAM,	
  4	
  Cores,	
  1Gbit	
  network	
  
           •  4	
  internal	
  disks	
  (total	
  1.6TB,	
  striped	
  together,	
  md,	
  XFS)	
  
Time	
  is	
  Money	
  
                                   48	
  nodes	
        96	
  nodes	
                  144	
  nodes	
                      288	
  nodes	
  
Writes	
  Capacity	
              174373	
  w/s	
       366828	
  w/s	
                   537172	
  w/s	
                1,099,837	
  w/s	
  
Storage	
  Capacity	
                  12.8	
  TB	
           25.6	
  TB	
                         38.4	
  TB	
                        76.8	
  TB	
  
Nodes	
  Cost/hr	
                      $32.64	
                $65.28	
                            $97.92	
                          $195.84	
  
Test	
  Driver	
  Instances	
                  10	
                      20	
                                30	
                               60	
  
Test	
  Driver	
  Cost/hr	
             $20.00	
                $40.00	
                            $60.00	
                          $120.00	
  
Cross	
  AZ	
  Traffic	
                 5	
  TB/hr	
         10	
  TB/hr	
                       15	
  TB/hr	
                       301	
  TB/hr	
  
Traffic	
  Cost/10min	
                     $8.33	
               $16.66	
                            $25.00	
                            $50.00	
  
Setup	
  DuraLon	
                15	
  minutes	
       22	
  minutes	
                    31	
  minutes	
                    662	
  minutes	
  
AWS	
  Billed	
  DuraLon	
                    1hr	
                    1hr	
                              1	
  hr	
                          2	
  hr	
  
Total	
  Test	
  Cost	
                 $60.97	
             $121.94	
                           $182.92	
                            $561.68	
  
                                                         1	
  EsLmate	
  two	
  thirds	
  of	
  total	
  network	
  traffic	
  	
  
                                                         2	
  Workaround	
  for	
  a	
  tooling	
  bug	
  slowed	
  setup	
  
Takeaway	
  
                                  	
  
    Ne<lix	
  is	
  using	
  Cassandra	
  on	
  AWS	
  as	
  a	
  key	
  	
  
          infrastructure	
  component	
  of	
  its	
  globally	
  
               distributed	
  streaming	
  product.	
  
                                  	
  
Also,	
  benchmarking	
  in	
  the	
  cloud	
  is	
  fast,	
  cheap	
  and	
  
                                scalable	
  
                                  	
  
             h=p://www.linkedin.com/in/adriancockcro6	
  
                     @adrianco	
  #ne9lixcloud	
  
                     acockcro6@ne9lix.com	
  
Amazon Cloud Terminology Reference
     See http://aws.amazon.com/ This is not a full list of Amazon Web Service features

•    AWS	
  –	
  Amazon	
  Web	
  Services	
  (common	
  name	
  for	
  Amazon	
  cloud)	
  
•    AMI	
  –	
  Amazon	
  Machine	
  Image	
  (archived	
  boot	
  disk,	
  Linux,	
  Windows	
  etc.	
  plus	
  applicaLon	
  code)	
  
•    EC2	
  –	
  ElasLc	
  Compute	
  Cloud	
  
       –    Range	
  of	
  virtual	
  machine	
  types	
  m1,	
  m2,	
  c1,	
  cc,	
  cg.	
  Varying	
  memory,	
  CPU	
  and	
  disk	
  configuraLons.	
  
       –    Instance	
  –	
  a	
  running	
  computer	
  system.	
  Ephemeral,	
  when	
  it	
  is	
  de-­‐allocated	
  nothing	
  is	
  kept.	
  
       –    Reserved	
  Instances	
  –	
  pre-­‐paid	
  to	
  reduce	
  cost	
  for	
  long	
  term	
  usage	
  
       –    Availability	
  Zone	
  –	
  datacenter	
  with	
  own	
  power	
  and	
  cooling	
  hosLng	
  cloud	
  instances	
  
       –    Region	
  –	
  group	
  of	
  Availability	
  Zones	
  –	
  US-­‐East,	
  US-­‐West,	
  EU-­‐Eire,	
  Asia-­‐Singapore,	
  Asia-­‐Japan	
  
•    ASG	
  –	
  Auto	
  Scaling	
  Group	
  (instances	
  booLng	
  from	
  the	
  same	
  AMI)	
  
•    S3	
  –	
  Simple	
  Storage	
  Service	
  (h=p	
  access)	
  
•    EBS	
  –	
  ElasLc	
  Block	
  Storage	
  (network	
  disk	
  filesystem	
  can	
  be	
  mounted	
  on	
  an	
  instance)	
  
•    RDS	
  –	
  RelaLonal	
  Database	
  Service	
  (managed	
  MySQL	
  master	
  and	
  slaves)	
  
•    SDB	
  –	
  Simple	
  Data	
  Base	
  (hosted	
  h=p	
  based	
  NoSQL	
  data	
  store)	
  
•    SQS	
  –	
  Simple	
  Queue	
  Service	
  (h=p	
  based	
  message	
  queue)	
  
•    SNS	
  –	
  Simple	
  NoLficaLon	
  Service	
  (h=p	
  and	
  email	
  based	
  topics	
  and	
  messages)	
  
•    EMR	
  –	
  ElasLc	
  Map	
  Reduce	
  (automaLcally	
  managed	
  Hadoop	
  cluster)	
  
•    ELB	
  –	
  ElasLc	
  Load	
  Balancer	
  
•    EIP	
  –	
  ElasLc	
  IP	
  (stable	
  IP	
  address	
  mapping	
  assigned	
  to	
  instance	
  or	
  ELB)	
  
•    VPC	
  –	
  Virtual	
  Private	
  Cloud	
  (extension	
  of	
  enterprise	
  datacenter	
  network	
  into	
  cloud)	
  
•    IAM	
  –	
  IdenLty	
  and	
  Access	
  Management	
  (fine	
  grain	
  role	
  based	
  security	
  keys)	
  

Weitere ähnliche Inhalte

Was ist angesagt?

AmebaPico 裏側の技術やAWSの利用について
AmebaPico 裏側の技術やAWSの利用についてAmebaPico 裏側の技術やAWSの利用について
AmebaPico 裏側の技術やAWSの利用についてKohei Morino
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconAdrian Cockcroft
 
Netflix Cloud Platform Building Blocks
Netflix Cloud Platform Building BlocksNetflix Cloud Platform Building Blocks
Netflix Cloud Platform Building BlocksSudhir Tonse
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowAdrian Cockcroft
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionAdrian Cockcroft
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumAdrian Cockcroft
 
Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!Sid Anand
 
Netflix keynote-adrian-qcon
Netflix keynote-adrian-qconNetflix keynote-adrian-qcon
Netflix keynote-adrian-qconYiwei Ma
 
Millicomputing Ignite Talk
Millicomputing Ignite TalkMillicomputing Ignite Talk
Millicomputing Ignite TalkAdrian Cockcroft
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAdrian Cockcroft
 
Intuit CTOF 2011 - Netflix for Mobile in the Cloud
Intuit CTOF 2011 - Netflix for Mobile in the CloudIntuit CTOF 2011 - Netflix for Mobile in the Cloud
Intuit CTOF 2011 - Netflix for Mobile in the CloudSid Anand
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Adrian Cockcroft
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Adrian Cockcroft
 
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 TutorialCloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 TutorialStuart Charlton
 
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...Amazon Web Services
 
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012Amazon Web Services
 

Was ist angesagt? (20)

AmebaPico 裏側の技術やAWSの利用について
AmebaPico 裏側の技術やAWSの利用についてAmebaPico 裏側の技術やAWSの利用について
AmebaPico 裏側の技術やAWSの利用について
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
 
Netflix Cloud Platform Building Blocks
Netflix Cloud Platform Building BlocksNetflix Cloud Platform Building Blocks
Netflix Cloud Platform Building Blocks
 
Netflix in the Cloud
Netflix in the CloudNetflix in the Cloud
Netflix in the Cloud
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV Forum
 
Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!
 
Netflix keynote-adrian-qcon
Netflix keynote-adrian-qconNetflix keynote-adrian-qcon
Netflix keynote-adrian-qcon
 
Millicomputing Ignite Talk
Millicomputing Ignite TalkMillicomputing Ignite Talk
Millicomputing Ignite Talk
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at Netflix
 
Intuit CTOF 2011 - Netflix for Mobile in the Cloud
Intuit CTOF 2011 - Netflix for Mobile in the CloudIntuit CTOF 2011 - Netflix for Mobile in the Cloud
Intuit CTOF 2011 - Netflix for Mobile in the Cloud
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
 
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 TutorialCloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
 
Gluecon keynote
Gluecon keynoteGluecon keynote
Gluecon keynote
 
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...
 
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
 

Andere mochten auch

AddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFSAddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFSDataStax Academy
 
Firebird Scalability, by Dmitry Yemanov (in English)
Firebird Scalability, by Dmitry Yemanov (in English)Firebird Scalability, by Dmitry Yemanov (in English)
Firebird Scalability, by Dmitry Yemanov (in English)Alexey Kovyazin
 
Firebird migration: from Firebird 1.5 to Firebird 2.5
Firebird migration: from Firebird 1.5 to Firebird 2.5Firebird migration: from Firebird 1.5 to Firebird 2.5
Firebird migration: from Firebird 1.5 to Firebird 2.5Alexey Kovyazin
 
Firebird's Big Databases (in English)
Firebird's Big Databases (in English)Firebird's Big Databases (in English)
Firebird's Big Databases (in English)Alexey Kovyazin
 
FBScanner: IBSurgeon's tool to solve all types of performance problems with F...
FBScanner: IBSurgeon's tool to solve all types of performance problems with F...FBScanner: IBSurgeon's tool to solve all types of performance problems with F...
FBScanner: IBSurgeon's tool to solve all types of performance problems with F...Alexey Kovyazin
 
Fail-Safe Cluster for FirebirdSQL and something more
Fail-Safe Cluster for FirebirdSQL and something moreFail-Safe Cluster for FirebirdSQL and something more
Fail-Safe Cluster for FirebirdSQL and something moreAlexey Kovyazin
 
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...DataStax Academy
 
Cassandra Performance Benchmark
Cassandra Performance BenchmarkCassandra Performance Benchmark
Cassandra Performance BenchmarkBigstep
 
Life with big Firebird databases
Life with big Firebird databasesLife with big Firebird databases
Life with big Firebird databasesAlexey Kovyazin
 
High-load performance testing: Firebird 2.5, 3.0, 4.0
High-load performance testing:  Firebird 2.5, 3.0, 4.0High-load performance testing:  Firebird 2.5, 3.0, 4.0
High-load performance testing: Firebird 2.5, 3.0, 4.0Alexey Kovyazin
 
Resolving Firebird performance problems
Resolving Firebird performance problemsResolving Firebird performance problems
Resolving Firebird performance problemsAlexey Kovyazin
 
Inside Hulu's Data platform (BigDataCamp LA 2013)
Inside Hulu's Data platform (BigDataCamp LA 2013)Inside Hulu's Data platform (BigDataCamp LA 2013)
Inside Hulu's Data platform (BigDataCamp LA 2013)Prasan Samtani
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterDataStax Academy
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud ArchitectureAdrian Cockcroft
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 

Andere mochten auch (15)

AddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFSAddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFS
 
Firebird Scalability, by Dmitry Yemanov (in English)
Firebird Scalability, by Dmitry Yemanov (in English)Firebird Scalability, by Dmitry Yemanov (in English)
Firebird Scalability, by Dmitry Yemanov (in English)
 
Firebird migration: from Firebird 1.5 to Firebird 2.5
Firebird migration: from Firebird 1.5 to Firebird 2.5Firebird migration: from Firebird 1.5 to Firebird 2.5
Firebird migration: from Firebird 1.5 to Firebird 2.5
 
Firebird's Big Databases (in English)
Firebird's Big Databases (in English)Firebird's Big Databases (in English)
Firebird's Big Databases (in English)
 
FBScanner: IBSurgeon's tool to solve all types of performance problems with F...
FBScanner: IBSurgeon's tool to solve all types of performance problems with F...FBScanner: IBSurgeon's tool to solve all types of performance problems with F...
FBScanner: IBSurgeon's tool to solve all types of performance problems with F...
 
Fail-Safe Cluster for FirebirdSQL and something more
Fail-Safe Cluster for FirebirdSQL and something moreFail-Safe Cluster for FirebirdSQL and something more
Fail-Safe Cluster for FirebirdSQL and something more
 
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
 
Cassandra Performance Benchmark
Cassandra Performance BenchmarkCassandra Performance Benchmark
Cassandra Performance Benchmark
 
Life with big Firebird databases
Life with big Firebird databasesLife with big Firebird databases
Life with big Firebird databases
 
High-load performance testing: Firebird 2.5, 3.0, 4.0
High-load performance testing:  Firebird 2.5, 3.0, 4.0High-load performance testing:  Firebird 2.5, 3.0, 4.0
High-load performance testing: Firebird 2.5, 3.0, 4.0
 
Resolving Firebird performance problems
Resolving Firebird performance problemsResolving Firebird performance problems
Resolving Firebird performance problems
 
Inside Hulu's Data platform (BigDataCamp LA 2013)
Inside Hulu's Data platform (BigDataCamp LA 2013)Inside Hulu's Data platform (BigDataCamp LA 2013)
Inside Hulu's Data platform (BigDataCamp LA 2013)
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra Cluster
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 

Ähnlich wie Global Netflix Migrates Data from Oracle to Apache Cassandra on AWS

Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qconYiwei Ma
 
Cloud computing with AWS
Cloud computing with AWS Cloud computing with AWS
Cloud computing with AWS ikanow
 
Cloudian_Cassandra Summit 2012
Cloudian_Cassandra Summit 2012Cloudian_Cassandra Summit 2012
Cloudian_Cassandra Summit 2012CLOUDIAN KK
 
The Netflix Open Source Platform
The Netflix Open Source PlatformThe Netflix Open Source Platform
The Netflix Open Source PlatformRuslan Meshenberg
 
O'Reilly Webcast: Architecting Applications For The Cloud
O'Reilly Webcast: Architecting Applications For The CloudO'Reilly Webcast: Architecting Applications For The Cloud
O'Reilly Webcast: Architecting Applications For The CloudO'Reilly Media
 
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...IndicThreads
 
Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013MassTLC
 
Oracle cloud environment architecture orientation
Oracle cloud environment  architecture orientationOracle cloud environment  architecture orientation
Oracle cloud environment architecture orientationOsama Abdullah
 
Cloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWebCloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWebjineshvaria
 
Amazon Ec2 Application Design
Amazon Ec2 Application DesignAmazon Ec2 Application Design
Amazon Ec2 Application Designguestd0b61e
 
Running High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWSRunning High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWSAcquia
 
OSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesOSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesMatt Ray
 
Oracle Solutions on AWS : May 2014
Oracle Solutions on AWS : May 2014Oracle Solutions on AWS : May 2014
Oracle Solutions on AWS : May 2014Tom Laszewski
 
On Metal - The Future Of Hybrid Cloud
On Metal - The Future Of Hybrid CloudOn Metal - The Future Of Hybrid Cloud
On Metal - The Future Of Hybrid CloudRackspace Asia
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your StartupAmazon Web Services
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
Serving Media From The Edge - Miles Ward - AWS Summit 2012 Australia
Serving Media From The Edge - Miles Ward - AWS Summit 2012 AustraliaServing Media From The Edge - Miles Ward - AWS Summit 2012 Australia
Serving Media From The Edge - Miles Ward - AWS Summit 2012 AustraliaAmazon Web Services
 
Daneyon Hansen - Intro to OpenStack - Feb13 OpenStack Denver Meetup
Daneyon Hansen - Intro to OpenStack - Feb13 OpenStack Denver MeetupDaneyon Hansen - Intro to OpenStack - Feb13 OpenStack Denver Meetup
Daneyon Hansen - Intro to OpenStack - Feb13 OpenStack Denver MeetupShannon McFarland
 

Ähnlich wie Global Netflix Migrates Data from Oracle to Apache Cassandra on AWS (20)

Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qcon
 
Cloud computing with AWS
Cloud computing with AWS Cloud computing with AWS
Cloud computing with AWS
 
Cloudian_Cassandra Summit 2012
Cloudian_Cassandra Summit 2012Cloudian_Cassandra Summit 2012
Cloudian_Cassandra Summit 2012
 
The Netflix Open Source Platform
The Netflix Open Source PlatformThe Netflix Open Source Platform
The Netflix Open Source Platform
 
O'Reilly Webcast: Architecting Applications For The Cloud
O'Reilly Webcast: Architecting Applications For The CloudO'Reilly Webcast: Architecting Applications For The Cloud
O'Reilly Webcast: Architecting Applications For The Cloud
 
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
 
Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013
 
Oracle cloud environment architecture orientation
Oracle cloud environment  architecture orientationOracle cloud environment  architecture orientation
Oracle cloud environment architecture orientation
 
Cloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWebCloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWeb
 
Amazon Ec2 Application Design
Amazon Ec2 Application DesignAmazon Ec2 Application Design
Amazon Ec2 Application Design
 
Netflix and Open Source
Netflix and Open SourceNetflix and Open Source
Netflix and Open Source
 
Running High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWSRunning High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWS
 
OSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesOSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best Practices
 
Oracle Solutions on AWS : May 2014
Oracle Solutions on AWS : May 2014Oracle Solutions on AWS : May 2014
Oracle Solutions on AWS : May 2014
 
On Metal - The Future Of Hybrid Cloud
On Metal - The Future Of Hybrid CloudOn Metal - The Future Of Hybrid Cloud
On Metal - The Future Of Hybrid Cloud
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your Startup
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
Serving Media From The Edge - Miles Ward - AWS Summit 2012 Australia
Serving Media From The Edge - Miles Ward - AWS Summit 2012 AustraliaServing Media From The Edge - Miles Ward - AWS Summit 2012 Australia
Serving Media From The Edge - Miles Ward - AWS Summit 2012 Australia
 
Daneyon Hansen - Intro to OpenStack - Feb13 OpenStack Denver Meetup
Daneyon Hansen - Intro to OpenStack - Feb13 OpenStack Denver MeetupDaneyon Hansen - Intro to OpenStack - Feb13 OpenStack Denver Meetup
Daneyon Hansen - Intro to OpenStack - Feb13 OpenStack Denver Meetup
 

Mehr von Adrian Cockcroft

Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesAdrian Cockcroft
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Adrian Cockcroft
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...Adrian Cockcroft
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Adrian Cockcroft
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSFAdrian Cockcroft
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSAdrian Cockcroft
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is uselessAdrian Cockcroft
 

Mehr von Adrian Cockcroft (9)

Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 
Migrating to Public Cloud
Migrating to Public CloudMigrating to Public Cloud
Migrating to Public Cloud
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is useless
 
NoSQL for Netflix
NoSQL for NetflixNoSQL for Netflix
NoSQL for Netflix
 

Kürzlich hochgeladen

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Kürzlich hochgeladen (20)

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Global Netflix Migrates Data from Oracle to Apache Cassandra on AWS

  • 1. Global  Ne)lix     Replacing  Datacenter  Oracle  with  Global  Apache  Cassandra  on  AWS October  24th,  2011   Adrian  Cockcro6   @adrianco  #ne9lixcloud   h=p://www.linkedin.com/in/adriancockcro6  
  • 2. Ne9lix  Inc.   With  over  25  million  members  in  the  United  States,   Canada  and  La8n  America,  Ne<lix,  Inc.  is  the  world's   leading  Internet  subscrip8on  service  for  enjoying   movies  and  TV  shows.     Interna8onal  Expansion   Ne<lix,  Inc.,  the  leading  global  Internet  movie   subscrip8on  service,  today  announced  it  will  expand   to  the  United  Kingdom  and  Ireland  in  early  2012.   Source:  h=p://ir.ne9lix.com  
  • 3. Building  a  Global  Ne9lix  Service   Ne9lix  Cloud  MigraLon   Highly  Available  and  Globally   Distributed  Data   Scalability  and  Performance  
  • 4. Why  Use  Public  Cloud?  
  • 7. Data  Center   Ne9lix  could  not   build  new   datacenters  fast   enough   Capacity  growth  is  acceleraLng,  unpredictable   Product  launch  spikes  -­‐  iPhone,  Wii,  PS3,  XBox  
  • 8. Out-­‐Growing  Data  Center   h=p://techblog.ne9lix.com/2011/02/redesigning-­‐ne9lix-­‐api.html   37x  Growth  Jan   2010-­‐Jan  2011   Datacenter   Capacity  
  • 9. Ne9lix.com  is  now  ~100%  Cloud   A  few  small  back  end  data  sources  sLll  in  progress   All  internaLonal  product  is  cloud  based   USA  specific  logisLcs  remains  in  the  Datacenter   Working  aggressively  on  billing,  PCI  compliance  on  AWS  
  • 10. Ne9lix  Choice  was  AWS  with  our   own  pla9orm  and  tools   Unique  pla9orm  requirements  and   extreme  agility  and  flexibility  
  • 11. Leverage  AWS  Scale   “the  biggest  public  cloud”   AWS  investment  in  features  and  automaLon   Use  AWS  zones  and  regions  for  high  availability,   scalability  and  global  deployment  
  • 12. We  want  to  use  clouds,   we  don’t  have  Lme  to  build  them   Public  cloud  for  agility  and  scale   AWS  because  they  are  big  enough  to  allocate  thousands   of  instances  per  hour  when  we  need  to  
  • 13. Ne9lix  Deployed  on  AWS   Content   Logs   Play   WWW   API   Video   S3   DRM   Sign-­‐Up   Metadata   Masters   EMR   CDN   Device   EC2   Search   Hadoop   rouLng   Config   Movie   TV  Movie   S3   Hive   Bookmarks   Choosing   Choosing   Business   Mobile   CDN   Logging   RaLngs   Intelligence   iPhone  
  • 14. Datacenter  AnL-­‐Pa=erns   What  did  we  do  in  the  datacenter   that  prevented  us  from  meeLng  our   goals?    
  • 15. Old  Datacenter  vs.  New  Cloud  Arch   Central  SQL  Database   Distributed  Key/Value  NoSQL   SLcky  In-­‐Memory  Session   Shared  Memcached  Session   Cha=y  Protocols   Latency  Tolerant  Protocols   Tangled  Service  Interfaces   Layered  Service  Interfaces   Instrumented  Code   Instrumented  Service  Pa=erns   Fat  Complex  Objects   Lightweight  Serializable  Objects   Components  as  Jar  Files   Components  as  Services  
  • 16. The  Central  SQL  Database   •  Datacenter  has  central  Oracle  databases   –  Everything  in  one  place  is  convenient  unLl  it  fails   –  Customers,  movies,  history,  configuraLon   •  Schema  changes  require  downLme     An8-­‐paOern  impacts  scalability,  availability  
  • 17. The  Distributed  Key-­‐Value  Store   •  Cloud  has  many  key-­‐value  data  stores   –  More  complex  to  keep  track  of,  do  backups  etc.   –  Each  store  is  much  simpler  to  administer   –  Joins  take  place  in  java  code   DBA   •  No  schema  to  change,  no  scheduled  downLme   •  Latency  for  typical  queries   –  Memcached  is  dominated  by  network  latency  <1ms   –  Cassandra  replicaLon  takes  a  few  milliseconds   –  Oracle  for  simple  queries  is  a  few  milliseconds   –  SimpleDB  replicaLon  and  REST  auth  overheads  >10ms  
  • 18. Data  MigraLon  to  Cassandra  
  • 19. TransiLonal  Steps   •  BidirecLonal  ReplicaLon   –  Oracle  to  SimpleDB   –  Queued  reverse  path  using  SQS   –  Backups  remain  in  Datacenter  via  Oracle   •  New  Cloud-­‐Only  Data  Sources   –  Cassandra  based   –  No  replicaLon  to  Datacenter   –  Backups  performed  in  the  cloud  
  • 20. API   AWS  EC2   Front  End  Load  Balancer   Discovery   Service   API  Proxy   API  etc.   Load  Balancer   Component   API   SQS   Services   Oracl e   Oracle   Oracle   Cassandra   memcached   ReplicaLon   memcached   EC2   Internal   Disks   Ne)lix   S3   Data  Center   SimpleDB  
  • 21. Cuong  the  Umbilical   •  TransiLon  Oracle  Data  Sources  to  Cassandra   –  Offload  Datacenter  Oracle  hardware   –  Free  up  capacity  for  growth  of  remaining  services   •  TransiLon  SimpleDB+Memcached  to  Cassandra   –  Primary  data  sources  that  need  backup   –  Keep  simplest  small  use  cases  for  now   •  New  challenges   –  Backup,  restore,  archive,  business  conLnuity   –  Business  Intelligence  integraLon  
  • 22. API   AWS  EC2   Front  End  Load  Balancer   Discovery   Service   API  Proxy   Load  Balancer   Component   API   Services   memcached   Cassandra   EC2   Internal   Disks   Backup   S3   SimpleDB  
  • 23. High  Availability   •  Cassandra  stores  3  local  copies,  1  per  zone   –  Synchronous  access,  durable,  highly  available   –  Read/Write  One  fastest,  least  consistent  -­‐  ~1ms   –  Read/Write  Quorum  2  of  3,  consistent  -­‐  ~3ms   •  AWS  Availability  Zones   –  Separate  buildings   –  Separate  power  etc.   –  Close  together    
  • 24. Cassandra  Write  Data  Flows   Single  Region,  MulLple  Availability  Zone   Cassandra   • Disks   • Zone  A   2   2   4   2   1.  Client  Writes  to  any   Cassandra  3   3   Cassandra   If  a  node  goes  offline,   Cassandra  Node   • Disks   5 • Disks   5   hinted  handoff   2.  Coordinator  Node   • Zone  C   1 • Zone  A   completes  the  write   replicates  to  nodes   when  the  node  comes   and  Zones   back  up.   3.  Nodes  return  ack  to   Clients     coordinator   Requests  can  choose  to   4.  Coordinator  returns   3   wait  for  one  node,  a   Cassandra   Cassandra   ack  to  client   • Disks   • Disks   5   quorum,  or  all  nodes  to   5.  Data  wri=en  to   • Zone  C   • Zone  B   ack  the  write   internal  commit  log     disk   Cassandra   SSTable  disk  writes  and   • Disks   • Zone  B   compacLons  occur   asynchronously  
  • 25. Data  Flows  for  MulL-­‐Region  Writes   Consistency  Level  =  Local  Quorum   1.  Client  Writes  to  any   If  a  node  or  region  goes  offline,  hinted  handoff   Cassandra  Node   completes  the  write  when  the  node  comes  back  up.   2.  Coordinator  node  replicates   Nightly  global  compare  and  repair  jobs  ensure   to  other  nodes  Zones  and   everything  stays  consistent.   regions   3.  Local  write  acks  returned  to   coordinator   100+ms  latency   Cassandra   2 7   4.  Client  gets  ack  when  2  of  3   Cassandra   •  Disks   •  Disks   8   2   2   •  Zone  A   4   2   6   6   •  Zone  A   local  nodes  are  commi=ed   Cassandra   3   3   Cassandra   7   Cassandra   Cassandra   5   5   5.  Data  wri=en  to  internal   8   •  Disks   •  Disks   •  Disks   •  Disks   •  Zone  C   •  Zone  A   •  Zone  C   •  Zone  A   1   commit  log  disks   US   EU   6.  When  data  arrives,  remote   Clients   Clients   Cassandra   3   Cassandra   Cassandra   7   Cassandra   node  replicates  data   •  Disks   •  Zone  C   •  Disks   •  Zone  B   5   •  Disks   •  Zone  C   •  Disks   •  Zone  B   8   7.  Ack  direct  to  source  region   Cassandra   Cassandra   coordinator   •  Disks   •  Disks   •  Zone  B   •  Zone  B   8.  Remote  copies  wri=en  to   commit  log  disks  
  • 26. Remote  Copies   •  Cassandra  duplicates  across  AWS  regions   –  Asynchronous  write,  replicates  at  desLnaLon   –  Doesn’t  directly  affect  local  read/write  latency   •  Global  Coverage   –  Business  agility   –  Follow  AWS…   •  Local  Access   3 3 –  Be=er  latency   3 3 –  Fault  IsolaLon    
  • 27. Cassandra  Backup   •  Full  Backup   Cassandra   –  Time  based  snapshot   Cassandra   Cassandra   –  SSTable  compress  -­‐>  S3   Cassandra   Cassandra   •  Incremental   S3   –  SSTable  write  triggers   Cassandra   Backup   Cassandra   compressed  copy  to  S3   •  ConLnuous  OpLon   Cassandra   Cassandra   –  Scrape  commit  log   Cassandra   Cassandra   –  Write  to  EBS  every  30s  
  • 28. Cassandra  Restore   •  Full  Restore   Cassandra   Cassandra   Cassandra   –  Replace  previous  data   •  New  Ring  from  Backup   Cassandra   Cassandra   –  New  name  old  data   S3   Backup   Cassandra   Cassandra   •  Scripted   –  Create  new  instances   Cassandra   Cassandra   –  Parallel  load  -­‐  fast   Cassandra   Cassandra  
  • 29. Cassandra  Online  AnalyLcs   •  Brisk  =  Hadoop  +  Cass   Cassandra   Brisk   Cassandra   –  Use  split  Brisk  ring   –  Size  each  separately   Brisk   Cassandra   •  Direct  Access   S3   Backup   Cassandra   Cassandra   –  Keyspaces   –  Hive/Pig/Map-­‐Reduce   Cassandra   Cassandra   –  Hdfs  as  a  keyspace   Cassandra   Cassandra   –  Distributed  namenode  
  • 30. Cassandra  Archive   Appropriate  level  of  paranoia  needed…   •  Archive  could  be  un-­‐readable   –  Restore  S3  backups  weekly  from  prod  to  test   •  Archive  could  be  stolen   –  PGP  Encrypt  archive   •  AWS  East  Region  could  have  a  problem   –  Copy  data  to  AWS  West   •  ProducLon  AWS  Account  could  have  an  issue   –  Separate  Archive  account  with  no-­‐delete  S3  ACL   •  AWS  S3  could  have  a  global  problem   –  Create  an  extra  copy  on  a  different  cloud  vendor  
  • 31. Tools  and  AutomaLon   •  Developer  and  Build  Tools   –  Jira,  Perforce,  Eclipse,  Jenkins,  Ivy,  ArLfactory   –  Builds,  creates  .war  file,  .rpm,  bakes  AMI  and  launches   •  Custom  Ne9lix  ApplicaLon  Console   –  AWS  Features  at  Enterprise  Scale  (hide  the  AWS  security  keys!)   –  Auto  Scaler  Group  is  unit  of  deployment  to  producLon   •  Open  Source  +  Support   –  Apache,  Tomcat,  Cassandra,  Hadoop,  OpenJDK,  CentOS   –  Datastax  support  for  Cassandra,  AWS  support  for  Hadoop  via  EMR   •  Monitoring  Tools   –  Datastax  Opscenter  for  monitoring  Cassandra   –  AppDynamics  –  Developer  focus  for  cloud  h=p://appdynamics.com  
  • 32. Developer  MigraLon   •  Detailed  SQL  to  NoSQL  TransiLon  Advice   –  Sid  Anand    -­‐  QConSF  Nov  5th  –  Ne9lix’  TransiLon   to  High  Availability  Storage  Systems   –  Blog  -­‐  h=p://pracLcalcloudcompuLng.com/   –  Download  Paper  PDF  -­‐  h=p://bit.ly/bhOTLu   •  Mark  Atwood,  "Guide  to  NoSQL,  redux”   –  YouTube  h=p://youtu.be/zAbFRiyT3LU  
  • 33. Cloud  OperaLons   Cassandra  Use  Cases   Model  Driven  Architecture   Performance  and  Scalability  
  • 34. Cassandra  Use  Cases   •  Key  by  Customer  –  Cross-­‐region  clusters   –  Many  app  specific  Cassandra  clusters,  read-­‐intensive   –  Keys+Rows  in  memory  using  m2.4xl  Instances   •  Key  by  Customer:Movie  –  e.g.  Viewing  History   –  Growing  fast,  write  intensive  –  m1.xl  instances   –  Keys  cached  in  memory,  one  cluster  per  region   •  Large  scale  data  logging  –  lots  of  writes   –  Column  data  expires  a6er  Lme  period   –  Distributed  counters,  one  cluster  per  region  
  • 35. Model  Driven  Architecture   •  Datacenter  PracLces   –  Lots  of  unique  hand-­‐tweaked  systems   –  Hard  to  enforce  pa=erns   •  Model  Driven  Cloud  Architecture   –  Perforce/Ivy/Jenkins  based  builds  for  everything   –  Every  producLon  instance  is  a  pre-­‐baked  AMI   –  Every  applicaLon  is  managed  by  an  Autoscaler   Every  change  is  a  new  AMI  
  • 36. Ne9lix  Pla9orm  Cassandra  AMI   •  Tomcat  server   –  Always  running,  registers  with  pla9orm   –  Manages  Cassandra  state,  tokens,  backups   •  Removed  Root  Disk  Dependency  on  EBS   –  Use  S3  backed  AMI  for  stateful  services   –  Normally  use  EBS  backed  AMI  for  fast  provisioning  
  • 37. Chaos  Monkey   •  Make  sure  systems  are  resilient   –  Allow  any  instance  to  fail  without  customer  impact   •  Chaos  Monkey  hours   –  Monday-­‐Thursday  9am-­‐3pm  random  instance  kill   •  ApplicaLon  configuraLon  opLon   –  Apps  now  have  to  opt-­‐out  from  Chaos  Monkey   •  Computers  (Datacenter  or  AWS)  randomly  die   –  Fact  of  life,  but  too  infrequent  to  test  resiliency  
  • 38. AppDynamics  Monitoring  of  Cassandra  –  AutomaLc  Discovery  
  • 39. Ne9lix  ContribuLons  to  Cassandra   •  Cassandra  as  a  mutable  toolkit   –  Cassandra  is  in  Java,  pluggable,  well  structured   –  Ne9lix  has  a  building  full  of  Java  engineers….   •  Actual  ContribuLons  delivered  in  0.8   –  First  prototype  of  off-­‐heap  row  cache   –  Incremental  backup  SSTable  write  callback   •  Work  In  Progress   –  AWS  integraLon  and  backup  using  Tomcat  helper   –  Astyanax  re-­‐write  of  Hector  Java  client  library  
  • 40. Performance  TesLng   •  Cloud  Based  TesLng  –  fricLonless,  elasLc   –  Create/destroy  any  sized  cluster  in  minutes   –  Many  test  scenarios  run  in  parallel   •  Test  Scenarios   –  Internal  app  specific  tests   –  Simple  “stress”  tool  provided  with  Cassandra   •  Scale  test,  keep  making  the  cluster  bigger   –  Check  that  tooling  and  automaLon  works…   –  How  many  ten  column  row  writes/sec  can  we  do?  
  • 41.
  • 42.
  • 44. Scale-­‐Up  Linearity   Client  Writes/s  by  node  count  –  ReplicaEon  Factor  =  3   1200000   1099837   1000000   800000   600000   537172   400000   366828   200000   174373   0   0   50   100   150   200   250   300   350  
  • 45.
  • 46.
  • 47. Per  Node  AcLvity   Per  Node   48  Nodes   96  Nodes   144  Nodes   288  Nodes   Per  Server  Writes/s   10,900  w/s   11,460  w/s   11,900  w/s   11,456  w/s   Mean  Server  Latency   0.0117  ms   0.0134  ms   0.0148  ms   0.0139  ms   Mean  CPU  %Busy   74.4  %   75.4  %   72.5  %   81.5  %   Disk  Read   5,600  KB/s   4,590  KB/s   4,060  KB/s   4,280  KB/s   Disk  Write   12,800  KB/s   11,590  KB/s   10,380  KB/s   10,080  KB/s   Network  Read   22,460  KB/s   23,610  KB/s   21,390  KB/s   23,640  KB/s   Network  Write   18,600  KB/s   19,600  KB/s   17,810  KB/s   19,770  KB/s   Node  specificaLon  –  Xen  Virtual  Images,  AWS  US  East,  three  zones   •  Cassandra  0.8.6,  CentOS,  SunJDK6   •  AWS  EC2  m1  Extra  Large  –  Standard  price  $  0.68/Hour   •  15  GB  RAM,  4  Cores,  1Gbit  network   •  4  internal  disks  (total  1.6TB,  striped  together,  md,  XFS)  
  • 48. Time  is  Money   48  nodes   96  nodes   144  nodes   288  nodes   Writes  Capacity   174373  w/s   366828  w/s   537172  w/s   1,099,837  w/s   Storage  Capacity   12.8  TB   25.6  TB   38.4  TB   76.8  TB   Nodes  Cost/hr   $32.64   $65.28   $97.92   $195.84   Test  Driver  Instances   10   20   30   60   Test  Driver  Cost/hr   $20.00   $40.00   $60.00   $120.00   Cross  AZ  Traffic   5  TB/hr   10  TB/hr   15  TB/hr   301  TB/hr   Traffic  Cost/10min   $8.33   $16.66   $25.00   $50.00   Setup  DuraLon   15  minutes   22  minutes   31  minutes   662  minutes   AWS  Billed  DuraLon   1hr   1hr   1  hr   2  hr   Total  Test  Cost   $60.97   $121.94   $182.92   $561.68   1  EsLmate  two  thirds  of  total  network  traffic     2  Workaround  for  a  tooling  bug  slowed  setup  
  • 49. Takeaway     Ne<lix  is  using  Cassandra  on  AWS  as  a  key     infrastructure  component  of  its  globally   distributed  streaming  product.     Also,  benchmarking  in  the  cloud  is  fast,  cheap  and   scalable     h=p://www.linkedin.com/in/adriancockcro6   @adrianco  #ne9lixcloud   acockcro6@ne9lix.com  
  • 50. Amazon Cloud Terminology Reference See http://aws.amazon.com/ This is not a full list of Amazon Web Service features •  AWS  –  Amazon  Web  Services  (common  name  for  Amazon  cloud)   •  AMI  –  Amazon  Machine  Image  (archived  boot  disk,  Linux,  Windows  etc.  plus  applicaLon  code)   •  EC2  –  ElasLc  Compute  Cloud   –  Range  of  virtual  machine  types  m1,  m2,  c1,  cc,  cg.  Varying  memory,  CPU  and  disk  configuraLons.   –  Instance  –  a  running  computer  system.  Ephemeral,  when  it  is  de-­‐allocated  nothing  is  kept.   –  Reserved  Instances  –  pre-­‐paid  to  reduce  cost  for  long  term  usage   –  Availability  Zone  –  datacenter  with  own  power  and  cooling  hosLng  cloud  instances   –  Region  –  group  of  Availability  Zones  –  US-­‐East,  US-­‐West,  EU-­‐Eire,  Asia-­‐Singapore,  Asia-­‐Japan   •  ASG  –  Auto  Scaling  Group  (instances  booLng  from  the  same  AMI)   •  S3  –  Simple  Storage  Service  (h=p  access)   •  EBS  –  ElasLc  Block  Storage  (network  disk  filesystem  can  be  mounted  on  an  instance)   •  RDS  –  RelaLonal  Database  Service  (managed  MySQL  master  and  slaves)   •  SDB  –  Simple  Data  Base  (hosted  h=p  based  NoSQL  data  store)   •  SQS  –  Simple  Queue  Service  (h=p  based  message  queue)   •  SNS  –  Simple  NoLficaLon  Service  (h=p  and  email  based  topics  and  messages)   •  EMR  –  ElasLc  Map  Reduce  (automaLcally  managed  Hadoop  cluster)   •  ELB  –  ElasLc  Load  Balancer   •  EIP  –  ElasLc  IP  (stable  IP  address  mapping  assigned  to  instance  or  ELB)   •  VPC  –  Virtual  Private  Cloud  (extension  of  enterprise  datacenter  network  into  cloud)   •  IAM  –  IdenLty  and  Access  Management  (fine  grain  role  based  security  keys)