SlideShare ist ein Scribd-Unternehmen logo
1 von 55
@r39132
Motivation

    Circa late 2008, Netflix had a single data center
           Single-point-of-failure (a.k.a. SPOF)
           Approaching limits on cooling, power, space, traffic
            capacity

    Alternatives
           Build more data centers
           Outsource the majority of capacity planning and scale
            out
              Allows us to focus on core competencies



@r39132                             3
Motivation



    Winner : Outsource the majority of capacity planning and scale
     out
           Leverage a leading Infrastructure-as-a-service provider
                 Amazon Web Services




@r39132                                  4
Cloud Migration Strategy

    Components
           Applications and Software Infrastructure
           Data



    Migration Considerations
           Avoid sensitive data for now
                 PII and PCI DSS stays in our DC, rest can go to the cloud
           Favor Web Scale applications & data




@r39132                                  6
Cloud Migration Strategy

   Examples of Data that can be moved

    Video-centric data
           Critics’ and Users’ reviews
           Video Metadata (e.g. director, actors, plot description, etc…)

    User-video-centric data – some of our largest data sets
           Video Queue
           Watched History
           Video Ratings (i.e. a 5-star rating system)
           Video Playback Metadata (e.g. streaming bookmarks, activity
            logs)



@r39132                                   7
Cloud Migration Strategy




@r39132              9
Cloud Migration Strategy




@r39132              10
Cloud Migration Strategy




@r39132              11
Pick a Data Store in the Cloud


 An ideal storage solution should have the following features:
 •    Be hosted in AWS
      • We wanted a database-as-a-service

 •    Be highly scalable and available and have acceptable latencies
      • It should automatically scale with Netflix’s traffic growth
      • It should be as available as television – i.e. zero downtime

 •    Support SQL
      • Developers already familiar with the model



@r39132                             13
Pick a Data Store in the Cloud


    We picked SimpleDB and S3
           SimpleDB was targeted as the AP (c.f. CAP theorem)
            equivalent of our RDBMS databases in our Data
            Center

           S3 was used for data sets where item or row data
            exceeded SimpleDB limits and could be looked up
            purely by a single key (i.e. does not require
            secondary indices and complex query semantics)
                Video encodes
              Streaming device activity logs (i.e. CLOB, BLOB, etc…)
                Compressed (old) Rental History

@r39132                               14
SimpleDB
Technology Overview : SimpleDB

   Terminology
          SimpleDB          Hash Table           Relational Databases

           Domain           Hash Table                   Table

             Item              Entry                     Row

          Item Name             Key              Mandatory Primary Key

           Attribute   Part of the Entry Value          Column




@r39132                         16
Technology Overview : SimpleDB
Soccer Players
Key                         Value

                                                                               Nickname = Wizard of     Teams = Leeds United,
ab12ocs12v9                 First Name = Harold      Last Name = Kewell        Oz                       Liverpool, Galatasaray
                                                                               Nickname = Czech         Teams = Lazio,
b24h3b3403b                 First Name = Pavel       Last Name = Nedved        Cannon                   Juventus
                                                                                                        Teams = Sporting,
                                                                                                        Manchester United,
cc89c9dc892                 First Name = Cristiano   Last Name = Ronaldo                                Real Madrid

SimpleDB’s salient characteristics
       • SimpleDB offers a range of consistency options

       • SimpleDB domains are sparse and schema-less

       • The Key and all Attributes are indexed

       • Each item must have a unique Key

       • An item contains a set of Attributes
              • Each Attribute has a name
                 • Each Attribute has a set of values
                 • All data is stored as UTF-8 character strings (i.e. no support for types such as numbers or dates)


@r39132                                                         17
Technology Overview : SimpleDB
   What does the API look like?
      Manage Domains
           CreateDomain
           DeleteDomain
           ListDomains
           DomainMetaData
      Access Data
           Retrieving Data
                 GetAttributes – returns a single item
                 Select – returns multiple items using SQL syntax
           Writing Data
                 PutAttributes – put single item
                 BatchPutAttributes – put multiple items
           Removing Data
                 DeleteAttributes – delete single item
                 BatchDeleteAttributes – delete multiple items




@r39132                                         18
Technology Overview : SimpleDB

    Options available on reads and writes
           Consistent Read
                Read the most recently committed write
                May have lower throughput/higher latency/lower
                 availability


           Conditional Put/Delete
                i.e. Optimistic Locking
                Useful if you want to build a consistent multi-master data
                 store – you will still require your own consistency
                 checking




@r39132                             19
SimpleDB
Major Issues with SimpleDB


 Manual data set partitioning was needed to work around size and
  throughput limits

 Read & Write latency variance could be large

 SimpleDB multi-tenancy created both throughput and latency issues

 Bad cost model – poor performance cost us more money

 Not Global – new requirement : global expansion

 No (external) back-up and recovery available – new requirement : refresh
  Test DBs from Prod DBs



  @r39132                            21
Pick Another Data Store in the Cloud


          An ideal storage solution should have the following
              features:
          •   New Requirements
              •   Support Global Use-cases
              • Support Back-up and Recovery
          •   Retained Requirements
              •   Be highly scalable and available and have acceptable
                  latencies
          •   Obsolete Requirements
              •   Be hosted in AWS
              •   Support SQL



@r39132                                23
Pick Another Data Store in the Cloud


          We picked Cassandra (i.e. Dynamo + BigTable)
          •   Support Global Use-cases
              •   Easy to add new nodes to the cluster, even if the new nodes
                  are on a different continent
              •
          •   Be easy to own and operate
              •   Dynamo’s masterless design avoids SPOF
              •   Dynamo’s repair mechanisms (i.e. read repair, anti-entropy
                  repair, and hinted handoff) promote self-management




@r39132                                 24
Pick Another Data Store in the Cloud



          We picked Cassandra (i.e. Dynamo + BigTable)
          •   Be highly scalable and available and have acceptable
              latencies
              •   Known to scale for writes
                  •   Netflix achieved >1 million writes/sec
              •   Netflix leverages caches for reads


          •   Bonus
              • Data model identical to SimpleDB – i.e. one less thing
                 for developers to learn



@r39132                                    25
Cassandra
Technology Overview : Cassandra



          Features
          •   Consistent Hashing
          •   Tunable Consistency at the Request-level (like Simple DB)
          •   Automatic Healing
              •    Read Repair
              •    Anti-entropy Repair
              •    Hinted-Handoff
          •   Failure Detection
          •   Clusters are configurable & upgradeable without restart
          •   Infinite Incremental Write Scalability


@r39132                                  27
Consistent Hashing


How does Consistent Hashing
work in Cassandra?

• Take a number line from [0-159]

• Wrap it on itself, so you now
  have a number ring




@r39132                           28
Consistent Hashing


• Given a key k, map the key to the ring
  using:

    • hash_func(k)%160 = token = position
      on the ring




@r39132                          29
Consistent Hashing


• You can then manually map
  machines to the ring

• 8 machines are mapped to the
  ring here
   • N1  0
   • N2  20
   • N3  40




@r39132                       30
Consistent Hashing

Now map key ranges to
machines:

• Host N2 owns all tokens in
  the range (0, 20]
   • In other words, “foo” is
      mapped to the bucket 9
      but assigned to server N2

• Host N5 owns all tokens in
  the range (60, 80]
   • “bar” is mapped to bucket
      79 and assigned to server
      N5
@r39132                           31
Consistent Hashing : Dead
                    node?
What happens if a node dies or
becomes unresponsive?

With a replication factor of say 3, data
is always written to 3 places

• Writes to tokens in N2’s primary
  token range are also written to N3
  and N4

• Writes to tokens in N5’s primary
  token range are also written to N6
  and N7

 @r39132                            32
Writes in Cassandra
The Write Path

• Cassandra clients are not
  required to know token-ring
  mapping

• Client can send a request to any
  node in the cluster

• The receiving node is called the
  “coordinator” for that request

• The coordinator will always
  execute <Replication Factor>
  number of writes (e.g. 3)
@r39132                          34
The Write Path

• Coordinators take care of:
   • Key routing

   • Executing the consistency level
     of the request
      • e.g. “CL=1” means that the
         coordinator will wait till 1
         node ACKs the write before
         sending a response to the
         cllent

        • e.g. “CL=Quorum” means
          that the coordinator will wait
  @r39132
          till 2 of 3 nodes ACK the 35
          write
The Write Path

• If node N3 is presumed dead by
  the failure detection algorithm or
  if N3 is too busy to respond

    • N5 will log the write to N3
      and deliver it as soon as N3
      comes back

    • This is called Hinted
      Handoff




@r39132                          36
The Write Path

• Now that N5 is holding
  uncommitted writes, what
  happens if N5 dies before it can
  replay the failed write?

    • How will the logged write
      ever make it to N3?

    • Another repair mechanism
      helps in this case: Read
      Repair (stay tuned)


@r39132                           37
Reads in Cassandra
The Read Path

• Again, N5 acts as a proxy, this
  time for a get request

• Assume that the the
  consistency_level on the
  request is “Quorum”, N5 will
  send a full read request to N2
  and digest requests to N3 & N4

   • This is a network
     optimization


  @r39132                       39
The Read Path : Read Repair

• N5 will compare the responses as
  soon as any 2 requests return

    • At least one response will be a
      digest (e.g. say N2 and N3)

• If the digest is more recent than the
  full read, N5 doesn’t have the latest
  data

    • N5 then ask the N3 replica for a
      full read



 @r39132                             40
The Read Path : Read Repair

• Once N5 receives the response of
  the full read from N3, N5 returns this
  data to the client

• But wait! What do we do about the
  stale data on N2?

• Then schedule an async update for
  the stale node(s)

• This is called read repair




 @r39132                             41
Expanding the Ring
Consistent Hashing : Growing the
                Ring

• To double the capacity of
  the ring and keep the data
  distributed evenly, we add
  nodes (N9, N10, etc… ) in
  the interstitial positions
  (10, 30, etc… )

• Cassandra bootstraps new
  nodes via data streaming

• For large data sets, Netflix
  recovers from backup and
  runs AER
@r39132                          43
Consistent Hashing : Growing the
                Ring
• After new nodes have
  been added, they take
  over half of the primary
  token ranges of each old
  node




@r39132                      44
A Global Ring

• One of the benefits of
  Cassandra is that it
  can support
  deployment of a
  global ring




@r39132                    45
A Global Ring

• In January 2012,
  Netflix rolled out its
  streaming service to
  the UK and Ireland

• It did this by running a
  few Cassandra
  clusters between Va
  and the UK




@r39132                      46
A Global Ring
• Growing the ring into the UK from
  its origin in Va was easy

   • Double the ring as mentioned
     before

   • New interstitial nodes are in the
     UK

   • Increase the global replication
     factor 3  6

• Now each key is covered by 6
  nodes, 3 in Va, 3 in the UK
  @r39132                          47
Single Node Design
Single Node Design

A single node is implemented as per the LSM
tree (i.e. log structured merge tree) pattern:

• Writes first append to the commit log and
  then write to the memtable in memory

• This model gives fast writes

When the memtable is full, it is flushed to disk
to give SSTable 3

The SSTables are immutable


  @r39132                            49
Single Node Design

In a pure KV LSM store (e.g. Level DB),
when a Read occurs, the memtable is first
consulted

If the key is found, the data is returned from
the memtable

This is a fast read

If the data is not in the memtable, then it is
read from the latest SSTable

This is a slower read

   @r39132                            50
Single Node Design

Since Cassandra supports multiple value
columns, only a subset of which may be
written at any time, a read must consult
memtable and potentially multiple sstables

Hence, reads can be slower than they would
be for pure LSM KV stores




  @r39132                          51
Single Node Design

To reduce the number of SSTables
that need to be read, periodic
compaction needs to be run

This puts an additional load on GC
and I/O

The end result of compaction is
fewer files with hopefully fewer total
rows

To mitigate the problems inherent in
compaction, Cassandra 1.x support
Leveled Compaction
   @r39132                               52
Cassandra
Current Status
   Timeline of Migration

   • Jan 2009 – Start investigating the AWS cloud & SimpleDB

   • Feb 2010 – Deliver the app to production

   • Dec 2010 – ~95% of Netflix traffic has been moved into both the
     AWS cloud and SimpleDB (+10 months)

   • Apr 2011 – Start migration to Cassandra

   • Jan 2012 – Netflix EU launched on Cassandra (+9 months)




@r39132                            54
Linked in nosql_atnetflix_2012_v1

Weitere ähnliche Inhalte

Was ist angesagt?

Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Summit
 
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Robert "Chip" Senkbeil
 
Apache Spark e AWS Glue
Apache Spark e AWS GlueApache Spark e AWS Glue
Apache Spark e AWS GlueLaercio Serra
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeDatabricks
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with SparkKnoldus Inc.
 
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAmazon Web Services
 
Advanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPAdvanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPDatabricks
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark StreamingKnoldus Inc.
 
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayMigrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayDatabricks
 
Ultimate journey towards realtime data platform with 2.5M events per sec
Ultimate journey towards realtime data platform with 2.5M events per secUltimate journey towards realtime data platform with 2.5M events per sec
Ultimate journey towards realtime data platform with 2.5M events per secb0ris_1
 
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...Databricks
 
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech Talks
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech TalksHow to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech Talks
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech TalksAmazon Web Services
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkDatabricks
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Sid Anand
 
Spark and Spark Streaming
Spark and Spark StreamingSpark and Spark Streaming
Spark and Spark Streaming宇 傅
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird
 

Was ist angesagt? (20)

Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
 
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
 
Apache Spark e AWS Glue
Apache Spark e AWS GlueApache Spark e AWS Glue
Apache Spark e AWS Glue
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta Lake
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 
Data Pipeline at Tapad
Data Pipeline at TapadData Pipeline at Tapad
Data Pipeline at Tapad
 
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
 
Advanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPAdvanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLP
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayMigrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
 
Ultimate journey towards realtime data platform with 2.5M events per sec
Ultimate journey towards realtime data platform with 2.5M events per secUltimate journey towards realtime data platform with 2.5M events per sec
Ultimate journey towards realtime data platform with 2.5M events per sec
 
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
 
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech Talks
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech TalksHow to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech Talks
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech Talks
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
 
Spark and Spark Streaming
Spark and Spark StreamingSpark and Spark Streaming
Spark and Spark Streaming
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 

Ähnlich wie Linked in nosql_atnetflix_2012_v1

Netflix's Transition to High-Availability Storage (QCon SF 2010)
Netflix's Transition to High-Availability Storage (QCon SF 2010)Netflix's Transition to High-Availability Storage (QCon SF 2010)
Netflix's Transition to High-Availability Storage (QCon SF 2010)Sid Anand
 
Svccg nosql 2011_v4
Svccg nosql 2011_v4Svccg nosql 2011_v4
Svccg nosql 2011_v4Sid Anand
 
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2Sid Anand
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Sid Anand
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDataWorks Summit
 
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformLarge Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformDataStax Academy
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksAnyscale
 
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...HostedbyConfluent
 
Cassandra presentation
Cassandra presentationCassandra presentation
Cassandra presentationSergey Enin
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccsrisatish ambati
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in SparkDatabricks
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Dataconomy Media
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, EgyptSQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, EgyptChris Richardson
 
(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWSAmazon Web Services
 
SiddharthAnand_NetflixsCloudDataArchitecture
SiddharthAnand_NetflixsCloudDataArchitectureSiddharthAnand_NetflixsCloudDataArchitecture
SiddharthAnand_NetflixsCloudDataArchitectureKostas Mavridis
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Databricks
 

Ähnlich wie Linked in nosql_atnetflix_2012_v1 (20)

Netflix's Transition to High-Availability Storage (QCon SF 2010)
Netflix's Transition to High-Availability Storage (QCon SF 2010)Netflix's Transition to High-Availability Storage (QCon SF 2010)
Netflix's Transition to High-Availability Storage (QCon SF 2010)
 
Svccg nosql 2011_v4
Svccg nosql 2011_v4Svccg nosql 2011_v4
Svccg nosql 2011_v4
 
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
 
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformLarge Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
 
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
 
Cassandra presentation
Cassandra presentationCassandra presentation
Cassandra presentation
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, EgyptSQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
 
(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS
 
SiddharthAnand_NetflixsCloudDataArchitecture
SiddharthAnand_NetflixsCloudDataArchitectureSiddharthAnand_NetflixsCloudDataArchitecture
SiddharthAnand_NetflixsCloudDataArchitecture
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 

Mehr von Sid Anand

Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)Sid Anand
 
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Sid Anand
 
Low Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionLow Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionSid Anand
 
YOW! Data Keynote (2021)
YOW! Data Keynote (2021)YOW! Data Keynote (2021)
YOW! Data Keynote (2021)Sid Anand
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowSid Anand
 
Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Sid Anand
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Sid Anand
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Sid Anand
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon TokyoSid Anand
 
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)Sid Anand
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Sid Anand
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Sid Anand
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Sid Anand
 
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)Sid Anand
 
Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Sid Anand
 
Hands On with Maven
Hands On with MavenHands On with Maven
Hands On with MavenSid Anand
 
Learning git
Learning gitLearning git
Learning gitSid Anand
 
LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)Sid Anand
 
LinkedIn Data Infrastructure (QCon London 2012)
LinkedIn Data Infrastructure (QCon London 2012)LinkedIn Data Infrastructure (QCon London 2012)
LinkedIn Data Infrastructure (QCon London 2012)Sid Anand
 
Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!Sid Anand
 

Mehr von Sid Anand (20)

Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)
 
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
 
Low Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionLow Latency Fraud Detection & Prevention
Low Latency Fraud Detection & Prevention
 
YOW! Data Keynote (2021)
YOW! Data Keynote (2021)YOW! Data Keynote (2021)
YOW! Data Keynote (2021)
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
 
Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
 
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
 
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
 
Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)
 
Hands On with Maven
Hands On with MavenHands On with Maven
Hands On with Maven
 
Learning git
Learning gitLearning git
Learning git
 
LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)
 
LinkedIn Data Infrastructure (QCon London 2012)
LinkedIn Data Infrastructure (QCon London 2012)LinkedIn Data Infrastructure (QCon London 2012)
LinkedIn Data Infrastructure (QCon London 2012)
 
Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!
 

Kürzlich hochgeladen

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 

Kürzlich hochgeladen (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Linked in nosql_atnetflix_2012_v1

  • 2.
  • 3. Motivation  Circa late 2008, Netflix had a single data center  Single-point-of-failure (a.k.a. SPOF)  Approaching limits on cooling, power, space, traffic capacity  Alternatives  Build more data centers  Outsource the majority of capacity planning and scale out  Allows us to focus on core competencies @r39132 3
  • 4. Motivation  Winner : Outsource the majority of capacity planning and scale out  Leverage a leading Infrastructure-as-a-service provider  Amazon Web Services @r39132 4
  • 5.
  • 6. Cloud Migration Strategy  Components  Applications and Software Infrastructure  Data  Migration Considerations  Avoid sensitive data for now  PII and PCI DSS stays in our DC, rest can go to the cloud  Favor Web Scale applications & data @r39132 6
  • 7. Cloud Migration Strategy Examples of Data that can be moved  Video-centric data  Critics’ and Users’ reviews  Video Metadata (e.g. director, actors, plot description, etc…)  User-video-centric data – some of our largest data sets  Video Queue  Watched History  Video Ratings (i.e. a 5-star rating system)  Video Playback Metadata (e.g. streaming bookmarks, activity logs) @r39132 7
  • 8.
  • 12.
  • 13. Pick a Data Store in the Cloud An ideal storage solution should have the following features: • Be hosted in AWS • We wanted a database-as-a-service • Be highly scalable and available and have acceptable latencies • It should automatically scale with Netflix’s traffic growth • It should be as available as television – i.e. zero downtime • Support SQL • Developers already familiar with the model @r39132 13
  • 14. Pick a Data Store in the Cloud  We picked SimpleDB and S3  SimpleDB was targeted as the AP (c.f. CAP theorem) equivalent of our RDBMS databases in our Data Center  S3 was used for data sets where item or row data exceeded SimpleDB limits and could be looked up purely by a single key (i.e. does not require secondary indices and complex query semantics)  Video encodes  Streaming device activity logs (i.e. CLOB, BLOB, etc…)  Compressed (old) Rental History @r39132 14
  • 16. Technology Overview : SimpleDB Terminology SimpleDB Hash Table Relational Databases Domain Hash Table Table Item Entry Row Item Name Key Mandatory Primary Key Attribute Part of the Entry Value Column @r39132 16
  • 17. Technology Overview : SimpleDB Soccer Players Key Value Nickname = Wizard of Teams = Leeds United, ab12ocs12v9 First Name = Harold Last Name = Kewell Oz Liverpool, Galatasaray Nickname = Czech Teams = Lazio, b24h3b3403b First Name = Pavel Last Name = Nedved Cannon Juventus Teams = Sporting, Manchester United, cc89c9dc892 First Name = Cristiano Last Name = Ronaldo Real Madrid SimpleDB’s salient characteristics • SimpleDB offers a range of consistency options • SimpleDB domains are sparse and schema-less • The Key and all Attributes are indexed • Each item must have a unique Key • An item contains a set of Attributes • Each Attribute has a name • Each Attribute has a set of values • All data is stored as UTF-8 character strings (i.e. no support for types such as numbers or dates) @r39132 17
  • 18. Technology Overview : SimpleDB What does the API look like?  Manage Domains  CreateDomain  DeleteDomain  ListDomains  DomainMetaData  Access Data  Retrieving Data  GetAttributes – returns a single item  Select – returns multiple items using SQL syntax  Writing Data  PutAttributes – put single item  BatchPutAttributes – put multiple items  Removing Data  DeleteAttributes – delete single item  BatchDeleteAttributes – delete multiple items @r39132 18
  • 19. Technology Overview : SimpleDB  Options available on reads and writes  Consistent Read  Read the most recently committed write  May have lower throughput/higher latency/lower availability  Conditional Put/Delete  i.e. Optimistic Locking  Useful if you want to build a consistent multi-master data store – you will still require your own consistency checking @r39132 19
  • 21. Major Issues with SimpleDB  Manual data set partitioning was needed to work around size and throughput limits  Read & Write latency variance could be large  SimpleDB multi-tenancy created both throughput and latency issues  Bad cost model – poor performance cost us more money  Not Global – new requirement : global expansion  No (external) back-up and recovery available – new requirement : refresh Test DBs from Prod DBs @r39132 21
  • 22.
  • 23. Pick Another Data Store in the Cloud An ideal storage solution should have the following features: • New Requirements • Support Global Use-cases • Support Back-up and Recovery • Retained Requirements • Be highly scalable and available and have acceptable latencies • Obsolete Requirements • Be hosted in AWS • Support SQL @r39132 23
  • 24. Pick Another Data Store in the Cloud We picked Cassandra (i.e. Dynamo + BigTable) • Support Global Use-cases • Easy to add new nodes to the cluster, even if the new nodes are on a different continent • • Be easy to own and operate • Dynamo’s masterless design avoids SPOF • Dynamo’s repair mechanisms (i.e. read repair, anti-entropy repair, and hinted handoff) promote self-management @r39132 24
  • 25. Pick Another Data Store in the Cloud We picked Cassandra (i.e. Dynamo + BigTable) • Be highly scalable and available and have acceptable latencies • Known to scale for writes • Netflix achieved >1 million writes/sec • Netflix leverages caches for reads • Bonus • Data model identical to SimpleDB – i.e. one less thing for developers to learn @r39132 25
  • 27. Technology Overview : Cassandra Features • Consistent Hashing • Tunable Consistency at the Request-level (like Simple DB) • Automatic Healing • Read Repair • Anti-entropy Repair • Hinted-Handoff • Failure Detection • Clusters are configurable & upgradeable without restart • Infinite Incremental Write Scalability @r39132 27
  • 28. Consistent Hashing How does Consistent Hashing work in Cassandra? • Take a number line from [0-159] • Wrap it on itself, so you now have a number ring @r39132 28
  • 29. Consistent Hashing • Given a key k, map the key to the ring using: • hash_func(k)%160 = token = position on the ring @r39132 29
  • 30. Consistent Hashing • You can then manually map machines to the ring • 8 machines are mapped to the ring here • N1  0 • N2  20 • N3  40 @r39132 30
  • 31. Consistent Hashing Now map key ranges to machines: • Host N2 owns all tokens in the range (0, 20] • In other words, “foo” is mapped to the bucket 9 but assigned to server N2 • Host N5 owns all tokens in the range (60, 80] • “bar” is mapped to bucket 79 and assigned to server N5 @r39132 31
  • 32. Consistent Hashing : Dead node? What happens if a node dies or becomes unresponsive? With a replication factor of say 3, data is always written to 3 places • Writes to tokens in N2’s primary token range are also written to N3 and N4 • Writes to tokens in N5’s primary token range are also written to N6 and N7 @r39132 32
  • 34. The Write Path • Cassandra clients are not required to know token-ring mapping • Client can send a request to any node in the cluster • The receiving node is called the “coordinator” for that request • The coordinator will always execute <Replication Factor> number of writes (e.g. 3) @r39132 34
  • 35. The Write Path • Coordinators take care of: • Key routing • Executing the consistency level of the request • e.g. “CL=1” means that the coordinator will wait till 1 node ACKs the write before sending a response to the cllent • e.g. “CL=Quorum” means that the coordinator will wait @r39132 till 2 of 3 nodes ACK the 35 write
  • 36. The Write Path • If node N3 is presumed dead by the failure detection algorithm or if N3 is too busy to respond • N5 will log the write to N3 and deliver it as soon as N3 comes back • This is called Hinted Handoff @r39132 36
  • 37. The Write Path • Now that N5 is holding uncommitted writes, what happens if N5 dies before it can replay the failed write? • How will the logged write ever make it to N3? • Another repair mechanism helps in this case: Read Repair (stay tuned) @r39132 37
  • 39. The Read Path • Again, N5 acts as a proxy, this time for a get request • Assume that the the consistency_level on the request is “Quorum”, N5 will send a full read request to N2 and digest requests to N3 & N4 • This is a network optimization @r39132 39
  • 40. The Read Path : Read Repair • N5 will compare the responses as soon as any 2 requests return • At least one response will be a digest (e.g. say N2 and N3) • If the digest is more recent than the full read, N5 doesn’t have the latest data • N5 then ask the N3 replica for a full read @r39132 40
  • 41. The Read Path : Read Repair • Once N5 receives the response of the full read from N3, N5 returns this data to the client • But wait! What do we do about the stale data on N2? • Then schedule an async update for the stale node(s) • This is called read repair @r39132 41
  • 43. Consistent Hashing : Growing the Ring • To double the capacity of the ring and keep the data distributed evenly, we add nodes (N9, N10, etc… ) in the interstitial positions (10, 30, etc… ) • Cassandra bootstraps new nodes via data streaming • For large data sets, Netflix recovers from backup and runs AER @r39132 43
  • 44. Consistent Hashing : Growing the Ring • After new nodes have been added, they take over half of the primary token ranges of each old node @r39132 44
  • 45. A Global Ring • One of the benefits of Cassandra is that it can support deployment of a global ring @r39132 45
  • 46. A Global Ring • In January 2012, Netflix rolled out its streaming service to the UK and Ireland • It did this by running a few Cassandra clusters between Va and the UK @r39132 46
  • 47. A Global Ring • Growing the ring into the UK from its origin in Va was easy • Double the ring as mentioned before • New interstitial nodes are in the UK • Increase the global replication factor 3  6 • Now each key is covered by 6 nodes, 3 in Va, 3 in the UK @r39132 47
  • 49. Single Node Design A single node is implemented as per the LSM tree (i.e. log structured merge tree) pattern: • Writes first append to the commit log and then write to the memtable in memory • This model gives fast writes When the memtable is full, it is flushed to disk to give SSTable 3 The SSTables are immutable @r39132 49
  • 50. Single Node Design In a pure KV LSM store (e.g. Level DB), when a Read occurs, the memtable is first consulted If the key is found, the data is returned from the memtable This is a fast read If the data is not in the memtable, then it is read from the latest SSTable This is a slower read @r39132 50
  • 51. Single Node Design Since Cassandra supports multiple value columns, only a subset of which may be written at any time, a read must consult memtable and potentially multiple sstables Hence, reads can be slower than they would be for pure LSM KV stores @r39132 51
  • 52. Single Node Design To reduce the number of SSTables that need to be read, periodic compaction needs to be run This puts an additional load on GC and I/O The end result of compaction is fewer files with hopefully fewer total rows To mitigate the problems inherent in compaction, Cassandra 1.x support Leveled Compaction @r39132 52
  • 54. Current Status Timeline of Migration • Jan 2009 – Start investigating the AWS cloud & SimpleDB • Feb 2010 – Deliver the app to production • Dec 2010 – ~95% of Netflix traffic has been moved into both the AWS cloud and SimpleDB (+10 months) • Apr 2011 – Start migration to Cassandra • Jan 2012 – Netflix EU launched on Cassandra (+9 months) @r39132 54

Hinweis der Redaktion

  1. Existing functionality needs to move in phasesLimits the risk and exposure to bugsLimits conflicts with new product launches
  2. Existing functionality needs to move in phasesLimits the risk and exposure to bugsLimits conflicts with new product launches
  3. Existing functionality needs to move in phasesLimits the risk and exposure to bugsLimits conflicts with new product launches
  4. Dynamo storage doesn’t suffer from this!