SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Downloaden Sie, um offline zu lesen
Data Infrastructure @ LinkedIn (v2)
Sid Anand
QCon NY (June 2012)




                                      1
What Do I Mean by V2?
V2 == version of this talk, not version of our architecture.
                                      *


Version 1 of this talk
•  Presented at QCon London (March 2012)
    •  http://www.infoq.com/presentations/Data-Infrastructure-LinkedIn



Version 2 – i.e. this talk
•  Contains some overlapping content
•  Introduces Espresso, our new NoSQL database



For more information on what LinkedIn Engineering is doing, feel free to
follow @LinkedInEng



                                          @r39132                          2
About Me
Current Life…
                                   *
  LinkedIn
      Site Ops
      Web (Software) Engineering
           Search, Network, and Analytics (SNA)
               Distributed Data Systems (DDS)
                   Me (working on analytics projects)

In Recent History…
  Netflix, Cloud Database Architect (4.5 years)
  eBay, Web Development, Research Labs, & Search Engine (4 years)




                                      @r39132                        3
Let’s Talk Numbers!




        @r39132       4
The world’s largest professional network
Over 60% of members are outside of the United States



                                                                    82%
                                                                                   *

                                     161M+          *



                                             90                     Fortune 100 Companies
                                                                    use LinkedIn to hire



                                                                    >2M
                                                                                    *




                                                                    Company Pages
                                      55



                                                                    17
                                                                              *



                               32
                                                                    Languages



                                                                  ~4.2B
                        17

                  8
  2      4
                                                                  Professional searches in 2011
 2004   2005    2006   2007   2008   2009    2010                                   *as of March 31, 2012

               LinkedIn Members (Millions)


                                                        @r39132                                       5
Our Architecture




      @r39132      6
LinkedIn : Architecture	


Overview

•  Our site runs primarily on Java, with some use of Scala for specific
   infrastructure

•  What runs on Scala?
   •  Network Graph Service
   •  Kafka

•  Most of our services run on Apache Traffic Server + Jetty




                                 @r39132                             7
LinkedIn : Architecture	


                                                                           A web page requests
                                                                            information A and B



                                                  Presentation Tier        A thin layer focused on
                                                                            building the UI. It assembles
                                                                            the page by making parallel
                                                                            requests to the BT


      A     A      B     B         C        C         Business Tier        Encapsulates business
                                                                            logic. Can call other BT
                                                                            clusters and its own DAT
                                                                            cluster.
      A     A      B     B         C        C         Data Access Tier     Encapsulates DAL logic




                                                  Data Infrastructure
 Oracle   Oracle       Oracle           Oracle                             Concerned with the
                                                                            persistent storage of and
                                                                            easy access to data
                                Memcached




                                            @r39132                                               8
LinkedIn : Architecture	


                                                                           A web page requests
                                                                            information A and B



                                                  Presentation Tier        A thin layer focused on
                                                                            building the UI. It assembles
                                                                            the page by making parallel
                                                                            requests to the BT


      A     A      B     B         C        C         Business Tier        Encapsulates business
                                                                            logic. Can call other BT
                                                                            clusters and its own DAT
                                                                            cluster.
      A     A      B     B         C        C         Data Access Tier     Encapsulates DAL logic




                                                  Data Infrastructure
 Oracle   Oracle       Oracle           Oracle                             Concerned with the
                                                                            persistent storage of and
                                                                            easy access to data
                                Memcached




                                            @r39132                                               9
Data Infrastructure Technologies




              @r39132              10
LinkedIn Data Infrastructure Technologies

Oracle: Source of Truth for User-Provided Data




                              @r39132            11
Oracle : Overview	

Oracle
•  Until recently, all user-provided data was stored in Oracle – our source of truth
    •  Espresso is ramping up
•  About 50 Schemas running on tens of physical instances
•  With our user base and traffic growing at an accelerating pace, how do we scale
   Oracle for user-provided data?

Scaling Reads
•  Oracle Slaves
•  Memcached
•  Voldemort – for key-value lookups
                                                                              ***



Scaling Writes
•  Move to more expensive hardware or replace Oracle with something better




                                        @r39132                                     12
LinkedIn Data Infrastructure Technologies
Voldemort: Highly-Available Distributed Data Store




                              @r39132                13
Voldemort : Overview	

•    A distributed, persistent key-value store influenced by the Amazon Dynamo paper

•    Key Features of Dynamo
        Highly Scalable, Available, and Performant
        Achieves this via Tunable Consistency
          •  Strong consistency comes with a cost – i.e. lower availability and higher response times
          •  The user can tune this to his/her needs

        Provides several self-healing mechanisms when data does become inconsistent
          •  Read Repair
                 Repairs value for a key when the key is looked up/read
          •  Hinted Handoff
                 Buffers value for a key that wasn’t successfully written, then writes it later
          •  Anti-Entropy Repair
                 Scans the entire data set on a node and fixes it

        Provides means to detect node failure and a means to recover from node failure
          •  Failure Detection
          •  Bootstrapping New Nodes
                                                @r39132                                            14
Voldemort : Overview	

                        API                                    Layered, Pluggable Architecture
     VectorClock<V> get (K key)
                                                                             Client
     put (K key, VectorClock<V> value)
     applyUpdate(UpdateAction action, int retries)                         Client API
                                                                      Conflict Resolution
Voldemort-specific Features                                               Serialization
  Implements a layered, pluggable                                     Repair Mechanism
   architecture
                                                                        Failure Detector
                                                                            Routing
  Each layer implements a common interface
   (c.f. API). This allows us to replace or
   remove implementations at any layer                                       Server

                                                                  Repair Mechanism
    •  Pluggable data storage layer                                Failure Detector         Admin
           BDB JE, Custom RO storage,
                                                                       Routing
            etc…
                                                                         Storage Engine

    •  Pluggable routing supports
           Single or Multi-datacenter routing



                                                     @r39132                                  15
Voldemort : Overview	

                                                 Layered, Pluggable Architecture
                                                               Client
Voldemort-specific Features
                                                             Client API
                                                        Conflict Resolution
•  Supports Fat client or Fat Server                        Serialization
                                                         Repair Mechanism
    •  Repair Mechanism + Failure                         Failure Detector

       Detector + Routing can run on                          Routing

       server or client
                                                               Server


•  LinkedIn currently runs Fat Client, but we       Repair Mechanism
   would like to move this to a Fat Server           Failure Detector         Admin
   Model                                                 Routing
                                                           Storage Engine




                                       @r39132                                  16
Where Does LinkedIn use
      Voldemort?




          @r39132         17
Voldemort : Usage Patterns @ LinkedIn	



 2 Usage-Patterns

   Read-Write Store
     –  A key-value alternative to Oracle
     –  Uses BDB JE for the storage engine
     –  50% of Voldemort Stores (aka Tables) are RW


   Read-only Store
     –  Uses a custom Read-only format
     –  50% of Voldemort Stores (aka Tables) are RO


   Let’s look at the RO Store

                                 @r39132              18
Voldemort : RO Store Usage at LinkedIn	

     People You May Know	

     Viewers of this profile also viewed	

   Related Searches	





Events you may be interested in	

          LinkedIn Skills	

          Jobs you may be
                                                                          interested in	





                                               @r39132                                       19
Voldemort : Usage Patterns @ LinkedIn	

RO Store Usage Pattern
1.    We schedule a Hadoop job to build a table (called “store” in Voldemort-speak)

2.    Azkaban, our Hadoop scheduler, detects Hadoop job completion and tells Voldemort to fetch the new data
      and index files

3.    Voldemort fetches the data and index files in parallel from HDFS. Once fetch completes, swap indexes!

4.     Voldemort serves fast key-value look-ups on the site
      –    e.g. For key=“Sid Anand”, get all the people that “Sid Anand” may know!
      –    e.g. For key=“Sid Anand”, get all the jobs that “Sid Anand” may be interested in!




                                                     @r39132                                           20
How Do The Voldemort RO
     Stores Perform?




          @r39132         21
Voldemort : RO Store Performance : TP vs. Latency	


                                                             ●       MySQL ● Voldemort
                                                   ●   MySQL ● Voldemort ● Voldemort
                                                                      ●   MySQL
                                                       median MySQL ● Voldemort 99th percentile
                                                               ●

                                           median            median          ●     99th percentile ●percentile
                                                                                                     99th
                                                       median                                  99th percentile
                                   3.5                           ●          ●    ●
                                                                                250          ●                ●
                                                             ●              ●                            ●
                     3.5                3.5                     ●
                                                                     250 ●      ●                       ●
                                  3.5
                                   3.0          ●                                   250 ●       ●
                                                                                                   ● ●
                                                            ●
                                                                   ●           250     ●
                                                                                           ●               ●
                     3.0                                                      ● 200 ● ●             ● ●
                                                                                                       ● ●
                                        3.0                                ●       ●       ●
                                                                                               ●
                                                                                                  ● ●
                                  3.0                                                    ●
                                latency (ms)


                                   2.5                            ●
                                                                     200
                                                                        ●            ●         ●
                                                           ●                        200
                                                                               200
      latency (ms)




                     2.5
                             latency (ms)


                                        2.5         ●    ●             ●   ●
                            latency (ms)




                                  2.5
                                   2.0        ●
                                                   ●
                                                      ● ●       ●
                                                                 ●
                                                                       ●        150
                                                          ●
                     2.0                2.0● ●
                                         ●●               ● ●        150            150
                                  2.0 ●
                                   1.5            ●
                                                       ●
                                                     ● ●                       150                                  ●
                                                                                                                     ●
                            ●   ●              ● ● ●                            100
                                         ●● ●                                                                    ●
                     1.5                1.5
                                         ●      ●
                                                                                                            ●
                                                                                                           ●●            ●
                                                                                                                          ●
                                  1.5
                                   1.0               ●
                                                          ●          100            100                            ●
                                                                                                                    ●
                            ●
                                   ●
                                               ●
                                                     ●
                                                                               100                    ●●           ●
                     1.0                 ●
                                        1.0 ●                                    50 ● ● ● ● ●              ●
                                                                                                                ●
                                                                                                                ●
                                  1.0
                                   0.5                                               ●                 ●
                                                                                                       ●
                                                                                                           ●
                                                                       50 ●     ●    50 ●  ●      ●
                                                                                                  ●
                     0.5                0.5                                     50 ●
                                  0.5
                                   0.0                                             0
                     0.0           0.0                   0           0
                               0.0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700
                           100 200 300 400 500 600 700 throughput 300 400 500 600 700 500 600 700
                                                           100 200
                                       100 200 300 400 500 600 700 (qps) 200 300 400
                                                                       100
                                   100 200 300 400 500 600 700     100 200 300 400 500 600 700
                                                     throughput (qps) (qps)
                                                                throughput (qps)
                                                            throughput

                                                       100 GB data, 24 GB RAM 	


                                                                    @r39132                                                   22
LinkedIn Data Infrastructure Solutions
Databus : Timeline-Consistent Change Data Capture




                               @r39132              23
Where Does LinkedIn use
       Databus?




          @r39132         24
Databus : Use-Cases @ LinkedIn	




                  Updates
                                   Standard      Search        Graph        Read
                                    ization       Index        Index       Replicas




                Oracle
                                              Data Change Events



A user updates his profile with skills and position history. He also accepts a connection

•    The write is made to an Oracle Master and Databus replicates:
•    the profile change to the Standardization service
         E.G. the many (actually 40) forms of IBM are canonicalized for search-friendliness and
          recommendation-friendliness
•    the profile change to the Search Index service
         Recruiters can find you immediately by new keywords
•    the connection change to the Graph Index service
         The user can now start receiving feed updates from his new connections immediately


                                              @r39132                                        25
Databus Architecture




        @r39132        26
Databus : Architecture	

                                            Databus consists of 2 components
        Capture          Relay              •  Relay Service
   DB   Changes
                      Event Win                 •  “Relays” DB changes to
                                                   subscribers as they happen in
                     On-line                       Oracle
                    Changes
                                                •  Uses a sharded, in-memory,
                                                   circular buffer
                     Bootstrap                  •  Very fast, but has limited amount of
                                                   buffered data!
                          DB
                                            •  Bootstrap Service
                                                •  Serves historical data to
                                                   subscribers who have fallen behind
                                                   on the latest data from the “relay”
                                                •  Not as fast as the relay, but has
                                                   large amount of data buffered on
                                                   disk




                                  @r39132                                       27
Databus : Architecture	


                                                                   Client
        Capture          Relay                                     Consumer 1




                                                      Client Lib
                                            On-line




                                                      Databus
   DB   Changes                             Changes
                      Event Win                                    Consumer n
                     On-line
                                               ated
                    Changes              solid
                                     Con         eT
                                            Sinc
                                      Delta                        Client
                     Bootstrap
                                                                   Consumer 1




                                                      Client Lib
                                      Consistent




                                                      Databus
                                     Snapshot at U                 Consumer n
                            DB




                                  @r39132                                       28
Databus : Architecture - Bootstrap	

    Generate consistent snapshots and consolidated deltas
     during continuous updates

                                                                     Client
                              Read
                                                                      Consumer 1




                                                        Client Lib
                                                        Databus
                              Online
            Relay             Changes
                                                                      Consumer n
          Event Win

                 Read
                 Changes     Bootstrap server                        Bootstrap

                                            Read
           Log Writer                                                Server
                                        recent events

                           Replay
          Log Storage      events                       Snapshot Storage

                               Log Applier


                                    @r39132                                        29
LinkedIn Data Infrastructure Technologies
Espresso: Indexed Timeline-Consistent Distributed
Data Store




                              @r39132               30
Why Do We Need Yet Another
       NoSQL DB?




           @r39132           31
Espresso: Overview	

What is Oracle Missing?

•  (Infinite) Incremental Scalability
     •  Adding 50% more resources gives us 50% more scalability (e.g. storage and serving
         capacity, etc…). In effect, we do not want to see diminishing returns

•  Always Available
    •  Users of the system do not perceive any service interruptions
        •  Adding capacity or doing any other maintenance does not incur downtime
        •  Node failures do not cause downtime

•  Operational Flexibility and Cost Control
    •  No recurring license fees
    •  Runs on commodity hardware

•  Simplified Development Model
     •  E.g. Adding a column without running DDL ALTER TABLE commands




                                              @r39132                                       32
Espresso: Overview	

What Features Should We Retain from Oracle?
•  Limited Transactions : Constrained Strong Consistency
     •  We need consistency within an entity (more on this later)

•  Ability to Query by Fields other than the Primary Key
    •  a.k.a. non-PK columns in RDBMS

•  Must Feed a Change Data Capture system
    •  i.e. can act as a Databus Source
    •  Recall that a large ecosystem of analytic services are fed by the Oracle-Databus pipe. We
       need to continue to feed that ecosystem



 Guiding Principles when replacing a system (e.g. Oracle)
 •  Strive for Usage Parity, not Feature Parity
      •  In other words, first look at how you use Oracle and look for those features in candidate
         systems. Do not look for general feature parity between systems.
 •  Don’t shoot for a full replacement
      •  Buy yourself headroom by migrating the top K use-cases by load off Oracle. Leave the
         others.

                                               @r39132                                        33
What Does the API Look Like?




            @r39132            34
Espresso: API Example	

Consider the User-to-User Communication Case at LinkedIn
•  Users can view
    •  Inbox, sent folder, archived folder, etc….

•     On social networks such as LinkedIn and Facebook, user-to-user messaging consumes substantial database
      resources in terms of storage and CPU (e.g. activity)
        •  At LinkedIn 40% of Host DB CPU estimated to serve U2U Comm
        •  Footnote : Facebook migrated their messaging use-case to HBase

•     We’re moving this off Oracle to Espresso




                                                   @r39132                                           35
Espresso: API Example	

Database and Tables
•  Imagine that you have a Mailbox Database containing tables needed for the U2U Communications case
•  The Mailbox Database contains the following 3 tables:
     •  Message_Metadata – captures subject text
          •  Primary Key = MemberId & MsgId
     •  Message_Details – captures full message content
          •  Primary Key = MemberId & MsgId
     •  Mailbox_Aggregates – captures counts of read/unread & total
          •  Primary Key = MemberId



Example Read Request
Espresso supports REST semantics.

To get unread and total email count for “bob”, issue
a request of the form:
•  GET /<database_name>/<table_name>/
    <resource_id>
•  GET /Mailbox/Mailbox_Aggregates/bob




                                                       @r39132                                   36
Espresso: API Example	

Collection Resources vs. Singleton Resources
A resource identified by a resource_id may be either a singleton or a collection

Examples (Read)
  For singletons, the URI refers to an individual resource.
    –  GET /<database_name>/<table_name>/<resource_id>
    –  E.g.: Mailbox_Aggregates table
            To get unread and total email count for “bob”, issue a request of the form:
               –  GET /Mailbox/Mailbox_Aggregates/bob

    For collections, a secondary path element defines individual resources within the collection
      •  GET /<database_name>/<table_name>/<resource_id>/<subresource_id>
      •  E.g.: Message_Metadata & Message_Details tables
           •  To get all of “bob’s” mail metadata, specify the URI up to the <resource_id>
                 •  GET /Mailbox/Message_Metadata/bob
           •  To display one of “bob’s” messages in its entirety, specify the URI up to the
               <subresource_id>
                 •  GET /Mailbox/Message_Details/bob/4



                                              @r39132                                         37
What Does the Architecture Look
             Like?




              @r39132             38
Espresso: Architecture	

•    Components
     •    Request Routing Tier
           •    Stateless
           •    Accepts HTTP request
           •    Consults Cluster Manager to
                determine master for partition
           •    Forwards request to appropriate
                storage node
     •    Storage Tier
           •    Data Store (e.g. MySQL)
           •    Data is semi-structured (e.g. Avro)
           •    Local Secondary Index (Lucene)
     •    Cluster Manager
           •    Responsible for data set
                partitioning
           •    Notifies Routing Tier of partition
                locations for a data set
           •    Monitors health and executes
                repairs on an unhealthy cluster
     •    Relay Tier
           •    Replicates changes in commit
                order to subscribers (uses
                Databus)

                                                      @r39132   39
Next Steps for Espresso




          @r39132         40
Espresso: Next Steps	

Key Road Map Items

•  Internal Rollout for more use-cases within LinkedIn

•  Development
    •  Active-Active across data centers
    •  Global Search Indexes : “<resource_id>?query=…” for singleton resources as well
    •  More Fault Tolerance measures in the Cluster Manager (Helix)

•  Open Source Helix (Cluster Manager) in Summer of 2012
    •  Look out of an article in High Scalability

•  Open Source Espresso in the beginning of 2013




                                             @r39132                                     41
Acknowledgments	

Presentation & Content
    Tom Quiggle (Espresso)                                       @TomQuiggle
    Kapil Surlaker (Espresso)                                    @kapilsurlaker
    Shirshanka Das (Espresso)                                    @shirshanka
    Chavdar Botev (Databus)                                      @cbotev
    Roshan Sumbaly (Voldemort)                                   @rsumbaly
    Neha Narkhede (Kafka)                                        @nehanarkhede

A Very Talented Development Team
     Aditya Auradkar, Chavdar Botev, Antony Curtis, Vinoth Chandar, Shirshanka Das, Dave
     DeMaagd, Alex Feinberg, Phanindra Ganti, Mihir Gandhi, Lei Gao, Bhaskar Ghosh, Kishore
     Gopalakrishna, Brendan Harris, Todd Hendricks, Swaroop Jagadish, Joel Koshy, Brian
     Kozumplik, Kevin Krawez, Jay Kreps, Shi Lu, Sunil Nagaraj, Neha Narkhede, Sasha
     Pachev, Igor Perisic, Lin Qiao, Tom Quiggle, Jun Rao, Bob Schulman, Abraham Sebastian,
     Oliver Seeliger, Rupa Shanbag, Adam Silberstein, Boris Shkolnik, Chinmay Soman, Subbu
     Subramaniam, Roshan Sumbaly, Kapil Surlaker, Sajid Topiwala, Cuong Tran, Balaji
     Varadarajan, Jemiah Westerman, Zhongjie Wu, Zach White, Yang Ye, Mammad Zadeh,
     David Zhang, and Jason Zhang



                                          @r39132                                     42
y Questions?




   @r39132     43

Weitere ähnliche Inhalte

Was ist angesagt?

Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
Flink Forward
 

Was ist angesagt? (20)

BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
AWS DynamoDB and Schema Design
AWS DynamoDB and Schema DesignAWS DynamoDB and Schema Design
AWS DynamoDB and Schema Design
 
Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentation
 
Data Modeling with Power BI
Data Modeling with Power BIData Modeling with Power BI
Data Modeling with Power BI
 
Best practices on building data lakes and lake formation
Best practices on building data lakes and lake formationBest practices on building data lakes and lake formation
Best practices on building data lakes and lake formation
 
Frame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningFrame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine Learning
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
 
Extreme Salesforce Data Volumes Webinar
Extreme Salesforce Data Volumes WebinarExtreme Salesforce Data Volumes Webinar
Extreme Salesforce Data Volumes Webinar
 
MelOn 빅데이터 플랫폼과 Tajo 이야기
MelOn 빅데이터 플랫폼과 Tajo 이야기MelOn 빅데이터 플랫폼과 Tajo 이야기
MelOn 빅데이터 플랫폼과 Tajo 이야기
 
Data partitioning
Data partitioningData partitioning
Data partitioning
 
Webinar | Introduction to Amazon DynamoDB
Webinar | Introduction to Amazon DynamoDBWebinar | Introduction to Amazon DynamoDB
Webinar | Introduction to Amazon DynamoDB
 
Database migration
Database migrationDatabase migration
Database migration
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive
 
BI in the Cloud - Microsoft Power BI Overview and Demo
BI in the Cloud - Microsoft Power BI Overview and DemoBI in the Cloud - Microsoft Power BI Overview and Demo
BI in the Cloud - Microsoft Power BI Overview and Demo
 
Mis assignment (database)
Mis assignment (database)Mis assignment (database)
Mis assignment (database)
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
 
Apache drill
Apache drillApache drill
Apache drill
 
DBMS - Normalization
DBMS - NormalizationDBMS - Normalization
DBMS - Normalization
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
 
Object Oriented Dbms
Object Oriented DbmsObject Oriented Dbms
Object Oriented Dbms
 

Andere mochten auch

EBAY_VIACOM_UPFRONT.pptm
EBAY_VIACOM_UPFRONT.pptmEBAY_VIACOM_UPFRONT.pptm
EBAY_VIACOM_UPFRONT.pptm
Suhail Mahajan
 
خليك بروفيشنال | حدث وجود | انوار رسالة الشرقية
خليك بروفيشنال | حدث وجود | انوار رسالة الشرقيةخليك بروفيشنال | حدث وجود | انوار رسالة الشرقية
خليك بروفيشنال | حدث وجود | انوار رسالة الشرقية
Ahmed Faris
 

Andere mochten auch (20)

Data Infrastructure at LinkedIn
Data Infrastructure at LinkedInData Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
 
Viacom
ViacomViacom
Viacom
 
Victors & Spoils at DIS: Rethinking the Agency Model
Victors & Spoils at DIS: Rethinking the Agency ModelVictors & Spoils at DIS: Rethinking the Agency Model
Victors & Spoils at DIS: Rethinking the Agency Model
 
EBAY_VIACOM_UPFRONT.pptm
EBAY_VIACOM_UPFRONT.pptmEBAY_VIACOM_UPFRONT.pptm
EBAY_VIACOM_UPFRONT.pptm
 
Viacom culture
Viacom cultureViacom culture
Viacom culture
 
+Data analysis using tableau
+Data analysis using tableau+Data analysis using tableau
+Data analysis using tableau
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
 
MongoDB @ Viacom
MongoDB @ ViacomMongoDB @ Viacom
MongoDB @ Viacom
 
(1) Espresso Shots of Business Wisdom
(1) Espresso Shots of Business Wisdom(1) Espresso Shots of Business Wisdom
(1) Espresso Shots of Business Wisdom
 
The Age of Distraction at VIACOM Identify
The Age of Distraction at VIACOM IdentifyThe Age of Distraction at VIACOM Identify
The Age of Distraction at VIACOM Identify
 
لتخطيط الاستراتيجي عبد الرحمن تيشوري اتصالات طرطجوس
لتخطيط الاستراتيجي عبد الرحمن تيشوري اتصالات طرطجوسلتخطيط الاستراتيجي عبد الرحمن تيشوري اتصالات طرطجوس
لتخطيط الاستراتيجي عبد الرحمن تيشوري اتصالات طرطجوس
 
Linked Data in Bibliothekskatalogen
Linked Data in BibliothekskatalogenLinked Data in Bibliothekskatalogen
Linked Data in Bibliothekskatalogen
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
 
B2B Nespresso Presentation
B2B Nespresso PresentationB2B Nespresso Presentation
B2B Nespresso Presentation
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Linked in presentation
Linked in presentationLinked in presentation
Linked in presentation
 
خليك بروفيشنال | حدث وجود | انوار رسالة الشرقية
خليك بروفيشنال | حدث وجود | انوار رسالة الشرقيةخليك بروفيشنال | حدث وجود | انوار رسالة الشرقية
خليك بروفيشنال | حدث وجود | انوار رسالة الشرقية
 

Ähnlich wie LinkedIn Data Infrastructure Slides (Version 2)

Kuldeep presentation ppt
Kuldeep presentation pptKuldeep presentation ppt
Kuldeep presentation ppt
kuldeep khichar
 
Cloud Computing overview and case study
Cloud Computing overview and case studyCloud Computing overview and case study
Cloud Computing overview and case study
Babak Hosseinzadeh
 
Unleash The Power Of Your Data Using Open Source Business Intelligence
Unleash The Power Of Your Data Using Open Source Business IntelligenceUnleash The Power Of Your Data Using Open Source Business Intelligence
Unleash The Power Of Your Data Using Open Source Business Intelligence
MySQLConference
 
Sap business objects BI4.0 reporting presentation
Sap business objects BI4.0 reporting presentationSap business objects BI4.0 reporting presentation
Sap business objects BI4.0 reporting presentation
shaktell2
 
IOT Editable Design Presentation Editable
IOT Editable Design Presentation EditableIOT Editable Design Presentation Editable
IOT Editable Design Presentation Editable
hareyaa
 

Ähnlich wie LinkedIn Data Infrastructure Slides (Version 2) (20)

LinkedIn Data Infrastructure (QCon London 2012)
LinkedIn Data Infrastructure (QCon London 2012)LinkedIn Data Infrastructure (QCon London 2012)
LinkedIn Data Infrastructure (QCon London 2012)
 
(.Net Portfolio) Td Rodda
(.Net Portfolio) Td Rodda(.Net Portfolio) Td Rodda
(.Net Portfolio) Td Rodda
 
Ofm msft-interop-v5c-132827
Ofm msft-interop-v5c-132827Ofm msft-interop-v5c-132827
Ofm msft-interop-v5c-132827
 
Kuldeep presentation ppt
Kuldeep presentation pptKuldeep presentation ppt
Kuldeep presentation ppt
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
Introduction inbox v2.0
Introduction inbox v2.0Introduction inbox v2.0
Introduction inbox v2.0
 
Netflix in the cloud 2011
Netflix in the cloud 2011Netflix in the cloud 2011
Netflix in the cloud 2011
 
Babak Hosseinzadeh IT Portfolio Management In Shared Services & CC
Babak Hosseinzadeh   IT Portfolio Management In Shared Services & CCBabak Hosseinzadeh   IT Portfolio Management In Shared Services & CC
Babak Hosseinzadeh IT Portfolio Management In Shared Services & CC
 
Document Classification using DMX in SQL Server Analysis Services
Document Classification using DMX in SQL Server Analysis ServicesDocument Classification using DMX in SQL Server Analysis Services
Document Classification using DMX in SQL Server Analysis Services
 
The Internet
The InternetThe Internet
The Internet
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011
 
Cloud Computing overview and case study
Cloud Computing overview and case studyCloud Computing overview and case study
Cloud Computing overview and case study
 
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
 
Unleash The Power Of Your Data Using Open Source Business Intelligence
Unleash The Power Of Your Data Using Open Source Business IntelligenceUnleash The Power Of Your Data Using Open Source Business Intelligence
Unleash The Power Of Your Data Using Open Source Business Intelligence
 
Cloud Enterprise Integration
Cloud Enterprise IntegrationCloud Enterprise Integration
Cloud Enterprise Integration
 
Sap business objects BI4.0 reporting presentation
Sap business objects BI4.0 reporting presentationSap business objects BI4.0 reporting presentation
Sap business objects BI4.0 reporting presentation
 
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudIBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
 
IOT Editable Design Presentation Editable
IOT Editable Design Presentation EditableIOT Editable Design Presentation Editable
IOT Editable Design Presentation Editable
 
Leveraging PowerPivot
Leveraging PowerPivotLeveraging PowerPivot
Leveraging PowerPivot
 
Cloud Computing: Changing the software business
Cloud Computing: Changing the software businessCloud Computing: Changing the software business
Cloud Computing: Changing the software business
 

Mehr von Sid Anand

Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)
Sid Anand
 

Mehr von Sid Anand (20)

Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)Building High Fidelity Data Streams (QCon London 2023)
Building High Fidelity Data Streams (QCon London 2023)
 
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
 
Low Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionLow Latency Fraud Detection & Prevention
Low Latency Fraud Detection & Prevention
 
YOW! Data Keynote (2021)
YOW! Data Keynote (2021)YOW! Data Keynote (2021)
YOW! Data Keynote (2021)
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
 
Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
 
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
Cloud Native Data Pipelines (QCon Shanghai & Tokyo 2016)
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016
 
Airflow @ Agari
Airflow @ Agari Airflow @ Agari
Airflow @ Agari
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
 
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
 
Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)
 
Hands On with Maven
Hands On with MavenHands On with Maven
Hands On with Maven
 
Learning git
Learning gitLearning git
Learning git
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

LinkedIn Data Infrastructure Slides (Version 2)

  • 1. Data Infrastructure @ LinkedIn (v2) Sid Anand QCon NY (June 2012) 1
  • 2. What Do I Mean by V2? V2 == version of this talk, not version of our architecture. * Version 1 of this talk •  Presented at QCon London (March 2012) •  http://www.infoq.com/presentations/Data-Infrastructure-LinkedIn Version 2 – i.e. this talk •  Contains some overlapping content •  Introduces Espresso, our new NoSQL database For more information on what LinkedIn Engineering is doing, feel free to follow @LinkedInEng @r39132 2
  • 3. About Me Current Life… *   LinkedIn   Site Ops   Web (Software) Engineering   Search, Network, and Analytics (SNA)   Distributed Data Systems (DDS)   Me (working on analytics projects) In Recent History…   Netflix, Cloud Database Architect (4.5 years)   eBay, Web Development, Research Labs, & Search Engine (4 years) @r39132 3
  • 5. The world’s largest professional network Over 60% of members are outside of the United States 82% * 161M+ * 90 Fortune 100 Companies use LinkedIn to hire >2M * Company Pages 55 17 * 32 Languages ~4.2B 17 8 2 4 Professional searches in 2011 2004 2005 2006 2007 2008 2009 2010 *as of March 31, 2012 LinkedIn Members (Millions) @r39132 5
  • 6. Our Architecture @r39132 6
  • 7. LinkedIn : Architecture Overview •  Our site runs primarily on Java, with some use of Scala for specific infrastructure •  What runs on Scala? •  Network Graph Service •  Kafka •  Most of our services run on Apache Traffic Server + Jetty @r39132 7
  • 8. LinkedIn : Architecture   A web page requests information A and B Presentation Tier   A thin layer focused on building the UI. It assembles the page by making parallel requests to the BT A A B B C C Business Tier   Encapsulates business logic. Can call other BT clusters and its own DAT cluster. A A B B C C Data Access Tier   Encapsulates DAL logic Data Infrastructure Oracle Oracle Oracle Oracle   Concerned with the persistent storage of and easy access to data Memcached @r39132 8
  • 9. LinkedIn : Architecture   A web page requests information A and B Presentation Tier   A thin layer focused on building the UI. It assembles the page by making parallel requests to the BT A A B B C C Business Tier   Encapsulates business logic. Can call other BT clusters and its own DAT cluster. A A B B C C Data Access Tier   Encapsulates DAL logic Data Infrastructure Oracle Oracle Oracle Oracle   Concerned with the persistent storage of and easy access to data Memcached @r39132 9
  • 11. LinkedIn Data Infrastructure Technologies Oracle: Source of Truth for User-Provided Data @r39132 11
  • 12. Oracle : Overview Oracle •  Until recently, all user-provided data was stored in Oracle – our source of truth •  Espresso is ramping up •  About 50 Schemas running on tens of physical instances •  With our user base and traffic growing at an accelerating pace, how do we scale Oracle for user-provided data? Scaling Reads •  Oracle Slaves •  Memcached •  Voldemort – for key-value lookups *** Scaling Writes •  Move to more expensive hardware or replace Oracle with something better @r39132 12
  • 13. LinkedIn Data Infrastructure Technologies Voldemort: Highly-Available Distributed Data Store @r39132 13
  • 14. Voldemort : Overview •  A distributed, persistent key-value store influenced by the Amazon Dynamo paper •  Key Features of Dynamo   Highly Scalable, Available, and Performant   Achieves this via Tunable Consistency •  Strong consistency comes with a cost – i.e. lower availability and higher response times •  The user can tune this to his/her needs   Provides several self-healing mechanisms when data does become inconsistent •  Read Repair   Repairs value for a key when the key is looked up/read •  Hinted Handoff   Buffers value for a key that wasn’t successfully written, then writes it later •  Anti-Entropy Repair   Scans the entire data set on a node and fixes it   Provides means to detect node failure and a means to recover from node failure •  Failure Detection •  Bootstrapping New Nodes @r39132 14
  • 15. Voldemort : Overview API Layered, Pluggable Architecture VectorClock<V> get (K key) Client put (K key, VectorClock<V> value) applyUpdate(UpdateAction action, int retries) Client API Conflict Resolution Voldemort-specific Features Serialization   Implements a layered, pluggable Repair Mechanism architecture Failure Detector Routing   Each layer implements a common interface (c.f. API). This allows us to replace or remove implementations at any layer Server Repair Mechanism •  Pluggable data storage layer Failure Detector Admin   BDB JE, Custom RO storage, Routing etc… Storage Engine •  Pluggable routing supports   Single or Multi-datacenter routing @r39132 15
  • 16. Voldemort : Overview Layered, Pluggable Architecture Client Voldemort-specific Features Client API Conflict Resolution •  Supports Fat client or Fat Server Serialization Repair Mechanism •  Repair Mechanism + Failure Failure Detector Detector + Routing can run on Routing server or client Server •  LinkedIn currently runs Fat Client, but we Repair Mechanism would like to move this to a Fat Server Failure Detector Admin Model Routing Storage Engine @r39132 16
  • 17. Where Does LinkedIn use Voldemort? @r39132 17
  • 18. Voldemort : Usage Patterns @ LinkedIn 2 Usage-Patterns   Read-Write Store –  A key-value alternative to Oracle –  Uses BDB JE for the storage engine –  50% of Voldemort Stores (aka Tables) are RW   Read-only Store –  Uses a custom Read-only format –  50% of Voldemort Stores (aka Tables) are RO   Let’s look at the RO Store @r39132 18
  • 19. Voldemort : RO Store Usage at LinkedIn People You May Know Viewers of this profile also viewed Related Searches Events you may be interested in LinkedIn Skills Jobs you may be interested in @r39132 19
  • 20. Voldemort : Usage Patterns @ LinkedIn RO Store Usage Pattern 1.  We schedule a Hadoop job to build a table (called “store” in Voldemort-speak) 2.  Azkaban, our Hadoop scheduler, detects Hadoop job completion and tells Voldemort to fetch the new data and index files 3.  Voldemort fetches the data and index files in parallel from HDFS. Once fetch completes, swap indexes! 4.  Voldemort serves fast key-value look-ups on the site –  e.g. For key=“Sid Anand”, get all the people that “Sid Anand” may know! –  e.g. For key=“Sid Anand”, get all the jobs that “Sid Anand” may be interested in! @r39132 20
  • 21. How Do The Voldemort RO Stores Perform? @r39132 21
  • 22. Voldemort : RO Store Performance : TP vs. Latency ● MySQL ● Voldemort ● MySQL ● Voldemort ● Voldemort ● MySQL median MySQL ● Voldemort 99th percentile ● median median ● 99th percentile ●percentile 99th median 99th percentile 3.5 ● ● ● 250 ● ● ● ● ● 3.5 3.5 ● 250 ● ● ● 3.5 3.0 ● 250 ● ● ● ● ● ● 250 ● ● ● 3.0 ● 200 ● ● ● ● ● ● 3.0 ● ● ● ● ● ● 3.0 ● latency (ms) 2.5 ● 200 ● ● ● ● 200 200 latency (ms) 2.5 latency (ms) 2.5 ● ● ● ● latency (ms) 2.5 2.0 ● ● ● ● ● ● ● 150 ● 2.0 2.0● ● ●● ● ● 150 150 2.0 ● 1.5 ● ● ● ● 150 ● ● ● ● ● ● ● 100 ●● ● ● 1.5 1.5 ● ● ● ●● ● ● 1.5 1.0 ● ● 100 100 ● ● ● ● ● ● 100 ●● ● 1.0 ● 1.0 ● 50 ● ● ● ● ● ● ● ● 1.0 0.5 ● ● ● ● 50 ● ● 50 ● ● ● ● 0.5 0.5 50 ● 0.5 0.0 0 0.0 0.0 0 0 0.0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700 100 200 300 400 500 600 700 throughput 300 400 500 600 700 500 600 700 100 200 100 200 300 400 500 600 700 (qps) 200 300 400 100 100 200 300 400 500 600 700 100 200 300 400 500 600 700 throughput (qps) (qps) throughput (qps) throughput 100 GB data, 24 GB RAM @r39132 22
  • 23. LinkedIn Data Infrastructure Solutions Databus : Timeline-Consistent Change Data Capture @r39132 23
  • 24. Where Does LinkedIn use Databus? @r39132 24
  • 25. Databus : Use-Cases @ LinkedIn Updates Standard Search Graph Read ization Index Index Replicas Oracle Data Change Events A user updates his profile with skills and position history. He also accepts a connection •  The write is made to an Oracle Master and Databus replicates: •  the profile change to the Standardization service   E.G. the many (actually 40) forms of IBM are canonicalized for search-friendliness and recommendation-friendliness •  the profile change to the Search Index service   Recruiters can find you immediately by new keywords •  the connection change to the Graph Index service   The user can now start receiving feed updates from his new connections immediately @r39132 25
  • 26. Databus Architecture @r39132 26
  • 27. Databus : Architecture Databus consists of 2 components Capture Relay •  Relay Service DB Changes Event Win •  “Relays” DB changes to subscribers as they happen in On-line Oracle Changes •  Uses a sharded, in-memory, circular buffer Bootstrap •  Very fast, but has limited amount of buffered data! DB •  Bootstrap Service •  Serves historical data to subscribers who have fallen behind on the latest data from the “relay” •  Not as fast as the relay, but has large amount of data buffered on disk @r39132 27
  • 28. Databus : Architecture Client Capture Relay Consumer 1 Client Lib On-line Databus DB Changes Changes Event Win Consumer n On-line ated Changes solid Con eT Sinc Delta Client Bootstrap Consumer 1 Client Lib Consistent Databus Snapshot at U Consumer n DB @r39132 28
  • 29. Databus : Architecture - Bootstrap   Generate consistent snapshots and consolidated deltas during continuous updates Client Read Consumer 1 Client Lib Databus Online Relay Changes Consumer n Event Win Read Changes Bootstrap server Bootstrap Read Log Writer Server recent events Replay Log Storage events Snapshot Storage Log Applier @r39132 29
  • 30. LinkedIn Data Infrastructure Technologies Espresso: Indexed Timeline-Consistent Distributed Data Store @r39132 30
  • 31. Why Do We Need Yet Another NoSQL DB? @r39132 31
  • 32. Espresso: Overview What is Oracle Missing? •  (Infinite) Incremental Scalability •  Adding 50% more resources gives us 50% more scalability (e.g. storage and serving capacity, etc…). In effect, we do not want to see diminishing returns •  Always Available •  Users of the system do not perceive any service interruptions •  Adding capacity or doing any other maintenance does not incur downtime •  Node failures do not cause downtime •  Operational Flexibility and Cost Control •  No recurring license fees •  Runs on commodity hardware •  Simplified Development Model •  E.g. Adding a column without running DDL ALTER TABLE commands @r39132 32
  • 33. Espresso: Overview What Features Should We Retain from Oracle? •  Limited Transactions : Constrained Strong Consistency •  We need consistency within an entity (more on this later) •  Ability to Query by Fields other than the Primary Key •  a.k.a. non-PK columns in RDBMS •  Must Feed a Change Data Capture system •  i.e. can act as a Databus Source •  Recall that a large ecosystem of analytic services are fed by the Oracle-Databus pipe. We need to continue to feed that ecosystem Guiding Principles when replacing a system (e.g. Oracle) •  Strive for Usage Parity, not Feature Parity •  In other words, first look at how you use Oracle and look for those features in candidate systems. Do not look for general feature parity between systems. •  Don’t shoot for a full replacement •  Buy yourself headroom by migrating the top K use-cases by load off Oracle. Leave the others. @r39132 33
  • 34. What Does the API Look Like? @r39132 34
  • 35. Espresso: API Example Consider the User-to-User Communication Case at LinkedIn •  Users can view •  Inbox, sent folder, archived folder, etc…. •  On social networks such as LinkedIn and Facebook, user-to-user messaging consumes substantial database resources in terms of storage and CPU (e.g. activity) •  At LinkedIn 40% of Host DB CPU estimated to serve U2U Comm •  Footnote : Facebook migrated their messaging use-case to HBase •  We’re moving this off Oracle to Espresso @r39132 35
  • 36. Espresso: API Example Database and Tables •  Imagine that you have a Mailbox Database containing tables needed for the U2U Communications case •  The Mailbox Database contains the following 3 tables: •  Message_Metadata – captures subject text •  Primary Key = MemberId & MsgId •  Message_Details – captures full message content •  Primary Key = MemberId & MsgId •  Mailbox_Aggregates – captures counts of read/unread & total •  Primary Key = MemberId Example Read Request Espresso supports REST semantics. To get unread and total email count for “bob”, issue a request of the form: •  GET /<database_name>/<table_name>/ <resource_id> •  GET /Mailbox/Mailbox_Aggregates/bob @r39132 36
  • 37. Espresso: API Example Collection Resources vs. Singleton Resources A resource identified by a resource_id may be either a singleton or a collection Examples (Read)   For singletons, the URI refers to an individual resource. –  GET /<database_name>/<table_name>/<resource_id> –  E.g.: Mailbox_Aggregates table   To get unread and total email count for “bob”, issue a request of the form: –  GET /Mailbox/Mailbox_Aggregates/bob   For collections, a secondary path element defines individual resources within the collection •  GET /<database_name>/<table_name>/<resource_id>/<subresource_id> •  E.g.: Message_Metadata & Message_Details tables •  To get all of “bob’s” mail metadata, specify the URI up to the <resource_id> •  GET /Mailbox/Message_Metadata/bob •  To display one of “bob’s” messages in its entirety, specify the URI up to the <subresource_id> •  GET /Mailbox/Message_Details/bob/4 @r39132 37
  • 38. What Does the Architecture Look Like? @r39132 38
  • 39. Espresso: Architecture •  Components •  Request Routing Tier •  Stateless •  Accepts HTTP request •  Consults Cluster Manager to determine master for partition •  Forwards request to appropriate storage node •  Storage Tier •  Data Store (e.g. MySQL) •  Data is semi-structured (e.g. Avro) •  Local Secondary Index (Lucene) •  Cluster Manager •  Responsible for data set partitioning •  Notifies Routing Tier of partition locations for a data set •  Monitors health and executes repairs on an unhealthy cluster •  Relay Tier •  Replicates changes in commit order to subscribers (uses Databus) @r39132 39
  • 40. Next Steps for Espresso @r39132 40
  • 41. Espresso: Next Steps Key Road Map Items •  Internal Rollout for more use-cases within LinkedIn •  Development •  Active-Active across data centers •  Global Search Indexes : “<resource_id>?query=…” for singleton resources as well •  More Fault Tolerance measures in the Cluster Manager (Helix) •  Open Source Helix (Cluster Manager) in Summer of 2012 •  Look out of an article in High Scalability •  Open Source Espresso in the beginning of 2013 @r39132 41
  • 42. Acknowledgments Presentation & Content   Tom Quiggle (Espresso) @TomQuiggle   Kapil Surlaker (Espresso) @kapilsurlaker   Shirshanka Das (Espresso) @shirshanka   Chavdar Botev (Databus) @cbotev   Roshan Sumbaly (Voldemort) @rsumbaly   Neha Narkhede (Kafka) @nehanarkhede A Very Talented Development Team Aditya Auradkar, Chavdar Botev, Antony Curtis, Vinoth Chandar, Shirshanka Das, Dave DeMaagd, Alex Feinberg, Phanindra Ganti, Mihir Gandhi, Lei Gao, Bhaskar Ghosh, Kishore Gopalakrishna, Brendan Harris, Todd Hendricks, Swaroop Jagadish, Joel Koshy, Brian Kozumplik, Kevin Krawez, Jay Kreps, Shi Lu, Sunil Nagaraj, Neha Narkhede, Sasha Pachev, Igor Perisic, Lin Qiao, Tom Quiggle, Jun Rao, Bob Schulman, Abraham Sebastian, Oliver Seeliger, Rupa Shanbag, Adam Silberstein, Boris Shkolnik, Chinmay Soman, Subbu Subramaniam, Roshan Sumbaly, Kapil Surlaker, Sajid Topiwala, Cuong Tran, Balaji Varadarajan, Jemiah Westerman, Zhongjie Wu, Zach White, Yang Ye, Mammad Zadeh, David Zhang, and Jason Zhang @r39132 42
  • 43. y Questions? @r39132 43