Data Applications and Infrastructure at LinkedIn__HadoopSummit2010

•

97 likes•7,776 views

Yahoo Developer Network

Hadoop Summit 2010 - application track Data Applications and Infrastructure at LinkedIn Jay Kreps, LinkedIn

Technology

Data Applications and Infrastructure at LinkedIn ,[object Object],LinkedIn

Plan ,[object Object],[object Object],[object Object]

Data-centric engineering at LinkedIn ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

People You May Know ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Relevance Products ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Infrastructure as an Ecosystem ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Open Source Zoie – Faceted Search Bobo – Real-time search indexing Decomposer – Very large matrix decomposition routines (now in Mahout) Norbert – Partition aware cluster management & RPC Voldemort – Key/Value storage Kamikaze – Compression package Sensei – Distributed search Azkaban – Hadoop workflow

Azkaban workflow:hadoop :: web framework:webapp

Azkaban Examples ,[object Object],Example workflow UI

Azkaban ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Data Deployment How do you get your multi-billion edge probabilistic relationship graph to the live website to serve queries?

Voldemort ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Voldemort Data Deployment ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

What's hot

WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsMars Lan

What's new in SQL on Hadoop and BeyondDataWorks Summit/Hadoop Summit

Lambda-less Stream Processing @Scale in LinkedIn DataWorks Summit/Hadoop Summit

Big Data Ready Enterprise DataWorks Summit/Hadoop Summit

What is an Open Data Lake? - Data Sheets | WhitepaperVasu S

Discovery & Consumption of Analytics Data @TwitterKamran Munshi

Benefits of Hadoop as Platform as a ServiceDataWorks Summit/Hadoop Summit

Machine learning at scale challenges and solutionsStavros Kontopoulos

Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseDataWorks Summit

The Past, Present and Future of Big Data @LinkedInSuja Viswesan

Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Shirshanka Das

[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business

AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...Databricks

Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Flink Forward

Querying Druid in SQL with SupersetDataWorks Summit

Gobblin' Big Data With Ease @ QConSF 2014Lin Qiao

Spark and Couchbase– Augmenting the Operational Database with SparkMatt Ingenthron

Schema-on-Read vs Schema-on-WriteAmr Awadallah

Data Infrastructure at LinkedInAmy W. Tang

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Data Con LA

What's hot (20)

WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms

What's new in SQL on Hadoop and Beyond

Lambda-less Stream Processing @Scale in LinkedIn

Big Data Ready Enterprise

What is an Open Data Lake? - Data Sheets | Whitepaper

Discovery & Consumption of Analytics Data @Twitter

Benefits of Hadoop as Platform as a Service

Machine learning at scale challenges and solutions

Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse

The Past, Present and Future of Big Data @LinkedIn

Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop

[Webinar] Getting to Insights Faster: A Framework for Agile Big Data

AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...

Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...

Querying Druid in SQL with Superset

Gobblin' Big Data With Ease @ QConSF 2014

Spark and Couchbase– Augmenting the Operational Database with Spark

Schema-on-Read vs Schema-on-Write

Data Infrastructure at LinkedIn

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...

Viewers also liked

Building a Real-Time Data Pipeline: Apache Kafka at LinkedInAmy W. Tang

Graph dbGagan Agrawal

GraphDB Connectors – Powering Complex SPARQL QueriesMarin Dimitrov

LinkedIn Data Infrastructure Slides (Version 2)Sid Anand

Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das

Architecture of a Kafka camus infrastructuremattlieber

Netflix Data Pipeline With KafkaAllen (Xiaozhong) Wang

NoSQL x SQL: Bancos de Dados em Nuvens ComputacionaisCarlo Pires

The Big Data Analytics Ecosystem at LinkedInrajappaiyer

Apache KafkaMaher TEBOURBI

Bigger Faster Easier: LinkedIn Hadoop Summit 2015Shirshanka Das

Text Analytics & Linked Data Management As-a-ServiceMarin Dimitrov

Realtime streaming architecture in INFINARIOJozo Kovac

IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...In-Memory Computing Summit

Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...rajappaiyer

Bringing OLTP woth OLAP: Lumos on HadoopDataWorks Summit

Comparação de desempenho entre SQL e NoSQLpichiliani

Free Code Friday - Spark Streaming with HBaseMapR Technologies

Real-time Analytics with Apache Flink and DruidJan Graßegger

Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Shirshanka Das

Viewers also liked (20)

Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn

Graph db

GraphDB Connectors – Powering Complex SPARQL Queries

LinkedIn Data Infrastructure Slides (Version 2)

Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem

Architecture of a Kafka camus infrastructure

Netflix Data Pipeline With Kafka

NoSQL x SQL: Bancos de Dados em Nuvens Computacionais

The Big Data Analytics Ecosystem at LinkedIn

Apache Kafka

Bigger Faster Easier: LinkedIn Hadoop Summit 2015

Text Analytics & Linked Data Management As-a-Service

Realtime streaming architecture in INFINARIO

IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...

Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...

Bringing OLTP woth OLAP: Lumos on Hadoop

Comparação de desempenho entre SQL e NoSQL

Free Code Friday - Spark Streaming with HBase

Real-time Analytics with Apache Flink and Druid

Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012

Similar to Data Applications and Infrastructure at LinkedIn__HadoopSummit2010

UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin

Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal

Hadoop and Voldemort @ LinkedInHadoop User Group

Os Solomonoscon2007

Super Sizing Youtube with Pythondidip

scale_perf_best_practiceswebuploader

Bhupeshbansal bigdata Bhupesh Bansal

Front Range PHP NoSQL DatabasesJon Meredith

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely

Drupal Performance : DrupalCamp NorthPhilip Norton

Web20expo Scalable Web Archroyans

Web20expo Scalable Web Archguest18a0f1

Web20expo Scalable Web Archmclee

Java ee7 with apache spark for the world's largest credit card core systems, ...Rakuten Group, Inc.

Stream Processing with CompletableFuture and Flow in Java 9Trayan Iliev

Final deckSteve Watt

Beat the devil: towards a Drupal performance benchmarkPedro González Serrano

Performance Analysis of Idle Programsgreenwop

Real time analyticsLeandro Totino Pereira

Apache Kafka® and the Data MeshConfluentInc1

Similar to Data Applications and Infrastructure at LinkedIn__HadoopSummit2010 (20)

UnConference for Georgia Southern Computer Science March 31, 2015

Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010

Hadoop and Voldemort @ LinkedIn

Os Solomon

Super Sizing Youtube with Python

scale_perf_best_practices

Bhupeshbansal bigdata

Front Range PHP NoSQL Databases

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...

Drupal Performance : DrupalCamp North

Web20expo Scalable Web Arch

Java ee7 with apache spark for the world's largest credit card core systems, ...

Stream Processing with CompletableFuture and Flow in Java 9

Final deck

Beat the devil: towards a Drupal performance benchmark

Performance Analysis of Idle Programs

Real time analytics

Apache Kafka® and the Data Mesh

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Install Stable Diffusion in windows machinePadma Pradeep

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Pigging Solutions Piggable Sweeping ElbowsPigging Solutions

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Key Features Of Token Development (1).pptxLBM Solutions

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Install Stable Diffusion in windows machine

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Pigging Solutions Piggable Sweeping Elbows

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

08448380779 Call Girls In Friends Colony Women Seeking Men

Unblocking The Main Thread Solving ANRs and Frozen Frames

Presentation on how to chat with PDF using ChatGPT code interpreter

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Key Features Of Token Development (1).pptx

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Handwritten Text Recognition for manuscripts and early printed texts

Human Factors of XR: Using Human Factors to Design XR Systems

Benefits Of Flutter Compared To Other Frameworks

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

My Hashitalk Indonesia April 2024 Presentation

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation