SlideShare a Scribd company logo
1 of 12
Download to read offline
Spark and Hadoop at Yahoo:
Brought to you by YARN

Andy Feng
Yahoo! Hadoop
(afeng@yahoo-inc.com)
Personalized Web
Big-Data in Yahoo!

3

9/10/13
Hadoop + Spark:
Empowered by YARN

30k+ Yahoo! production nodes on YARN since Q1 2013
Shark Pilot: Advertising Data Analytics
§  Business questions
› 

Are two sets of audience cohorts similar to each other?

› 

What audience segment is most likely to be interested in this ad
campaign?

› 

In what way was the new front page rollout different than the
previous front page as far as audience engagement goes?

› 

What are the right metrics to define user engagement?

§  Shark pilot
› 

50 nodes, each w/ 96GB RAM
•  Currently loaded w/ 3.2 TB sample data in memory

› 

Homegrown BI tools for ad-hoc queries
•  Using Shark Server (contributed to community by Yahoo!)
Shark Perf: TCP-H Benchmark
Average
Seconds
600
500
400
300
200
100
0
Spark Pilot: Model Training Pipeline
§  A DAG of M/R jobs in Hadoop Streaming
› 

Feature extraction

› 

Train models

› 

Score and analyze models

§  Initial Spark prototype
› 

3x speedup on feature extraction

§  Production launch
› 

Apply Spark against complete pipeline

› 

Spark on 80 node cluster
•  Thanks to the enhanced UI and metrics in Spark 0.8

7

9/10/13
Use Case: Ad Targeting

Spark

M/R and Storm

8

9/10/13
Use Case: Content Recommendation
w/ Collaborative Filtering
Input

CF Learning

Ranking

Spark

Spark

9

9/10/13

Output
Spark-YARN: Deployment Simplified
run spark.deploy.yarn.Client --jar … --class … --args …
--queue …--num-workers … --worker-memory …

Spark-YARN (contributed by Yahoo!) is being adopted by
community (ex. Taobao) for production use. You should try it
on your Hadoop cluster.
10

9/10/13
Acknowledgement
§  AMPLab team
› 

Outstanding collaboration: Ion, Matei, Reynold, Patrick, Matt, …

§  Yahoo! Hadoop team
› 

Thomas, Bobby, Paul, Rajiv, Mithun, …

§  Yahoo! Lab.
› 

Mridul, Nathan, …

§  Yahoo! data analytics
› 

Supreeth, Ram, Tim, …

§  Yahoo! spark users
› 

Gavin, Jay, Hirakendu, …

11

9/10/13
We Are Hiring!
http://careers.yahoo.com/

More Related Content

What's hot

C#言語機能の作り方
C#言語機能の作り方C#言語機能の作り方
C#言語機能の作り方信之 岩永
 
Using PyFoam as library(第25回オープンCAE勉強会@関西)
Using PyFoam as library(第25回オープンCAE勉強会@関西)Using PyFoam as library(第25回オープンCAE勉強会@関西)
Using PyFoam as library(第25回オープンCAE勉強会@関西)TatsuyaKatayama
 
ตัวอย่างการเขียนผังงาน
ตัวอย่างการเขียนผังงานตัวอย่างการเขียนผังงาน
ตัวอย่างการเขียนผังงานsarida ruangthai
 
Racunarski hardver
Racunarski hardverRacunarski hardver
Racunarski hardverAleksaToni98
 
หลักการจัดองค์ประกอบของงานภาพเคลื่อนไหว (Compositing)
หลักการจัดองค์ประกอบของงานภาพเคลื่อนไหว (Compositing)หลักการจัดองค์ประกอบของงานภาพเคลื่อนไหว (Compositing)
หลักการจัดองค์ประกอบของงานภาพเคลื่อนไหว (Compositing)Dr.Kridsanapong Lertbumroongchai
 
Linuxカーネル超入門
Linuxカーネル超入門Linuxカーネル超入門
Linuxカーネル超入門Takashi Masuda
 
Javaトラブルに備えよう #jjug_ccc #ccc_h2
Javaトラブルに備えよう #jjug_ccc #ccc_h2Javaトラブルに備えよう #jjug_ccc #ccc_h2
Javaトラブルに備えよう #jjug_ccc #ccc_h2Norito Agetsuma
 
STMとROSをシリアル通信させて移動ロボットを作る
STMとROSをシリアル通信させて移動ロボットを作るSTMとROSをシリアル通信させて移動ロボットを作る
STMとROSをシリアル通信させて移動ロボットを作るmozyanari
 
แบบฝึกหัด เรื่อง พื้นฐานโปรแกรม-FLASH-CS6
แบบฝึกหัด เรื่อง พื้นฐานโปรแกรม-FLASH-CS6แบบฝึกหัด เรื่อง พื้นฐานโปรแกรม-FLASH-CS6
แบบฝึกหัด เรื่อง พื้นฐานโปรแกรม-FLASH-CS6nittaya cnp
 
검색엔진에 적용된 ChatGPT
검색엔진에 적용된 ChatGPT검색엔진에 적용된 ChatGPT
검색엔진에 적용된 ChatGPTTae Young Lee
 
すごーい!APIドキュメントを更新するだけでAPIが自動テストできちゃう!たのしー!
すごーい!APIドキュメントを更新するだけでAPIが自動テストできちゃう!たのしー! すごーい!APIドキュメントを更新するだけでAPIが自動テストできちゃう!たのしー!
すごーい!APIドキュメントを更新するだけでAPIが自動テストできちゃう!たのしー! dcubeio
 
Rustで3D graphics programming
Rustで3D graphics programmingRustで3D graphics programming
Rustで3D graphics programmingKiyotomoHiroyasu
 
Jbatch実践入門 #jdt2015
Jbatch実践入門 #jdt2015Jbatch実践入門 #jdt2015
Jbatch実践入門 #jdt2015Norito Agetsuma
 
Node.js從無到有 基本課程
Node.js從無到有 基本課程Node.js從無到有 基本課程
Node.js從無到有 基本課程Simon Su
 
เทคนิคการเล่าเรื่องดิจิทัล (Digital Storytelling Technique)
เทคนิคการเล่าเรื่องดิจิทัล (Digital Storytelling Technique)เทคนิคการเล่าเรื่องดิจิทัล (Digital Storytelling Technique)
เทคนิคการเล่าเรื่องดิจิทัล (Digital Storytelling Technique)Dr.Kridsanapong Lertbumroongchai
 
วัตสัน
วัตสันวัตสัน
วัตสันfateemeenorm
 
CyberChefの使い方(HamaCTF2019 WriteUp編)
CyberChefの使い方(HamaCTF2019 WriteUp編)CyberChefの使い方(HamaCTF2019 WriteUp編)
CyberChefの使い方(HamaCTF2019 WriteUp編)Shota Shinogi
 
JSR 352 “Batch Applications for the Java Platform”
JSR 352 “Batch Applications for the Java Platform”JSR 352 “Batch Applications for the Java Platform”
JSR 352 “Batch Applications for the Java Platform”Norito Agetsuma
 
Printing - การออกแบบสื่อสิ่งพิมพ์
Printing - การออกแบบสื่อสิ่งพิมพ์Printing - การออกแบบสื่อสิ่งพิมพ์
Printing - การออกแบบสื่อสิ่งพิมพ์Ploykarn Lamdual
 

What's hot (20)

C#言語機能の作り方
C#言語機能の作り方C#言語機能の作り方
C#言語機能の作り方
 
Using PyFoam as library(第25回オープンCAE勉強会@関西)
Using PyFoam as library(第25回オープンCAE勉強会@関西)Using PyFoam as library(第25回オープンCAE勉強会@関西)
Using PyFoam as library(第25回オープンCAE勉強会@関西)
 
ตัวอย่างการเขียนผังงาน
ตัวอย่างการเขียนผังงานตัวอย่างการเขียนผังงาน
ตัวอย่างการเขียนผังงาน
 
Racunarski hardver
Racunarski hardverRacunarski hardver
Racunarski hardver
 
หลักการจัดองค์ประกอบของงานภาพเคลื่อนไหว (Compositing)
หลักการจัดองค์ประกอบของงานภาพเคลื่อนไหว (Compositing)หลักการจัดองค์ประกอบของงานภาพเคลื่อนไหว (Compositing)
หลักการจัดองค์ประกอบของงานภาพเคลื่อนไหว (Compositing)
 
Linuxカーネル超入門
Linuxカーネル超入門Linuxカーネル超入門
Linuxカーネル超入門
 
Javaトラブルに備えよう #jjug_ccc #ccc_h2
Javaトラブルに備えよう #jjug_ccc #ccc_h2Javaトラブルに備えよう #jjug_ccc #ccc_h2
Javaトラブルに備えよう #jjug_ccc #ccc_h2
 
STMとROSをシリアル通信させて移動ロボットを作る
STMとROSをシリアル通信させて移動ロボットを作るSTMとROSをシリアル通信させて移動ロボットを作る
STMとROSをシリアル通信させて移動ロボットを作る
 
Glibc malloc internal
Glibc malloc internalGlibc malloc internal
Glibc malloc internal
 
แบบฝึกหัด เรื่อง พื้นฐานโปรแกรม-FLASH-CS6
แบบฝึกหัด เรื่อง พื้นฐานโปรแกรม-FLASH-CS6แบบฝึกหัด เรื่อง พื้นฐานโปรแกรม-FLASH-CS6
แบบฝึกหัด เรื่อง พื้นฐานโปรแกรม-FLASH-CS6
 
검색엔진에 적용된 ChatGPT
검색엔진에 적용된 ChatGPT검색엔진에 적용된 ChatGPT
검색엔진에 적용된 ChatGPT
 
すごーい!APIドキュメントを更新するだけでAPIが自動テストできちゃう!たのしー!
すごーい!APIドキュメントを更新するだけでAPIが自動テストできちゃう!たのしー! すごーい!APIドキュメントを更新するだけでAPIが自動テストできちゃう!たのしー!
すごーい!APIドキュメントを更新するだけでAPIが自動テストできちゃう!たのしー!
 
Rustで3D graphics programming
Rustで3D graphics programmingRustで3D graphics programming
Rustで3D graphics programming
 
Jbatch実践入門 #jdt2015
Jbatch実践入門 #jdt2015Jbatch実践入門 #jdt2015
Jbatch実践入門 #jdt2015
 
Node.js從無到有 基本課程
Node.js從無到有 基本課程Node.js從無到有 基本課程
Node.js從無到有 基本課程
 
เทคนิคการเล่าเรื่องดิจิทัล (Digital Storytelling Technique)
เทคนิคการเล่าเรื่องดิจิทัล (Digital Storytelling Technique)เทคนิคการเล่าเรื่องดิจิทัล (Digital Storytelling Technique)
เทคนิคการเล่าเรื่องดิจิทัล (Digital Storytelling Technique)
 
วัตสัน
วัตสันวัตสัน
วัตสัน
 
CyberChefの使い方(HamaCTF2019 WriteUp編)
CyberChefの使い方(HamaCTF2019 WriteUp編)CyberChefの使い方(HamaCTF2019 WriteUp編)
CyberChefの使い方(HamaCTF2019 WriteUp編)
 
JSR 352 “Batch Applications for the Java Platform”
JSR 352 “Batch Applications for the Java Platform”JSR 352 “Batch Applications for the Java Platform”
JSR 352 “Batch Applications for the Java Platform”
 
Printing - การออกแบบสื่อสิ่งพิมพ์
Printing - การออกแบบสื่อสิ่งพิมพ์Printing - การออกแบบสื่อสิ่งพิมพ์
Printing - การออกแบบสื่อสิ่งพิมพ์
 

Similar to Yahoo spark

Fikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine LearningFikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine LearningSukru Hasdemir
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingYahoo Developer Network
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnDatabricks
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about SparkGiivee The
 
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! Sumeet Singh
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Big Data Trend and Open Data
Big Data Trend and Open DataBig Data Trend and Open Data
Big Data Trend and Open DataJongwook Woo
 
Capstone Project Slides- Yelper
Capstone Project Slides- YelperCapstone Project Slides- Yelper
Capstone Project Slides- YelperChuan Sun
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!Cloudera, Inc.
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009yhadoop
 
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...Lillian Pierson
 
Introduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on HadoopIntroduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on HadoopJongwook Woo
 
Power Platform Leeds - November 2019 - Microsoft Ignite Announcements
Power Platform Leeds - November 2019 - Microsoft Ignite AnnouncementsPower Platform Leeds - November 2019 - Microsoft Ignite Announcements
Power Platform Leeds - November 2019 - Microsoft Ignite AnnouncementsSimon Doy
 
The Future Of SEO/Content Marketing
The Future Of SEO/Content MarketingThe Future Of SEO/Content Marketing
The Future Of SEO/Content MarketingBritney Muller
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopMark Ginnebaugh
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 

Similar to Yahoo spark (20)

Fikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine LearningFikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine Learning
 
Clickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache SparkClickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache Spark
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data Processing
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Big Data Trend and Open Data
Big Data Trend and Open DataBig Data Trend and Open Data
Big Data Trend and Open Data
 
Capstone Project Slides- Yelper
Capstone Project Slides- YelperCapstone Project Slides- Yelper
Capstone Project Slides- Yelper
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009
 
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Introduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on HadoopIntroduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on Hadoop
 
Power Platform Leeds - November 2019 - Microsoft Ignite Announcements
Power Platform Leeds - November 2019 - Microsoft Ignite AnnouncementsPower Platform Leeds - November 2019 - Microsoft Ignite Announcements
Power Platform Leeds - November 2019 - Microsoft Ignite Announcements
 
The Future Of SEO/Content Marketing
The Future Of SEO/Content MarketingThe Future Of SEO/Content Marketing
The Future Of SEO/Content Marketing
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & Hadoop
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 

Recently uploaded

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Yahoo spark

  • 1. Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com)
  • 4. Hadoop + Spark: Empowered by YARN 30k+ Yahoo! production nodes on YARN since Q1 2013
  • 5. Shark Pilot: Advertising Data Analytics §  Business questions ›  Are two sets of audience cohorts similar to each other? ›  What audience segment is most likely to be interested in this ad campaign? ›  In what way was the new front page rollout different than the previous front page as far as audience engagement goes? ›  What are the right metrics to define user engagement? §  Shark pilot ›  50 nodes, each w/ 96GB RAM •  Currently loaded w/ 3.2 TB sample data in memory ›  Homegrown BI tools for ad-hoc queries •  Using Shark Server (contributed to community by Yahoo!)
  • 6. Shark Perf: TCP-H Benchmark Average Seconds 600 500 400 300 200 100 0
  • 7. Spark Pilot: Model Training Pipeline §  A DAG of M/R jobs in Hadoop Streaming ›  Feature extraction ›  Train models ›  Score and analyze models §  Initial Spark prototype ›  3x speedup on feature extraction §  Production launch ›  Apply Spark against complete pipeline ›  Spark on 80 node cluster •  Thanks to the enhanced UI and metrics in Spark 0.8 7 9/10/13
  • 8. Use Case: Ad Targeting Spark M/R and Storm 8 9/10/13
  • 9. Use Case: Content Recommendation w/ Collaborative Filtering Input CF Learning Ranking Spark Spark 9 9/10/13 Output
  • 10. Spark-YARN: Deployment Simplified run spark.deploy.yarn.Client --jar … --class … --args … --queue …--num-workers … --worker-memory … Spark-YARN (contributed by Yahoo!) is being adopted by community (ex. Taobao) for production use. You should try it on your Hadoop cluster. 10 9/10/13
  • 11. Acknowledgement §  AMPLab team ›  Outstanding collaboration: Ion, Matei, Reynold, Patrick, Matt, … §  Yahoo! Hadoop team ›  Thomas, Bobby, Paul, Rajiv, Mithun, … §  Yahoo! Lab. ›  Mridul, Nathan, … §  Yahoo! data analytics ›  Supreeth, Ram, Tim, … §  Yahoo! spark users ›  Gavin, Jay, Hirakendu, … 11 9/10/13