Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Computing" by Todd Papaioannou

•Download as PPTX, PDF•

3 likes•1,479 views

Yahoo Developer Network

Hadoop & thefuture of Cloud Computing Todd Papaioannou VP, Cloud Architecture By SearchNetMedia

what’s happening More publicly available human-generated content More interactions being tracked (e.g. clickstream data) More business processes are being digitized More history being kept = The Data Exhaust! Flickr : sub_lime79 BigData is here!

CUTTING THROUGH THE NOISE access audience blogs communication computerinternetmass media people networking technology Location Social Relationships Science UnderstandingUser Interests Flickr : Lomo-Cam

turning data into insights machine learning time series logic regression content clustering algorithms Ad inventory modeling user interest prediction Flickr : NASA Goddard Photo and Video factorization models

hadoop: lightning-fast Technology science + big data + insight = personal relevance = VALUE Flickr : DDFic

THE PLATFORM EFFECT THE HADOOP ECOSYSTEM and other Early Adopters Scale and productize Hadoop Orgs with Internet Scale Problems Add tools / frameworks, enhance Hadoop Enhance Hadoop Ecosystem Service Providers Grow ecosystem - Training, support, enhancements Apache Hadoop Virtuous Circle! ,[object Object]

Adoption -> InvestmentMainstream / Enterprise adoption Fund further development, enhancements 9

HADOOP IS GOING MAINSTREAM 2010 2008 2009 2007 The Datagraph Blog 10

hadoop at yahoo! “Where Science meets Data” PRODUCTS Data Analytics Content Optimization Content Enrichment Yahoo! Mail Anti-Spam Advertising Products Ad Optimization Ad Selection Big Data Processing & ETL DIMENSIONAL DATA CONTENT DATA PIPELINES HADOOP CLUSTERS Tens of thousands of servers APPLIED SCIENCE User Interest Prediction Ad inventory prediction Machine learning - search ranking Machine learning - ad targeting Machine learning - spam filtering 11

250 200 150 100 50 0 from project to core platform 90 80 70 60 50 40 30 20 10 0 38K Servers 170 PB Storage 1M+ Monthly Jobs Petabytes Thousands of Servers Today 2010 2007 2008 2009 2006 12

yahoo!’S Vision open source cloud Open Source Benefits »Avoid technological dead ends »Leverage community contributions »Workforce already trained Ongoing contributions Yahoo!’s adoption of open source Future contributions Cloud serving Storage 13

Similar to Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Computing" by Todd Papaioannou

SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! Sumeet Singh

IRJET- Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET Journal

HadoopWorkshopJuly2014Dieter De Witte

CloudsSheena Girdhar

How Do I Learn Big Databigdatabeginner

Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Edureka!

Acquia - NY Senate GSAAcquia

Svccg 2011-05-12Geoff Arnold

Introduction To Big Data and Use Cases on HadoopJongwook Woo

Social media with big data analyticsUniversiti Technologi Malaysia (UTM)

Big Data Systems: Past, Present & (Possibly) Future with @techmilindEMC

Computer Applications and Systems - Workshop VRaji Gogulapati

Social Worldprasadpawaskar

Social Media, Cloud Computing and architectureRick Mans

Project management in a virtual worldAlison Sigmon, M.Ed., PMP

Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev KumarYahoo Developer Network

Big data in actionTu Pham

DrupalCon Chicago 2011 ReportBack (11/03/30 - G. Bedford)DrupalCape

Similar to Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Computing" by Todd Papaioannou (20)

SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!

IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework

HadoopWorkshopJuly2014

Clouds

How Do I Learn Big Data

Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...

Acquia - NY Senate GSA

Svccg 2011-05-12

Introduction To Big Data and Use Cases on Hadoop

Social media with big data analytics

Big Data Systems: Past, Present & (Possibly) Future with @techmilind

Computer Applications and Systems - Workshop V

Social World

Social Media, Cloud Computing and architecture

Project management in a virtual world

Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar

Big data in action

DrupalCon Chicago 2011 ReportBack (11/03/30 - G. Bedford)

More from Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network

Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network

Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network

Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network

CICD at Oath using ScrewdriverYahoo Developer Network

Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network

How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network

The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network

Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network

Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network

HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network

Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network

Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network

Architecting Petabyte Scale AI ApplicationsYahoo Developer Network

Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network

Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network

Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network

February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network

February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network

February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network

More from Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media

Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...

Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan

Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...

CICD at Oath using Screwdriver

Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath

How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu

The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool

Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...

Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...

HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath

Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...

Moving the Oath Grid to Docker, Eric Badger, Oath

Architecting Petabyte Scale AI Applications

Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...

Jun 2017 HUG: YARN Scheduling – A Step Beyond

Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies

February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...

February 2017 HUG: Exactly-once end-to-end processing with Apache Apex

February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics

Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Computing" by Todd Papaioannou

1. Hadoop & thefuture of Cloud Computing Todd Papaioannou VP, Cloud Architecture By SearchNetMedia

2. what’s happening More publicly available human-generated content More interactions being tracked (e.g. clickstream data) More business processes are being digitized More history being kept = The Data Exhaust! Flickr : sub_lime79 BigData is here!

3. CUTTING THROUGH THE NOISE access audience blogs communication computerinternetmass media people networking technology Location Social Relationships Science UnderstandingUser Interests Flickr : Lomo-Cam

4. turning data into insights machine learning time series logic regression content clustering algorithms Ad inventory modeling user interest prediction Flickr : NASA Goddard Photo and Video factorization models

5. making it relevant Flickr : ogimogi

6. hadoop: lightning-fast Technology science + big data + insight = personal relevance = VALUE Flickr : DDFic

7. BEHIND every click

8. hadoop Flickr : Got Sarah

10. Adoption -> InvestmentMainstream / Enterprise adoption Fund further development, enhancements 9

11. HADOOP IS GOING MAINSTREAM 2010 2008 2009 2007 The Datagraph Blog 10

12. hadoop at yahoo! “Where Science meets Data” PRODUCTS Data Analytics Content Optimization Content Enrichment Yahoo! Mail Anti-Spam Advertising Products Ad Optimization Ad Selection Big Data Processing & ETL DIMENSIONAL DATA CONTENT DATA PIPELINES HADOOP CLUSTERS Tens of thousands of servers APPLIED SCIENCE User Interest Prediction Ad inventory prediction Machine learning - search ranking Machine learning - ad targeting Machine learning - spam filtering 11

13. 250 200 150 100 50 0 from project to core platform 90 80 70 60 50 40 30 20 10 0 38K Servers 170 PB Storage 1M+ Monthly Jobs Petabytes Thousands of Servers Today 2010 2007 2008 2009 2006 12

14. yahoo!’S Vision open source cloud Open Source Benefits »Avoid technological dead ends »Leverage community contributions »Workforce already trained Ongoing contributions Yahoo!’s adoption of open source Future contributions Cloud serving Storage 13

15. What does The Future hold? By Elsie

16. More BIG By BionicTeaching

17. Data in the cloud By Fadilfb

18. PrivateClouds By Zachstern

19. hybrid clouds By Calop

20. Automation

21. cloud fabrics

22. Questions?

Editor's Notes

The web is changing. It’s always evolving and changing. This evolution is about people-powered experiences and transient, unstructured data. My 16-year-old writes. He deletes. He retweets.In fact, a ton of the data on the web today is transient data. It exists for a moment and then it's gone. Its comments on Facebook, emails, content alerts, messenger updates, blogs, twitter feeds .In fact, only 5% of the information created in the world today is “structured”.
Yahoo!'s role has always been to cut through the noise and help people find what they want. We do that in many ways – primarily with deep science and insights, all relying on Hadoop. From curating people’s relationships to get more meaning out of them, to understanding their interests and their location, to adding a complex layer of science on top of all that – Hadoop’s right at the core of making all of that possible.
Turning data into insights isn't trivial. It's heavy lifting. It’s analysis and refinement of raw, unstructured information. It's also deep, best-in-class technology and science, and applying and improving this science is one of the things we do best at Yahoo! – using a variety of techniques as you see listed here.
Yahoo! has made investments in Hadoop that have enabled us to add much more relevance to our data, enrich it, extract insights, and deliver relevant, personalized content and experiences to our consumers. These same investments help deliver the right audiences to our advertisers. As a result of delivering that highly relevant experience to 600 million users around the world, Yahoo!’s one of the most trusted brands on the Internet.
Hadoop delivers huge value to Yahoo! by enabling the important stuff we do with all of our big data. Without it, we simply couldn’t deliver the engaging consumer experiences and advertiser value the way we do today. With Hadoop, we get the disruptive ability to rapidly innovate by customizing, personalizing and fusing people’s individual worlds with the Web at large, in a way no other company can today.
With 600 million people visiting Yahoo!, 11 billion times a month, generating 98 billion page views, Yahoo! is a leader in many categories, and people trust us to give them a great experience and show them what’s most interesting and relevant to them. Behind every click, we’re using Hadoop to optimize what you see on Yahoo.com. We serve about 3 million different versions of the Today Module every 24 hours. Hadoop allows us to analyze story clicks by applying machine learning so we can figure out what you like and give you more of it. Every click a person makes on our homepage – that’s around half a billion clicks per day – results in multiple personalized rankings being computed, each completing in less than 1/100th of a second. Within ~7 minutes of a user clicking on a story, our entire ranking model is updated. Our Content Optimization Engine creates a real-time feedback loop for our editors. They can serve up popular stories and pull out unpopular stories, based on what the algorithm is telling them in real time. Our modeling techniques help us deeply understand the content and eliminate the guesswork, so we can actually predict a story’s relevance and popularity with our audience.
Because of technologies like Hadoop and the rest of our Cloud platform, we’re learning and building faster and faster. It’s all about speed, innovation and real, substantial value to our business. At Yahoo, we’ve been using Hadoop across the company for the last five years, and I’ve shown you just a few examples. Based on our testing and experience, we believe Hadoop is now ready for mainstream enterprise use. We’ve deliberately chosen to invest in open source as the foundation of our cloud. Yahoo! is running the largest implementation of Hadoop in the world today.
An overview of the Hadoop EcosystemYahoo! employees, including Doug Cutting, initiated Apache Hadoop in 2005Since then, the ecosystem has expanded
Hadoop is at the center or our data eco system Every click, page view, search Foundation of our ad management & targeting systems Content Enrichment: (geo location, category) Customize content for users Where Science Meets DataMachine learning - algorithm developmentspam detectionad targetingpredicting user interest and ad inventory Research on ad effectivenessProvides Scale for Big DataDaily: 120TB, 3+PB. Total 70+PB data -- and growingWeb data growing at CAGR of 60% - by 2013 - 667 exabytes (Cisco)
Started Developing Hadoop 5 years ago Prototype of a 20 node clusterDedicated team developing Hadoop every since Focused on supporting Yahoo! needsContributing Hadoop to Apache and helping build the communityStarted as research projectsProgressed to applied science efforts supporting search and adv productsThen production systems (Ad Targeting, Content optimization)Now Hadoop usage has spread to all parts of our business Hadoop is our Big Data infrastructure -- It provides agility with Big Data50% of enterprises cited recent study said strongly considering Hadoop adoption Agility cited as the number one reason
People ask why we contribute to open sourceOpen Source helps us avoid technological dead endsBenefit from leveraging community contributionsAllows us to hire a workforce already trained in our technologyOpen sourcing our Cloud components starts with HadoopPigYahoo! Distribution of Hadoop (adding others)Yahoo! Traffic ServerZookeeperIn addition to benefiting from extern Hadoop contributions:Hive, Apache Web Server, Xen

Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Computing" by Todd Papaioannou

Recommended

Recommended

More Related Content

Similar to Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Computing" by Todd Papaioannou

Similar to Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Computing" by Todd Papaioannou (20)

More from Yahoo Developer Network

More from Yahoo Developer Network (20)

Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Computing" by Todd Papaioannou

Editor's Notes