SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Cloud Platform Group (CPG)

Presentation at IIT Chennai

March 29, 2012
Agenda

 CPG Mission and Value Proposition

 Fit within the Yahoo Stack

 Drill-down: User Generated Content (UGC)

 Drill-down: User Location

 Drill-down: Web Extractions

 Drill-down: Trending

 Q&A




Yahoo! Presentation, Confidential     2
Cloud Platform Group Mission



Create a global, scalable platform built on
science that enables rapid innovation and
  delivery of personalized, monetizable
       experiences across devices.




Yahoo! Presentation, Confidential   3   3/29/2012
CPG Value Proposition
            1     Agility with Stability

   LEGO powered by Content Agility




Yahoo! Presentation, Confidential          4
CPG Value Proposition
                                        2   Science at Scale




Yahoo! Presentation, Confidential   5
ILLUSTRATIVE SAMPLE


CPG powers all of Yahoo! today
                                                            MAIL
                  DISPLAY ADS                                                                    FRONT PAGE
                                                      powered by Edge,
                powered by Hadoop                                                              powered by CORE
                                                 Storage, Ranking, & Hadoop




       3x improvement in accuracy of ad       40% faster download time, 300K+ spam    Increased CTR by +263% for Today
     placements and our ability to forecast             mails blocked/ sec           Module by serving right content to the
          supply over legacy systems                                                      right user (over pre-CORE)


                                                                                                LIVESTAND
                  LEGO (YPP)                         SOCIAL CHROME
                                                                                           powered by Mobile &
            powered by Content Agility            powered by Social Platform
                                                                                      Cocktails Presentation Services




      Reduce time to launch new sites from      Over 22M net cumulative installs     Seamlessly distribute content across
               quarters to weeks                 since launch, Integrated into         devices in an experience that is
                                                News, Games, Movies, OMG, TV              elegant and personalized


Yahoo! Presentation, Confidential                              6
User Generated Content

     Unified, scalable platform that enables self expression and gets users to
     connect over content


USE CASE                          RESULTS
Increase content stickiness       UGC platforms are used by over 200 Yahoo!
and user retention; drive         properties with over 650M UGC actions per year
repeat usage across the           Comments              Message Boards
Yahoo! network
                                                         1/3 of US
                                            6M
                                                          Finance
                                         comments
                                                           traffic
                                         per month
                                                         from MB

                                                                     Ratings & Reviews
                                         Polls
SOLUTION
                                                                               40M user
UGC Cloud is a                                                                  ratings
scalable, real-time platform               1.2M poll                              per
                                           votes per                            month
that lets users to express                  month
themselves, resulting in
increased user
engagement and a vibrant
Yahoo! community
User Generated Content – Applications

Improving Comment Quality




3 pronged approach – Machine; Human and Community Moderation
300M analyzed, 70 M blocked with machine moderation
Reactive Volume (cost of reacting to abuse) avoided




Sentiment Slider

http://news.yahoo.com/open-business-free-agency-set-begin-211828913--spt.html
User Generated Content – Social Poll
User Generated Content – In the Works
Topical Organization of Comments   Social Conversations
User Location
        Store, manage & share user locations and locations of interest to create
        deeply personal digital experiences


 USE CASE                           RESULTS
 User location information was      Properties can launch location aware services
 siloed, inconsistent, and          with faster time to market on a single platform
 not shareable across
 properties and users                      237M users with 550M locations

                                              Management, Authorization, and Control

                                                           LOCDROP
                                               Normalized, Geo-Aware User Locations
 SOLUTION                                      Centralized, Consistent, and Contextual
                                              Accurate, Relevant, Valuable Experiences
 Create a single data store                  Increase Content, Targeting and Revenues
 of user locations, shareable
 across Yahoo! properties and
 advertising systems
Read locations to drive local news, events and deals
Contextual Locations for Yahoo News




 YAHOO! CONFIDENTIAL
User Generated Places: Enable users to submit (and curate) a
location if one does not exist
  Android Messenger Use Case

                               User cannot find a place and decides to create
                                        a new location to check-in

                               User is asked for permission to detect current
                                           location from device

                               Users location is pointed on a map. This will be
                                used to get the lat/long of the created place

                                 User enters a location “Russian Tea Room”


                                A new location is stored in UGP platform and
                                   the user is checked-in to this location

                                 User has an option to curate the locations
                                          created by other users
                                 UGP platform enables algorithmic curation
KAFE: Technologies*


                                                                                                     Web Content
                                                     Manual SDE Rules
                                                                                      Bing WCC      YST        HVC     Live Pages
                                                         Large Aggregator Websites                                      (LLFS)
                                                          (e.g. amazon)

Editorial
Effort
                                             Dapper                                                       KAFE
                                                Small Websites
                                                 (e.g. community sites)                    S.D.E          Dapper        PSOX
                                                Behind the Form sites
                                                 (Deep Web)

                   PSOX (Y! Labs)
                         Unsupervised extractions
                          from large number of
                          websites
                                                                                            W.O.O                    Properties
                         Goldrush, Dish-a-                                                                           Legacy
                          wish, Restaurant Photos                                                                    Backend



                                         Precision

            * Supports Multiple Sources of Data and Multiple Technologies


     Yahoo! Presentation, Confidential                                    15
Answers Not Links
Dappfactory




Dappfactory used by DD Builder to create over 3000+ DD experiences !




                                                       16
Answers Not Links
Dappfactory




Dappfactory used by DD Builder to create over 3000+ DD experiences !




                                                       17
Answers Not Links
    S-DEKAFE XSL Rules




Creating Vertical Search Experiences for
Recipes



                                           18
Answers Not Links
PSOX-Unsupervised Extractions
Looking for where to buy Amana dishwashers ?   Y! Goldrush




 Craving for Hummus in Sunnyvale ?                Y! Dish-a-Wish




                                                    19
Enhanced Listings
Dappfactory
Before:
                                                                         After:




•   Taken from Roadmap deck for Y! Local by Erin Johns
•   Data being provided to Y! Local, Front End revamp on Local Roadmap




                                                                                  20
Local Events for N.I.L.E Dappfactory




                                                                              Extracted using
                                                                              Dappfactory




As of Feb ‘12, over 22,000 events for 250 US cities have been extracted using Dappfactory


                                                                       21
Data Extraction – Challenges

      Technology whitespace
        Head – Fully manual scales fine. Gives high precision.
        Torso – Mostly use human assisted learning. Drop in recall and
         precision, but acceptable for production use.
        Tail content – Only option is ML/no-human-in-loop models.
         Recall and Precision need lot of improvement.
           Semantic Web initiatives – Web of Objects
           Linked Open Data Format (RDF-a, OWL, Sparql)
           Lod Cloud – Few Thousand data sets, 10s of billions of
            interlinked facts.
           Confhopper – Sample/Demo application
           Unstructured Corpus – NLP Extraction
           Systems /Engineering Challenges – Low Latency
            processing, tokenization/parsing – Intl support
           Sciences Challenges –
            polysemy, synonymy, aboutness/concepts, sentiment analysis.
           CAP – Contextual analysis platform
Yahoo! Presentation, Confidential      22
TimeSense – usecases/business value proposition
                                                             Search Suggestions in SD box – Timesense powered
US FP Trending Now local pool for a given DMA                suggestions triggered for 6% of all gossip requests
powered by TS –6% CTR lift attributed to local terms




Trending searches in Left Rail on Yahoo US SRP – triggered
for ~6% of all user queries


                                                                          TW FP Trending Now automated by
                                                                          Timesense API




                                            Plumbing, Monetization, & Games



                                                         23
TimeSense
 In Bucket

AUTOMATED trending module on shopping.yahoo.com : First module with no editorial intervention, vertically categorized
trends, fast refresh and rotating terms




  Soon to Launch

 HK , TW and KR Automated trends modules on FP, Mail, OMG, news etc




 Editorial Power users of Timesense

 • Search Forecasting Editorial Team – updates sent twice a day to 500+ subscribers
 • FP Trending Now team
                                            Plumbing, Monetization, & Games
 •Regional Content programming , search editorial and SEO teams : US ,UK, HK, TW, IN [Q1 launch – all INTLs]
  Upcoming

  • Trending Now Syndication for Yahoo Hosted Search partners – via BOSS
  • Trending Image experience
  • Trending Now 2.0 automation expansion               24
Trending topic detection – Challenges

      Systems Challenges
       • Low latency requirement
       • GBs of data analyzed from multiple data sources every 5
         minutes
       • Scalability – different verticals, segmented models.
       • High Availability requirement
      Sciences Challenges
        Algorithmic improvements for near real time detection without
         precision loss
        Short Phrase Categorization
        Deduping/Clustering – intent detection
        Segmentation/Smoothing – Age/gender/Behavioral Tracking
         Categories/Geography – signal sparsity with fine grained
         segmentation.



Yahoo! Presentation, Confidential    25

Weitere ähnliche Inhalte

Was ist angesagt?

Effortless Interfaces for Appified TV
Effortless Interfaces for Appified TVEffortless Interfaces for Appified TV
Effortless Interfaces for Appified TVVenu Vasudevan
 
Microsoft Digital Solutions: Personalized Experiences
Microsoft Digital Solutions: Personalized ExperiencesMicrosoft Digital Solutions: Personalized Experiences
Microsoft Digital Solutions: Personalized ExperiencesLukas Cudrigh
 
Microsoft Digital Solutions: Rich Media Delivery
Microsoft Digital Solutions: Rich Media DeliveryMicrosoft Digital Solutions: Rich Media Delivery
Microsoft Digital Solutions: Rich Media DeliveryLukas Cudrigh
 
Microsoft Digital Solutions: Content Publishing
Microsoft Digital Solutions: Content PublishingMicrosoft Digital Solutions: Content Publishing
Microsoft Digital Solutions: Content PublishingLukas Cudrigh
 
Microsoft Digital Solutions: Multi-Channel Marketing
Microsoft Digital Solutions: Multi-Channel MarketingMicrosoft Digital Solutions: Multi-Channel Marketing
Microsoft Digital Solutions: Multi-Channel MarketingLukas Cudrigh
 
Mobile web me2day_seminar
Mobile web me2day_seminarMobile web me2day_seminar
Mobile web me2day_seminarSang-il Jung
 
Codestrong 2012 breakout session the role of cloud services in your next ge...
Codestrong 2012 breakout session   the role of cloud services in your next ge...Codestrong 2012 breakout session   the role of cloud services in your next ge...
Codestrong 2012 breakout session the role of cloud services in your next ge...Axway Appcelerator
 
M12 social gamification API | BEWE
M12 social gamification API | BEWEM12 social gamification API | BEWE
M12 social gamification API | BEWEROIALTY
 
IMS presence for intuitive communications
IMS presence for intuitive communicationsIMS presence for intuitive communications
IMS presence for intuitive communicationsClaude Florin
 
2012 Breakthrough Austin : Zumobi
2012 Breakthrough Austin : Zumobi2012 Breakthrough Austin : Zumobi
2012 Breakthrough Austin : ZumobiiMedia Connection
 
Manifest comprehensive overview
Manifest comprehensive overviewManifest comprehensive overview
Manifest comprehensive overviewTen Times Better
 
Simone Mora - PhD Interview at ITU
Simone Mora - PhD Interview at ITU Simone Mora - PhD Interview at ITU
Simone Mora - PhD Interview at ITU Simone Mora
 
Mobile web me2day_seminar
Mobile web me2day_seminarMobile web me2day_seminar
Mobile web me2day_seminarSang-il Jung
 
Social Media in a Corporate Context 2010 - Luke Aviet, GoViral
Social Media in a Corporate Context 2010 - Luke Aviet, GoViralSocial Media in a Corporate Context 2010 - Luke Aviet, GoViral
Social Media in a Corporate Context 2010 - Luke Aviet, GoViralCommunicate Magazine
 
Clearvale overview oct2011
Clearvale overview oct2011Clearvale overview oct2011
Clearvale overview oct2011tommydm
 
Dunet Mobile & Ge Nie Introduction
Dunet Mobile & Ge Nie IntroductionDunet Mobile & Ge Nie Introduction
Dunet Mobile & Ge Nie Introductiondunetinc
 

Was ist angesagt? (19)

Effortless Interfaces for Appified TV
Effortless Interfaces for Appified TVEffortless Interfaces for Appified TV
Effortless Interfaces for Appified TV
 
Microsoft Digital Solutions: Personalized Experiences
Microsoft Digital Solutions: Personalized ExperiencesMicrosoft Digital Solutions: Personalized Experiences
Microsoft Digital Solutions: Personalized Experiences
 
E2 0 Partner
E2 0 PartnerE2 0 Partner
E2 0 Partner
 
Microsoft Digital Solutions: Rich Media Delivery
Microsoft Digital Solutions: Rich Media DeliveryMicrosoft Digital Solutions: Rich Media Delivery
Microsoft Digital Solutions: Rich Media Delivery
 
Microsoft Digital Solutions: Content Publishing
Microsoft Digital Solutions: Content PublishingMicrosoft Digital Solutions: Content Publishing
Microsoft Digital Solutions: Content Publishing
 
Microsoft Digital Solutions: Multi-Channel Marketing
Microsoft Digital Solutions: Multi-Channel MarketingMicrosoft Digital Solutions: Multi-Channel Marketing
Microsoft Digital Solutions: Multi-Channel Marketing
 
SharePoint and Mobile
SharePoint and MobileSharePoint and Mobile
SharePoint and Mobile
 
Mobile web me2day_seminar
Mobile web me2day_seminarMobile web me2day_seminar
Mobile web me2day_seminar
 
Codestrong 2012 breakout session the role of cloud services in your next ge...
Codestrong 2012 breakout session   the role of cloud services in your next ge...Codestrong 2012 breakout session   the role of cloud services in your next ge...
Codestrong 2012 breakout session the role of cloud services in your next ge...
 
M12 social gamification API | BEWE
M12 social gamification API | BEWEM12 social gamification API | BEWE
M12 social gamification API | BEWE
 
IMS presence for intuitive communications
IMS presence for intuitive communicationsIMS presence for intuitive communications
IMS presence for intuitive communications
 
2012 Breakthrough Austin : Zumobi
2012 Breakthrough Austin : Zumobi2012 Breakthrough Austin : Zumobi
2012 Breakthrough Austin : Zumobi
 
Manifest comprehensive overview
Manifest comprehensive overviewManifest comprehensive overview
Manifest comprehensive overview
 
Simone Mora - PhD Interview at ITU
Simone Mora - PhD Interview at ITU Simone Mora - PhD Interview at ITU
Simone Mora - PhD Interview at ITU
 
Mobile web me2day_seminar
Mobile web me2day_seminarMobile web me2day_seminar
Mobile web me2day_seminar
 
JTsianos infographics
JTsianos infographicsJTsianos infographics
JTsianos infographics
 
Social Media in a Corporate Context 2010 - Luke Aviet, GoViral
Social Media in a Corporate Context 2010 - Luke Aviet, GoViralSocial Media in a Corporate Context 2010 - Luke Aviet, GoViral
Social Media in a Corporate Context 2010 - Luke Aviet, GoViral
 
Clearvale overview oct2011
Clearvale overview oct2011Clearvale overview oct2011
Clearvale overview oct2011
 
Dunet Mobile & Ge Nie Introduction
Dunet Mobile & Ge Nie IntroductionDunet Mobile & Ge Nie Introduction
Dunet Mobile & Ge Nie Introduction
 

Andere mochten auch

Mobile And The Latency Trap
Mobile And The Latency TrapMobile And The Latency Trap
Mobile And The Latency TrapTom Croucher
 
Boss hack u-iit-madras-2012
Boss hack u-iit-madras-2012Boss hack u-iit-madras-2012
Boss hack u-iit-madras-2012discoversudhir
 
Boss hack u-iit-madras-2012
Boss hack u-iit-madras-2012Boss hack u-iit-madras-2012
Boss hack u-iit-madras-2012discoversudhir
 
Creative Turtleheads
Creative TurtleheadsCreative Turtleheads
Creative TurtleheadsJeremy Fuksa
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
 

Andere mochten auch (8)

Mobile And The Latency Trap
Mobile And The Latency TrapMobile And The Latency Trap
Mobile And The Latency Trap
 
Boss hack u-iit-madras-2012
Boss hack u-iit-madras-2012Boss hack u-iit-madras-2012
Boss hack u-iit-madras-2012
 
Boss hack u-iit-madras-2012
Boss hack u-iit-madras-2012Boss hack u-iit-madras-2012
Boss hack u-iit-madras-2012
 
Yahoo answers
Yahoo answersYahoo answers
Yahoo answers
 
Creative Turtleheads
Creative TurtleheadsCreative Turtleheads
Creative Turtleheads
 
Soundcloud käyttö
Soundcloud käyttöSoundcloud käyttö
Soundcloud käyttö
 
Yahoo @ Nike
Yahoo @ NikeYahoo @ Nike
Yahoo @ Nike
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 

Ähnlich wie Yahoo Cloud Platform Group Presentation on User Generated Content, Location Data & Web Extractions

From Valleys to Clouds
From Valleys to CloudsFrom Valleys to Clouds
From Valleys to CloudsPeter Coffee
 
Azuki Systems Overview
Azuki Systems OverviewAzuki Systems Overview
Azuki Systems Overviewguestb33dd1
 
Marshall Sponder - Social Media Monitoring Analytics - Measure13
Marshall Sponder - Social Media Monitoring Analytics - Measure13Marshall Sponder - Social Media Monitoring Analytics - Measure13
Marshall Sponder - Social Media Monitoring Analytics - Measure13Our Social Times
 
Frankly Chat Competitive Analysis
Frankly Chat Competitive AnalysisFrankly Chat Competitive Analysis
Frankly Chat Competitive AnalysisLauren P. Dodge
 
Agileload - load testing tool for better web performance
Agileload - load testing tool for better web performanceAgileload - load testing tool for better web performance
Agileload - load testing tool for better web performanceAgileload testing
 
Social Media for AGA
Social Media for AGASocial Media for AGA
Social Media for AGAShannon Bond
 
Software Development Engineers Ireland
Software Development Engineers IrelandSoftware Development Engineers Ireland
Software Development Engineers IrelandSean O'Sullivan
 
By the Book: Examining the Art of Building Great User Experiences in Software
By the Book: Examining the Art of Building Great User Experiences in SoftwareBy the Book: Examining the Art of Building Great User Experiences in Software
By the Book: Examining the Art of Building Great User Experiences in SoftwareEffectiveUI
 
By the Book: Examining the Art of Building Great User Experiences in Software
By the Book: Examining the Art of Building Great User Experiences in SoftwareBy the Book: Examining the Art of Building Great User Experiences in Software
By the Book: Examining the Art of Building Great User Experiences in SoftwareEffective
 
DevOps vs. ShadowOps (Pulse 2013)
DevOps vs. ShadowOps (Pulse 2013)DevOps vs. ShadowOps (Pulse 2013)
DevOps vs. ShadowOps (Pulse 2013)Michael Elder
 
WordLift 2.0 presented on the Semantic Web Meetup in Rome
WordLift 2.0 presented on the Semantic Web Meetup in RomeWordLift 2.0 presented on the Semantic Web Meetup in Rome
WordLift 2.0 presented on the Semantic Web Meetup in RomeAndrea Volpini
 
Web 2.0 Biz Model
Web 2.0 Biz ModelWeb 2.0 Biz Model
Web 2.0 Biz Modelsundong
 
Web 20 Business Models 1225341206538880 8
Web 20 Business Models 1225341206538880 8Web 20 Business Models 1225341206538880 8
Web 20 Business Models 1225341206538880 8Denis Leite Rangel
 
Social Models, Trusted Clouds
Social Models, Trusted CloudsSocial Models, Trusted Clouds
Social Models, Trusted CloudsPeter Coffee
 
Stocktwits & Responsive Web Design, social network meets flexible framework
Stocktwits & Responsive Web Design, social network meets flexible frameworkStocktwits & Responsive Web Design, social network meets flexible framework
Stocktwits & Responsive Web Design, social network meets flexible frameworkJohn Strott
 
Nolan Wright: Appcelerator's World-Class Ecosystem
Nolan Wright: Appcelerator's World-Class Ecosystem Nolan Wright: Appcelerator's World-Class Ecosystem
Nolan Wright: Appcelerator's World-Class Ecosystem Axway Appcelerator
 
May2010 cq53-worldwide-tour
May2010 cq53-worldwide-tourMay2010 cq53-worldwide-tour
May2010 cq53-worldwide-tourdaysoftware
 
Inaugural address manjusha - Indicthreads cloud computing conference 2011
Inaugural address manjusha -  Indicthreads cloud computing conference 2011Inaugural address manjusha -  Indicthreads cloud computing conference 2011
Inaugural address manjusha - Indicthreads cloud computing conference 2011IndicThreads
 
GeniUS: Generic User Modeling Library for the Social Semantic Web
GeniUS: Generic User Modeling Library for the Social Semantic WebGeniUS: Generic User Modeling Library for the Social Semantic Web
GeniUS: Generic User Modeling Library for the Social Semantic WebWeb Information Systems, TU Delft
 

Ähnlich wie Yahoo Cloud Platform Group Presentation on User Generated Content, Location Data & Web Extractions (20)

From Valleys to Clouds
From Valleys to CloudsFrom Valleys to Clouds
From Valleys to Clouds
 
Azuki Systems Overview
Azuki Systems OverviewAzuki Systems Overview
Azuki Systems Overview
 
Marshall Sponder - Social Media Monitoring Analytics - Measure13
Marshall Sponder - Social Media Monitoring Analytics - Measure13Marshall Sponder - Social Media Monitoring Analytics - Measure13
Marshall Sponder - Social Media Monitoring Analytics - Measure13
 
Frankly Chat Competitive Analysis
Frankly Chat Competitive AnalysisFrankly Chat Competitive Analysis
Frankly Chat Competitive Analysis
 
Agileload - load testing tool for better web performance
Agileload - load testing tool for better web performanceAgileload - load testing tool for better web performance
Agileload - load testing tool for better web performance
 
Social Media for AGA
Social Media for AGASocial Media for AGA
Social Media for AGA
 
Software Development Engineers Ireland
Software Development Engineers IrelandSoftware Development Engineers Ireland
Software Development Engineers Ireland
 
portfolio
portfolioportfolio
portfolio
 
By the Book: Examining the Art of Building Great User Experiences in Software
By the Book: Examining the Art of Building Great User Experiences in SoftwareBy the Book: Examining the Art of Building Great User Experiences in Software
By the Book: Examining the Art of Building Great User Experiences in Software
 
By the Book: Examining the Art of Building Great User Experiences in Software
By the Book: Examining the Art of Building Great User Experiences in SoftwareBy the Book: Examining the Art of Building Great User Experiences in Software
By the Book: Examining the Art of Building Great User Experiences in Software
 
DevOps vs. ShadowOps (Pulse 2013)
DevOps vs. ShadowOps (Pulse 2013)DevOps vs. ShadowOps (Pulse 2013)
DevOps vs. ShadowOps (Pulse 2013)
 
WordLift 2.0 presented on the Semantic Web Meetup in Rome
WordLift 2.0 presented on the Semantic Web Meetup in RomeWordLift 2.0 presented on the Semantic Web Meetup in Rome
WordLift 2.0 presented on the Semantic Web Meetup in Rome
 
Web 2.0 Biz Model
Web 2.0 Biz ModelWeb 2.0 Biz Model
Web 2.0 Biz Model
 
Web 20 Business Models 1225341206538880 8
Web 20 Business Models 1225341206538880 8Web 20 Business Models 1225341206538880 8
Web 20 Business Models 1225341206538880 8
 
Social Models, Trusted Clouds
Social Models, Trusted CloudsSocial Models, Trusted Clouds
Social Models, Trusted Clouds
 
Stocktwits & Responsive Web Design, social network meets flexible framework
Stocktwits & Responsive Web Design, social network meets flexible frameworkStocktwits & Responsive Web Design, social network meets flexible framework
Stocktwits & Responsive Web Design, social network meets flexible framework
 
Nolan Wright: Appcelerator's World-Class Ecosystem
Nolan Wright: Appcelerator's World-Class Ecosystem Nolan Wright: Appcelerator's World-Class Ecosystem
Nolan Wright: Appcelerator's World-Class Ecosystem
 
May2010 cq53-worldwide-tour
May2010 cq53-worldwide-tourMay2010 cq53-worldwide-tour
May2010 cq53-worldwide-tour
 
Inaugural address manjusha - Indicthreads cloud computing conference 2011
Inaugural address manjusha -  Indicthreads cloud computing conference 2011Inaugural address manjusha -  Indicthreads cloud computing conference 2011
Inaugural address manjusha - Indicthreads cloud computing conference 2011
 
GeniUS: Generic User Modeling Library for the Social Semantic Web
GeniUS: Generic User Modeling Library for the Social Semantic WebGeniUS: Generic User Modeling Library for the Social Semantic Web
GeniUS: Generic User Modeling Library for the Social Semantic Web
 

Kürzlich hochgeladen

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 

Kürzlich hochgeladen (20)

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 

Yahoo Cloud Platform Group Presentation on User Generated Content, Location Data & Web Extractions

  • 1. Cloud Platform Group (CPG) Presentation at IIT Chennai March 29, 2012
  • 2. Agenda  CPG Mission and Value Proposition  Fit within the Yahoo Stack  Drill-down: User Generated Content (UGC)  Drill-down: User Location  Drill-down: Web Extractions  Drill-down: Trending  Q&A Yahoo! Presentation, Confidential 2
  • 3. Cloud Platform Group Mission Create a global, scalable platform built on science that enables rapid innovation and delivery of personalized, monetizable experiences across devices. Yahoo! Presentation, Confidential 3 3/29/2012
  • 4. CPG Value Proposition 1 Agility with Stability LEGO powered by Content Agility Yahoo! Presentation, Confidential 4
  • 5. CPG Value Proposition 2 Science at Scale Yahoo! Presentation, Confidential 5
  • 6. ILLUSTRATIVE SAMPLE CPG powers all of Yahoo! today MAIL DISPLAY ADS FRONT PAGE powered by Edge, powered by Hadoop powered by CORE Storage, Ranking, & Hadoop 3x improvement in accuracy of ad 40% faster download time, 300K+ spam Increased CTR by +263% for Today placements and our ability to forecast mails blocked/ sec Module by serving right content to the supply over legacy systems right user (over pre-CORE) LIVESTAND LEGO (YPP) SOCIAL CHROME powered by Mobile & powered by Content Agility powered by Social Platform Cocktails Presentation Services Reduce time to launch new sites from Over 22M net cumulative installs Seamlessly distribute content across quarters to weeks since launch, Integrated into devices in an experience that is News, Games, Movies, OMG, TV elegant and personalized Yahoo! Presentation, Confidential 6
  • 7. User Generated Content Unified, scalable platform that enables self expression and gets users to connect over content USE CASE RESULTS Increase content stickiness UGC platforms are used by over 200 Yahoo! and user retention; drive properties with over 650M UGC actions per year repeat usage across the Comments Message Boards Yahoo! network 1/3 of US 6M Finance comments traffic per month from MB Ratings & Reviews Polls SOLUTION 40M user UGC Cloud is a ratings scalable, real-time platform 1.2M poll per votes per month that lets users to express month themselves, resulting in increased user engagement and a vibrant Yahoo! community
  • 8. User Generated Content – Applications Improving Comment Quality 3 pronged approach – Machine; Human and Community Moderation 300M analyzed, 70 M blocked with machine moderation Reactive Volume (cost of reacting to abuse) avoided Sentiment Slider http://news.yahoo.com/open-business-free-agency-set-begin-211828913--spt.html
  • 9. User Generated Content – Social Poll
  • 10. User Generated Content – In the Works Topical Organization of Comments Social Conversations
  • 11. User Location Store, manage & share user locations and locations of interest to create deeply personal digital experiences USE CASE RESULTS User location information was Properties can launch location aware services siloed, inconsistent, and with faster time to market on a single platform not shareable across properties and users 237M users with 550M locations Management, Authorization, and Control LOCDROP Normalized, Geo-Aware User Locations SOLUTION Centralized, Consistent, and Contextual Accurate, Relevant, Valuable Experiences Create a single data store Increase Content, Targeting and Revenues of user locations, shareable across Yahoo! properties and advertising systems
  • 12. Read locations to drive local news, events and deals
  • 13. Contextual Locations for Yahoo News YAHOO! CONFIDENTIAL
  • 14. User Generated Places: Enable users to submit (and curate) a location if one does not exist Android Messenger Use Case User cannot find a place and decides to create a new location to check-in User is asked for permission to detect current location from device Users location is pointed on a map. This will be used to get the lat/long of the created place User enters a location “Russian Tea Room” A new location is stored in UGP platform and the user is checked-in to this location User has an option to curate the locations created by other users UGP platform enables algorithmic curation
  • 15. KAFE: Technologies* Web Content Manual SDE Rules Bing WCC YST HVC Live Pages  Large Aggregator Websites (LLFS) (e.g. amazon) Editorial Effort Dapper KAFE  Small Websites (e.g. community sites) S.D.E Dapper PSOX  Behind the Form sites (Deep Web) PSOX (Y! Labs)  Unsupervised extractions from large number of websites W.O.O Properties  Goldrush, Dish-a- Legacy wish, Restaurant Photos Backend Precision * Supports Multiple Sources of Data and Multiple Technologies Yahoo! Presentation, Confidential 15
  • 16. Answers Not Links Dappfactory Dappfactory used by DD Builder to create over 3000+ DD experiences ! 16
  • 17. Answers Not Links Dappfactory Dappfactory used by DD Builder to create over 3000+ DD experiences ! 17
  • 18. Answers Not Links S-DEKAFE XSL Rules Creating Vertical Search Experiences for Recipes 18
  • 19. Answers Not Links PSOX-Unsupervised Extractions Looking for where to buy Amana dishwashers ? Y! Goldrush Craving for Hummus in Sunnyvale ? Y! Dish-a-Wish 19
  • 20. Enhanced Listings Dappfactory Before: After: • Taken from Roadmap deck for Y! Local by Erin Johns • Data being provided to Y! Local, Front End revamp on Local Roadmap 20
  • 21. Local Events for N.I.L.E Dappfactory Extracted using Dappfactory As of Feb ‘12, over 22,000 events for 250 US cities have been extracted using Dappfactory 21
  • 22. Data Extraction – Challenges  Technology whitespace  Head – Fully manual scales fine. Gives high precision.  Torso – Mostly use human assisted learning. Drop in recall and precision, but acceptable for production use.  Tail content – Only option is ML/no-human-in-loop models. Recall and Precision need lot of improvement.  Semantic Web initiatives – Web of Objects  Linked Open Data Format (RDF-a, OWL, Sparql)  Lod Cloud – Few Thousand data sets, 10s of billions of interlinked facts.  Confhopper – Sample/Demo application  Unstructured Corpus – NLP Extraction  Systems /Engineering Challenges – Low Latency processing, tokenization/parsing – Intl support  Sciences Challenges – polysemy, synonymy, aboutness/concepts, sentiment analysis.  CAP – Contextual analysis platform Yahoo! Presentation, Confidential 22
  • 23. TimeSense – usecases/business value proposition Search Suggestions in SD box – Timesense powered US FP Trending Now local pool for a given DMA suggestions triggered for 6% of all gossip requests powered by TS –6% CTR lift attributed to local terms Trending searches in Left Rail on Yahoo US SRP – triggered for ~6% of all user queries TW FP Trending Now automated by Timesense API Plumbing, Monetization, & Games 23
  • 24. TimeSense In Bucket AUTOMATED trending module on shopping.yahoo.com : First module with no editorial intervention, vertically categorized trends, fast refresh and rotating terms Soon to Launch HK , TW and KR Automated trends modules on FP, Mail, OMG, news etc Editorial Power users of Timesense • Search Forecasting Editorial Team – updates sent twice a day to 500+ subscribers • FP Trending Now team Plumbing, Monetization, & Games •Regional Content programming , search editorial and SEO teams : US ,UK, HK, TW, IN [Q1 launch – all INTLs] Upcoming • Trending Now Syndication for Yahoo Hosted Search partners – via BOSS • Trending Image experience • Trending Now 2.0 automation expansion 24
  • 25. Trending topic detection – Challenges  Systems Challenges • Low latency requirement • GBs of data analyzed from multiple data sources every 5 minutes • Scalability – different verticals, segmented models. • High Availability requirement  Sciences Challenges  Algorithmic improvements for near real time detection without precision loss  Short Phrase Categorization  Deduping/Clustering – intent detection  Segmentation/Smoothing – Age/gender/Behavioral Tracking Categories/Geography – signal sparsity with fine grained segmentation. Yahoo! Presentation, Confidential 25

Hinweis der Redaktion

  1. From Siloed to PlatformEarlier everything was a technology and a data silo. Built one-off.CPG1.We had to get everyone on the same technology – stable unified platform services for powering innovation Trade-off: Agile (does not scale later) vs. Stable. People usually give on one vs. the other as they hurry to market. We can talk about tradeoffs in scale, latency, security, etc. Bring up the M&A example of RMX. All acquisition integrations have faced the same problem. RMX’s 300MM impressions did not scale (agility choice), we are now at 12B NGDs. We rebuilt the backend storage etc. Dapper was the same way.2. Once everyone’s on the same system, then we can share data, apply science to data on the “platform” at scale to derive business valueWe can bring up LEGO as an example for siloed properties brought to a common content platform. Sherpa example for structured data storage, disparate MySQL to a common data store.
  2. From Siloed to PlatformEarlier everything was a technology and a data silo. Built one-off.CPG1.We had to get everyone on the same technology – stable unified platform services for powering innovation Trade-off: Agile (does not scale later) vs. Stable. People usually give on one vs. the other as they hurry to market. We can talk about tradeoffs in scale, latency, security, etc. Bring up the M&A example of RMX. All acquisition integrations have faced the same problem. RMX’s 300MM impressions did not scale (agility choice), we are now at 12B NGDs. We rebuilt the backend storage etc. Dapper was the same way.2. Once everyone’s on the same system, then we can share data, apply science to data on the “platform” at scale to derive business valueWe can bring up LEGO as an example for siloed properties brought to a common content platform. Sherpa example for structured data storage, disparate MySQL to a common data store.
  3. CPG power ALL of Yahoo!1.Display Ads (Emphasis on Hadoop) 7 clusters, 15K notes, 17T/day, 10PB, (APT 11 4PB, RMX 16 5.8PB) Categorize Ads, BT targeting, Predict user response, Traffic protectionHadoop helps Yahoo! target billions of impressions per day across one of the largest ad networks in the world by processing declared data and recent activity to segment users and determine the right ad to serve in milliseconds. 3x improvement in accuracy of ad placements and our ability to forecast supply over legacy systems (MyNA/Panama & AWACS/ All Warehouse Access System)“Predict” - critical to serving apparatusMachine Learned Categorization for Ads and Queries to automatically assign categories to web pages, ads, and queriesKeystone – Contextual Ads, predict and model user response based on all user context, including page content, user attributes like behavioral and geographical data, referrals to the page (how the user got there), and information about the publisher page.Display Supply and Demand ForecastingFuture (supply) inventory forecastingNGD: pricing forecasting computation - advisory useNGD: estimate clicks from impressionsTraffic protectionExecute the trade in serving, and clean it up later for bad traffic, before it hits the revenue system2. Mail (Emphasize on Cloud Services)YCPI has shown to improve download speed by over 40% for Mail. Hadoop helps blocks over 300,000 spam mails/ sec globally. 24BMail, the best monetized product at Yahoo! and at the heart of the Yahoo! network, fully leverages the power of Cloud. At the same time, it also leverages several other platform capabilities such as Membership services (Over 226,000 new good accounts created per day for U.S. Mail alone, 72M successful logins/ day). MobStor is used to solve the attachment de-dup problem to increase efficiency. Ranking Systems (Vespa) is used to search through the mailboxes/ folders.3. Lego (Emphasize on Core Content Services)135 regional Media sites have moved to Content Agility last year alone.Leverages Content Agility as the single, grid-based, highly scalable CMS instead of siloed approaches for CMS, front-end development, and editorial that properties earlier had (pre-Lego). Lego provides reusable UI modules and shared tool to reduce time to launch new sites from quarters to weeks. Content Agility and Lego power the content network and bring agility to Yahoo! properties. 4.Front Page (Emphasize on KAPS - Personalization)CORE increased CTR by +263% for Today Module vs. pre-CORECORE enables a real-time feedback loop across properties, leveraging user interest, intent, and context to optimize user engagement. Increase engagement by showing the right content to users with input from science & human editors. CORE delivers the most relevant experience on the Web by serving the right content to the right user. 5.Social Chrome (Emphasize on KAPS – Social)Over 22M net cumulative installs since launch, 620K Facebook referrals generated daily. Daily active users crossed 1MM within 5 weeks of launch. Vitamix/Vitality powers social chrome on all Y! properties worldwide to increase user engagement by surfacing relevant activities from friends. Vitamix provides Facebar of Friends with activity, activity history of a friend, friends activity feed, and friends activity on top articles. Several raking type initiatives for 2012 (rank friends, show most shared article etc.). 6.Livestand (Emphasize on MPS – Cocktails)Leverages Cocktails, a presentation platform and application framework built on YUI3, to create connected experiences across devices with single codebase. Provides a simple way for publishers and advertisers to seamlessly distribute content across devices in an experience that is elegant and personalized – single serving stack across applications, framework, and runtime. Fragmented approaches slow innovation and create tech debt – one stack per device class (web 1.0, web 2.0, iOS/Android, and Feature Phones). Cocktails provides reusable modules across devices and properties, server side JavaScript execution engine, high-efficiency HTTP server for personalization/ 2-way browser-server communications, and cloud hosted applications for easy deployment and bucket testing.
  4. What is the location of the user does not exist? Especially for building the business listings database, User Generated Places (UGP) provides the capability to crowd source and algorithmically curate locations.Ingesting over 10,000 RSS feeds with an average of 3000 / day.
  5. Location as a key pillar of Personalization.Crowd sourced with confidence levels (Messenger use case)For properties that require more precision like Local and Travel, we are adopting a multi pronged approach:Extractions from the deep webEarly work with Sciences to apply algos