SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Boston Hadoop User Group
Jeremy Rishel, SVP Engineering, Products, & Data
April 2012
Which is Better?

A. More Data

B. Better Data

C. Better Algorithms




                       Bluefin Labs Proprietary and Confidential
Which is Better?

A. More Data

B. Better Data

C. Better Algorithms

D. All of the Above




                       Bluefin Labs Proprietary and Confidential
Social TV




Television          Social Web
Social TV




Television          Social Web
Social TV




Television          Social Web
Impressions
Impressions   Expressions
Impressions   Expressions
Kinds of Data and Algorithms
Public social media (Twitter, Facebook) 250M+ documents per day

Programming info for 200+ U.S. networks

Video signal for 65+ U.S. networks

Brand conversation & ad tracking for thousands of brands

Realtime semantic analysis of comments

Demographic & behavioral analysis of authors

Advertising context & effect of advertising on brand dynamics

Overlap between audiences and comparative analysis
                                                  Bluefin Labs Proprietary and Confidential
Realtime & Historical Data
2M show telecasts

1.5M ad airings / month

50M links between social media users and TV shows / month

10B links between social media users and TV ads / month

End-to-end latency in minutes - visible & searchable in realtime

Historical data visible & searchable through various UIs/tools

Searchable text index of all social media comments in our archive &
methods for large-scale analysis jobs (including MR)
                                                        Bluefin Labs Proprietary and Confidential
Kinds of Questions
We often deal at the intersection of multiple data streams or data &
algorithms

How much chatter about a show (realtime)? (Social media +
programming info + semantic analysis)

What ads are airing (near realtime)? (Video signals + programming
info + computer vision/audio fingerprinting)

Which brands does the audience of a show talk most about? Which
shows do brand engaged authors talk most about? (Social media +
programming info + brand data + semantic analysis + audience
overlap analysis)



                                                   Bluefin Labs Proprietary and Confidential
More Data

“More data” can mean new streams, broader streams, or more
granular data

“More data” powers better algorithms & aids in creating better data




                                                 Bluefin Labs Proprietary and Confidential
More Data

“More data” can mean new streams, broader streams, or more
granular data

“More data” powers better algorithms & aids in creating better data

Capturing color, texture, & audio features from the TV video stream
improved our ad detection




                                                  Bluefin Labs Proprietary and Confidential
More Data

“More data” can mean new streams, broader streams, or more
granular data

“More data” powers better algorithms & aids in creating better data

Capturing color, texture, & audio features from the TV video stream
improved our ad detection

Tapping into full author history permitted better age classification




                                                    Bluefin Labs Proprietary and Confidential
More Data

“More data” can mean new streams, broader streams, or more
granular data

“More data” powers better algorithms & aids in creating better data

Capturing color, texture, & audio features from the TV video stream
improved our ad detection

Tapping into full author history permitted better age classification

Analyzing closed caption gave us another dimension of semantic
analysis and avenues to explore social/mass media engagement



                                                    Bluefin Labs Proprietary and Confidential
Better Data
“Better data” achieved through human-machine collaboration, with a
view to continual improvement

“Better data” makes for better algorithms & big data more useful




                                                  Bluefin Labs Proprietary and Confidential
Better Data
“Better data” achieved through human-machine collaboration, with a
view to continual improvement

“Better data” makes for better algorithms & big data more useful

Both realtime and large scale review & curation




                                                  Bluefin Labs Proprietary and Confidential
Better Data
“Better data” achieved through human-machine collaboration, with a
view to continual improvement

“Better data” makes for better algorithms & big data more useful

Both realtime and large scale review & curation

Systematic monitoring, statistical QA, & estimation models




                                                  Bluefin Labs Proprietary and Confidential
Better Data
“Better data” achieved through human-machine collaboration, with a
view to continual improvement

“Better data” makes for better algorithms & big data more useful

Both realtime and large scale review & curation

Systematic monitoring, statistical QA, & estimation models

High quality data supports in-domain benchmarking (How is a show
or network vs. competitors? How is a brand within its sector?)




                                                  Bluefin Labs Proprietary and Confidential
Better Data
“Better data” achieved through human-machine collaboration, with a
view to continual improvement

“Better data” makes for better algorithms & big data more useful

Both realtime and large scale review & curation

Systematic monitoring, statistical QA, & estimation models

High quality data supports in-domain benchmarking (How is a show
or network vs. competitors? How is a brand within its sector?)

High quality and consistent data permits richer trend analysis (e.g.
season-over-season or ad campaign-to-ad campaign comparison)

                                                    Bluefin Labs Proprietary and Confidential
Better Algorithms

“Better algorithms” include both new analytics & improvements to
existing ones

“Better algorithm” approaches can be taken with more & better data




                                                 Bluefin Labs Proprietary and Confidential
Better Algorithms

“Better algorithms” include both new analytics & improvements to
existing ones

“Better algorithm” approaches can be taken with more & better data

Focus areas of NLP/machine learning, computer vision, & statistical
analysis; key to “better” is having a way to measure “goodness”




                                                  Bluefin Labs Proprietary and Confidential
Better Algorithms

“Better algorithms” include both new analytics & improvements to
existing ones

“Better algorithm” approaches can be taken with more & better data

Focus areas of NLP/machine learning, computer vision, & statistical
analysis; key to “better” is having a way to measure “goodness”

Ad discovery methods possible changed once we shifted to broader
approach




                                                  Bluefin Labs Proprietary and Confidential
Better Algorithms

“Better algorithms” include both new analytics & improvements to
existing ones

“Better algorithm” approaches can be taken with more & better data

Focus areas of NLP/machine learning, computer vision, & statistical
analysis; key to “better” is having a way to measure “goodness”

Ad discovery methods possible changed once we shifted to broader
approach

Higher quality show telecast engagement data permits more precise
audience analysis across domains - e.g. shows & networks to brands

                                                  Bluefin Labs Proprietary and Confidential
All of the Above

More data helps build better data & algorithms

Better data improves algorithms & makes large data more useful

Better algorithms get leverage out of more & better data

You should care about all three




                                                  Bluefin Labs Proprietary and Confidential
Jeremy Rishel
 jrishel@bluefinlabs.com
Confidential

Weitere ähnliche Inhalte

Ähnlich wie Boston Hadoop User Group Presentation

Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackPrecisely
 
Liberating data power of APIs
Liberating data power of APIsLiberating data power of APIs
Liberating data power of APIsBala Iyer
 
Unlock your Big Data with Analytics and BI on Office 365 - OFF103
Unlock your Big Data with Analytics and BI on Office 365 - OFF103Unlock your Big Data with Analytics and BI on Office 365 - OFF103
Unlock your Big Data with Analytics and BI on Office 365 - OFF103Brian Culver
 
Benchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the MarketBenchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the MarketApigee | Google Cloud
 
Microsoft for Media and Entertainment.
Microsoft for Media and Entertainment.Microsoft for Media and Entertainment.
Microsoft for Media and Entertainment.Nguyễn Quang Huy
 
From Data to Action: the Future of Hospitality Marketing
From Data to Action: the Future of Hospitality MarketingFrom Data to Action: the Future of Hospitality Marketing
From Data to Action: the Future of Hospitality MarketingTim Russell
 
Use of Analytics by Netflix - Case Study
Use of Analytics by Netflix - Case StudyUse of Analytics by Netflix - Case Study
Use of Analytics by Netflix - Case StudySaket Toshniwal
 
Man & Machine: The Role Of Search Practitioners Utilizing Technology
Man & Machine: The Role Of Search Practitioners Utilizing TechnologyMan & Machine: The Role Of Search Practitioners Utilizing Technology
Man & Machine: The Role Of Search Practitioners Utilizing TechnologyRyan Fitzgibbon
 
Worst Practices in Artificial Intelligence
Worst Practices in Artificial IntelligenceWorst Practices in Artificial Intelligence
Worst Practices in Artificial IntelligenceWilliam Tsoi
 
Supercharging AI with Data Enrichment
Supercharging AI with Data EnrichmentSupercharging AI with Data Enrichment
Supercharging AI with Data EnrichmentPrecisely
 
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank NothaftThe Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank NothaftDatabricks
 
Make Design A First Class Citizen To Ensure Analytics Success
Make Design A First Class Citizen To Ensure Analytics SuccessMake Design A First Class Citizen To Ensure Analytics Success
Make Design A First Class Citizen To Ensure Analytics SuccessSiteworx LLC
 
The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataPhilip Bourne
 
Watson DevCon 2016 - From Jeopardy! to the Future
Watson DevCon 2016 - From Jeopardy! to the FutureWatson DevCon 2016 - From Jeopardy! to the Future
Watson DevCon 2016 - From Jeopardy! to the FutureIBM Watson
 
How BrackenData Leverages Data on Over 250,000 Clinical Trials
How BrackenData Leverages Data on Over 250,000 Clinical TrialsHow BrackenData Leverages Data on Over 250,000 Clinical Trials
How BrackenData Leverages Data on Over 250,000 Clinical TrialsBracken
 
BioIT 2017 - Ontoforce and Amgen Gene Knowledge Discovery
BioIT 2017 - Ontoforce and Amgen Gene Knowledge DiscoveryBioIT 2017 - Ontoforce and Amgen Gene Knowledge Discovery
BioIT 2017 - Ontoforce and Amgen Gene Knowledge DiscoveryWolfgang G. Hoeck
 
Opticon 2015- Powerful Integrations with Optimizely
Opticon 2015- Powerful Integrations with OptimizelyOpticon 2015- Powerful Integrations with Optimizely
Opticon 2015- Powerful Integrations with OptimizelyOptimizely
 
Channeling insights to the right people
Channeling insights to the right peopleChanneling insights to the right people
Channeling insights to the right peopleSebastien Lefebvre
 
Using the information server toolset to deliver end to end traceability
Using the information server toolset to deliver end to end traceabilityUsing the information server toolset to deliver end to end traceability
Using the information server toolset to deliver end to end traceabilityIBM Sverige
 
ChatGPT and not only: how can you use the power of Generative AI at scale
ChatGPT and not only: how can you use the power of Generative AI at scaleChatGPT and not only: how can you use the power of Generative AI at scale
ChatGPT and not only: how can you use the power of Generative AI at scaleMaxim Salnikov
 

Ähnlich wie Boston Hadoop User Group Presentation (20)

Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
 
Liberating data power of APIs
Liberating data power of APIsLiberating data power of APIs
Liberating data power of APIs
 
Unlock your Big Data with Analytics and BI on Office 365 - OFF103
Unlock your Big Data with Analytics and BI on Office 365 - OFF103Unlock your Big Data with Analytics and BI on Office 365 - OFF103
Unlock your Big Data with Analytics and BI on Office 365 - OFF103
 
Benchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the MarketBenchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the Market
 
Microsoft for Media and Entertainment.
Microsoft for Media and Entertainment.Microsoft for Media and Entertainment.
Microsoft for Media and Entertainment.
 
From Data to Action: the Future of Hospitality Marketing
From Data to Action: the Future of Hospitality MarketingFrom Data to Action: the Future of Hospitality Marketing
From Data to Action: the Future of Hospitality Marketing
 
Use of Analytics by Netflix - Case Study
Use of Analytics by Netflix - Case StudyUse of Analytics by Netflix - Case Study
Use of Analytics by Netflix - Case Study
 
Man & Machine: The Role Of Search Practitioners Utilizing Technology
Man & Machine: The Role Of Search Practitioners Utilizing TechnologyMan & Machine: The Role Of Search Practitioners Utilizing Technology
Man & Machine: The Role Of Search Practitioners Utilizing Technology
 
Worst Practices in Artificial Intelligence
Worst Practices in Artificial IntelligenceWorst Practices in Artificial Intelligence
Worst Practices in Artificial Intelligence
 
Supercharging AI with Data Enrichment
Supercharging AI with Data EnrichmentSupercharging AI with Data Enrichment
Supercharging AI with Data Enrichment
 
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank NothaftThe Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
 
Make Design A First Class Citizen To Ensure Analytics Success
Make Design A First Class Citizen To Ensure Analytics SuccessMake Design A First Class Citizen To Ensure Analytics Success
Make Design A First Class Citizen To Ensure Analytics Success
 
The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big Data
 
Watson DevCon 2016 - From Jeopardy! to the Future
Watson DevCon 2016 - From Jeopardy! to the FutureWatson DevCon 2016 - From Jeopardy! to the Future
Watson DevCon 2016 - From Jeopardy! to the Future
 
How BrackenData Leverages Data on Over 250,000 Clinical Trials
How BrackenData Leverages Data on Over 250,000 Clinical TrialsHow BrackenData Leverages Data on Over 250,000 Clinical Trials
How BrackenData Leverages Data on Over 250,000 Clinical Trials
 
BioIT 2017 - Ontoforce and Amgen Gene Knowledge Discovery
BioIT 2017 - Ontoforce and Amgen Gene Knowledge DiscoveryBioIT 2017 - Ontoforce and Amgen Gene Knowledge Discovery
BioIT 2017 - Ontoforce and Amgen Gene Knowledge Discovery
 
Opticon 2015- Powerful Integrations with Optimizely
Opticon 2015- Powerful Integrations with OptimizelyOpticon 2015- Powerful Integrations with Optimizely
Opticon 2015- Powerful Integrations with Optimizely
 
Channeling insights to the right people
Channeling insights to the right peopleChanneling insights to the right people
Channeling insights to the right people
 
Using the information server toolset to deliver end to end traceability
Using the information server toolset to deliver end to end traceabilityUsing the information server toolset to deliver end to end traceability
Using the information server toolset to deliver end to end traceability
 
ChatGPT and not only: how can you use the power of Generative AI at scale
ChatGPT and not only: how can you use the power of Generative AI at scaleChatGPT and not only: how can you use the power of Generative AI at scale
ChatGPT and not only: how can you use the power of Generative AI at scale
 

Mehr von Bluefin Labs

Social TV Data for the 2012 NBA Finals - Bluefin Labs
Social TV Data for the 2012 NBA Finals - Bluefin LabsSocial TV Data for the 2012 NBA Finals - Bluefin Labs
Social TV Data for the 2012 NBA Finals - Bluefin LabsBluefin Labs
 
Bluefin labs topsocialtv_ads709
Bluefin labs topsocialtv_ads709Bluefin labs topsocialtv_ads709
Bluefin labs topsocialtv_ads709Bluefin Labs
 
Social TV Fact Sheet: May 2012
Social TV Fact Sheet: May 2012Social TV Fact Sheet: May 2012
Social TV Fact Sheet: May 2012Bluefin Labs
 
Social TV Fact Sheet: January 2012
Social TV Fact Sheet: January 2012Social TV Fact Sheet: January 2012
Social TV Fact Sheet: January 2012Bluefin Labs
 
Social TV Fact Sheet: February 2012
Social TV Fact Sheet: February 2012Social TV Fact Sheet: February 2012
Social TV Fact Sheet: February 2012Bluefin Labs
 
Social TV Fact Sheet: April 2012
Social TV Fact Sheet: April 2012Social TV Fact Sheet: April 2012
Social TV Fact Sheet: April 2012Bluefin Labs
 
Social TV Fact Sheet: March 2012
Social TV Fact Sheet: March 2012Social TV Fact Sheet: March 2012
Social TV Fact Sheet: March 2012Bluefin Labs
 
Social TV Fact Sheet: June 2012
Social TV Fact Sheet: June 2012Social TV Fact Sheet: June 2012
Social TV Fact Sheet: June 2012Bluefin Labs
 
Social TV for Sports Media Marketers
Social TV for Sports Media MarketersSocial TV for Sports Media Marketers
Social TV for Sports Media MarketersBluefin Labs
 

Mehr von Bluefin Labs (10)

Social TV Data for the 2012 NBA Finals - Bluefin Labs
Social TV Data for the 2012 NBA Finals - Bluefin LabsSocial TV Data for the 2012 NBA Finals - Bluefin Labs
Social TV Data for the 2012 NBA Finals - Bluefin Labs
 
Bluefin labs topsocialtv_ads709
Bluefin labs topsocialtv_ads709Bluefin labs topsocialtv_ads709
Bluefin labs topsocialtv_ads709
 
Social TV Fact Sheet: May 2012
Social TV Fact Sheet: May 2012Social TV Fact Sheet: May 2012
Social TV Fact Sheet: May 2012
 
Social TV Fact Sheet: January 2012
Social TV Fact Sheet: January 2012Social TV Fact Sheet: January 2012
Social TV Fact Sheet: January 2012
 
Social TV Fact Sheet: February 2012
Social TV Fact Sheet: February 2012Social TV Fact Sheet: February 2012
Social TV Fact Sheet: February 2012
 
Social TV Fact Sheet: April 2012
Social TV Fact Sheet: April 2012Social TV Fact Sheet: April 2012
Social TV Fact Sheet: April 2012
 
Social TV Fact Sheet: March 2012
Social TV Fact Sheet: March 2012Social TV Fact Sheet: March 2012
Social TV Fact Sheet: March 2012
 
Social TV Fact Sheet: June 2012
Social TV Fact Sheet: June 2012Social TV Fact Sheet: June 2012
Social TV Fact Sheet: June 2012
 
Social TV Ratings
Social TV RatingsSocial TV Ratings
Social TV Ratings
 
Social TV for Sports Media Marketers
Social TV for Sports Media MarketersSocial TV for Sports Media Marketers
Social TV for Sports Media Marketers
 

Kürzlich hochgeladen

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Kürzlich hochgeladen (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

Boston Hadoop User Group Presentation

  • 1. Boston Hadoop User Group Jeremy Rishel, SVP Engineering, Products, & Data April 2012
  • 2. Which is Better? A. More Data B. Better Data C. Better Algorithms Bluefin Labs Proprietary and Confidential
  • 3. Which is Better? A. More Data B. Better Data C. Better Algorithms D. All of the Above Bluefin Labs Proprietary and Confidential
  • 4. Social TV Television Social Web
  • 5. Social TV Television Social Web
  • 6. Social TV Television Social Web
  • 7.
  • 9. Impressions Expressions
  • 10. Impressions Expressions
  • 11. Kinds of Data and Algorithms Public social media (Twitter, Facebook) 250M+ documents per day Programming info for 200+ U.S. networks Video signal for 65+ U.S. networks Brand conversation & ad tracking for thousands of brands Realtime semantic analysis of comments Demographic & behavioral analysis of authors Advertising context & effect of advertising on brand dynamics Overlap between audiences and comparative analysis Bluefin Labs Proprietary and Confidential
  • 12. Realtime & Historical Data 2M show telecasts 1.5M ad airings / month 50M links between social media users and TV shows / month 10B links between social media users and TV ads / month End-to-end latency in minutes - visible & searchable in realtime Historical data visible & searchable through various UIs/tools Searchable text index of all social media comments in our archive & methods for large-scale analysis jobs (including MR) Bluefin Labs Proprietary and Confidential
  • 13. Kinds of Questions We often deal at the intersection of multiple data streams or data & algorithms How much chatter about a show (realtime)? (Social media + programming info + semantic analysis) What ads are airing (near realtime)? (Video signals + programming info + computer vision/audio fingerprinting) Which brands does the audience of a show talk most about? Which shows do brand engaged authors talk most about? (Social media + programming info + brand data + semantic analysis + audience overlap analysis) Bluefin Labs Proprietary and Confidential
  • 14. More Data “More data” can mean new streams, broader streams, or more granular data “More data” powers better algorithms & aids in creating better data Bluefin Labs Proprietary and Confidential
  • 15. More Data “More data” can mean new streams, broader streams, or more granular data “More data” powers better algorithms & aids in creating better data Capturing color, texture, & audio features from the TV video stream improved our ad detection Bluefin Labs Proprietary and Confidential
  • 16. More Data “More data” can mean new streams, broader streams, or more granular data “More data” powers better algorithms & aids in creating better data Capturing color, texture, & audio features from the TV video stream improved our ad detection Tapping into full author history permitted better age classification Bluefin Labs Proprietary and Confidential
  • 17. More Data “More data” can mean new streams, broader streams, or more granular data “More data” powers better algorithms & aids in creating better data Capturing color, texture, & audio features from the TV video stream improved our ad detection Tapping into full author history permitted better age classification Analyzing closed caption gave us another dimension of semantic analysis and avenues to explore social/mass media engagement Bluefin Labs Proprietary and Confidential
  • 18. Better Data “Better data” achieved through human-machine collaboration, with a view to continual improvement “Better data” makes for better algorithms & big data more useful Bluefin Labs Proprietary and Confidential
  • 19. Better Data “Better data” achieved through human-machine collaboration, with a view to continual improvement “Better data” makes for better algorithms & big data more useful Both realtime and large scale review & curation Bluefin Labs Proprietary and Confidential
  • 20. Better Data “Better data” achieved through human-machine collaboration, with a view to continual improvement “Better data” makes for better algorithms & big data more useful Both realtime and large scale review & curation Systematic monitoring, statistical QA, & estimation models Bluefin Labs Proprietary and Confidential
  • 21. Better Data “Better data” achieved through human-machine collaboration, with a view to continual improvement “Better data” makes for better algorithms & big data more useful Both realtime and large scale review & curation Systematic monitoring, statistical QA, & estimation models High quality data supports in-domain benchmarking (How is a show or network vs. competitors? How is a brand within its sector?) Bluefin Labs Proprietary and Confidential
  • 22. Better Data “Better data” achieved through human-machine collaboration, with a view to continual improvement “Better data” makes for better algorithms & big data more useful Both realtime and large scale review & curation Systematic monitoring, statistical QA, & estimation models High quality data supports in-domain benchmarking (How is a show or network vs. competitors? How is a brand within its sector?) High quality and consistent data permits richer trend analysis (e.g. season-over-season or ad campaign-to-ad campaign comparison) Bluefin Labs Proprietary and Confidential
  • 23. Better Algorithms “Better algorithms” include both new analytics & improvements to existing ones “Better algorithm” approaches can be taken with more & better data Bluefin Labs Proprietary and Confidential
  • 24. Better Algorithms “Better algorithms” include both new analytics & improvements to existing ones “Better algorithm” approaches can be taken with more & better data Focus areas of NLP/machine learning, computer vision, & statistical analysis; key to “better” is having a way to measure “goodness” Bluefin Labs Proprietary and Confidential
  • 25. Better Algorithms “Better algorithms” include both new analytics & improvements to existing ones “Better algorithm” approaches can be taken with more & better data Focus areas of NLP/machine learning, computer vision, & statistical analysis; key to “better” is having a way to measure “goodness” Ad discovery methods possible changed once we shifted to broader approach Bluefin Labs Proprietary and Confidential
  • 26. Better Algorithms “Better algorithms” include both new analytics & improvements to existing ones “Better algorithm” approaches can be taken with more & better data Focus areas of NLP/machine learning, computer vision, & statistical analysis; key to “better” is having a way to measure “goodness” Ad discovery methods possible changed once we shifted to broader approach Higher quality show telecast engagement data permits more precise audience analysis across domains - e.g. shows & networks to brands Bluefin Labs Proprietary and Confidential
  • 27. All of the Above More data helps build better data & algorithms Better data improves algorithms & makes large data more useful Better algorithms get leverage out of more & better data You should care about all three Bluefin Labs Proprietary and Confidential

Hinweis der Redaktion

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n