SlideShare a Scribd company logo
1 of 70
Early Lessons Learned in Applying Big Data To TV Advertising ARF September 12, 2011 Jack Smith, Chief Product Officer, Simulmedia
About Us Who We Are We are a New York based start-up. We are venture backed by Avalon Ventures, Union Square Ventures and Time-Warner. Where We Have Been Our 35 person team has veterans of: What We Believe Television is still the most powerful advertising medium in the world. While addressability will come, we’re not waiting for it. We’ve taken a few strategies we learned from the Internet and are applying it to linear TV advertising, today. Through partnerships with major data providers, we have assembled the world’s largest set of actionable television data. How We Do It How We Make Money We sell television advertising. With inventory in over 106 million US households, we can cost-effectively extend reach into high-value target audiences across virtually any advertiser category. We use big data and science to do this.
Why Did We Leave The Web? Television remains the dominant consumer medium (a) Nielsen US TV Viewing AudicenceTraditional Live-Only TV based on average monthly viewing during 1Q2011.  Internet and Online Video based on average monthly consumption during July 2011.  Video on Demand based on consumption during May 2011.
TV Spend Is Increasing Source: MAGNAGLOBAL
Audience Is Fragmenting Source: Nielsen via TVbythenumbers.com
Campaign Reach Is Declining Impossible for measurement and planning tools to keep pace  Source: Simulmedia analysis of data from SQAD, Nielsen and TVB
Big Data
Big Data Is Driving Growth “We are on the cusp of a tremendous wave of innovation, productivity and growth, as well as new modes of competition and value-capture – all driven by Big Data.” - McKinsey Global Institute, May 2011 “For CMOs,Big Data is a very big deal.” - Alfredo Gangotena, CMO, Mastercard, July 2011
Size Is Relative 1 byte x 1000 = 1 kilobyte …x 1000 = 1 megabyte …x 1000 = 1 gigabyte …x 1000 = 1 terabyte …x 1000 = 1 petabyte …x 1000 = 1 exabyte
Size Is Relative Telegram = 100 bytes Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm
Size Is Relative Page of an Encyclopedia = 100 kilobytes Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm
Size Is Relative Pickup truck bed full of paper = 1 gigabyte  Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm
Size Is Relative Entire print collection of the Library of Congress = 10 terabytes Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm
Size Is Relative All hard drives produced in 1995 = 20 petabytes  Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm
Size Is Relative All printed material = 200 petabytes  Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm
But Big Data Is More Than Size What happened? Why did it happen? BIG DATA What’s going to happen next? Time: Past Future Focus: Reporting Prediction Supports: Human decisions Machine decisions Structured Aggregated Unstructured Unaggregated Data: Dashboards Excel Discovery Visualization Statistics & Physics Human Skills:
Accelerating The Push To Big Data Hadoop, cloud computing, Facebook, Yahoo, quants, Bittorrent, machine learning, Stanford, large hadron collider, Wal-Mart, text processing, Amazon S3 & EC2, open source intelligence, NoSQL, social media, Google, commodity hardware, Hive, fraud detection, trading desks, MapReduce, natural language processing
What Can It Mean For TV Advertising? Big data drove the rise of web & search advertising ,[object Object]
Better predictions about consumer interests
Real time return path
Automation
Interim step for addressability
More diligence around consumer privacy
Media buyers and sellers rethinking their approach to audience packaging, campaign planning, technology, data assembly and people,[object Object]
Australian Bureau of Statistics: 250 tb1
AT&T: 250 tb1
Nielsen: 45 tb1
Adidas: 13 tb1
Wal-Mart: 1 pb2Data Lakes ,[object Object]
Yahoo: 22 pb4
Google: ???1 Oracle F1Q10 Earnings Call September 16, 2009 Transcript 2Stair, Principles of Information Systems, 2009, p 181 3 Dhruba Borthakur, Facebook, December 2010, http://www.facebook.com/note.php?note_id=468211193919 4 Simulmedia estimate
Our Idea of Big Data Bringing the data set together in a single platform Our (comparatively modest) data set: ,[object Object]
113,858,592 daily events
Approximately 402,301 weekly ads
Double capacity every 6 months…And we don’t load every data point across all data sets, yet
Rethinking Media Data Architecture Applying big data to television required us to rethink what our technical architecture should be Commodity Hardware ,[object Object]
Expect hardware failure
Learn from those who have done it
Participate in the Open Source communityOpen Source Software Write Your Own Software ,[object Object]
Meddle
Machine learningScience ,[object Object]
Experimentation,[object Object]
The People We Needed A different approach required different skill sets ,[object Object]
Pattern recognition
Visualization
Technology
Experimentation
Where do you find hard to find tech skills?
You don’t find them. You make them.
A dedicated Science team
Non traditional researchers (Brain imaging, bioinformatics, economic modeling, genetics)
People who watch a lot of television,[object Object]
Some Things To Know, First ,[object Object]
Time shifting lessons is a whole other presentation
Time shifting + live viewing lessons is a whole other other presentation
Video on demand is a whole other other other presentation
We name names and provide numbers where clients and data partners permit
Client confidentiality is important to us
None of this work would’ve been possible without the help of our clients and partnersThis box will contain important information about the graphs on each page. Read me…
60% of TV Viewers Watch 90% of TV
Where The Other 40% Are Networks with relatively fewer lighter viewer impressions  Networks with relatively more lighter viewer impressions  Vertical: Ratio of Heavy Viewers to light viewer impressions.  Horizontal: Low rated to Highly rated networks Call outs: Ratio is the number of Heavier Viewer impressions you would deliver to reach a Lighter Viewer on a given network Higher rated networks Lower rated networks Sources: Nielsen & Simulmedia’s a7
Where The Other 40% Are To capture light viewers, media planning and measurement tools must quickly apply new methods to emerging data sets
Quality Control Is A Full Time Job
When Data Goes Missing Automation of error checking/quality control is essential Reuse the data to solve other problems Occasionally observe missing data Three choices: ,[object Object]
Estimate missing fields
Work around the missing dataTime series of SYFY network. 10645 observations from 2010.02.28 at 7:00pm Eastern to 2010.10.14 at 12:30pm Eastern Source: Simulmedia’s a7
More Data Really Is Better
Disambiguation: The Madonna Problem OR Pop Icon? Religious icon?
The Revolution of Simple Methods More data beats better algorithms. The best performing algorithm underperforms the worst algorithm when given an order of magnitude more data.  Simple algorithms at very large scale can help better predict audience movement. Peter Norvig | Internet Scale Data Analysis | June 21, 2010 Original graph sourced from: Banko & Brill, 2001. Mitigating the paucity-of-data problem: exploring the effect of training corpus size on classifier performance for natural language processing
Packaging Reach Very large data sets better predict TV audience movements Peter Norvig | Internet Scale Data Analysis | June 21, 2010

More Related Content

Similar to Early Lessons Learned in Applying Big Data To TV Advertising

Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...IABmembership
 
巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆台灣資料科學年會
 
The What, Why and How of Big Data
The What, Why and How of Big DataThe What, Why and How of Big Data
The What, Why and How of Big DataLuca Naso
 
NPTEL BIG DATA FULL PPT BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA...
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA...NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA...
NPTEL BIG DATA FULL PPT BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA...SayantanRoy14
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementationSandip Tipayle Patil
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 
Automation wars. Made by Kateryna Vasylenko and Oleksandr Yatsenko
Automation wars. Made by Kateryna Vasylenko and Oleksandr YatsenkoAutomation wars. Made by Kateryna Vasylenko and Oleksandr Yatsenko
Automation wars. Made by Kateryna Vasylenko and Oleksandr YatsenkoKate Vasylenko
 
INN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for managementINN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for managementSimen Smaaberg
 
7 ‘Hidden’ Sources of Big Data That You Have
7 ‘Hidden’ Sources of Big Data That You Have7 ‘Hidden’ Sources of Big Data That You Have
7 ‘Hidden’ Sources of Big Data That You HavePromptCloud
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Denny Lee
 
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?Tuan Yang
 
151111 BASE ELN 151112 CIO Big Data Collaboration
151111 BASE ELN 151112 CIO Big Data Collaboration151111 BASE ELN 151112 CIO Big Data Collaboration
151111 BASE ELN 151112 CIO Big Data CollaborationDr. Bill Limond
 
National Big Data Analytics (BDA) Initiative - //bina/ 2014 conference
National Big Data Analytics (BDA) Initiative - //bina/ 2014 conferenceNational Big Data Analytics (BDA) Initiative - //bina/ 2014 conference
National Big Data Analytics (BDA) Initiative - //bina/ 2014 conferencePeter Kua
 
How collaboration can change the world
How collaboration can change the world How collaboration can change the world
How collaboration can change the world Ayelet Baron
 
Big data hype (and reality)
Big data hype (and reality)Big data hype (and reality)
Big data hype (and reality)Parul Verma
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
Big Data From Hype to Reality from Richard Benjamins of Telefonica at Big Med...
Big Data From Hype to Reality from Richard Benjamins of Telefonica at Big Med...Big Data From Hype to Reality from Richard Benjamins of Telefonica at Big Med...
Big Data From Hype to Reality from Richard Benjamins of Telefonica at Big Med...ACTUONDA
 

Similar to Early Lessons Learned in Applying Big Data To TV Advertising (20)

Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
 
巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆
 
The What, Why and How of Big Data
The What, Why and How of Big DataThe What, Why and How of Big Data
The What, Why and How of Big Data
 
NPTEL BIG DATA FULL PPT BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA...
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA...NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA...
NPTEL BIG DATA FULL PPT BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA...
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Automation wars. Made by Kateryna Vasylenko and Oleksandr Yatsenko
Automation wars. Made by Kateryna Vasylenko and Oleksandr YatsenkoAutomation wars. Made by Kateryna Vasylenko and Oleksandr Yatsenko
Automation wars. Made by Kateryna Vasylenko and Oleksandr Yatsenko
 
INN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for managementINN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for management
 
7 ‘Hidden’ Sources of Big Data That You Have
7 ‘Hidden’ Sources of Big Data That You Have7 ‘Hidden’ Sources of Big Data That You Have
7 ‘Hidden’ Sources of Big Data That You Have
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
 
Big data
Big dataBig data
Big data
 
Who? What? Why we better care?
Who? What? Why we better care?Who? What? Why we better care?
Who? What? Why we better care?
 
BMP 2015 Katerina Vasilenko&Aleksander Yatsenko
BMP 2015 Katerina Vasilenko&Aleksander YatsenkoBMP 2015 Katerina Vasilenko&Aleksander Yatsenko
BMP 2015 Katerina Vasilenko&Aleksander Yatsenko
 
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?
 
151111 BASE ELN 151112 CIO Big Data Collaboration
151111 BASE ELN 151112 CIO Big Data Collaboration151111 BASE ELN 151112 CIO Big Data Collaboration
151111 BASE ELN 151112 CIO Big Data Collaboration
 
National Big Data Analytics (BDA) Initiative - //bina/ 2014 conference
National Big Data Analytics (BDA) Initiative - //bina/ 2014 conferenceNational Big Data Analytics (BDA) Initiative - //bina/ 2014 conference
National Big Data Analytics (BDA) Initiative - //bina/ 2014 conference
 
How collaboration can change the world
How collaboration can change the world How collaboration can change the world
How collaboration can change the world
 
Big data hype (and reality)
Big data hype (and reality)Big data hype (and reality)
Big data hype (and reality)
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
Big Data From Hype to Reality from Richard Benjamins of Telefonica at Big Med...
Big Data From Hype to Reality from Richard Benjamins of Telefonica at Big Med...Big Data From Hype to Reality from Richard Benjamins of Telefonica at Big Med...
Big Data From Hype to Reality from Richard Benjamins of Telefonica at Big Med...
 

Early Lessons Learned in Applying Big Data To TV Advertising

  • 1. Early Lessons Learned in Applying Big Data To TV Advertising ARF September 12, 2011 Jack Smith, Chief Product Officer, Simulmedia
  • 2. About Us Who We Are We are a New York based start-up. We are venture backed by Avalon Ventures, Union Square Ventures and Time-Warner. Where We Have Been Our 35 person team has veterans of: What We Believe Television is still the most powerful advertising medium in the world. While addressability will come, we’re not waiting for it. We’ve taken a few strategies we learned from the Internet and are applying it to linear TV advertising, today. Through partnerships with major data providers, we have assembled the world’s largest set of actionable television data. How We Do It How We Make Money We sell television advertising. With inventory in over 106 million US households, we can cost-effectively extend reach into high-value target audiences across virtually any advertiser category. We use big data and science to do this.
  • 3. Why Did We Leave The Web? Television remains the dominant consumer medium (a) Nielsen US TV Viewing AudicenceTraditional Live-Only TV based on average monthly viewing during 1Q2011. Internet and Online Video based on average monthly consumption during July 2011. Video on Demand based on consumption during May 2011.
  • 4. TV Spend Is Increasing Source: MAGNAGLOBAL
  • 5. Audience Is Fragmenting Source: Nielsen via TVbythenumbers.com
  • 6. Campaign Reach Is Declining Impossible for measurement and planning tools to keep pace Source: Simulmedia analysis of data from SQAD, Nielsen and TVB
  • 8. Big Data Is Driving Growth “We are on the cusp of a tremendous wave of innovation, productivity and growth, as well as new modes of competition and value-capture – all driven by Big Data.” - McKinsey Global Institute, May 2011 “For CMOs,Big Data is a very big deal.” - Alfredo Gangotena, CMO, Mastercard, July 2011
  • 9. Size Is Relative 1 byte x 1000 = 1 kilobyte …x 1000 = 1 megabyte …x 1000 = 1 gigabyte …x 1000 = 1 terabyte …x 1000 = 1 petabyte …x 1000 = 1 exabyte
  • 10. Size Is Relative Telegram = 100 bytes Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm
  • 11. Size Is Relative Page of an Encyclopedia = 100 kilobytes Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm
  • 12. Size Is Relative Pickup truck bed full of paper = 1 gigabyte Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm
  • 13. Size Is Relative Entire print collection of the Library of Congress = 10 terabytes Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm
  • 14. Size Is Relative All hard drives produced in 1995 = 20 petabytes Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm
  • 15. Size Is Relative All printed material = 200 petabytes Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm
  • 16. But Big Data Is More Than Size What happened? Why did it happen? BIG DATA What’s going to happen next? Time: Past Future Focus: Reporting Prediction Supports: Human decisions Machine decisions Structured Aggregated Unstructured Unaggregated Data: Dashboards Excel Discovery Visualization Statistics & Physics Human Skills:
  • 17. Accelerating The Push To Big Data Hadoop, cloud computing, Facebook, Yahoo, quants, Bittorrent, machine learning, Stanford, large hadron collider, Wal-Mart, text processing, Amazon S3 & EC2, open source intelligence, NoSQL, social media, Google, commodity hardware, Hive, fraud detection, trading desks, MapReduce, natural language processing
  • 18.
  • 19. Better predictions about consumer interests
  • 22. Interim step for addressability
  • 23. More diligence around consumer privacy
  • 24.
  • 25. Australian Bureau of Statistics: 250 tb1
  • 29.
  • 31. Google: ???1 Oracle F1Q10 Earnings Call September 16, 2009 Transcript 2Stair, Principles of Information Systems, 2009, p 181 3 Dhruba Borthakur, Facebook, December 2010, http://www.facebook.com/note.php?note_id=468211193919 4 Simulmedia estimate
  • 32.
  • 35. Double capacity every 6 months…And we don’t load every data point across all data sets, yet
  • 36.
  • 38. Learn from those who have done it
  • 39.
  • 41.
  • 42.
  • 43.
  • 48. Where do you find hard to find tech skills?
  • 49. You don’t find them. You make them.
  • 51. Non traditional researchers (Brain imaging, bioinformatics, economic modeling, genetics)
  • 52.
  • 53.
  • 54. Time shifting lessons is a whole other presentation
  • 55. Time shifting + live viewing lessons is a whole other other presentation
  • 56. Video on demand is a whole other other other presentation
  • 57. We name names and provide numbers where clients and data partners permit
  • 58. Client confidentiality is important to us
  • 59. None of this work would’ve been possible without the help of our clients and partnersThis box will contain important information about the graphs on each page. Read me…
  • 60. 60% of TV Viewers Watch 90% of TV
  • 61. Where The Other 40% Are Networks with relatively fewer lighter viewer impressions Networks with relatively more lighter viewer impressions Vertical: Ratio of Heavy Viewers to light viewer impressions. Horizontal: Low rated to Highly rated networks Call outs: Ratio is the number of Heavier Viewer impressions you would deliver to reach a Lighter Viewer on a given network Higher rated networks Lower rated networks Sources: Nielsen & Simulmedia’s a7
  • 62. Where The Other 40% Are To capture light viewers, media planning and measurement tools must quickly apply new methods to emerging data sets
  • 63. Quality Control Is A Full Time Job
  • 64.
  • 66. Work around the missing dataTime series of SYFY network. 10645 observations from 2010.02.28 at 7:00pm Eastern to 2010.10.14 at 12:30pm Eastern Source: Simulmedia’s a7
  • 67. More Data Really Is Better
  • 68. Disambiguation: The Madonna Problem OR Pop Icon? Religious icon?
  • 69. The Revolution of Simple Methods More data beats better algorithms. The best performing algorithm underperforms the worst algorithm when given an order of magnitude more data. Simple algorithms at very large scale can help better predict audience movement. Peter Norvig | Internet Scale Data Analysis | June 21, 2010 Original graph sourced from: Banko & Brill, 2001. Mitigating the paucity-of-data problem: exploring the effect of training corpus size on classifier performance for natural language processing
  • 70. Packaging Reach Very large data sets better predict TV audience movements Peter Norvig | Internet Scale Data Analysis | June 21, 2010
  • 71. The Cost Of More Data More data drives better results but there are costs
  • 72. The Data Isn’t Biased Just Because It Comes From A Set Top Box
  • 73. Applying Simple Methods At Scale High correlation of a7 measures and Nielsen estimates. Either bias is insignificant or Nielsen data and our data share the same bias. Multiple methods yield similar results Regression analysis of Nielsen Household Cume Rating against Simulmedia’s a7 cume rating. 20 Primetime Network shows with HAWAII FIVE-0. Fall 2010. Sources: Nielsen & Simulmedia’s a7
  • 74.
  • 76.
  • 77. Cross correlated individual data sets contained in a7 aggregate data set
  • 78.
  • 79.
  • 80. Closing The Loop On Program Promotion Spring 2010 broadcast premiere promotion. Horizontal: Left to right moves back in time. 0 is the premiere time. Vertical: Conversion rate is measured in percent. Size of the bubble represents total conversions for a given spot. Sources: Simulmedia’s a7
  • 81. Closing The Loop On Program Promotion Spring 2010 broadcast premiere promotion. Horizontal: Left to right moves back in time. 0 is the premiere time. Vertical: Conversion rate is measured in percent. Size of the bubble represents total conversions for a given spot. Sources: Simulmedia’s a7
  • 82. Closing The Loop Long held beliefs and rules of thumb in planning may or may not be supported by data TV marketers now have more options for show promotion
  • 83. Nielsen’s Ratings Are Good (Surprisingly Good)
  • 84. Time Series: Broadcast: CBS Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) 60 networks. High correlation between Nielsen large sample measurement and a7 measures Sources: Nielsen & Simulmedia’s a7
  • 85. Time Series: Broadcast: Fox Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) Sources: Nielsen & Simulmedia’s a7
  • 86. Time Series: Broadcast: ABC Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) Sources: Nielsen & Simulmedia’s a7
  • 87. Time Series: Cable: Investigation Discovery Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) Sources: Nielsen & Simulmedia’s a7
  • 88. Time Series: Cable: Golf Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) Sources: Nielsen & Simulmedia’s a7
  • 89. Time Series: Cable: Bravo Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) Sources: Nielsen & Simulmedia’s a7
  • 90. Time Series: Cable: ESPN2 Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) Sources: Nielsen & Simulmedia’s a7
  • 91. Time Series: Cable: Speed Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) Sources: Nielsen & Simulmedia’s a7
  • 93. When You Look Closer Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) Sources: Nielsen & Simulmedia’s a7
  • 94. High Frequency Time Series: ABC Family Volatility in dayparts, low rated networks, demographics…. Unrated networks “don’t exist.” Did NOT look at local. a7 Nielsen Sample graph from High Frequency (Second and Minute level) Time Series Analysis of 45 networks on January 19th2011. Simulmedia a7Sample (Second by Second to Minute) Nielsen Sample (Minute by Minute) Sources: Nielsen & Simulmedia’s a7
  • 95. Women Are More Different Than Men
  • 96. Gender Driven Geographic Variation Viewing by zip code among women across markets is more varied than men in the same zip codes Men 18-54 Women 18-54 Fraction of view time for ages 18-54 as fraction of view time for all TV viewers. Week 2 vs. the same fraction for week 1 (last two weeks in January). Three markets: Philadelphia (blue) Atlanta (red) and Chicago (green) Each point represents a zip code in one of these markets.  Source: Simulmedia’s a7
  • 97. Gender Driven Geographic Variation Planning tactics for female targeted campaigns should be different than male target campaigns PS…Also a good case for geo based creative versioning
  • 99.
  • 100. Make consumer privacy protection part of the business from the beginning
  • 102. No personal data or data that can be related to particular individuals or devices
  • 105. Mass Reach Is Indiscriminant
  • 106. Fragmentation Effects On Frequency Each segment was above 70% reach but the frequency distribution was nearly identical Percent of audience reached for major animated motion picture campaign 2011. Two weeks prior to release. Each stacked bar is a different audience segment. Each color with the stacked bar represents the frequency of ad view for each segment. Source: Nielsen & Simulmedia’s a7
  • 107. Fragmentation Effects On Frequency Fragmentation is affecting all high reach campaigns. Percent of audience reached for insurance advertisers September to October 2010. Approximately 8000 ads. Each stacked bar is a different audience segment. Each color with the stacked bar represents the frequency of ad view for each segment. Source: Nielsen & Simulmedia’s a7
  • 108. Fragmentation Effects On Frequency The TV advertising market can’t continue to support this
  • 109. 40% Of The Audience Is Getting 85% Of The Impressions
  • 110. Fragmentation Rears It’s Head Again Campaign impressions increasingly concentrated against heavy viewers. 0.0% 0.0 Total US Television Audience 1.4 3.6% 4.3 10.8% Percent of audience reached for a different major animated motion picture campaign 2011. Two weeks prior to release. The stacked bar represents quintiles. Blue labels are average frequency per respective quintile. Red labels are % of total campaign impressions by respective quintile. 23.0% 9.1 62.6% 24.8 Average Frequency Per Quintile % of Total Impressions Per Quintile Source: Nielsen & Simulmedia’s a7
  • 111. Fragmentation Effects on Frequency Advertisers won’t continue to support this
  • 113.
  • 118. Re-aggregate audiences using big dataWhat do you think?
  • 120.
  • 121. Previously: Chief Scientist, Tacoda. Chief Scientist, Real Media.
  • 122. Doctoral Candidate, Physics. (Condensed Matter Physics) The Ohio State University
  • 123. MS, Computer & Information Systems. The Ohio State University
  • 124. MSc, Physics. Indian Institute of Technology, Kanpur
  • 126. Previously: Clinical Research (Brain Imaging), Mount Sinai College of Medicine
  • 128. BSE, Computer Science & Engineering. University of Pennsylvania
  • 129. BA, Psychology. University of Pennsylvania
  • 131. Previously: Lecturer, Bioinformatics, New York University. Senior Consultant, Weiser LLP.
  • 133. MS, Bioinformatics. New York University
  • 134. Dr. Sidd Mukherjee, Scientist
  • 135. Previously, Visiting Scholar (Atomic Scattering experiments), The Ohio State University
  • 136. Post doctoral research, Heat capacity of Helium-4. Pennsylvania State University
  • 137. PhD, Physics. (Thesis: Measurements of Diffuse and Specular Scattering of 4He Atoms from 4He Films), Ohio State University
  • 138. MS, Computer &Information Systems. The Ohio State University
  • 139. BSc, Physics & Mathematics. University of Bombay

Editor's Notes

  1. The revolution will be televised.
  2. Audience fragmentation is going from bad to worseThis fragmentation is wrecking effective campaign reach and creating a massive frequency imbalanceAudience re-aggregation will be key for brand advertisers to maintain scaleTV is not going to the web. The web is going to television.
  3. Audience fragmentation is going from bad to worseThis fragmentation is wrecking effective campaign reach and creating a massive frequency imbalanceAudience re-aggregation will be key for brand advertisers to maintain scaleTV is not going to the web. The web is going to television.
  4. The Huntington copy is one of eleven surviving copies printed on vellum, and one of three such copies in the United States. An additional thirty-six copies printed on paper also survive.
  5. Our claim of the world's largest actionable set of TV viewing data at 75tb would be hard for anyone to challenge. The fact that we link schedule information, set-top box data and ratings data makes it even more difficult to challenge.  The most interesting discovery was that we're 3x larger than Nielsen's biggest single instance transactional datastore. (Netezza has similar kinds of multiplying factors as our data storage scheme, Hadoop.) The Numbers:Wal-Mart: 1 petabyte (800 million transactions/day across 7000 stores globally) (3)  (This is probably in a combination of HP Neoview and Teradata.)Yahoo!: 700 terabytes (1)  (Doesn't include their Hadoop cluster which is approx 15 petabytes.)Australian Bureau of Statistics: 250 terabytes (1)AT&T: 250 terabytes (1)AC Nielsen: Largest single instances: Netezza: 20 tera, Oracle: 10 tera (500 terabytes TOTAL in Netezza, 45 tera in Oracle) Most are distributed databases with client data. (1)(2)Adidas: 13 terabytesLargest Hadoop cluster (4):Facebook: 30 petabytes of storage---------------------------------------------The fine print----------NOTES:(1) From Oracle F1Q10 Earnings Call September 16, 2009 5:00 pm ET Transcript (Charles E. Phillips Jr.)Yahoo!: 700 terabytes Australian Bureau of Statistics: 250 terabytesAT&T: 250 terabytesAC Nielsen: 45-terabyte data [mart], they called itAdidas: 13 terabytes2) DBMS2:September 29, 2009What Nielsen really uses in data warehousing DBMSIn its latest earnings call, Oracle made a reference to The Nielsen Companythat was — to put it politely — rather confusing. I just plopped down in a chair next to Greg Goff, who evidently runs data warehousing at Nielsen, and had a quick chat. Here’s the real story.The Nielsen Company has over half a petabyte of data on Netezza in the US. This installation is growing.The Nielsen Company indeed has 45 terabytes or whatever of data on Oracle in its European (Customer) Information Factory. This is not particularly growing. Nielsen’s Oracle data warehouse has been built up over the past 9 years. It’s not new. It’s certainly not on Exadata, nor planned to move to Exadata.These are not single-instance databases. Nielsen’s biggest single Netezza database is 20 terabytes or so of user data, and its biggest single Oracle database is 10 terabytes or so.Much (most?) of the rest of the installations are customer data marts and the like, based in each case on the “big” central database. (That’s actually a classic data mart use case.) Greg said that Netezza’s capabilities to spin out those databases seemed pretty good.That 10 terabyte Oracle data warehouse instance requires a lot of partitioning effort and so on in the usual way.Nielsen has no immediate plans to replace Oracle with Netezza.Nielsen actually has 800 terabytes or so of Netezza equipment. Some of that is kept more lightly loaded, for performance.(3) Stair, Principles of Information Systems, 2009, p 181.(4) Dhruba Borthakur who is the Hadoop Engineer for Facebook.30petabytes in December 2010.  This is really interesting....  http://www.facebook.com/note.php?note_id=468211193919In May 2010The Datawarehouse Hadoop cluster at Facebook has become the largest known Hadoop storage cluster in the world. Here are some of the details about this single HDFS cluster:21 PB of storage in a single HDFS cluster2000 machines12 TB per machine (a few machines have 24 TB each)1200 machines with 8 cores each + 800 machines with 16 cores each32 GB of RAM per machine15 map-reduce tasks per machineThat's a total of more than 21 PB of configured storage capacity! This is larger than the previously known Yahoo!'s cluster of 14 PB. Here are the cluster statistics from the HDFS cluster at Facebook:
  6. BioinformaticsFederalist papersPhysicsBusinessdevelopement
  7. Two reasons for light viewing:Modality. People have busy lives.Fragmentation to lower measured networksThe heaviest viewers watch 3X the volume of television of the average viewer.The lightest viewers watch 5% the volume of television of the average viewer.60% of the television audience accounts for 90% of television viewing (and therefore ad impressions).  Call them the Heavier Viewers.The remaining 40% of the viewers account for only 10% of total attention to television.  These Lighter Viewers’ attention to television generates less than 1/10 the volume of impressions that a Heavier Viewer does.Without careful planning based on the best possible data resource, every 12 impressions an advertiser buys will yield one unit of reach against the 40% of the audience that are Lighter Viewers.Ratio of Heavier Viewer viewing to Lighter Viewer viewing varies by network.  Networks with a relatively greater share of viewing attributable to heavier viewers will tend to accumulate audience more slowly that networks with lower share of viewing attributable to heavier viewers.  All else equal, impressions on networks with more heavier viewer viewing will create more frequency and less reach than networks with less heavier viewer viewing.
  8. SYFY 2010.02.28 7:00:00PM to 2010.10.14 12:30PM10645 Observations for 514 stationsSometimes easy to spotFiles corruptedWhat about inconsistency in field level data?Possibly a logging problem at the STB level?Possibly an aggregation problem?
  9. Learning the difference between “bank” of a river vs “bank” as a place where you put your money.In search we called this the “Madonna problem” Madonna the religious icon vs Madonna pop culture icon
  10. Learning the difference between “bank” of a river vs “bank” as a place where you put your money.In search we called this the “Madonna problem” Madonna the religious icon vs Madonna pop culture icon
  11. Learning the difference between “bank” of a river vs “bank” as a place where you put your money.In search we called this the “Madonna problem” Madonna the religious icon vs Madonna pop culture icon
  12. Nielsen has Over The Air, Analog, Digital
  13. Nielsen has Over The Air, Analog, Digital
  14. Nielsen has Over The Air, Analog, Digital
  15. Nielsen has Over The Air, Analog, Digital
  16. Nielsen has Over The Air, Analog, DigitalImputed Nielsen’s numbers
  17. The first chart shows the Fraction of view time for women of ages 18-54 (F18-54) as fraction of view time for all tv viewers for week 2 vs the same fraction for week 1 (two weeks in January). The data is for three markets Philadelphia in blue, Atlanta in red and Chicago in green. Each point represents a zip code in one of these markets. The second chart is similar but for men 18-54 (M18-54).The distance of a point away from the diagonal line represents the variation from one week to the next for that zip code. The separation along the diagonal line represents the varying fraction of adult women between the zip codes. As an example, if there had been no change from the first week to the second, all points would have been along the diagonal.We see strong overlap of all three markets and they can't be separated in these views. However, we see significant spread of the fraction of the F18-54 group and M-18-54 group between the zip codes that compose these markets.  Women appear to show more geographically variation in their viewing habits
  18. Audience fragmentation is going from bad to worseThis fragmentation is wrecking effective campaign reach and creating a massive frequency imbalanceAudience re-aggregation will be key for brand advertisers to maintain scaleTV is not going to the web. The web is going to television.
  19. Audience fragmentation is going from bad to worseThis fragmentation is wrecking effective campaign reach and creating a massive frequency imbalanceAudience re-aggregation will be key for brand advertisers to maintain scaleTV is not going to the web. The web is going to television.
  20. Merci.