Axa Assurance Maroc - Insurer Innovation Award 2024
Video AI for Media and Entertainment Industry
1. Video AI for Media and
Entertainment Industry
Albert Y. C. Chen, Ph.D.
Vice President, R&D
Viscovery
2. Albert Y. C. Chen, Ph.D.
陳彥呈博⼠士
• Experience
2017-present: Vice President of R&D @ Viscovery
2016-2017: Chief Scientist @ Viscovery
2015: Principal Scientist @ Nervve Technologies
2013-2014 Computer Vision Scientist @ Tandent
2011-2012 @ GE Global Research
• Education
Ph.D. in Computer Science, SUNY-Buffalo
M.S. in Computer Science, NTNU
B.S. in Computer Science, NTHU
3. Viscovery = Video Discovery
Optical Character
Recognition
Offline
Recognition
2013
2014
Product Recognition
2015
Video Content related
Advertisements
2017
Wearable Devices
Video Content Discovery &
Interaction
2016
Leading provider of Video AI analytic products
4. Current AI does not “solve it all”
appl.
layer
tech
layer
infra
layer
solution
platform
libraries
modules
data
machine computing power
data accumulation via open API
AI/DNN library AI/DNN library
gen purpose
platforms
gen purpose
platforms
app-specific
platforms
app-specific
platforms
app app app app app
HW
co.
VerticalAIStartups
agri. manu. med. fin. retail trans.
E.g., 1: Google, Amazon, FB, 2: IBM, 3: Walmart, 5: NVidia
5. Vertical AI
Solving industry-specific problems by combining
AI and Subject Matter Expertise.
• Full Stack Products
• Subject Matter Expertise
• Proprietary Data
• AI delivers core value
(Bradford Cross, 2017/06/14)
7. Media & Entertainment
Industry’s challenge
• Internet Era: Make content free, maximize traffic,
ad revenue waiting at the end of the rainbow?
• It worked for nearly 20 years, with Google and
Facebook being the only beneficiary; they control
75% of digital ad revenue, 99% of future growth.
• Is this business model still working? Does it work
for others? The latest unicorns from Silicon Valley
are suggesting otherwise.
11. The curveball: App Stores
and News Syndicators!
• News Republic (acquired for 57M use, Aug 2016)
• 12.5 million daily active users
• 60k USD annual revenue
• 今⽇日頭條 (toutiao.com)
• 80 million daily active users.
• 1B USD annual revenue.
12. Pay source, or pay platform?
• Platform:
• More focus, less distraction: news focus on
content instead of customer service, software
development, etc.
• Potential Problem:
• Facebook and Google control 75% of all traffic
and 99% of expected future growth?
13. Netflix
• Netflix spends $250m USD yearly on
personalization and content recommendation.
• 104m subscribers worldwide; 52m in US (75%
market penetration, #1 in US, Youtube #2 at
53%)
• Netflix subscribers watch 19 days per month, for
28H/month (#2, less than Dish’s 47 H/month)
17. The evolution of methods for
monetizing text/video content
Struggling
Traditional
Media
Free Content
Ad Revenue
Subscription
Revenue
2000 2005 2010
Do nothing?
Sitting Duck.
Improve
Ad Revenue?
Ad Tech
now
Video
Content-related
ads
Own platform?
shared
platform, licensed
content?
tailored
recommendations
(improve UX & stickiness)
(user & video
content related
recommendations)
Video Data
Mining
18. If we already have such precise
indexing of video content
Jay Chao
singing A
dancing B
wearing C
with items D
in front of E
at time F?• We will disrupt:
• advertisement
• e-commerce
• online video platform ecosystem
• screenwriting, film producuction and film editing..
19. Video content-related
advertisements
Previous moment: dining scene Insert Food Deliver Service ad Next Moment: dining scene
饿了了吗?快点饿了了么!
Food Delivery Service Ad:
Previous moment: dining scene Insert KFC ad Next second: dining scene
炸鸡红包快
来抢!
Restaurant Ad:
24. Mining Video Content with
Computer Vision
• 85% of data are unstructured, e.g., videos.
• Previously, videos need manual tagging before its
content can be indexed and further utilized.
• Computer Vision is the AI subfield that focuses on
recognizing and understanding visual content.
25. What algorithms do we need?
Face Motion
Image
scene Text Audio Object
Semantics
26. Where are we now?
• Face
• Object
• Scene
• Logos
• Text
• Audio
• Motion
• Semantics
27. Where are new now?
Face Recognition
• 1 to 1: 99%+
• 1 to 100: 90%
• 1 to 10,000:
50%-70%.
• 1 to 1M: 30%.
LFW dataset, common FN↑, FP↓
28. Where are we now?
Image Scene Classification
• MIT Places 365
dataset.
• top-5 accuracy
rates >85%.
29. Where are we now?
Object Detection & Classification
• ImageNet Large Scale Visual
Recognition Challenge (ILSVRC)
• 1000+ classes, 1.2M images.
0
0.125
0.25
0.375
0.5
11 12 13 14 11 12 13 14
classification
error
classification
+localization error
30. Putting things together is not
trivial and often very messy.
Classical Workflow:
1. Data collection
2. Feature Extraction
3. Dimension Reduction
4. Classifier (re)Design
5. Classifier Verification
6. Deploy
Modern Brute-force workflow
1. Data collection
2. Throw everything into a Deep Neural Network
3. Mommy, why doesn’t it work ???
31. Classical Problem #1:
Curse of Dimensionality
坐
ze
sit
って
앉다
sentarse
• Number of Variables vs Number of Samples
Q. Who would make such naive mistakes?
A. Many “newbies” repeatedly do so.
32. Example 1-1:
illegal parking detection
legal parking samples x100 illegal parking samples x100
Let’s train a 150-layer Res-Net!!!
What could possibly go wrong?
33. Example 1-1:
illegal parking detection
• Data: try cleaner data
• Feature: fine-tune with pre-trained model; don’t
train from scratch
• Classifier overfitting: beware of statistical
coincidences,
35. Example 1-2: Smart Photo
Album with Google Cloud Vision
No effective distance measure for thousands,
if not millions of dimensions (tags); would be
approximately zero most of the time.
36. Classical Problem #2:
Overfitting Data
• Make sure your deep learning algorithm is
learning better features for data, not overfitting
the data with complex classifiers.
37. Luckily, we’re in AI startup boom!
(BCG AI Report, 2016/10)
appl.
layer
tech
layer
infra
layer
solution
platform
libraries
modules
data
machine computing power
data accumulation via open API
AI/DNN library AI/DNN library
gen purpose
platforms
gen purpose
platforms
app-specific
platforms
app-specific
platforms
app app app app app
HW
co.
VerticalAIStartups
agri. manu. med. fin. retail trans.
E.g., 1: Google, Amazon, FB, 2: IBM, 3: Walmart, 5: NVidia
38. Vertical AI Startups
Solving industry-specific problems by combining
AI and Subject Matter Expertise.
• Full Stack Products
• Subject Matter Expertise
• Proprietary Data
• AI delivers core value
(Bradford Cross, 2017/06/14)
40. TOP 5 TAGS COMPARISON
TAG
AD PLACEMENT
VALUE
TAG
AD PLACEMENT
VALUE
Person Low
Coulee Nazha
(actress)
High
Anime Low Sean Sun (actor) High
Screenshot Low Back of smartphone High
Cartoon Low Female Medium
Adult Medium Young Medium
“FIRST LOVE” DRAMA SERIES SCENE
Competitive Analysis
Baidu vs. Viscovery
TOP 5 TAGS COMPARISON
TAG (Man’s Face)
AD PLACEMENT
VALUE
TAG
AD PLACEMENT
VALUE
Age: 32 Medium Necklace High
Asian Medium Baseball cap High
Male Medium Bracelet High
Not smiling Low (inaccurate) Ziwen Wang High
Examples of Vertical AI
beating General Purpose AI
41. Use AI to turn unstructured
video data into a gold mine!
60 mins0 mins
服饰 汽⻋车
代⾔言⼈人
聚会
⼿手机
居家
z
CTR: 0.2%
60 mins0 mins
旅游 活⼒力力 汽⻋车
⼯工作 聊天
z
60 mins0 mins
学习
using only physical tags
for recommendation
CTR: 0.9%
CTR: 2.0%
z
z
Smartphone Ad physical plus abstract
and emotional tags
physical, abstract and
emotional tags plus feedback
客厅
欢乐客厅
聊天⼯工作⼿手机 代⾔言⼈人 欢乐旅游