SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Identifying use cases
and evaluating ML technology
Case: the Metadata Machine Project at Yle
Lauri Saarikoski, Yle, the Finnish Broadcasting Company
Matt Eaton, GrayMeta
Background and context
Metadata Machine, FIAT/IFTA 2019
Industry Drivers:
Content
Lorem ipsum
congue
Volume
Lorem
ipsum
Lorem
ipsum
Lorem
ipsum
Lorem ipsum
congue
Lorem
ipsum
Lorem
ipsum
Lorem
ipsum
Lorem
ipsum
Lorem ipsum
congue
Lorem
ipsum
Lorem
ipsum
Lorem
ipsum
Lorem
ipsum
Need for
More
Content
Metadata
Volume
Increasing
Personalised
Content
Expected
New Ways
of
Consuming
Content
More
Distributio
n Channels
Metadata Machine, FIAT/IFTA 2019
Industry Drivers:
Technology
Generating &
Using New
Content
Metadata
Easier
Rapidly
Maturing
Machine
Learning
Services
Growing
Adoption of
Cloud
Services
APIs
Enabling
Easier
Integration
Metadata Machine, FIAT/IFTA 2019
Archives as part of a Media Company, case Yle
Media Production
● inhouse
● production houses
Media Distribution &
Publishing
● Online, TV, Radio
● 3rd party platforms
Yle Archives
[that’s us, 61 people]
Metadata Machine, FIAT/IFTA 2019
Data in and out of the archives
Yle Archives
Production
Distribution
Existing collections
5,6 PB of media,
4,7 M metadata objects,
annually 12k legacy
objects get some
metadata added
Audience analytics,
publishing metadata
Annual hours:
47k audio,
19k linear video
Media,
content descriptions
Annual hours:
Video 6,7k
Audio 27k
+ photos, music
Metadata Machine, FIAT/IFTA 2019
Help from
Automated Metadata?
1. Increasing Volumes of content being archived
2. Demand for Archive content increasing, as
distribution to multiple platforms require content
3. Data entry is compromised by number of
resources available to manually enter data
4. Only a small % of the archive is enriched with
detailed metadata
5. New types of metadata needed e.g. Audio
Classification & facial detection
6. [Historical] Content which lack metadata entirely
7. Compromise of metadata Quality due to time
constraints.
Why look into machine learning
in the first place?
Metadata Machine, FIAT/IFTA 2019
Series of Pilots at Yle Archives
2016
2017
2018
2019
● Music identification
● Automated keyword
extraction market review
● Video recognition
technical pilots
● Image recognition
market review & piloting
● Face recognition
technical pilot
● Speech recognition
technical pilot
● Speech recognition on
demand MVP
● Speech recognition &
automated keyword
extraction MVP● IPR related to
data mining, memad.eu
● Study groups on AI ● Metadata Machine
● Facial recognition data
as UX
in production systems
Pilot projects
reduce complexity
After each pilot,
.. we know more about what to do next
and what questions still need looking
into.
.. our people have a more realistic view
on what to expect and what to discuss.
.. our people are more involved in
designing their work and
[maybe] find new tools more
acceptable than earlier.
Metadata Machine, FIAT/IFTA 2019
People Technology
Organisation
Metadata
Conventions
The Metadata Machine
Pre-study / PoC Spring 2019
All audiovisual content is analysed as early as possible
Vision: The Metadata Machine
Content creation
(raw material)
Procurement
(ready-made
content)
Publishing
(published content)
Archiving
(what do we have?)
Automatic content analysis engine
Speech
recognition
Image
recognition
Person identifier Fingerprinting
Sound
identifying
Video frame
color analysis
Music identifier
Text analysis
Company-wide metadata database on all content items
Language
identifier
...
Metadata Machine, FIAT/IFTA 2019
Approach of this Pre-study Project
Business
Cases
Company
Wide
Solution
Multiple
AME
services
What’s
available
today
Single
Solution
Single
Providers
What
could be
available
Our Focus
Avoided
How, when and why to get in to production?
Technological
Details
How to
build the
road to
future
Yet
another
POC
Metadata Machine, FIAT/IFTA 2019
Business Units Involved and Covered Today
Production
Management
Photo Archive
Video / Audio
Archives
Audience
Insight
OnDemand
(Yle Areena)
News
SportRadio / Audio
Master Control
Room & Media
Logistics
Senior
Leadership
Translation &
Versioning
Team
Architecture &
Technology
Metadata Machine, FIAT/IFTA 2019
Project Timeline and Structure
1. Buy a metadata machine (Graymeta Curio)
2. Involve the whole company to identify potential use cases for automatic metadata
3. Each team tries to solve their use cases with the machine
4. Collect the results from the teams, identify the most prominent business cases
5. Decide on the next step, e.g. investment
test round 1 test round 2 test round 3
analysis &
next steps
February September
©2019 GrayMeta. All rights reserved. PROPRIETARY & CONFIDENTIAL
Curio connects all of your content,
creating a single interface & API to
find and use any file.
Utilize automated metadata to help
find and use your content.
©2019 GrayMeta. All rights reserved. PROPRIETARY & CONFIDENTIAL
Metadata Machine, FIAT/IFTA 2019
Metadata Machine by the Numbers
637 hours of content
2193 assets22 Yle Testers
7 different ML providers used
23 Use cases analysed
37 audio files
873 images
1281 videos
2 application files5 insight groups
45 Different Automatic Metadata Extractors
100,000+ API calls
Metadata Machine, FIAT/IFTA 2019
Identifying Use Cases for Automated Metadata
100+ ideas
10+ proof of
concepts
1+ to production 10+ to production
Metadata Machine pre-study project 2019
Metadata Machine, FIAT/IFTA 2019
Approaching the Business Value
What kind of metadata
does it require?
Does it improve existing
processes?
Does it enable something
completely new?
Does it save money
or time?
Does it increase customer
satisfaction?
Is the technology solution
available today?
What are the direct and
indirect costs involved?
How to optimize the costs?
How does it affect the
surrounding production
process / way of working?
How to combine human
work with automation?
What are the success
criterias / KPIs … ?
...
Archive Use Cases
[After being refined during and after the project]
Metadata Machine, FIAT/IFTA 2019
Main Archive Use Cases
1. Enriching existing metadata for archive content and adding new types of metadata
• It also can be used to improve search functionality
2. New possibilities to browse and navigate through the Archive collection
• New faceted navigation functionality
• Ability to effectively browse through and filter archive based on enriched content.
3. Enhancing the metadata creation workflows by increasing automation levels.
• Automation allows for focus on metadata quality
• Automation allows for the team to manage the increase in demand and content being
archived.
Metadata Machine, FIAT/IFTA 2019
What Did We Do?
Aim:
To test various Automated Metadata Services as to understand where the metadata creation workflows can be enhanced, by increasing automation levels; as well
as, being able to generate metadata for collections which are currently lacking.
Services Tested:
● OCR - Optical Character Recognition
● Speech to Text
● Tags & Descriptions
Methods Used:
● Variations on Extractor Settings & Thresholds, to refine results and remove false positives.
● Varied content from old Black and White footage to newer content - to understand the value across the entire archive.
Selected test videos from
Archive to test against various
Machine Learning (ML) Services
Configured relevant Machine
Learning (ML) Services to
Process Photos.
Refined and Tested different
extractors and confidence
thresholds.
Compared results and built
conclusions on the validity of
services for the Archive Team
1 2 3 4
● Facial Detection
● Audio Classification
● Logo Detection
● Curio’s User Interface
● Curio’s API
Sample Facial and Logo Recognition
Selected test material from
photo archive to test against
Identified & Trained Persons of
interest within AI Studio.
Identified & Trained Logos of
interest within AI Studio.
Configured relevant Machine
Learning (ML) Services to
Process Images.
Analysed the impact on Search
& Discovery through Facial
Detection
Analysed the impact on Search
& Discovery through Tags,
Descriptions & OCR
Analysed the impact on Search
& Discovery through Logo
Detection
Reviewed and reported on the
success & failures of ML for
Images
1
5
2
6 7
3
8
4
Metadata Machine, FIAT/IFTA 2019
Metadata Machine, FIAT/IFTA 2019
Sample Speech to Text (ASR)
Metadata Machine, FIAT/IFTA 2019
Sample OCR
Metadata Machine, FIAT/IFTA 2019
Sample Audio Detection - Radio Content
Metadata Machine, FIAT/IFTA 2019
Sample Audio Detection - Video Content
Findings and conclusions
Learned during this and previous projects
Metadata Machine, FIAT/IFTA 2019
Conclusion 1:
Identified Basic
Analysis Bundles for
Audio and Video
Drafted based on
● User needs
● Technology readiness level
● Availability of technology
● Expected impact vs. costs
Basic Audio analysis bundle:
Speech/music segmentation + ASR +
automated tagging
● later expand with speaker
identification, spoken language
identification etc.
● Can also be applied to video
Basic Video analysis bundle:
Audio analysis bundle +
facial recognition + OCR
● generic video analysis and
natural language descriptions
are not ready enough
Metadata Machine, FIAT/IFTA 2019
Conclusion 1:
Identified Basic
Analysis Bundles for
Audio and Video
Metadata Machine, FIAT/IFTA 2019
2: Importance of
Focusing on Use Cases -
Not Only Technology
Technology Requirement:
“We should use ML on archive content to get metadata”
Define Use Case How Business Will Use Technology
“We should get a better sense of what happens during a
single archive program”
a. As a user I can navigate within a program
based on what is discussed in it.
b. As a user I can navigate within a program
based on topics discussed in it.
c. As a user I can see in the ASR result
who is speaking and
where there is music in the program.
Without a use case, it is hard to know
how well you perform.
What to measure?
● For technical tests:
technical criteria (WER, recall etc.)
● For a use case: added value
“What effect did this have on your
work?”
“Did it help?”
Success criteria depends on the case
you are solving, not on technical
metrics.
Metadata Machine, FIAT/IFTA 2019
2: Importance of
Focusing on
Use Cases -
Not Only Technology
3: Strategies for Integrating
Machine Learning into
Human Processes
1. Use the data as it comes from ML
services
→ When people won’t be looking at the data & the quality is
good enough?
OR
2. Interact with the ML services and the data
→ If the ROI of human effort makes sense?
OR
3. Look at the ML data but enter the
metadata manually
→ When automation helps you grasp the context but end
results need to be high quality?
Metadata Machine, FIAT/IFTA 2019
Humans
Media storage
Metadata Machine, FIAT/IFTA 2019
Human tasks, system roles
Metadata storage - Across Content Supply Chain (not just Archive)
UX for testing and
comparing ML services
Process orchestration
ML services
Metadata
unifier & mapper
ML services
ML services
ML services
ML services
Search & Browse UX
+ Curate the ML models,
metadata mappings etc.
+ Use the metadata and
review it’s quality
Systems
+ Review ML services
and build on them
UX for managing ML
Machine Learning Services
Metadata Machine, FIAT/IFTA 2019
Future Roadmap
Short Term Medium Term Long Term
For Archive Content with No Metadata
- deploy Audio & Visual ML Bundles
Introduce face recognition
in a subset of Archive
Further Investigate
Audio Classification
Speaker Recognition Based
on Audio
Increase Automation of
metadata forms
Process Rest of Archive
Introduce Machine Learning for High Re-Use
Archive Content
Perhaps in
collaboration with
Sports Production
or Analytics
Identify Speaker dependent on
new machine learning services
Use Machine Learning to generate
content metadata on portion of archive
where no metadata exists
Detect Music vs
Speech to speed
up assessing
rights clearance
and focus ASR
use
Augment existing manually
created metadata forms for
archived content with automated
machine learning
Metadata Machine, FIAT/IFTA 2019
● For piloting purposes the content volumes can be kept low
○ Someone typically has to go through the results afterwards
● Involve the wider Archive team to help evaluate the use cases and to increase the sense of involvement
○ This will lead into a better understanding of technology among your staff
○ Expectation management: getting rid of fears and hype
● Distributed work is great for involving people, but needs careful management
○ The project team can support and help with e.g. evaluation criteria and setup & facilitate discussions
● Reserve enough time for setting up and result analysis
○ Setting up the systems and logistics take their own time
○ Running the analyses is fast, making sense of the results is slow.
○ Before you start:
● narrow down the ‘thing’ you are testing / evaluating
● decide on your methodology / approach
● choose suitable content for testing
○ Reporting results and conclusions is easier if the project setting supports this from the beginning
Practical advice for running a project
Final Thoughts
After deciding your first Use Case you can..
- Define success criteria for this case and measure it
- Build a roadmap to realise and enhance your case
- Find right components to realise your case
- Decide on the type of human curation needed for your case
Some types of Machine Learning services can already be put to good use.
Iterative approach seems to work well since everything is changing and you need to start your
journey sooner rather than later.
Please share your ideas for smart types of human participation in the ML assisted work!
Metadata Machine, FIAT/IFTA 2019
Contact information:
Lauri Saarikoski
Development Manager at Yle, the Finnish Broadcasting Company
lauri.saarikoski@yle.fi
Matt Eaton
Managing Director, EMEA at GrayMeta Inc
matt.eaton@graymeta.com
Thank you!
Metadata Machine, FIAT/IFTA 2019
SAARIKOSKI YLE metadata machine

Weitere ähnliche Inhalte

Ähnlich wie SAARIKOSKI YLE metadata machine

Selkala viljanen identifying the business case for automatic metadata in the ...
Selkala viljanen identifying the business case for automatic metadata in the ...Selkala viljanen identifying the business case for automatic metadata in the ...
Selkala viljanen identifying the business case for automatic metadata in the ...FIAT/IFTA
 
Stermedia - AI and software solutions for manufacturing/industry 4.0
Stermedia - AI and software solutions for manufacturing/industry 4.0Stermedia - AI and software solutions for manufacturing/industry 4.0
Stermedia - AI and software solutions for manufacturing/industry 4.0stermedia
 
leewayhertz.com-How to build a generative AI solution From prototyping to pro...
leewayhertz.com-How to build a generative AI solution From prototyping to pro...leewayhertz.com-How to build a generative AI solution From prototyping to pro...
leewayhertz.com-How to build a generative AI solution From prototyping to pro...KristiLBurns
 
Leveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data IntegrationLeveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data IntegrationSafe Software
 
The Need for IoT Ecosystem to become a Producer Nation
The Need for IoT Ecosystem to become a Producer NationThe Need for IoT Ecosystem to become a Producer Nation
The Need for IoT Ecosystem to become a Producer NationDr. Mazlan Abbas
 
IoT Analytics From Data to Decision Making - Trends & Challenges
IoT Analytics From Data to Decision Making- Trends & ChallengesIoT Analytics From Data to Decision Making- Trends & Challenges
IoT Analytics From Data to Decision Making - Trends & ChallengesDr. Mazlan Abbas
 
Session 4 - A practical journey on how to use the DataBench Toolbox
Session 4 - A practical journey on how to use the DataBench ToolboxSession 4 - A practical journey on how to use the DataBench Toolbox
Session 4 - A practical journey on how to use the DataBench ToolboxDataBench
 
OneBot: A Comprehensive Case Study on Enterprise Digital Assistants
OneBot: A Comprehensive Case Study on Enterprise Digital AssistantsOneBot: A Comprehensive Case Study on Enterprise Digital Assistants
OneBot: A Comprehensive Case Study on Enterprise Digital AssistantsSoham Dasgupta
 
WSO2 ITALIA SMART TALK #4 - Telefonica Use Case
WSO2 ITALIA SMART TALK #4 - Telefonica Use CaseWSO2 ITALIA SMART TALK #4 - Telefonica Use Case
WSO2 ITALIA SMART TALK #4 - Telefonica Use CaseProfesia Srl, Lynx Group
 
AI in the Enterprise
AI in the EnterpriseAI in the Enterprise
AI in the EnterpriseRon Bodkin
 
Overview about Emerging Technologies
Overview about Emerging TechnologiesOverview about Emerging Technologies
Overview about Emerging TechnologiesMurali Venkatesh
 
Breaking the barriers of Internet of Things (IoT)
Breaking the barriers of Internet of Things (IoT)Breaking the barriers of Internet of Things (IoT)
Breaking the barriers of Internet of Things (IoT)Dr. Mazlan Abbas
 
ICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTDr. Haxel Consult
 
Bitrock manufacturing
Bitrock manufacturing Bitrock manufacturing
Bitrock manufacturing cosma_r
 
[MindsLab] company intro 201711
[MindsLab] company intro 201711[MindsLab] company intro 201711
[MindsLab] company intro 201711Taejoon Yoo
 
Intelligenza artificiale: le sue potenzialità, la bozza di regolamento UE e r...
Intelligenza artificiale: le sue potenzialità, la bozza di regolamento UE e r...Intelligenza artificiale: le sue potenzialità, la bozza di regolamento UE e r...
Intelligenza artificiale: le sue potenzialità, la bozza di regolamento UE e r...Giulio Coraggio
 
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...Safe Software
 
Coinbox Generative AI Slide.pdf
Coinbox Generative AI Slide.pdfCoinbox Generative AI Slide.pdf
Coinbox Generative AI Slide.pdfDavidAlozie4
 

Ähnlich wie SAARIKOSKI YLE metadata machine (20)

Selkala viljanen identifying the business case for automatic metadata in the ...
Selkala viljanen identifying the business case for automatic metadata in the ...Selkala viljanen identifying the business case for automatic metadata in the ...
Selkala viljanen identifying the business case for automatic metadata in the ...
 
Stermedia - AI and software solutions for manufacturing/industry 4.0
Stermedia - AI and software solutions for manufacturing/industry 4.0Stermedia - AI and software solutions for manufacturing/industry 4.0
Stermedia - AI and software solutions for manufacturing/industry 4.0
 
leewayhertz.com-How to build a generative AI solution From prototyping to pro...
leewayhertz.com-How to build a generative AI solution From prototyping to pro...leewayhertz.com-How to build a generative AI solution From prototyping to pro...
leewayhertz.com-How to build a generative AI solution From prototyping to pro...
 
Leveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data IntegrationLeveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data Integration
 
The Need for IoT Ecosystem to become a Producer Nation
The Need for IoT Ecosystem to become a Producer NationThe Need for IoT Ecosystem to become a Producer Nation
The Need for IoT Ecosystem to become a Producer Nation
 
IoT Analytics From Data to Decision Making - Trends & Challenges
IoT Analytics From Data to Decision Making- Trends & ChallengesIoT Analytics From Data to Decision Making- Trends & Challenges
IoT Analytics From Data to Decision Making - Trends & Challenges
 
Accenture Robotics Platform
Accenture Robotics PlatformAccenture Robotics Platform
Accenture Robotics Platform
 
Session 4 - A practical journey on how to use the DataBench Toolbox
Session 4 - A practical journey on how to use the DataBench ToolboxSession 4 - A practical journey on how to use the DataBench Toolbox
Session 4 - A practical journey on how to use the DataBench Toolbox
 
OneBot: A Comprehensive Case Study on Enterprise Digital Assistants
OneBot: A Comprehensive Case Study on Enterprise Digital AssistantsOneBot: A Comprehensive Case Study on Enterprise Digital Assistants
OneBot: A Comprehensive Case Study on Enterprise Digital Assistants
 
WSO2 ITALIA SMART TALK #4 - Telefonica Use Case
WSO2 ITALIA SMART TALK #4 - Telefonica Use CaseWSO2 ITALIA SMART TALK #4 - Telefonica Use Case
WSO2 ITALIA SMART TALK #4 - Telefonica Use Case
 
AI in the Enterprise
AI in the EnterpriseAI in the Enterprise
AI in the Enterprise
 
Overview about Emerging Technologies
Overview about Emerging TechnologiesOverview about Emerging Technologies
Overview about Emerging Technologies
 
Breaking the barriers of Internet of Things (IoT)
Breaking the barriers of Internet of Things (IoT)Breaking the barriers of Internet of Things (IoT)
Breaking the barriers of Internet of Things (IoT)
 
Jan Oeberg, ITAMOrg: New IT Asset Management Organization launched (TFT14 Sum...
Jan Oeberg, ITAMOrg: New IT Asset Management Organization launched (TFT14 Sum...Jan Oeberg, ITAMOrg: New IT Asset Management Organization launched (TFT14 Sum...
Jan Oeberg, ITAMOrg: New IT Asset Management Organization launched (TFT14 Sum...
 
ICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPT
 
Bitrock manufacturing
Bitrock manufacturing Bitrock manufacturing
Bitrock manufacturing
 
[MindsLab] company intro 201711
[MindsLab] company intro 201711[MindsLab] company intro 201711
[MindsLab] company intro 201711
 
Intelligenza artificiale: le sue potenzialità, la bozza di regolamento UE e r...
Intelligenza artificiale: le sue potenzialità, la bozza di regolamento UE e r...Intelligenza artificiale: le sue potenzialità, la bozza di regolamento UE e r...
Intelligenza artificiale: le sue potenzialità, la bozza di regolamento UE e r...
 
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...
 
Coinbox Generative AI Slide.pdf
Coinbox Generative AI Slide.pdfCoinbox Generative AI Slide.pdf
Coinbox Generative AI Slide.pdf
 

Mehr von FIAT/IFTA

2021 FIAT/IFTA Timeline Survey
2021 FIAT/IFTA Timeline Survey2021 FIAT/IFTA Timeline Survey
2021 FIAT/IFTA Timeline SurveyFIAT/IFTA
 
20211021 FIAT/IFTA Most Wanted List
20211021 FIAT/IFTA Most Wanted List20211021 FIAT/IFTA Most Wanted List
20211021 FIAT/IFTA Most Wanted ListFIAT/IFTA
 
WARBURTON FIAT/IFTA Timeline Survey results 2020
WARBURTON FIAT/IFTA Timeline Survey results 2020WARBURTON FIAT/IFTA Timeline Survey results 2020
WARBURTON FIAT/IFTA Timeline Survey results 2020FIAT/IFTA
 
OOMEN MEZARIS ReTV
OOMEN MEZARIS ReTVOOMEN MEZARIS ReTV
OOMEN MEZARIS ReTVFIAT/IFTA
 
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)FIAT/IFTA
 
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉCULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉFIAT/IFTA
 
HULSENBECK Value Use and Copyright Comission initiatives
HULSENBECK Value Use and Copyright Comission initiativesHULSENBECK Value Use and Copyright Comission initiatives
HULSENBECK Value Use and Copyright Comission initiativesFIAT/IFTA
 
WILSON Film digitisation at BBC Scotland
WILSON Film digitisation at BBC ScotlandWILSON Film digitisation at BBC Scotland
WILSON Film digitisation at BBC ScotlandFIAT/IFTA
 
GOLODNOFF We need to make our past accessible!
GOLODNOFF We need to make our past accessible!GOLODNOFF We need to make our past accessible!
GOLODNOFF We need to make our past accessible!FIAT/IFTA
 
LORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal depositLORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal depositFIAT/IFTA
 
BIRATUNGANYE Shock of formats
BIRATUNGANYE Shock of formatsBIRATUNGANYE Shock of formats
BIRATUNGANYE Shock of formatsFIAT/IFTA
 
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...FIAT/IFTA
 
BERGER RIPPON BBC Music memories
BERGER RIPPON BBC Music memoriesBERGER RIPPON BBC Music memories
BERGER RIPPON BBC Music memoriesFIAT/IFTA
 
AOIBHINN and CHOISTIN Rehash your archive
AOIBHINN and CHOISTIN Rehash your archiveAOIBHINN and CHOISTIN Rehash your archive
AOIBHINN and CHOISTIN Rehash your archiveFIAT/IFTA
 
HULSENBECK BLOM A blast from the past open up
HULSENBECK BLOM A blast from the past open upHULSENBECK BLOM A blast from the past open up
HULSENBECK BLOM A blast from the past open upFIAT/IFTA
 
PERVIZ Automated evolvable media console systems in digital archives
PERVIZ Automated evolvable media console systems in digital archivesPERVIZ Automated evolvable media console systems in digital archives
PERVIZ Automated evolvable media console systems in digital archivesFIAT/IFTA
 
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AIAICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AIFIAT/IFTA
 
VINSON Accuracy and cost assessment for archival video transcription methods
VINSON Accuracy and cost assessment for archival video transcription methodsVINSON Accuracy and cost assessment for archival video transcription methods
VINSON Accuracy and cost assessment for archival video transcription methodsFIAT/IFTA
 
LYCKE Artificial intelligence, hype or hope?
LYCKE Artificial intelligence, hype or hope?LYCKE Artificial intelligence, hype or hope?
LYCKE Artificial intelligence, hype or hope?FIAT/IFTA
 
AZIZ BABBUCCI Let's play with the archive
AZIZ BABBUCCI Let's play with the archiveAZIZ BABBUCCI Let's play with the archive
AZIZ BABBUCCI Let's play with the archiveFIAT/IFTA
 

Mehr von FIAT/IFTA (20)

2021 FIAT/IFTA Timeline Survey
2021 FIAT/IFTA Timeline Survey2021 FIAT/IFTA Timeline Survey
2021 FIAT/IFTA Timeline Survey
 
20211021 FIAT/IFTA Most Wanted List
20211021 FIAT/IFTA Most Wanted List20211021 FIAT/IFTA Most Wanted List
20211021 FIAT/IFTA Most Wanted List
 
WARBURTON FIAT/IFTA Timeline Survey results 2020
WARBURTON FIAT/IFTA Timeline Survey results 2020WARBURTON FIAT/IFTA Timeline Survey results 2020
WARBURTON FIAT/IFTA Timeline Survey results 2020
 
OOMEN MEZARIS ReTV
OOMEN MEZARIS ReTVOOMEN MEZARIS ReTV
OOMEN MEZARIS ReTV
 
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
 
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉCULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
 
HULSENBECK Value Use and Copyright Comission initiatives
HULSENBECK Value Use and Copyright Comission initiativesHULSENBECK Value Use and Copyright Comission initiatives
HULSENBECK Value Use and Copyright Comission initiatives
 
WILSON Film digitisation at BBC Scotland
WILSON Film digitisation at BBC ScotlandWILSON Film digitisation at BBC Scotland
WILSON Film digitisation at BBC Scotland
 
GOLODNOFF We need to make our past accessible!
GOLODNOFF We need to make our past accessible!GOLODNOFF We need to make our past accessible!
GOLODNOFF We need to make our past accessible!
 
LORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal depositLORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal deposit
 
BIRATUNGANYE Shock of formats
BIRATUNGANYE Shock of formatsBIRATUNGANYE Shock of formats
BIRATUNGANYE Shock of formats
 
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
 
BERGER RIPPON BBC Music memories
BERGER RIPPON BBC Music memoriesBERGER RIPPON BBC Music memories
BERGER RIPPON BBC Music memories
 
AOIBHINN and CHOISTIN Rehash your archive
AOIBHINN and CHOISTIN Rehash your archiveAOIBHINN and CHOISTIN Rehash your archive
AOIBHINN and CHOISTIN Rehash your archive
 
HULSENBECK BLOM A blast from the past open up
HULSENBECK BLOM A blast from the past open upHULSENBECK BLOM A blast from the past open up
HULSENBECK BLOM A blast from the past open up
 
PERVIZ Automated evolvable media console systems in digital archives
PERVIZ Automated evolvable media console systems in digital archivesPERVIZ Automated evolvable media console systems in digital archives
PERVIZ Automated evolvable media console systems in digital archives
 
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AIAICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
 
VINSON Accuracy and cost assessment for archival video transcription methods
VINSON Accuracy and cost assessment for archival video transcription methodsVINSON Accuracy and cost assessment for archival video transcription methods
VINSON Accuracy and cost assessment for archival video transcription methods
 
LYCKE Artificial intelligence, hype or hope?
LYCKE Artificial intelligence, hype or hope?LYCKE Artificial intelligence, hype or hope?
LYCKE Artificial intelligence, hype or hope?
 
AZIZ BABBUCCI Let's play with the archive
AZIZ BABBUCCI Let's play with the archiveAZIZ BABBUCCI Let's play with the archive
AZIZ BABBUCCI Let's play with the archive
 

Kürzlich hochgeladen

Introducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsIntroducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsKnowledgeSeed
 
Types of Cyberattacks - ASG I.T. Consulting.pdf
Types of Cyberattacks - ASG I.T. Consulting.pdfTypes of Cyberattacks - ASG I.T. Consulting.pdf
Types of Cyberattacks - ASG I.T. Consulting.pdfASGITConsulting
 
Neha Jhalani Hiranandani: A Guide to Her Life and Career
Neha Jhalani Hiranandani: A Guide to Her Life and CareerNeha Jhalani Hiranandani: A Guide to Her Life and Career
Neha Jhalani Hiranandani: A Guide to Her Life and Careerr98588472
 
Onemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
Onemonitar Android Spy App Features: Explore Advanced Monitoring CapabilitiesOnemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
Onemonitar Android Spy App Features: Explore Advanced Monitoring CapabilitiesOne Monitar
 
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdfSherl Simon
 
Excvation Safety for safety officers reference
Excvation Safety for safety officers referenceExcvation Safety for safety officers reference
Excvation Safety for safety officers referencessuser2c065e
 
Planetary and Vedic Yagyas Bring Positive Impacts in Life
Planetary and Vedic Yagyas Bring Positive Impacts in LifePlanetary and Vedic Yagyas Bring Positive Impacts in Life
Planetary and Vedic Yagyas Bring Positive Impacts in LifeBhavana Pujan Kendra
 
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...Operational Excellence Consulting
 
Rakhi sets symbolizing the bond of love.pptx
Rakhi sets symbolizing the bond of love.pptxRakhi sets symbolizing the bond of love.pptx
Rakhi sets symbolizing the bond of love.pptxRakhi Bazaar
 
Implementing Exponential Accelerators.pptx
Implementing Exponential Accelerators.pptxImplementing Exponential Accelerators.pptx
Implementing Exponential Accelerators.pptxRich Reba
 
digital marketing , introduction of digital marketing
digital marketing , introduction of digital marketingdigital marketing , introduction of digital marketing
digital marketing , introduction of digital marketingrajputmeenakshi733
 
WSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdfWSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdfJamesConcepcion7
 
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...ssuserf63bd7
 
71368-80-4.pdf Fast delivery good quality
71368-80-4.pdf Fast delivery  good quality71368-80-4.pdf Fast delivery  good quality
71368-80-4.pdf Fast delivery good qualitycathy664059
 
Unveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesUnveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesDoe Paoro
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckHajeJanKamps
 
MEP Plans in Construction of Building and Industrial Projects 2024
MEP Plans in Construction of Building and Industrial Projects 2024MEP Plans in Construction of Building and Industrial Projects 2024
MEP Plans in Construction of Building and Industrial Projects 2024Chandresh Chudasama
 
Welding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan DynamicsWelding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan DynamicsIndiaMART InterMESH Limited
 
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...SOFTTECHHUB
 
Simplify Your Funding: Quick and Easy Business Loans
Simplify Your Funding: Quick and Easy Business LoansSimplify Your Funding: Quick and Easy Business Loans
Simplify Your Funding: Quick and Easy Business LoansNugget Global
 

Kürzlich hochgeladen (20)

Introducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsIntroducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applications
 
Types of Cyberattacks - ASG I.T. Consulting.pdf
Types of Cyberattacks - ASG I.T. Consulting.pdfTypes of Cyberattacks - ASG I.T. Consulting.pdf
Types of Cyberattacks - ASG I.T. Consulting.pdf
 
Neha Jhalani Hiranandani: A Guide to Her Life and Career
Neha Jhalani Hiranandani: A Guide to Her Life and CareerNeha Jhalani Hiranandani: A Guide to Her Life and Career
Neha Jhalani Hiranandani: A Guide to Her Life and Career
 
Onemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
Onemonitar Android Spy App Features: Explore Advanced Monitoring CapabilitiesOnemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
Onemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
 
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf
 
Excvation Safety for safety officers reference
Excvation Safety for safety officers referenceExcvation Safety for safety officers reference
Excvation Safety for safety officers reference
 
Planetary and Vedic Yagyas Bring Positive Impacts in Life
Planetary and Vedic Yagyas Bring Positive Impacts in LifePlanetary and Vedic Yagyas Bring Positive Impacts in Life
Planetary and Vedic Yagyas Bring Positive Impacts in Life
 
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
 
Rakhi sets symbolizing the bond of love.pptx
Rakhi sets symbolizing the bond of love.pptxRakhi sets symbolizing the bond of love.pptx
Rakhi sets symbolizing the bond of love.pptx
 
Implementing Exponential Accelerators.pptx
Implementing Exponential Accelerators.pptxImplementing Exponential Accelerators.pptx
Implementing Exponential Accelerators.pptx
 
digital marketing , introduction of digital marketing
digital marketing , introduction of digital marketingdigital marketing , introduction of digital marketing
digital marketing , introduction of digital marketing
 
WSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdfWSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdf
 
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
 
71368-80-4.pdf Fast delivery good quality
71368-80-4.pdf Fast delivery  good quality71368-80-4.pdf Fast delivery  good quality
71368-80-4.pdf Fast delivery good quality
 
Unveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesUnveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic Experiences
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deck
 
MEP Plans in Construction of Building and Industrial Projects 2024
MEP Plans in Construction of Building and Industrial Projects 2024MEP Plans in Construction of Building and Industrial Projects 2024
MEP Plans in Construction of Building and Industrial Projects 2024
 
Welding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan DynamicsWelding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan Dynamics
 
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
 
Simplify Your Funding: Quick and Easy Business Loans
Simplify Your Funding: Quick and Easy Business LoansSimplify Your Funding: Quick and Easy Business Loans
Simplify Your Funding: Quick and Easy Business Loans
 

SAARIKOSKI YLE metadata machine

  • 1. Identifying use cases and evaluating ML technology Case: the Metadata Machine Project at Yle Lauri Saarikoski, Yle, the Finnish Broadcasting Company Matt Eaton, GrayMeta
  • 3. Metadata Machine, FIAT/IFTA 2019 Industry Drivers: Content Lorem ipsum congue Volume Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum congue Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum congue Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Need for More Content Metadata Volume Increasing Personalised Content Expected New Ways of Consuming Content More Distributio n Channels
  • 4. Metadata Machine, FIAT/IFTA 2019 Industry Drivers: Technology Generating & Using New Content Metadata Easier Rapidly Maturing Machine Learning Services Growing Adoption of Cloud Services APIs Enabling Easier Integration
  • 5. Metadata Machine, FIAT/IFTA 2019 Archives as part of a Media Company, case Yle Media Production ● inhouse ● production houses Media Distribution & Publishing ● Online, TV, Radio ● 3rd party platforms Yle Archives [that’s us, 61 people]
  • 6. Metadata Machine, FIAT/IFTA 2019 Data in and out of the archives Yle Archives Production Distribution Existing collections 5,6 PB of media, 4,7 M metadata objects, annually 12k legacy objects get some metadata added Audience analytics, publishing metadata Annual hours: 47k audio, 19k linear video Media, content descriptions Annual hours: Video 6,7k Audio 27k + photos, music
  • 7. Metadata Machine, FIAT/IFTA 2019 Help from Automated Metadata? 1. Increasing Volumes of content being archived 2. Demand for Archive content increasing, as distribution to multiple platforms require content 3. Data entry is compromised by number of resources available to manually enter data 4. Only a small % of the archive is enriched with detailed metadata 5. New types of metadata needed e.g. Audio Classification & facial detection 6. [Historical] Content which lack metadata entirely 7. Compromise of metadata Quality due to time constraints. Why look into machine learning in the first place?
  • 8. Metadata Machine, FIAT/IFTA 2019 Series of Pilots at Yle Archives 2016 2017 2018 2019 ● Music identification ● Automated keyword extraction market review ● Video recognition technical pilots ● Image recognition market review & piloting ● Face recognition technical pilot ● Speech recognition technical pilot ● Speech recognition on demand MVP ● Speech recognition & automated keyword extraction MVP● IPR related to data mining, memad.eu ● Study groups on AI ● Metadata Machine ● Facial recognition data as UX in production systems
  • 9. Pilot projects reduce complexity After each pilot, .. we know more about what to do next and what questions still need looking into. .. our people have a more realistic view on what to expect and what to discuss. .. our people are more involved in designing their work and [maybe] find new tools more acceptable than earlier. Metadata Machine, FIAT/IFTA 2019 People Technology Organisation Metadata Conventions
  • 10. The Metadata Machine Pre-study / PoC Spring 2019
  • 11. All audiovisual content is analysed as early as possible Vision: The Metadata Machine Content creation (raw material) Procurement (ready-made content) Publishing (published content) Archiving (what do we have?) Automatic content analysis engine Speech recognition Image recognition Person identifier Fingerprinting Sound identifying Video frame color analysis Music identifier Text analysis Company-wide metadata database on all content items Language identifier ...
  • 12. Metadata Machine, FIAT/IFTA 2019 Approach of this Pre-study Project Business Cases Company Wide Solution Multiple AME services What’s available today Single Solution Single Providers What could be available Our Focus Avoided How, when and why to get in to production? Technological Details How to build the road to future Yet another POC
  • 13. Metadata Machine, FIAT/IFTA 2019 Business Units Involved and Covered Today Production Management Photo Archive Video / Audio Archives Audience Insight OnDemand (Yle Areena) News SportRadio / Audio Master Control Room & Media Logistics Senior Leadership Translation & Versioning Team Architecture & Technology
  • 14. Metadata Machine, FIAT/IFTA 2019 Project Timeline and Structure 1. Buy a metadata machine (Graymeta Curio) 2. Involve the whole company to identify potential use cases for automatic metadata 3. Each team tries to solve their use cases with the machine 4. Collect the results from the teams, identify the most prominent business cases 5. Decide on the next step, e.g. investment test round 1 test round 2 test round 3 analysis & next steps February September
  • 15. ©2019 GrayMeta. All rights reserved. PROPRIETARY & CONFIDENTIAL Curio connects all of your content, creating a single interface & API to find and use any file. Utilize automated metadata to help find and use your content.
  • 16. ©2019 GrayMeta. All rights reserved. PROPRIETARY & CONFIDENTIAL
  • 17. Metadata Machine, FIAT/IFTA 2019 Metadata Machine by the Numbers 637 hours of content 2193 assets22 Yle Testers 7 different ML providers used 23 Use cases analysed 37 audio files 873 images 1281 videos 2 application files5 insight groups 45 Different Automatic Metadata Extractors 100,000+ API calls
  • 18. Metadata Machine, FIAT/IFTA 2019 Identifying Use Cases for Automated Metadata 100+ ideas 10+ proof of concepts 1+ to production 10+ to production Metadata Machine pre-study project 2019
  • 19. Metadata Machine, FIAT/IFTA 2019 Approaching the Business Value What kind of metadata does it require? Does it improve existing processes? Does it enable something completely new? Does it save money or time? Does it increase customer satisfaction? Is the technology solution available today? What are the direct and indirect costs involved? How to optimize the costs? How does it affect the surrounding production process / way of working? How to combine human work with automation? What are the success criterias / KPIs … ? ...
  • 21. [After being refined during and after the project] Metadata Machine, FIAT/IFTA 2019 Main Archive Use Cases 1. Enriching existing metadata for archive content and adding new types of metadata • It also can be used to improve search functionality 2. New possibilities to browse and navigate through the Archive collection • New faceted navigation functionality • Ability to effectively browse through and filter archive based on enriched content. 3. Enhancing the metadata creation workflows by increasing automation levels. • Automation allows for focus on metadata quality • Automation allows for the team to manage the increase in demand and content being archived.
  • 22. Metadata Machine, FIAT/IFTA 2019 What Did We Do? Aim: To test various Automated Metadata Services as to understand where the metadata creation workflows can be enhanced, by increasing automation levels; as well as, being able to generate metadata for collections which are currently lacking. Services Tested: ● OCR - Optical Character Recognition ● Speech to Text ● Tags & Descriptions Methods Used: ● Variations on Extractor Settings & Thresholds, to refine results and remove false positives. ● Varied content from old Black and White footage to newer content - to understand the value across the entire archive. Selected test videos from Archive to test against various Machine Learning (ML) Services Configured relevant Machine Learning (ML) Services to Process Photos. Refined and Tested different extractors and confidence thresholds. Compared results and built conclusions on the validity of services for the Archive Team 1 2 3 4 ● Facial Detection ● Audio Classification ● Logo Detection ● Curio’s User Interface ● Curio’s API
  • 23. Sample Facial and Logo Recognition Selected test material from photo archive to test against Identified & Trained Persons of interest within AI Studio. Identified & Trained Logos of interest within AI Studio. Configured relevant Machine Learning (ML) Services to Process Images. Analysed the impact on Search & Discovery through Facial Detection Analysed the impact on Search & Discovery through Tags, Descriptions & OCR Analysed the impact on Search & Discovery through Logo Detection Reviewed and reported on the success & failures of ML for Images 1 5 2 6 7 3 8 4 Metadata Machine, FIAT/IFTA 2019
  • 24. Metadata Machine, FIAT/IFTA 2019 Sample Speech to Text (ASR)
  • 25. Metadata Machine, FIAT/IFTA 2019 Sample OCR
  • 26. Metadata Machine, FIAT/IFTA 2019 Sample Audio Detection - Radio Content
  • 27. Metadata Machine, FIAT/IFTA 2019 Sample Audio Detection - Video Content
  • 28. Findings and conclusions Learned during this and previous projects
  • 29. Metadata Machine, FIAT/IFTA 2019 Conclusion 1: Identified Basic Analysis Bundles for Audio and Video Drafted based on ● User needs ● Technology readiness level ● Availability of technology ● Expected impact vs. costs
  • 30. Basic Audio analysis bundle: Speech/music segmentation + ASR + automated tagging ● later expand with speaker identification, spoken language identification etc. ● Can also be applied to video Basic Video analysis bundle: Audio analysis bundle + facial recognition + OCR ● generic video analysis and natural language descriptions are not ready enough Metadata Machine, FIAT/IFTA 2019 Conclusion 1: Identified Basic Analysis Bundles for Audio and Video
  • 31. Metadata Machine, FIAT/IFTA 2019 2: Importance of Focusing on Use Cases - Not Only Technology Technology Requirement: “We should use ML on archive content to get metadata” Define Use Case How Business Will Use Technology “We should get a better sense of what happens during a single archive program” a. As a user I can navigate within a program based on what is discussed in it. b. As a user I can navigate within a program based on topics discussed in it. c. As a user I can see in the ASR result who is speaking and where there is music in the program.
  • 32. Without a use case, it is hard to know how well you perform. What to measure? ● For technical tests: technical criteria (WER, recall etc.) ● For a use case: added value “What effect did this have on your work?” “Did it help?” Success criteria depends on the case you are solving, not on technical metrics. Metadata Machine, FIAT/IFTA 2019 2: Importance of Focusing on Use Cases - Not Only Technology
  • 33. 3: Strategies for Integrating Machine Learning into Human Processes 1. Use the data as it comes from ML services → When people won’t be looking at the data & the quality is good enough? OR 2. Interact with the ML services and the data → If the ROI of human effort makes sense? OR 3. Look at the ML data but enter the metadata manually → When automation helps you grasp the context but end results need to be high quality? Metadata Machine, FIAT/IFTA 2019
  • 34. Humans Media storage Metadata Machine, FIAT/IFTA 2019 Human tasks, system roles Metadata storage - Across Content Supply Chain (not just Archive) UX for testing and comparing ML services Process orchestration ML services Metadata unifier & mapper ML services ML services ML services ML services Search & Browse UX + Curate the ML models, metadata mappings etc. + Use the metadata and review it’s quality Systems + Review ML services and build on them UX for managing ML Machine Learning Services
  • 35. Metadata Machine, FIAT/IFTA 2019 Future Roadmap Short Term Medium Term Long Term For Archive Content with No Metadata - deploy Audio & Visual ML Bundles Introduce face recognition in a subset of Archive Further Investigate Audio Classification Speaker Recognition Based on Audio Increase Automation of metadata forms Process Rest of Archive Introduce Machine Learning for High Re-Use Archive Content Perhaps in collaboration with Sports Production or Analytics Identify Speaker dependent on new machine learning services Use Machine Learning to generate content metadata on portion of archive where no metadata exists Detect Music vs Speech to speed up assessing rights clearance and focus ASR use Augment existing manually created metadata forms for archived content with automated machine learning
  • 36. Metadata Machine, FIAT/IFTA 2019 ● For piloting purposes the content volumes can be kept low ○ Someone typically has to go through the results afterwards ● Involve the wider Archive team to help evaluate the use cases and to increase the sense of involvement ○ This will lead into a better understanding of technology among your staff ○ Expectation management: getting rid of fears and hype ● Distributed work is great for involving people, but needs careful management ○ The project team can support and help with e.g. evaluation criteria and setup & facilitate discussions ● Reserve enough time for setting up and result analysis ○ Setting up the systems and logistics take their own time ○ Running the analyses is fast, making sense of the results is slow. ○ Before you start: ● narrow down the ‘thing’ you are testing / evaluating ● decide on your methodology / approach ● choose suitable content for testing ○ Reporting results and conclusions is easier if the project setting supports this from the beginning Practical advice for running a project
  • 37. Final Thoughts After deciding your first Use Case you can.. - Define success criteria for this case and measure it - Build a roadmap to realise and enhance your case - Find right components to realise your case - Decide on the type of human curation needed for your case Some types of Machine Learning services can already be put to good use. Iterative approach seems to work well since everything is changing and you need to start your journey sooner rather than later. Please share your ideas for smart types of human participation in the ML assisted work! Metadata Machine, FIAT/IFTA 2019
  • 38. Contact information: Lauri Saarikoski Development Manager at Yle, the Finnish Broadcasting Company lauri.saarikoski@yle.fi Matt Eaton Managing Director, EMEA at GrayMeta Inc matt.eaton@graymeta.com Thank you! Metadata Machine, FIAT/IFTA 2019

Hinweis der Redaktion

  1. Increasing number of platforms for distribution Driving the demand for more content That can be personalised and found quickly by consumers Who are looking to consume that content in new ways (e.g. atomised news stories)
  2. Content machine learning services are rapidly maturing and available from a growing list of service providers APIs are allowing metadata services to be integrated more easily with production systems Growing adoption of cloud services providing processing power and rapid innovation.
  3. Number of hours / items has been steady for the last few years. (out of the 170k objects marked as “not finished”) Larger % of new productions can be archived since the adoption of “mass archiving” in 2015 → less manual corrections and moderation to the metadata, more QC type approach on the metadata coming in from productions Manual vs. mass: 2014 12k/0 2015 12k/5k 2016 12k/5k 2017 7,5k/9k 2018 7k/11k
  4. Small pilots on different areas, build-measure-learn iteration by design Different teams from the archives involved in pilots, hands on experience gained by a large number of personnel Active dialogue with researchers and tech providers
  5. Mention client names - Fox, Sky News, Vice Media, Channel 4
  6. [Fully automated]: Use the data as it comes from ML services → When people won’t be looking at the data & the quality is good enough? OR [Human in the loop]: Interact with the ML services and the data → If the ROI of human effort makes sense? OR [Manual based on automation] Look at the ML data but enter the metadata manually → When automation helps you grasp the context but end results need to be high quality? Housekeeping: Curate the ML models and metadata mappings etc.
  7. Which ones do you need for your pilot? Which parts can aggregated services provide? What are you evaluating?
  8. Using Machine Learning Generated Metadata would: Semi-automate content tagging providing capacity to deal with higher volumes Create new types of metadata (e.g. facial recognition, audio classification) that will help content search & discovery Require a review of archive processes to combine human and machine learning data creation / curation