SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
Automatic multi-modal
metadata annotation
based on trained
cognitive solutions
Jakob Rosinski
Lead Architect Video & Broadcast
IBM GBS Europe
Lead Architect Video & Broadcast, IBM GBS Europe
Member IBM Global Center of Competence Telco, Media & Entertainment
Member IBM Technical Expert Council Central (TEC CR)
Product Owner IBM AREMA
Jakob Rosinski is the Lead Architect for Video & Broadcast for IBM Global Business
Services Europe and also member of IBMs Global Center of Competence for Telecom,
Media & Entertainment. In this role he is also the product owner of IBM AREMA, a
workflow and essence management solution which is widely used at different
broadcasters for essence archives and workflow automation.
Over the last decade Jakob was responsible for various projects in the media industry
at HBO, France24, ORF, SRF, RTL Mediengruppe or Deutsche Bundesliga/Sportcast. He
is a subject matter expert for multi-site &multi-tier essence management and
workflow automation for ingest, archive, production & distribution.
Further he is well recognized in topics like cognitive content enrichment and broadcast
integration.
Dipl.-Inf. (M.Sc.) Jakob Rosinski
2
1. Introduction
2. Components
3. Training & Optimization
4. Analysis & Aggregation
5. Overall process & Integration
Agenda
3
Introduction
„Rich metadata is the key to content discovery and monetization. It powers
advanced video search and recommandation engines...“
FKTG Magazin 03/2017, S.84
5
Scene Detection / Segmentation
Deep Video-Analysis
 People-, Object and Context-Detection
 Classification of actors based on 24
emotions
 Classification of scenes based on 22.000
categories
Deep Audio-Analysis
 Background
 Actor sentiment and tone
Analysis of scene composition
 Classification of light and color
Analysis of succesful
trailers
https://www.youtube.com/watch?v=gJEzuYynaiw
6
7
Automatic content enrichment of 40+ years of soccer
content
 Annotation by usage of a portfolio of cognitive
solutions (IBM, FRH, Google, MS)
 Audio: Speech-to-text / Transcript
 Audio: Speaker-Detection
 Audio: Atmosphere (cheers, whistles, ..)
 Video: Angle/Camera & Context Detection
 Video: Face- & Object Detection
 Domain trained services including Traningsportal
 Sharpening of results by knowledge of domain and
creation of timelines, identifiying of concepts
Link with Game- and Playerdata
 Optimize content analysis and search based on game
and player statistics
 Guided search.
Persona-based User Experience
 Personalized Discovery, Suggestions, Design & Projects
Content enrichment for
Bundesliga archive
8
Components
Magical Metadata
10
Visual recognition allows us to understand the
contents of an image or video frame, answering the
question: “What is in this image?” Returns class, class
description, face detection, and text recognition.
Enhanced and automated
understanding of personalities
present in the frame, and objects
Speech to text / Audiomining lets us transcribe audio
into text by leveraging machine intelligence to combine
information about grammar and language structure with
knowledge of the composition of the audio signal.
Activate decade-old material by
running it through the STT API and
then performing deeper analytics
Deeper understanding of concepts,
recognized entities, keywords, and
relationships
Natural Language Undestanding delivers several
tools to distill text and dialogue into fundamental
concepts of relevance, like: Concepts, Document-Level
Emotions, Sentiment, Entities, Keywords, Language, etc
Target
Deeply enriched
content second-
to-second
Search for image and videodata for
not trained objects or contexts.
Pattern Detection & Similarity Search indexes visual
content bases on patterns and makes a similarity
search available
Magical Metadata
11
Visual recognition allows us to understand the
contents of an image or video frame, answering the
question: “What is in this image?” Returns class, class
description, face detection, and text recognition.
Enhanced and automated
understanding of personalities
present in the frame, and objects
Speech to text / Audiomining lets us transcribe audio
into text by leveraging machine intelligence to combine
information about grammar and language structure with
knowledge of the composition of the audio signal.
Activate decade-old material by
running it through the STT API and
then performing deeper analytics
Deeper understanding of concepts,
recognized entities, keywords, and
relationships
Natural Language Undestanding delivers several
tools to distill text and dialogue into fundamental
concepts of relevance, like: Concepts, Document-Level
Emotions, Sentiment, Entities, Keywords, Language, etc
Target
Deeply enriched
content second-
to-second
Search for image and videodata for
not trained objects or contexts.
Pattern Detection & Similarity Search indexes visual
content bases on patterns and makes a similarity
search available
IBM Watson Visual Recognition
Visual Recognition understands the contents of images - visual concepts
tag the image, find human faces, approximate age and gender, and find
similar images in a collection. You can also train the service by creating
your own custom concepts. Use Visual Recognition to detect a dress
type in retail, identify spoiled fruit in inventory, and more.
 Image Recognition
 Text Recognition
 Face- & Persondetection
 Pattern search / Collection
 Trainable
12
13
IBM Watson Visual Recognition
IBM Watson
Visual
Recognition –
A Multi-layered
trainable
architecture for
image analysis
• Need to learn effective semantic classifiers using a wide diversity of audio-visual features and models
• Need to design a rich space of semantic concepts that captures multiple facets of audio-visual content
FeaturesColor
Background
Frequencies SpectrumEdges
Camera
Motion
Energy Zero-crossings
Models
P P P P
P P P
P
PP
Positive
Examples
Negative
Examples
N N N N
N N N
N
NN
Labeled Data
Unlabeled Data
Addaboost
K-means
Regression
Bayes Net
Nearest
Neighbor
Neural Net
Deep Belief Nets
GMMClustering
Markov
ModelDecision TreeExpectation
Maximization
Factor Graph
Shot
Boundaries
Semantics
Multimedia Data
Scenes
Locations
Settings Objects
Activities
Actions
Objects
Actions
Behaviors
People
Objects
Living
CarsAnimals
People
Vehicles
Activities
Scenes
People
Places Faces
Objects
Events
Activities
GMMSVMs
ShapeTexture
Ensemble
Classifiers
Motion
Moving
Objects
Active
Learning
Regions
Scene
Dynamics
Tracks
14
Microsoft Cognitive Services
 Image Recognition
This feature returns information about visual content found in an image.
Use tagging, descriptions and domain-specific models to identify
content and label it with confidence. Apply the adult/racy settings to
enable automated restriction of adult content. Identify image types and
color schemes in pictures.
 Text Recognition
Optical Character Recognition (OCR) detects text in an image and
extracts the recognized words into a machine-readable character
stream. Analyze images to detect embedded text, generate character
streams and enable searching. Allow users to take photos of text
instead of copying to save time and effort.
 Face- & Persondetection
The Celebrity Model is an example of Domain Specific Models. Our
new celebrity recognition model recognizes 200K celebrities from
business, politics, sports and entertainment around the World. Domain-
specific models is a continuously evolving feature within Computer
Vision API.
 Emotiondetection
15
Google Vision
Google Cloud Vision API enables developers to understand
the content of an image by encapsulating powerful machine
learning models in an easy to use REST API. It quickly
classifies images into thousands of categories (e.g., "sailboat",
"lion", "Eiffel Tower"), detects individual objects and faces
within images, and finds and reads printed words contained
within images. You can build metadata on your image catalog,
moderate offensive content, or enable new marketing scenarios
through image sentiment analysis. Analyze images uploaded
in the request or integrate with your image storage on Google
Cloud Storage.
 Imagerecognition
 Textrecognition
 Facedetection
 Emotiondetection
 Textanalyzes (nicht deutsch)
16
OpenCV
OpenCV is released under a BSD license and hence it’s free for both academic and commercial use. It has C++, C, Python and
Java interfaces and supports Windows, Linux, Mac OS, iOS and Android. OpenCV was designed for computational efficiency and
with a strong focus on real-time applications. Written in optimized C/C++, the library can take advantage of multi-core processing.
Enabled with OpenCL, it can take advantage of the hardware acceleration of the underlying heterogeneous compute platform.
Adopted all around the world, OpenCV has more than 47 thousand people of user community and estimated number of
downloads exceeding 14 million. Usage ranges from interactive art, to mines inspection, stitching maps on the web or through
advanced robotics.
 Imagerecognition
 Face- &Persondetection
 Trainierbar
17
Clarifai Image and Video Recognition API
Predict / Classify
 Predict analyzes your images and tells you what's inside of them.
 The API will return a list of concepts with corresponding
probabilities of how likely it is these concepts are contained within
the image
Search
 The Search API allows you to send images (url or bytes) to the
service and have them indexed by 'general' model concepts and
their visual representations.
 Once indexed, you can search for images by concept or using
reverse image search.
Train
 Clarifai provides many different models that 'see' the world
differently. A model contains a group of concepts. A model will
only see the concepts it contains.
18
Imagga Auto-Tagging
Imagga is an Image Recognition
Platform-as-a-Service providing
Image Tagging APIs for
developers & businesses to
build scalable, image intensive
cloud apps.
19
Magical Metadata
20
Visual recognition allows us to understand the
contents of an image or video frame, answering the
question: “What is in this image?” Returns class, class
description, face detection, and text recognition.
Enhanced and automated
understanding of personalities
present in the frame, and objects
Speech to text / Audiomining lets us transcribe audio
into text by leveraging machine intelligence to combine
information about grammar and language structure with
knowledge of the composition of the audio signal.
Activate decade-old material by
running it through the STT API and
then performing deeper analytics
Deeper understanding of concepts,
recognized entities, keywords, and
relationships
Natural Language Undestanding delivers several
tools to distill text and dialogue into fundamental
concepts of relevance, like: Concepts, Document-Level
Emotions, Sentiment, Entities, Keywords, Language, etc
Target
Deeply enriched
content second-
to-second
Search for image and videodata for
not trained objects or contexts.
Pattern Detection & Similarity Search indexes visual
content bases on patterns and makes a similarity
search available
Fraunhofer IAIS Audiomining
 Segmentation
 Speaker- and Languagedetection
 Emotiondetection
 Trainable
 Keywordextraction
Alternatives
 IBM Watson Speech2Text (see later)
 Microsoft Cognitive Services – Bing Speech
 Google Speech
21
22
{"segments": [
…
{
"segmentNumber": 1,
"startTime": 4480,
"duration": 3190,
"endTime": 7670,
"speaker": 1,
"gender": "female",
"transcript": "Hier ist das erste deutsche Fernsehen mit der Tagesschau."
},
...
{
"segmentNumber": 20,
"startTime": 238980,
"duration": 23620,
"endTime": 262600,
"speaker": 2,
"gender": "male",
"transcript": "Großbritannien raus aus der Europäischen Union für viele unvorstellbar
das weiß auch der britische Premierminister Cameron und er nutzt es um die EU Partner
unter Druck zu setzen entweder das Staatenbündnis ist zu Reformen bereit oder bei der
geplanten Volksabstimmung über die EU Mitgliedschaft droht ein Nein heute hatte EU
Ratspräsident Tosca ein Kompromisspapier vorgelegt dass die Briten besänftigen soll."
},
Fraunhofer IAIS Audiomining
IBM Watson Speech to Text
23
https://www-
03.ibm.com/press
/us/en/pressrelea
se/51790.wss
24
Magical Metadata
25
Visual recognition allows us to understand the
contents of an image or video frame, answering the
question: “What is in this image?” Returns class, class
description, face detection, and text recognition.
Enhanced and automated
understanding of personalities
present in the frame, and objects
Speech to text/ Audiomining lets us transcribe audio
into text by leveraging machine intelligence to combine
information about grammar and language structure with
knowledge of the composition of the audio signal.
Activate decade-old material by
running it through the STT API and
then performing deeper analytics
Deeper understanding of concepts,
recognized entities, keywords, and
relationships
Natural Language Undestanding delivers several
tools to distill text and dialogue into fundamental
concepts of relevance, like: Concepts, Document-Level
Emotions, Sentiment, Entities, Keywords, Language, etc
Target
Deeply enriched
content second-
to-second
Search for image and videodata for
not trained objects or contexts.
Pattern Detection & Similarity Search indexes visual
content bases on patterns and makes a similarity
search available
IBM Watson Natural Language Unterstanding (NLU)
Extraction of
• Sentiment
• Emotion
• Keywords
• Entities
• Categories
• Concepts
• Semantic Roles
26
Magical Metadata
27
Visual recognition allows us to understand the
contents of an image or video frame, answering the
question: “What is in this image?” Returns class, class
description, face detection, and text recognition.
Enhanced and automated
understanding of personalities
present in the frame, and objects
Speech to text / Audiomining lets us transcribe audio
into text by leveraging machine intelligence to combine
information about grammar and language structure with
knowledge of the composition of the audio signal.
Activate decade-old material by
running it through the STT API and
then performing deeper analytics
Deeper understanding of concepts,
recognized entities, keywords, and
relationships
Natural Language Undestanding delivers several
tools to distill text and dialogue into fundamental
concepts of relevance, like: Concepts, Document-Level
Emotions, Sentiment, Entities, Keywords, Language, etc
Target
Deeply enriched
content second-
to-second
Search for image and videodata for
not trained objects or contexts.
Pattern Detection & Similarity Search indexes visual
content bases on patterns and makes a similarity
search available
Visual Atoms
FIND is a high-speed, high-accuracy, image visual search solution.
Our state-of-the-art visual search engine enables the matching of images
depicting the same objects or scenes based on visual similarities, without the
need for manual annotations or metadata.
If you are a provider of image editing or management solutions, the
FIND engine will equip your product with the necessary tools for the creation
of image databases which are searchable using images as queries. Your
end users will be able to create and maintain their own image databases and
efficiently organise, manage and search their image assets.
For providers of image hosting solutions, the FIND engine will allow the
creation of image databases which users can search using visual queries.
For developers of mobile apps, such as for e-commerce, tourism
or entertainment, the FIND engine will give your app cloud-based and/or
terminal based visual search functionality for retrieval of relevant images
and associated information.
With a streamlined API, the FIND engine is designed so that it can be
easily integrated in any third-party application or workflow.
Alternatives: IBM Watson VR Collections, Clarifai Search
28
Training & Optimization
...
Why is training necessary?
30
Visual Recognition - Training
31
Domain- specific model
32
Domain- specific model - Trainer
33
...
Optimization
of keyframe
extraction –
not good
extraction /
use
adaptive
extraction
34
...
Analysis & Aggregation
Cognitive modell for
German Soccer League
Archive
36
Metadaten
(Technisch, Statistik, Ticker,
etc.)
Essenzen
(Audio, Video, Keyframes,
etc.)
Analyse verschiedener Ordnung
(Audiomining, Bilderkennung, Gesichtserkennung,
Mustererkennung, etc.)
Timelines verschiedener Ordnung
(Atmosphäre, Kontext, Perspektive, Personen, etc.)
Cognitive model for German Soccer League Archive
– multi-modal analyzes
37
38
Cognitive model for German Soccer League Archive
– example for timeline of first order
Just uses
results from
analysis
39
Cognitive model for German Soccer League Archive
– example for timeline of second order
Uses results
from analyzes
as well as other
timelines
40
Cognitive model for German Soccer League Archive
– example for timeline of third order
Uses results
from analyzes
as well as other
timelines
41
Camera Timeline
Speed Timeline
Cognitive Aggregator for
Timelines
42
Normal: 60 %
Spidercam: 80%
SlowMo: 55 %
CloseUp: 83%
Normal: 67 %
Goalline: 77%
Normal: 83 %
Spidercam: 76%
Normal: 87 %
Spidercam: 77%
Reduce and sharpen from 20 analysis
events to 4
Combine
Timelines
Combine and
Sharpen SlowMo
Combine
Timelines
Combine Timelines and Frames
due to near similarity
+20 %
Overall process & Integration
IBM AREMA & Watson at Hackdays/SRF
„Die Zukunft der Mediennutzung“
44
Involving now:
• Watson VR - ClassifyImage
• Watson VR - DetectFaces
• Watson VR - RecognizeText
• Watson Speech2Text
• Alchemy API
Used to find
meaningful
content from
SRGs Archives
45
IBM AREMA & Watson at Hackdays/SRF
„Die Zukunft der Mediennutzung“
46
IBM AREMA & Watson at Hackdays/SRF
„Die Zukunft der Mediennutzung“
Cognitive
Process with
Trainer,
Analysis
Workflow and
Aggregator
47
Cognitive
Analysis
Workflow
Cognitive
Trainer
Cognitive
Aggregator
Image
Classifier
Inbox
Taxonomy
Database
Image
Classifier
Repository
Media
Ingestion
Metadata
Repository
(MAM)
1
2
3
4
5
6
1. Configure Taxonomy (add
Classifiers, Categories, etc.)
2. Show and organize classifier
images
3. Move good classifiers to
repository to optimize training
4. Use classifier repository to train
services and perform custom
analysis
5. Move actual frame to inbox
when confidence ok
6. Use taxonomy for rule creation
Future?
Upcoming:
Watson For Media,
announced in April 2017
at
First use cases available
at IBC in September 2017
49

Weitere ähnliche Inhalte

Ähnlich wie Automatic multi-modal metadata annotation based on trained cognitive solutions - Rosinski, Jakob

How can you get started with machine learning
How can you get started with machine learning How can you get started with machine learning
How can you get started with machine learning Omar Badawi
 
Using Cognitive Services
Using Cognitive ServicesUsing Cognitive Services
Using Cognitive ServicesEng Teong Cheah
 
Machine learning, WTF!?
Machine learning, WTF!? Machine learning, WTF!?
Machine learning, WTF!? Alê Borba
 
Intelligent Apps - Amplifying Human Ingenuity
Intelligent Apps - Amplifying Human IngenuityIntelligent Apps - Amplifying Human Ingenuity
Intelligent Apps - Amplifying Human IngenuityDavid J Rosenthal
 
Using Azure, AI and IoT to find out if the person next to you is a Cylon
Using Azure, AI and IoT to find out if the person next to you is a CylonUsing Azure, AI and IoT to find out if the person next to you is a Cylon
Using Azure, AI and IoT to find out if the person next to you is a CylonTodd Whitehead
 
Microsoft Azure beyond IaaS
Microsoft Azure  beyond IaaSMicrosoft Azure  beyond IaaS
Microsoft Azure beyond IaaSBipeen Sinha
 
Microsoft AI Overview: Cognitive Services
Microsoft AI Overview: Cognitive ServicesMicrosoft AI Overview: Cognitive Services
Microsoft AI Overview: Cognitive ServicesAI Leadership Institute
 
Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...
Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...
Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...Codemotion
 
O365Con19 - Sharepoint with (Artificial) Intelligence - Adis Jugo
O365Con19 - Sharepoint with (Artificial) Intelligence - Adis JugoO365Con19 - Sharepoint with (Artificial) Intelligence - Adis Jugo
O365Con19 - Sharepoint with (Artificial) Intelligence - Adis JugoNCCOMMS
 
Microsoft Cognitive Services at a Glance
Microsoft Cognitive Services at a GlanceMicrosoft Cognitive Services at a Glance
Microsoft Cognitive Services at a GlanceMarvin Heng
 
Prior AI consulting use cases
Prior AI consulting use casesPrior AI consulting use cases
Prior AI consulting use casesHarendra Singh
 
20160813 102-59-kim youngwook
20160813 102-59-kim youngwook20160813 102-59-kim youngwook
20160813 102-59-kim youngwookitproman35
 
.NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш...
.NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш....NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш...
.NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш...NETFest
 
unleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptx
unleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptxunleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptx
unleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptxUsama Wahab Khan Cloud, Data and AI
 
Rita Arrigo, Microsoft
Rita Arrigo, Microsoft Rita Arrigo, Microsoft
Rita Arrigo, Microsoft Hilary Ip
 
Inteligencia artificial para todos
Inteligencia artificial para todosInteligencia artificial para todos
Inteligencia artificial para todosJuan Nieto García
 

Ähnlich wie Automatic multi-modal metadata annotation based on trained cognitive solutions - Rosinski, Jakob (20)

How can you get started with machine learning
How can you get started with machine learning How can you get started with machine learning
How can you get started with machine learning
 
Using Cognitive Services
Using Cognitive ServicesUsing Cognitive Services
Using Cognitive Services
 
Machine learning, WTF!?
Machine learning, WTF!? Machine learning, WTF!?
Machine learning, WTF!?
 
Intelligent Apps - Amplifying Human Ingenuity
Intelligent Apps - Amplifying Human IngenuityIntelligent Apps - Amplifying Human Ingenuity
Intelligent Apps - Amplifying Human Ingenuity
 
Using Azure, AI and IoT to find out if the person next to you is a Cylon
Using Azure, AI and IoT to find out if the person next to you is a CylonUsing Azure, AI and IoT to find out if the person next to you is a Cylon
Using Azure, AI and IoT to find out if the person next to you is a Cylon
 
Microsoft Azure beyond IaaS
Microsoft Azure  beyond IaaSMicrosoft Azure  beyond IaaS
Microsoft Azure beyond IaaS
 
Microsoft AI Overview: Cognitive Services
Microsoft AI Overview: Cognitive ServicesMicrosoft AI Overview: Cognitive Services
Microsoft AI Overview: Cognitive Services
 
Design Day Workshop
Design Day WorkshopDesign Day Workshop
Design Day Workshop
 
Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...
Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...
Gianni Rosa Gallina - Where and how can AI be used in a real-world multimedia...
 
Guru_poster
Guru_posterGuru_poster
Guru_poster
 
Azure beyond IaaS
Azure  beyond IaaSAzure  beyond IaaS
Azure beyond IaaS
 
O365Con19 - Sharepoint with (Artificial) Intelligence - Adis Jugo
O365Con19 - Sharepoint with (Artificial) Intelligence - Adis JugoO365Con19 - Sharepoint with (Artificial) Intelligence - Adis Jugo
O365Con19 - Sharepoint with (Artificial) Intelligence - Adis Jugo
 
Microsoft Cognitive Services at a Glance
Microsoft Cognitive Services at a GlanceMicrosoft Cognitive Services at a Glance
Microsoft Cognitive Services at a Glance
 
Prior AI consulting use cases
Prior AI consulting use casesPrior AI consulting use cases
Prior AI consulting use cases
 
AI NOTES.docx
AI NOTES.docxAI NOTES.docx
AI NOTES.docx
 
20160813 102-59-kim youngwook
20160813 102-59-kim youngwook20160813 102-59-kim youngwook
20160813 102-59-kim youngwook
 
.NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш...
.NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш....NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш...
.NET Fest 2018. Олександр Краковецький. Microsoft AI: створюємо програмні ріш...
 
unleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptx
unleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptxunleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptx
unleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptx
 
Rita Arrigo, Microsoft
Rita Arrigo, Microsoft Rita Arrigo, Microsoft
Rita Arrigo, Microsoft
 
Inteligencia artificial para todos
Inteligencia artificial para todosInteligencia artificial para todos
Inteligencia artificial para todos
 

Mehr von FIAT/IFTA

2021 FIAT/IFTA Timeline Survey
2021 FIAT/IFTA Timeline Survey2021 FIAT/IFTA Timeline Survey
2021 FIAT/IFTA Timeline SurveyFIAT/IFTA
 
20211021 FIAT/IFTA Most Wanted List
20211021 FIAT/IFTA Most Wanted List20211021 FIAT/IFTA Most Wanted List
20211021 FIAT/IFTA Most Wanted ListFIAT/IFTA
 
WARBURTON FIAT/IFTA Timeline Survey results 2020
WARBURTON FIAT/IFTA Timeline Survey results 2020WARBURTON FIAT/IFTA Timeline Survey results 2020
WARBURTON FIAT/IFTA Timeline Survey results 2020FIAT/IFTA
 
OOMEN MEZARIS ReTV
OOMEN MEZARIS ReTVOOMEN MEZARIS ReTV
OOMEN MEZARIS ReTVFIAT/IFTA
 
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)FIAT/IFTA
 
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉCULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉFIAT/IFTA
 
HULSENBECK Value Use and Copyright Comission initiatives
HULSENBECK Value Use and Copyright Comission initiativesHULSENBECK Value Use and Copyright Comission initiatives
HULSENBECK Value Use and Copyright Comission initiativesFIAT/IFTA
 
WILSON Film digitisation at BBC Scotland
WILSON Film digitisation at BBC ScotlandWILSON Film digitisation at BBC Scotland
WILSON Film digitisation at BBC ScotlandFIAT/IFTA
 
GOLODNOFF We need to make our past accessible!
GOLODNOFF We need to make our past accessible!GOLODNOFF We need to make our past accessible!
GOLODNOFF We need to make our past accessible!FIAT/IFTA
 
LORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal depositLORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal depositFIAT/IFTA
 
BIRATUNGANYE Shock of formats
BIRATUNGANYE Shock of formatsBIRATUNGANYE Shock of formats
BIRATUNGANYE Shock of formatsFIAT/IFTA
 
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...FIAT/IFTA
 
BERGER RIPPON BBC Music memories
BERGER RIPPON BBC Music memoriesBERGER RIPPON BBC Music memories
BERGER RIPPON BBC Music memoriesFIAT/IFTA
 
AOIBHINN and CHOISTIN Rehash your archive
AOIBHINN and CHOISTIN Rehash your archiveAOIBHINN and CHOISTIN Rehash your archive
AOIBHINN and CHOISTIN Rehash your archiveFIAT/IFTA
 
HULSENBECK BLOM A blast from the past open up
HULSENBECK BLOM A blast from the past open upHULSENBECK BLOM A blast from the past open up
HULSENBECK BLOM A blast from the past open upFIAT/IFTA
 
PERVIZ Automated evolvable media console systems in digital archives
PERVIZ Automated evolvable media console systems in digital archivesPERVIZ Automated evolvable media console systems in digital archives
PERVIZ Automated evolvable media console systems in digital archivesFIAT/IFTA
 
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AIAICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AIFIAT/IFTA
 
VINSON Accuracy and cost assessment for archival video transcription methods
VINSON Accuracy and cost assessment for archival video transcription methodsVINSON Accuracy and cost assessment for archival video transcription methods
VINSON Accuracy and cost assessment for archival video transcription methodsFIAT/IFTA
 
LYCKE Artificial intelligence, hype or hope?
LYCKE Artificial intelligence, hype or hope?LYCKE Artificial intelligence, hype or hope?
LYCKE Artificial intelligence, hype or hope?FIAT/IFTA
 
AZIZ BABBUCCI Let's play with the archive
AZIZ BABBUCCI Let's play with the archiveAZIZ BABBUCCI Let's play with the archive
AZIZ BABBUCCI Let's play with the archiveFIAT/IFTA
 

Mehr von FIAT/IFTA (20)

2021 FIAT/IFTA Timeline Survey
2021 FIAT/IFTA Timeline Survey2021 FIAT/IFTA Timeline Survey
2021 FIAT/IFTA Timeline Survey
 
20211021 FIAT/IFTA Most Wanted List
20211021 FIAT/IFTA Most Wanted List20211021 FIAT/IFTA Most Wanted List
20211021 FIAT/IFTA Most Wanted List
 
WARBURTON FIAT/IFTA Timeline Survey results 2020
WARBURTON FIAT/IFTA Timeline Survey results 2020WARBURTON FIAT/IFTA Timeline Survey results 2020
WARBURTON FIAT/IFTA Timeline Survey results 2020
 
OOMEN MEZARIS ReTV
OOMEN MEZARIS ReTVOOMEN MEZARIS ReTV
OOMEN MEZARIS ReTV
 
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
 
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉCULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
 
HULSENBECK Value Use and Copyright Comission initiatives
HULSENBECK Value Use and Copyright Comission initiativesHULSENBECK Value Use and Copyright Comission initiatives
HULSENBECK Value Use and Copyright Comission initiatives
 
WILSON Film digitisation at BBC Scotland
WILSON Film digitisation at BBC ScotlandWILSON Film digitisation at BBC Scotland
WILSON Film digitisation at BBC Scotland
 
GOLODNOFF We need to make our past accessible!
GOLODNOFF We need to make our past accessible!GOLODNOFF We need to make our past accessible!
GOLODNOFF We need to make our past accessible!
 
LORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal depositLORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal deposit
 
BIRATUNGANYE Shock of formats
BIRATUNGANYE Shock of formatsBIRATUNGANYE Shock of formats
BIRATUNGANYE Shock of formats
 
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
 
BERGER RIPPON BBC Music memories
BERGER RIPPON BBC Music memoriesBERGER RIPPON BBC Music memories
BERGER RIPPON BBC Music memories
 
AOIBHINN and CHOISTIN Rehash your archive
AOIBHINN and CHOISTIN Rehash your archiveAOIBHINN and CHOISTIN Rehash your archive
AOIBHINN and CHOISTIN Rehash your archive
 
HULSENBECK BLOM A blast from the past open up
HULSENBECK BLOM A blast from the past open upHULSENBECK BLOM A blast from the past open up
HULSENBECK BLOM A blast from the past open up
 
PERVIZ Automated evolvable media console systems in digital archives
PERVIZ Automated evolvable media console systems in digital archivesPERVIZ Automated evolvable media console systems in digital archives
PERVIZ Automated evolvable media console systems in digital archives
 
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AIAICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
 
VINSON Accuracy and cost assessment for archival video transcription methods
VINSON Accuracy and cost assessment for archival video transcription methodsVINSON Accuracy and cost assessment for archival video transcription methods
VINSON Accuracy and cost assessment for archival video transcription methods
 
LYCKE Artificial intelligence, hype or hope?
LYCKE Artificial intelligence, hype or hope?LYCKE Artificial intelligence, hype or hope?
LYCKE Artificial intelligence, hype or hope?
 
AZIZ BABBUCCI Let's play with the archive
AZIZ BABBUCCI Let's play with the archiveAZIZ BABBUCCI Let's play with the archive
AZIZ BABBUCCI Let's play with the archive
 

Kürzlich hochgeladen

Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 

Kürzlich hochgeladen (20)

Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 

Automatic multi-modal metadata annotation based on trained cognitive solutions - Rosinski, Jakob

  • 1. Automatic multi-modal metadata annotation based on trained cognitive solutions Jakob Rosinski Lead Architect Video & Broadcast IBM GBS Europe
  • 2. Lead Architect Video & Broadcast, IBM GBS Europe Member IBM Global Center of Competence Telco, Media & Entertainment Member IBM Technical Expert Council Central (TEC CR) Product Owner IBM AREMA Jakob Rosinski is the Lead Architect for Video & Broadcast for IBM Global Business Services Europe and also member of IBMs Global Center of Competence for Telecom, Media & Entertainment. In this role he is also the product owner of IBM AREMA, a workflow and essence management solution which is widely used at different broadcasters for essence archives and workflow automation. Over the last decade Jakob was responsible for various projects in the media industry at HBO, France24, ORF, SRF, RTL Mediengruppe or Deutsche Bundesliga/Sportcast. He is a subject matter expert for multi-site &multi-tier essence management and workflow automation for ingest, archive, production & distribution. Further he is well recognized in topics like cognitive content enrichment and broadcast integration. Dipl.-Inf. (M.Sc.) Jakob Rosinski 2
  • 3. 1. Introduction 2. Components 3. Training & Optimization 4. Analysis & Aggregation 5. Overall process & Integration Agenda 3
  • 4. Introduction „Rich metadata is the key to content discovery and monetization. It powers advanced video search and recommandation engines...“ FKTG Magazin 03/2017, S.84
  • 5. 5
  • 6. Scene Detection / Segmentation Deep Video-Analysis  People-, Object and Context-Detection  Classification of actors based on 24 emotions  Classification of scenes based on 22.000 categories Deep Audio-Analysis  Background  Actor sentiment and tone Analysis of scene composition  Classification of light and color Analysis of succesful trailers https://www.youtube.com/watch?v=gJEzuYynaiw 6
  • 7. 7
  • 8. Automatic content enrichment of 40+ years of soccer content  Annotation by usage of a portfolio of cognitive solutions (IBM, FRH, Google, MS)  Audio: Speech-to-text / Transcript  Audio: Speaker-Detection  Audio: Atmosphere (cheers, whistles, ..)  Video: Angle/Camera & Context Detection  Video: Face- & Object Detection  Domain trained services including Traningsportal  Sharpening of results by knowledge of domain and creation of timelines, identifiying of concepts Link with Game- and Playerdata  Optimize content analysis and search based on game and player statistics  Guided search. Persona-based User Experience  Personalized Discovery, Suggestions, Design & Projects Content enrichment for Bundesliga archive 8
  • 10. Magical Metadata 10 Visual recognition allows us to understand the contents of an image or video frame, answering the question: “What is in this image?” Returns class, class description, face detection, and text recognition. Enhanced and automated understanding of personalities present in the frame, and objects Speech to text / Audiomining lets us transcribe audio into text by leveraging machine intelligence to combine information about grammar and language structure with knowledge of the composition of the audio signal. Activate decade-old material by running it through the STT API and then performing deeper analytics Deeper understanding of concepts, recognized entities, keywords, and relationships Natural Language Undestanding delivers several tools to distill text and dialogue into fundamental concepts of relevance, like: Concepts, Document-Level Emotions, Sentiment, Entities, Keywords, Language, etc Target Deeply enriched content second- to-second Search for image and videodata for not trained objects or contexts. Pattern Detection & Similarity Search indexes visual content bases on patterns and makes a similarity search available
  • 11. Magical Metadata 11 Visual recognition allows us to understand the contents of an image or video frame, answering the question: “What is in this image?” Returns class, class description, face detection, and text recognition. Enhanced and automated understanding of personalities present in the frame, and objects Speech to text / Audiomining lets us transcribe audio into text by leveraging machine intelligence to combine information about grammar and language structure with knowledge of the composition of the audio signal. Activate decade-old material by running it through the STT API and then performing deeper analytics Deeper understanding of concepts, recognized entities, keywords, and relationships Natural Language Undestanding delivers several tools to distill text and dialogue into fundamental concepts of relevance, like: Concepts, Document-Level Emotions, Sentiment, Entities, Keywords, Language, etc Target Deeply enriched content second- to-second Search for image and videodata for not trained objects or contexts. Pattern Detection & Similarity Search indexes visual content bases on patterns and makes a similarity search available
  • 12. IBM Watson Visual Recognition Visual Recognition understands the contents of images - visual concepts tag the image, find human faces, approximate age and gender, and find similar images in a collection. You can also train the service by creating your own custom concepts. Use Visual Recognition to detect a dress type in retail, identify spoiled fruit in inventory, and more.  Image Recognition  Text Recognition  Face- & Persondetection  Pattern search / Collection  Trainable 12
  • 13. 13 IBM Watson Visual Recognition
  • 14. IBM Watson Visual Recognition – A Multi-layered trainable architecture for image analysis • Need to learn effective semantic classifiers using a wide diversity of audio-visual features and models • Need to design a rich space of semantic concepts that captures multiple facets of audio-visual content FeaturesColor Background Frequencies SpectrumEdges Camera Motion Energy Zero-crossings Models P P P P P P P P PP Positive Examples Negative Examples N N N N N N N N NN Labeled Data Unlabeled Data Addaboost K-means Regression Bayes Net Nearest Neighbor Neural Net Deep Belief Nets GMMClustering Markov ModelDecision TreeExpectation Maximization Factor Graph Shot Boundaries Semantics Multimedia Data Scenes Locations Settings Objects Activities Actions Objects Actions Behaviors People Objects Living CarsAnimals People Vehicles Activities Scenes People Places Faces Objects Events Activities GMMSVMs ShapeTexture Ensemble Classifiers Motion Moving Objects Active Learning Regions Scene Dynamics Tracks 14
  • 15. Microsoft Cognitive Services  Image Recognition This feature returns information about visual content found in an image. Use tagging, descriptions and domain-specific models to identify content and label it with confidence. Apply the adult/racy settings to enable automated restriction of adult content. Identify image types and color schemes in pictures.  Text Recognition Optical Character Recognition (OCR) detects text in an image and extracts the recognized words into a machine-readable character stream. Analyze images to detect embedded text, generate character streams and enable searching. Allow users to take photos of text instead of copying to save time and effort.  Face- & Persondetection The Celebrity Model is an example of Domain Specific Models. Our new celebrity recognition model recognizes 200K celebrities from business, politics, sports and entertainment around the World. Domain- specific models is a continuously evolving feature within Computer Vision API.  Emotiondetection 15
  • 16. Google Vision Google Cloud Vision API enables developers to understand the content of an image by encapsulating powerful machine learning models in an easy to use REST API. It quickly classifies images into thousands of categories (e.g., "sailboat", "lion", "Eiffel Tower"), detects individual objects and faces within images, and finds and reads printed words contained within images. You can build metadata on your image catalog, moderate offensive content, or enable new marketing scenarios through image sentiment analysis. Analyze images uploaded in the request or integrate with your image storage on Google Cloud Storage.  Imagerecognition  Textrecognition  Facedetection  Emotiondetection  Textanalyzes (nicht deutsch) 16
  • 17. OpenCV OpenCV is released under a BSD license and hence it’s free for both academic and commercial use. It has C++, C, Python and Java interfaces and supports Windows, Linux, Mac OS, iOS and Android. OpenCV was designed for computational efficiency and with a strong focus on real-time applications. Written in optimized C/C++, the library can take advantage of multi-core processing. Enabled with OpenCL, it can take advantage of the hardware acceleration of the underlying heterogeneous compute platform. Adopted all around the world, OpenCV has more than 47 thousand people of user community and estimated number of downloads exceeding 14 million. Usage ranges from interactive art, to mines inspection, stitching maps on the web or through advanced robotics.  Imagerecognition  Face- &Persondetection  Trainierbar 17
  • 18. Clarifai Image and Video Recognition API Predict / Classify  Predict analyzes your images and tells you what's inside of them.  The API will return a list of concepts with corresponding probabilities of how likely it is these concepts are contained within the image Search  The Search API allows you to send images (url or bytes) to the service and have them indexed by 'general' model concepts and their visual representations.  Once indexed, you can search for images by concept or using reverse image search. Train  Clarifai provides many different models that 'see' the world differently. A model contains a group of concepts. A model will only see the concepts it contains. 18
  • 19. Imagga Auto-Tagging Imagga is an Image Recognition Platform-as-a-Service providing Image Tagging APIs for developers & businesses to build scalable, image intensive cloud apps. 19
  • 20. Magical Metadata 20 Visual recognition allows us to understand the contents of an image or video frame, answering the question: “What is in this image?” Returns class, class description, face detection, and text recognition. Enhanced and automated understanding of personalities present in the frame, and objects Speech to text / Audiomining lets us transcribe audio into text by leveraging machine intelligence to combine information about grammar and language structure with knowledge of the composition of the audio signal. Activate decade-old material by running it through the STT API and then performing deeper analytics Deeper understanding of concepts, recognized entities, keywords, and relationships Natural Language Undestanding delivers several tools to distill text and dialogue into fundamental concepts of relevance, like: Concepts, Document-Level Emotions, Sentiment, Entities, Keywords, Language, etc Target Deeply enriched content second- to-second Search for image and videodata for not trained objects or contexts. Pattern Detection & Similarity Search indexes visual content bases on patterns and makes a similarity search available
  • 21. Fraunhofer IAIS Audiomining  Segmentation  Speaker- and Languagedetection  Emotiondetection  Trainable  Keywordextraction Alternatives  IBM Watson Speech2Text (see later)  Microsoft Cognitive Services – Bing Speech  Google Speech 21
  • 22. 22 {"segments": [ … { "segmentNumber": 1, "startTime": 4480, "duration": 3190, "endTime": 7670, "speaker": 1, "gender": "female", "transcript": "Hier ist das erste deutsche Fernsehen mit der Tagesschau." }, ... { "segmentNumber": 20, "startTime": 238980, "duration": 23620, "endTime": 262600, "speaker": 2, "gender": "male", "transcript": "Großbritannien raus aus der Europäischen Union für viele unvorstellbar das weiß auch der britische Premierminister Cameron und er nutzt es um die EU Partner unter Druck zu setzen entweder das Staatenbündnis ist zu Reformen bereit oder bei der geplanten Volksabstimmung über die EU Mitgliedschaft droht ein Nein heute hatte EU Ratspräsident Tosca ein Kompromisspapier vorgelegt dass die Briten besänftigen soll." }, Fraunhofer IAIS Audiomining
  • 23. IBM Watson Speech to Text 23
  • 25. Magical Metadata 25 Visual recognition allows us to understand the contents of an image or video frame, answering the question: “What is in this image?” Returns class, class description, face detection, and text recognition. Enhanced and automated understanding of personalities present in the frame, and objects Speech to text/ Audiomining lets us transcribe audio into text by leveraging machine intelligence to combine information about grammar and language structure with knowledge of the composition of the audio signal. Activate decade-old material by running it through the STT API and then performing deeper analytics Deeper understanding of concepts, recognized entities, keywords, and relationships Natural Language Undestanding delivers several tools to distill text and dialogue into fundamental concepts of relevance, like: Concepts, Document-Level Emotions, Sentiment, Entities, Keywords, Language, etc Target Deeply enriched content second- to-second Search for image and videodata for not trained objects or contexts. Pattern Detection & Similarity Search indexes visual content bases on patterns and makes a similarity search available
  • 26. IBM Watson Natural Language Unterstanding (NLU) Extraction of • Sentiment • Emotion • Keywords • Entities • Categories • Concepts • Semantic Roles 26
  • 27. Magical Metadata 27 Visual recognition allows us to understand the contents of an image or video frame, answering the question: “What is in this image?” Returns class, class description, face detection, and text recognition. Enhanced and automated understanding of personalities present in the frame, and objects Speech to text / Audiomining lets us transcribe audio into text by leveraging machine intelligence to combine information about grammar and language structure with knowledge of the composition of the audio signal. Activate decade-old material by running it through the STT API and then performing deeper analytics Deeper understanding of concepts, recognized entities, keywords, and relationships Natural Language Undestanding delivers several tools to distill text and dialogue into fundamental concepts of relevance, like: Concepts, Document-Level Emotions, Sentiment, Entities, Keywords, Language, etc Target Deeply enriched content second- to-second Search for image and videodata for not trained objects or contexts. Pattern Detection & Similarity Search indexes visual content bases on patterns and makes a similarity search available
  • 28. Visual Atoms FIND is a high-speed, high-accuracy, image visual search solution. Our state-of-the-art visual search engine enables the matching of images depicting the same objects or scenes based on visual similarities, without the need for manual annotations or metadata. If you are a provider of image editing or management solutions, the FIND engine will equip your product with the necessary tools for the creation of image databases which are searchable using images as queries. Your end users will be able to create and maintain their own image databases and efficiently organise, manage and search their image assets. For providers of image hosting solutions, the FIND engine will allow the creation of image databases which users can search using visual queries. For developers of mobile apps, such as for e-commerce, tourism or entertainment, the FIND engine will give your app cloud-based and/or terminal based visual search functionality for retrieval of relevant images and associated information. With a streamlined API, the FIND engine is designed so that it can be easily integrated in any third-party application or workflow. Alternatives: IBM Watson VR Collections, Clarifai Search 28
  • 30. ... Why is training necessary? 30
  • 31. Visual Recognition - Training 31
  • 33. Domain- specific model - Trainer 33
  • 34. ... Optimization of keyframe extraction – not good extraction / use adaptive extraction 34 ...
  • 36. Cognitive modell for German Soccer League Archive 36 Metadaten (Technisch, Statistik, Ticker, etc.) Essenzen (Audio, Video, Keyframes, etc.) Analyse verschiedener Ordnung (Audiomining, Bilderkennung, Gesichtserkennung, Mustererkennung, etc.) Timelines verschiedener Ordnung (Atmosphäre, Kontext, Perspektive, Personen, etc.)
  • 37. Cognitive model for German Soccer League Archive – multi-modal analyzes 37
  • 38. 38 Cognitive model for German Soccer League Archive – example for timeline of first order Just uses results from analysis
  • 39. 39 Cognitive model for German Soccer League Archive – example for timeline of second order Uses results from analyzes as well as other timelines
  • 40. 40 Cognitive model for German Soccer League Archive – example for timeline of third order Uses results from analyzes as well as other timelines
  • 41. 41
  • 42. Camera Timeline Speed Timeline Cognitive Aggregator for Timelines 42 Normal: 60 % Spidercam: 80% SlowMo: 55 % CloseUp: 83% Normal: 67 % Goalline: 77% Normal: 83 % Spidercam: 76% Normal: 87 % Spidercam: 77% Reduce and sharpen from 20 analysis events to 4 Combine Timelines Combine and Sharpen SlowMo Combine Timelines Combine Timelines and Frames due to near similarity +20 %
  • 43. Overall process & Integration
  • 44. IBM AREMA & Watson at Hackdays/SRF „Die Zukunft der Mediennutzung“ 44
  • 45. Involving now: • Watson VR - ClassifyImage • Watson VR - DetectFaces • Watson VR - RecognizeText • Watson Speech2Text • Alchemy API Used to find meaningful content from SRGs Archives 45 IBM AREMA & Watson at Hackdays/SRF „Die Zukunft der Mediennutzung“
  • 46. 46 IBM AREMA & Watson at Hackdays/SRF „Die Zukunft der Mediennutzung“
  • 47. Cognitive Process with Trainer, Analysis Workflow and Aggregator 47 Cognitive Analysis Workflow Cognitive Trainer Cognitive Aggregator Image Classifier Inbox Taxonomy Database Image Classifier Repository Media Ingestion Metadata Repository (MAM) 1 2 3 4 5 6 1. Configure Taxonomy (add Classifiers, Categories, etc.) 2. Show and organize classifier images 3. Move good classifiers to repository to optimize training 4. Use classifier repository to train services and perform custom analysis 5. Move actual frame to inbox when confidence ok 6. Use taxonomy for rule creation
  • 48. Future? Upcoming: Watson For Media, announced in April 2017 at First use cases available at IBC in September 2017
  • 49. 49