The document discusses a project called ACAV that aims to make videos on the web more accessible through collaborative annotations. The consortium includes companies like Dailymotion and research groups working on multimedia, semantic web, and disabilities. The goals are to increase accessible videos through both automatic and manual annotations, and rendering annotations in multiple formats depending on users' needs. Preliminary studies explored requirements like different annotation granularities and outputs, and using auditory icons to convey video rhythm to blind users. The proposed ACAV system will include annotation schemas, a social network for annotations, integrated speech technologies, and authoring/rendering tools.
How AI, OpenAI, and ChatGPT impact business and software.
Towards Collaborative Annotation for Video Accessibility
1. Towards Collaborative
Annotation for
Video Accessibility
Pierre-Antoine Champin, Benoît Encelle,
Magali O. Beldame, Yannick Prié
Nick Evans and
Raphaël Troncy <raphael.troncy@eurecom.fr>
2. The consortium
Dailymotion (Paris, FR) : video sharing website
Promotes HTML 5 using the video tag, http://openvideo.dailymotion.com/
LIRIS (Lyon, FR) : CS research group
Silex Team: expertise in semantic web, annotation models, video annotation
and HCI for disabled people
EURECOM (Sophia Antipolis, FR) : research center in
communications systems
Multimedia team: expertise in multimedia analysis (speaker
diarization/recognition, speech recognition) and semantic web
INS HEA + school (Lyon, FR)
Experiences in physical disabilities: blindness, visual impairment, deafness
and hearing Loss
Blind and death high-school students
26/04/2010 - Towards Collaborative Annotations for Video Accessibility - W4A 2010, Raleigh, USA -2
3. Goals and Motivations
What is required to make video accessible on the Web?
How to increase the number of accessible videos?
Technologies:
Annotating: automatic (speech transcription) and manual (social
collaborative annotation tool)
Addressing: pointing to, retrieving, transmitting only parts of media
Rendering: video visualization for the impaired, Braille output
Expected benefits for:
disabled people, getting better access to video
video provider, reaching a wider audience
the Web in general, using semantic annotations
26/04/2010 - Towards Collaborative Annotations for Video Accessibility - W4A 2010, Raleigh, USA -3
4. Accessibility Features for Visually
Impaired and Blind People
Man’s actions Put on his shoes Walk in the street
Son’s actions Look his mother
Characters The mother, her son The son, the man The man and his friend
Scenery In the shop In the street
Annotations multimodal presentation
Annotations depends on video context
and user preferences
Audio Auditory Audio Braille
track icons description
26/04/2010 - Towards Collaborative Annotations for Video Accessibility - W4A 2010, Raleigh, USA -4
5. Accessibility Features for Deaf People
Mother‘s dialogues How are you ?
Son’s dialogues Hi mom Fine and you ?
Sound Car horn
Annotations presentation
Annotations depends on video cointext
and user preferences
Video Subtitles Surtitles
track
26/04/2010 - Towards Collaborative Annotations for Video Accessibility - W4A 2010, Raleigh, USA -5
6. Producing Video Annotations
Automatic annotations Social annotations
Speaker diarization
Who spoke and When? Annotation corrections,
Speech recognition enhancement
Transcription Audio description
(for visually impaired)
Annotations
Mother How are you ? Annotations
Son Ho mom Fine Mother How are you ?
Son Hi mom Fine and you ?
Sound Car horn
26/04/2010 - Towards Collaborative Annotations for Video Accessibility - W4A 2010, Raleigh, USA -6
7. Braille Rendering
The Advene prototype emulation views
Enriched
Media Player
Timeline
with typed
annotations
7
8. Preliminary study (1/2)
Semi-structured interviews with blind users (n=2)
Participant’s habits when watching programs with audio description
Audio description process
Multimodal presentations of descriptions
Requirements:
R1: generate additional descriptions and provide unobtrusive access
to descriptions (tactile access for blind Braille readers)
R2: descriptions at various level of granularity and verbosity
R3: use system’s multimodal output to provide two or more
descriptions (e.g. speech synthesis and Braille display)
26/04/2010 - Towards Collaborative Annotations for Video Accessibility - W4A 2010, Raleigh, USA -8
9. Preliminary study (2/2)
Goal: see whether we can use auditory icons to convey
the rhythm of the editing of a movie to blind users
e.g.: sound of a locomotive arriving from the right to convey the
concept of a traveling from right to left
Experiment and questionnaires (n=16+9)
Viewing with headsets of 5 min of Ratatouille,
http://www.imdb.com/title/tt0382932/
Results:
Rhythm and movie dynamic better perceived
Usefulness of auditory icons but must be limited (5 max) and be very
different from the main soundtrack of the movie
Editing cues: change of scenes, camera movement, flashback (e.g. NCIS)
Audio zoom (e.g. Survivor)
26/04/2010 - Towards Collaborative Annotations for Video Accessibility - W4A 2010, Raleigh, USA -9
10. ACAV Architecture
Benchmarking: Sphinx, HTK,
Julius
26/04/2010 - Towards Collaborative Annotations for Video Accessibility - W4A 2010, Raleigh, USA - 10
11. Media Fragments URI
Provide URI-based
mechanisms for uniquely
identifying fragments for
media objects on the Web,
such as video, audio, and
images.
Photo credit: Robert Freund
26/04/2010 - Towards Collaborative Annotations for Video Accessibility - W4A 2010, Raleigh, USA - 11
13. Conclusion
ACAV will bring:
Dedicated annotation schemas for video accessibility
Social network model for video annotations
Web integration of state of the art speech technologies
GUI models for authoring and rendering video
annotations
Media Fragments reference implementation
Open source Braille plugin for most used Web browsers
http://www.acavideo.fr/
26/04/2010 - Towards Collaborative Annotations for Video Accessibility - W4A 2010, Raleigh, USA - 13