business environment micro environment macro environment.pptx
Harm Belt & Kees Janse - Lifelike Communication -front-end audio and video technologies
1. Lifelike CommunicationFront-end audio and video technologiesHarm Belt and Kees JansePhilips Research, Eindhoven, The NetherlandsiMinds, Ghent, Belgium, 16 December 2010.
2. Philips defined: we are… “…a global company of leading businesses creating value with meaningful innovations that improve people’s health and well-being.” Healthcare Lighting Consumer lifestyle
3.
4. A basic human needSocial support, belonging, love, friendship, intimacy, connection, sharing, being near friends and family, feeling secure, …
8. Lifelike communication - what is important? Speech! “Without video you talk, without audio you walk” Video? Definitely! Non-verbal communications But the quality must be high Intelligibility Clarity (fatigue) Eye contact “Lifelike” implies a large screen, hence a distance between people and sensors.
9. Lifelike communicationImportant applications Family and Friends connect Remote health care Doctor at hospital, patient at home Family member can join Remote patient monitoring Doctor and remote colleague during medical procedure
10. Telepresence systems High quality audio and video Multiple users However: Very expensive Little freedom to move Limited applicability Room is fully conditioned from an acoustic and illumination point of view.
11. PC video phone clients Free. Great for single user. However: Only small distance to sensors allowed. Audio and video quality in general is not sufficient.
12. Audio Video Enhancements for Communication scene analysis microphone(s) audio enhancement speaker(s) audio/video (de-)coding transmit / receive camera(s) video enhancement display(s)
13. Communication terminal – spatial sound & video a c .. b d Lifelike communication Spatial audio: no fatigue during simultaneous conversations Spatial video: eye contact Communication dynamics like in real life.
17. Speech clarity h[n] This jump in c[n] determines the speech clarity slope: reverberation time (T60) c[n] n
18. Speech clarity The clarity index is defined by the ratio between direct and diffuse sound. Clarity index of at least 7 dB needed to avoid listener’s fatigue. At 4 meters distance in a reverberant room (T60=800ms) this is very difficult to achieve. Direct sound is attenuated much, bad direct/diffuse ration -> fatigue Multi-microphone adaptive beamforming We achieve 7 dB even in reverberant rooms (T60=800ms)
20. Communication terminal - sound Two locations with mono connection One-to-one communication goes well. Technologies Full-duplex Acoustic Echo Cancellation Noise Suppression Clarity index improvement Adaptive beamforming Audio/video person localization .. audio enhancement .. audio/video tracking
21. Communication terminal - sound a Two locations with mono connection Multi-to-one communication: NOT OK There is only a mono sound connection. Far-end sound sources cannot be separated by listener creates fatigue b .. c
22. Communication terminal - sound a a Multiple locations with mono transmission Each terminal transmits a mono signal, and receives multiple signals. Multi-to-one communication goes well. In the near-end terminal Multiple loudspeakers Multi-channel Acoustic Echo Cancellation Spatial sound is achieved by sound panning Much reduced fatigue b b .. c c
23. Communication terminal – sound a Two locations with multichannel transmission Each terminal transmits and receives multiple signals. Multi-to-one communication goes well. In addition to all the previously mentioned technologies source separation needed Adaptive microphone array processing “virtual close talk microphones” b .. c Each microphone signal contains contributions from a, b, and c. We want to transmit a, b, and c separately.
24. Communication terminal – stereo sound c a .. b d a Source Separation (a/v tracker) Spatial sound reproduction decoder coder .. b
26. Eye contactToday’s issue Drawback of traditional display technologies for Telepresence: Lack of natural eye contact and directional gaze awareness; 2D displays do not offer the sense of physical presence. Two photo’s taken at the same time
33. Eye contact displayInput format for rendering Dual “image + depth” input 15 views (7 left + 7 right + 1 transition) 30
34. 31 Eye contact displayBased on lenticular lenses Natural eye gaze awareness: Offering multiple perspectives of the remote person using multi-view display design. Immersive feeling: 3D autostereoscopic technology to maximize the feeling of physical presence. (b) View from position B (a) View from position A
35. Communication terminal – spatial sound & video a c .. b d Experiences Communication dynamics feel like “real life” People can talk through each other, casual communication enabled (no discipline needed) Feels relaxed, less fatigue after longer time
36. Conclusions Lifelike communication important for Philips Family & friends Doctor & patient (& family) Doctor & doctor Lifelike communication Spatial sound and video is an important aspect Presented: The spatial sound & video communication terminal