This document discusses natural interaction methods for augmented reality applications. It begins with a brief history of augmented reality and defines its key characteristics. It then explores various interaction metaphors for AR, including tangible user interfaces, augmented surfaces, and tangible AR. The document outlines architectures for multimodal gesture and speech interaction. It presents examples of intelligent interfaces and virtual agents. Finally, it discusses promising directions for future AR research, such as mobile gesture interaction, wearable systems like Google Glass, and novel displays including contact lenses.
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Natural Interaction for Augmented Reality Applications
1. Natural Interaction for Augmented
Reality Applications
Mark Billinghurst
mark.billinghurst@hitlabnz.org
The HIT Lab NZ, University of Canterbury
November 28th 2013
3. Augmented Reality Definition
Defining Characteristics
Combines Real and Virtual Images
- Both can be seen at the same time
Interactive in real-time
- The virtual content can be interacted with
Registered in 3D
- Virtual objects appear fixed in space
Azuma, R. T. (1997). A survey of augmented reality. Presence, 6(4), 355-385.
6. AR Interaction Metaphors
Information Browsing
View AR content
3D AR Interfaces
3D UI interaction techniques
Augmented Surfaces
Tangible UI techniques
Tangible AR
Tangible UI input + AR output
7. Tangible User Interfaces
Use physical objects to
interact with digital content
Foreground
graspable user interface
Background
ambient interfaces
Ishii, H., & Ullmer, B. (1997). Tangible bits: towards seamless interfaces between
people, bits and atoms. In Proceedings of the ACM SIGCHI Conference on Human
factors in computing systems (pp. 234-241). ACM.
8. TUI Benefits and Limitations
Pros
Physical objects make us smart
Objects aid collaboration
Objects increase understanding
Cons
Difficult to change object properties
Limited display capabilities – 2D view
Separation between object and display
9. Tangible AR Metaphor
AR overcomes limitation of TUIs
enhance display possibilities
merge task/display space
provide public and private views
TUI + AR = Tangible AR
Apply TUI methods to AR interface design
10. VOMAR Demo (Kato 2000)
AR Furniture Arranging
Elements + Interactions
Book:
- Turn over the page
Paddle:
- Push, shake, incline, hit, scoop
Kato, H., Billinghurst, M., et al. 2000. Virtual Object Manipulation on a Table-Top AR
Environment. In Proceedings of the International Symposium on Augmented Reality
(ISAR 2000), Munich, Germany, 111--119.
11. Lessons Learned
Advantages
Intuitive interaction, ease of use
Full 6 DOF manipulation
Disadvantages
Marker based tracking
- occlusion, limited tracking range, etc
Needs external interface objects
- Paddle, book, etc
16. AR MicroMachines
AR experience with environment awareness
and physically-based interaction
Based on MS Kinect RGB-D sensor
Augmented environment supports
occlusion, shadows
physically-based interaction between real and
virtual objects
Clark, A., & Piumsomboon, T. (2011). A realistic augmented reality racing game using a
depth-sensing camera. In Proceedings of the 10th International Conference on Virtual
Reality Continuum and Its Applications in Industry (pp. 499-502). ACM.
19. System Flow
The system flow consists of three sections:
Image Processing and Marker Tracking
Physics Simulation
Rendering
20. Physics Simulation
Create virtual mesh over real world
Update at 10 fps – can move real objects
Use by physics engine for collision detection (virtual/real)
Use by OpenScenegraph for occlusion and shadows
23. Natural Hand Interaction
Using bare hands to interact with AR content
MS Kinect depth sensing
Real time hand tracking
Physics based simulation model
24. Hand Interaction
Represent models as collections of spheres
Bullet physics engine for interaction with real world
25. Scene Interaction
Render AR scene with OpenSceneGraph
Using depth map for occlusion
Shadows yet to be implemented
27. Architecture
5. Gesture
• Static Gestures
• Dynamic Gestures
• Context based Gestures
o Supports PCL, OpenNI, OpenCV, and Kinect SDK.
o Provides access to depth, RGB, XYZRGB.
o Usage: Capturing color image, depth image and concatenated
point clouds from a single or multiple cameras
o For example:
4. Modeling
• Hand recognition/
modeling
• Rigid-body modeling
3. Classification/Tracking
2. Segmentation
1. Hardware Interface
Kinect for Xbox 360
Kinect for Windows
Asus Xtion Pro Live
28. Architecture
5. Gesture
• Static Gestures
• Dynamic Gestures
• Context based Gestures
o Segment images and point clouds based on color, depth and
space.
o Usage: Segmenting images or point clouds using color
models, depth, or spatial properties such as location, shape
and size.
o For example:
4. Modeling
• Hand recognition/
modeling
• Rigid-body modeling
Skin color segmentation
3. Classification/Tracking
2. Segmentation
1. Hardware Interface
Depth threshold
29. Architecture
5. Gesture
• Static Gestures
• Dynamic Gestures
• Context based Gestures
o Identify and track objects between frames based on
XYZRGB.
o Usage: Identifying current position/orientation of the tracked
object in space.
o For example:
4. Modeling
• Hand recognition/
modeling
• Rigid-body modeling
3. Classification/Tracking
2. Segmentation
1. Hardware Interface
Training set of hand
poses, colors
represent unique
regions of the hand.
Raw output (withoutcleaning) classified
on real hand input
(depth image).
30. Architecture
5. Gesture
• Static Gestures
• Dynamic Gestures
• Context based Gestures
4. Modeling
• Hand recognition/
modeling
• Rigid-body modeling
3. Classification/Tracking
2. Segmentation
1. Hardware Interface
o Hand Recognition/Modeling
Skeleton based (for low resolution
approximation)
Model based (for more accurate
representation)
o Object Modeling (identification and tracking rigidbody objects)
o Physical Modeling (physical interaction)
Sphere Proxy
Model based
Mesh based
o Usage: For general spatial interaction in AR/VR
environment
31. Architecture
5. Gesture
• Static Gestures
• Dynamic Gestures
• Context based Gestures
4. Modeling
• Hand recognition/
modeling
• Rigid-body modeling
3. Classification/Tracking
2. Segmentation
1. Hardware Interface
o Static (hand pose recognition)
o Dynamic (meaningful movement recognition)
o Context-based gesture recognition (gestures with context,
e.g. pointing)
o Usage: Issuing commands/anticipating user intention and high
level interaction.
32. Skeleton Based Interaction
3 Gear Systems
Kinect/Primesense Sensor
Two hand tracking
http://www.threegear.com
33. Skeleton Interaction + AR
HMD AR View
Viewpoint tracking
Two hand input
Skeleton interaction, occlusion
35. Multimodal Interaction
Combined speech input
Gesture and Speech complimentary
Speech
- modal commands, quantities
Gesture
- selection, motion, qualities
Previous work found multimodal interfaces
intuitive for 2D/3D graphics interaction
36. Free Hand Multimodal Input
Point
Move
Pick/Drop
Use free hand to interact with AR content
Recognize simple gestures
Lee, M., Billinghurst, M., Baek, W., Green, R., & Woo, W. (2013). A usability study of multimodal
input in an augmented reality environment. Virtual Reality, 17(4), 293-305.
41. User Evaluation
Change object shape, colour and position
Conditions
Speech only, gesture only, multimodal
Measure
performance time, error, subjective survey
42. Results
Average performance time (MMI, speech fastest)
Gesture: 15.44s
Speech: 12.38s
Multimodal: 11.78s
No difference in user errors
User subjective survey
Q1: How natural was it to manipulate the object?
- MMI, speech significantly better
70% preferred MMI, 25% speech only, 5% gesture only
44. Intelligent Interfaces
Most AR systems stupid
Don’t recognize user behaviour
Don’t provide feedback
Don’t adapt to user
Especially important for training
Scaffolded learning
Moving beyond check-lists of actions
45. Intelligent Interfaces
AR interface + intelligent tutoring system
ASPIRE constraint based system (from UC)
Constraints
- relevance cond., satisfaction cond., feedback
Westerfield, G., Mitrovic, A., & Billinghurst, M. (2013). Intelligent Augmented Reality Training for
Assembly Tasks. In Artificial Intelligence in Education (pp. 542-551). Springer Berlin Heidelberg.
49. Evaluation Results
16 subjects, with and without ITS
Improved task completion
Improved learning
50. Intelligent Agents
AR characters
Virtual embodiment of system
Multimodal input/output
Examples
AR Lego, Welbo, etc
Mr Virtuoso
- AR character more real, more fun
- On-screen 3D and AR similar in usefulness
Wagner, D., Billinghurst, M., & Schmalstieg, D. (2006). How real should virtual characters be?. In
Proceedings of the 2006 ACM SIGCHI international conference on Advances in computer
entertainment technology (p. 57). ACM.
52. Directions for Future Research
Mobile Gesture Interaction
Tablet, phone interfaces
Wearable Systems
Google Glass
Novel Displays
Contact lens
53. Mobile Gesture Interaction
Motivation
Richer interaction with handheld devices
Natural interaction with handheld AR
2D tracking
Finger tip tracking
3D tracking
[Hurst and Wezel 2013]
Hand tracking
[Henrysson et al. 2007]
Henrysson, A., Marshall, J., & Billinghurst, M. (2007). Experiments in 3D interaction for mobile phone AR.
In Proceedings of the 5th international conference on Computer graphics and interactive techniques in
Australia and Southeast Asia (pp. 187-194). ACM.
54. Fingertip Based Interaction
Running System
System Setup
Mobile Client + PC Server
Bai, H., Gao, L., El-Sana, J., & Billinghurst, M. (2013). Markerless 3D gesture-based interaction for
handheld augmented reality interfaces. In SIGGRAPH Asia 2013 Symposium on Mobile Graphics
and Interactive Applications (p. 22). ACM.
56. 3D Prototype System
3 Gear + Vuforia
Hand tracking + phone tracking
Freehand interaction on phone
Skeleton model
3D interaction
20 fps performance
59. User Experience
Truly Wearable Computing
Less than 46 ounces
Hands-free Information Access
Voice interaction, Ego-vision camera
Intuitive User Interface
Touch, Gesture, Speech, Head Motion
Access to all Google Services
Map, Search, Location, Messaging, Email, etc
60. Contact Lens Display
Babak Parviz
University Washington
MEMS components
Transparent elements
Micro-sensors
Challenges
Miniaturization
Assembly
Eye-safe
63. Conclusions
AR experiences need new interaction methods
Enabling technologies are advancing quickly
Displays, tracking, depth capture devices
Natural user interfaces possible
Free hand gesture, speech, intelligence interfaces
Important research for the future
Mobile, wearable, displays
64. More Information
• Mark Billinghurst
– Email: mark.billinghurst@hitlabnz.org
– Twitter: @marknb00
• Website
– http://www.hitlabnz.org/