An alternative tangible interface for manipulating 3D audio and multiple media

1
An alternative tangible interface for
manipulating 3D audio and multiple media.
MSc in Music Technology
September 2006
Timur M. Özsan
timur.ozsan@gmail.com
First Supervisor: Dr. Andy Hunt
Second Supervisor: Mr. Dave Malham

2
CONTENTS
Page
Abstract ……………………………………………………………. ……………….. 5
Acknowledgements ………………………………………………………………. 6
CHAPTER 1. INTRODUCTION ……………………………………………………… 7
1.1 Report Structure …………………………………………………….. 8
1.2 Aims and Objectives …………………………………………………….. 9
CHAPTER 2. LITERATURE SURVEY ……………………………………………… 11
2.1 Human Computer Interaction and Electronic Musical Instruments ……. 11
2.1.1 Haptic, Audible and Visual Feedback ……………………………………… 14
2.1.1.1 Graphical User Interface Design Issues ……………………… 16
2.1.2. Tangible Interfaces using Vision Systems ……………………………… 17
2.1.2.1 Audio D-Touch ……………………………………………………… 19
2.1.2.2 ReacTABLE ……………………………………………………… 20
2.1.2.3 Open Illusionist ……………………………………………………… 21
2.1.2.4 Others ……………………………………………………………… 22
2.2 Six Dimensional Electronic Hardware Interfaces ……………………… 23
2.3 Alternative tracking technologies ……………………………………… 24
2.3.1 Acoustic ……………………………………………………………………… 24
2.3.2 Laser ……………………………………………………………………… 24
2.3.3 Inertial and GPS ……………………………………………………………… 25
2.3.4 Electromagnetic ……………………………………………………………… 26
2.3.5 Reflection based Vision Systems ……………………………………… 28
2.3.6 Alternative Electronic Interfaces Summary ……………………………… 29
2.4 Existing Vision System Environments ……………………………………… 30
2.4.1 Eyesweb ……………………………………………………………………… 30
2.4.2 Clip ……………………………………………………………………… 32
2.4.3 Open Illusionist and fiducial library ……………………………………… 32
2.5 Protocol ……………………………………………………………………… 33
2.5.1 Open Sound Control (OSC) ……………………………………………… 33
2.6 Mapping ……………………………………………………………………… 34
2.6.1 Explicit Mapping Strategies ……………………………………………… 35
2.6.2 Implicit Mapping (Generative) ……………………………………………… 36
2.6.3 Mapping Layers ……………………………………………………………… 36
2.6.4 Metaphors for musical control ……………………………………………… 36
2.7 Audio Applications ……………………………………………………………… 37
2.7.1 Timbral Manipulation ……………………………………………………… 37
2.7.1.1 Multidimensional perceptual scaling ……………………………… 37
2.7.1.2 Tristimulus …………………………………………………….... 40
2.7.1.3 The Musician`s most common Use of Timbral Descriptors ……… 41
2.7.2 3D Ambisonic Manipulation ……………………………………………… 41
2.8 Literature Review Summary ……………………………………………… 42

3
CHAPTER 3. SYSTEM FEASIBILITY ……………………………………………… 43
3.1 Fiducial Tracking Algorithms ……………………………………………… 43
3.1.1 Video Positioning System Fiducials ……………………………………… 44
3.1.2 Reactable Fiducials ……………………………………………………… 45
3.1.3 Open Illusionist Fiducials ……………………………………………… 46
3.2 Open Illusionist Implementation ……………………………………………… 47
3.2.1 Open Illusionist Framework Setup ……………………………………… 47
3.2.1.1 C++ Compiler / Debugger ……………………………………… 47
3.2.1.2 Windows SDK ……………………………………………………… 47
3.2.1.3 WXwidgets ……………………………………………………… 48
3.2.1.4 Visual C++ Environment Setup ……………………………………… 48
3.2.1.5 Install and setup Open Illusionist ……………………………… 50
3.3 VPS Implementation ……………………………………………………… 51
3.3.1 Standard Meta Language Setup ……………………………………… 51
3.3.2 Printing Fiducials Using SML ……………………………………………… 52
3.3.3 Setting Up the Fiducial Tracking Workspace in Visual C++ 6.0 ……… 52
3.4 ReacTABLE Implementation ……………………………………………… 53
3.5 Feasibility Summary ……………………………………………………… 53
CHAPTER 4. OPEN ILLUSIONIST FIDUCIAL LIBRARY IMPLEMENTATION.... 54
4.1. Draw and Save Fiducial Program ……………………………………… 54
4.1.1 Create Fiducial Preliminaries ……………………………………………… 54
4.1.2 Using the correct CXImage conversion function ……………………… 55
4.1.3 Problems with Fiducial Image Distortion ……………………… 55
4.1.4 Negative Function ……………………………………………… 55
4.1.5 Creating a White background ……………………………………… 55
4.1.6 Flipping for correctly read code ……………………………………… 56
4.1.7 Multiple Fiducial Drawing ……………………………………… 57
4.2 Fiducial Tracking Program Development ……………………………... 58
4.2.1 Workspace Creation ……………………………………………………... 58
4.2.2 Frame Grabbing Process ……………………………………... 59
4.2.3 Seeking for Tangible Output Parameters ……………………………... 60
4.2.4 Printing X Position Values ……………………………………………… 60
4.2.5 Finding the Correct Data Output Location …………………………….... 62
4.2.6 Understanding the Outputs …………………………………………….… 63
4.2.7 X Y Position and Marker Direction …………………………………….… 63
4.3 Major and Minor Axes ………………………………………………….….. 64
4.3.1 dTheta ………………………………………………………………….….. 66
4.3.2 dAMatrix and dIAMatrix ……………………………………………………... 66
4.3.3 Flipping the Image in the Horizontal Plane ……………………………… 67
4.4 Fiducial Parameters List ……………………………………………………… 68
CHAPTER 5. PURE DATA CONNECTION ………………………………………. 69
5.1 Transferring data to Pure Data using system sockets ……………………. . 69
5.1.1 Single Fiducial Tracking ………………………………………. 69
5.1.2 Multiple Fiducial Tracking ………………………………………. 71

4
CHAPTER 6. APPLICATIONS ……………………………………………………….. 74
6.1 Sound Spatialization for Ambisonics ………………………………………. 74
6.1.1 A PAN Ctrl Ambisonic Panning Object ………………………………. 74
6.1.1.1 Simple Mapping ………………………………………………. 74
6.1.1.2 Spatial Mapping ………………………………………………. 77
6.1.2 Ambisonic Soundfield Zoom Control Object ………………………………. 81
6.1.3 Ambisonic Soundfield Orientation Manipulation Object …………………. 82
6.1.4 Volumetric Spatial Triggering ………………………………………………. 83
6.1.5 Timbral Manipulation ………………………………………………………. 87
6.1.5.1 Using the High Level Musician’s Use of Timbral Adjectives ……… 87
6.1.5.2 Timbral Space……………………………………………………………. 88
6.1.6 Musical Adventure …………………………………………………….... 89
6.1.7 Choir Spatialisation ………………………………………………………. 90
6.1.8 Musical Lego ………………………………………………………………. 91
CHAPTER 7. TECHNICAL CHALLENGES ………………………………………. 92
7.1 Application programming Interfaces for Web Cameras ………………. 92
7.2 Distortions ………………………………………………………. 92
7.3 Web Camera communication speed limitations…………………………….. 92
7.4 Multiple Fiducial Tracking issues ………………………………………. 93
CHAPTER 8. CONCLUSIONS AND FUTURE IMPROVEMENTS ……………… 94
8.1 Research conclusions ……………………………………………………… 94
8.2 Application conclusions ....…………………………………………………… 95
8.2.1 Volumetric Spatial Triggering ……………………………………………… 95
8.2.2 3D Ambisonic Panning ……………………………………………………… 96
8.3.3 Accuracy and Latency ……………………………………………………… 96
8.3 Future Improvements and Possibilities ……………………………………… 97
8.4 Summary ………………………………………………………………………….. 98
GLOSSARY ……………………………………………………………………………… 99
REFERENCES ………………………………..………………………………………… 100
BIBLIOGRAPHY ………..……………………………………………………………… 103
APPENDICES …………………………………………………………………………… 104
Appendix - A: Email correspondance ……………………………………… 104
Appendix - B: Programme Code ………………………………………………….. 110
Appendix - C: Cube Nets ……………………………………………………… 116
Appendix - D: VPS Fiducials ……………………………………………………… 120
Appendix - E: Global Pure Data Patches ……………………………………… 121
CONTACT DETAILS ……………………………………………………………… 123
DVD CONTENTS ……………………………………………………………………… 126

5
Abstract
This report describes the development of a new multidimensional interface which makes use of the
latest technology in robust pattern (fiducial) tracking algorithms using affordable web camera vision
systems.
The cube prototype allows accessible study for the interfacing design considerations required in
manipulating 3D audio. This project focuses on the areas of 3D sound spatialisation and
multidimensional timbral space with use of high level timbral descriptors. However,
experimentation with the interface has led to many more foreseeable applications in children’s toys
and affordable products for those with special needs.
The project provides a good technical base as well as inspiration and design consideration for future
audiovisual and musical projects using this technology.

6
Acknowledgements
I would like to say genuinely that I have never met such a nice group of people. The positive
attitudes of all mentioned really gave me great drive for the project. I found their views and
communication inspiring, and the whole process of being a communicator and drawing on specialist
knowledge has in turn helped me to communicate on a number of different levels.
Firstly I would like to thank my supervisor Andy Hunt, for inspiring this project, and for all
the humorous supervision sessions, and his unforgettable multi personality dramatised lectures.
Also to my second supervisor Dave Malham for his professional experience with Ambisonics. I
really don’t know their secret but somehow they know how to bring out the best in people, whether
you are on top of the world, or underneath it.
Special thanks to Matt Paradis for continuous support, and down to earth explanations
regarding the basics of C++ Object Oriented programming, also thanks for lending me your camera.
A big thanks to John Robinson for allowing me to take part in the Media ECAD taught tutorials to
learn the inner workings of CLIP and providing me with support to get the Fiducial Library up and
running and in sharing his ethics of wearable computing. I don’t believe that helping such a
persistent music technology Masters student is a usual occurrence for him.
Also to the PhD students in the Media Lab for putting up with me, providing me with feedback over
email and throughout the Open Illusionist forums , including Dan Parnham, Justen Hyde and LJ Li.
Thanks to Simon Shelley and Enrico Costanza for communication regarding Audio D Touch and
ReacTABLE fiducials.
A warm thanks to David Johnston, who I was in constant communication with about his
PhD. He sometimes stayed up late through the night to explain Faugeras theory to me, which I still
am struggling with. By the time you read this I will have already helped you revive your old castle.
Thanks also to Alistair Disley for communication and enjoyable seminar into the musicians use of
high level timbral descriptors. I hope to keep in contact with you about future developments of your
synthesis mappings.
Alistair Edwards, Edwin Hancock and Emine Gokce Aydal of the Computer Science
department for their knowledge and thoughts.
I would like to thank David Howard for the pleasant chat and introducing me to Harald Jers, who
inspired another application of 3D cubes with augmented reality for choir spatialisation.
Thanks to members of Eyesweb, especially Gualtiero Volpe for discussing the current developments
of Eyesweb.
Thanks to Ed Corrigan, for his visual support with the mathematics involved in the project,
and helping me to understand matrices.
Thanks to Ambrose Field and Tony Myatt for their thoughts about the interface and professional
musical advice.
A number of industrial contacts for information into alternative tracking technologies
including Andrew Ray and John Grist
Much thanks also to thank the following music technology masters students and friends for
their assistance. Peter Worth, Becky Stewart, Theo Burt and Rob Hinks.
I would like to thank my family for proof reading my report and checking in on my health now and
then.

7
CHAPTER 1. INTRODUCTION
The aim of the project is to explore the feasibility of a new multidimensional interface
providing a mapping console for multiple media and 3D audio applications. It challenges current
interfacing technologies by providing a robust, affordable web camera vision system that tracks
special patterns (fiducials) freely in 3D space.
As electronics and microprocessor technology become increasingly compact, the physical,
anthropometric limitations of human beings begin to play a greater role in the limiting size of
interfaces.
This project highlights the limitations of one dimensional and two dimensional interfaces
and provides a starting point for a truly multidimensional system.
Born into the monopoly of Microsoft, time is spent cursing two dimensional graphical user
interfaces with little regard to the input we are using.
With the globalisation of affordable home internet communication, and charities receiving
money from throw away mobile phones with video imaging technology, cameras are becoming
increasingly inexpensive.
Ongoing research into the field of machine vision is providing robust detection algorithms. This,
combined with developments into augmented reality using projectors paves a new direction for
human computer interfacing entirely, and rivals the use of wearable computers.
Developments in digital audio workstations combined with the plethora high quality free
virtual studio technology and affordability of computing has brought rise to a community of young
portable electroacoustic composers and performers. Now professional producers must compete with
teenagers that release hit records using software out of cereal packets. An intriguing example of live
performance is Kid Beyond, who uses Ableton Live, a microphone and his laptop to simultaneously
compose and perform live beatboxing. This example gives an idea of how much technology has
changed in the last 40 years.
However the increasing availability has also increased functionality such that the foreseeable
limitations are not so much about what can be done with the software, but how the mass of
parameters can be controlled creatively.
The concept of timbre categorisation spans many decades of research and development,
using a number of clustering and multidimensional scaling techniques. Timbral space is still
nebulous, and to affectively control and manipulate every aspect of timbre itself in realtime by the
use of a multidimensional controller requires much experimental research.
However there are more immediate applications such as for ambisonics where at present
composers are required to pan and position sounds using a number of on screen 2D potentiometers
and sliders which can be confusing and unrepresentative of the associated movements.
Experimentation and an appreciation of the interfaces interactive merits led to a number of
conceptual future projects which are not inconceivable. The challenge for these applications would
be to make the interface preferable to using the mouse or keyboard when dealing with
multidimensionality.

8
1.1 Report Structure
The report structure has been provided to give a short overview of each section in this report.
Chapter 2 is dedicated to the Literature Review which describes human computer interaction and
the need for multidimensional interfacing. It also examines parallel interfacing technologies and
outlines why pattern tracking is more robust than colour tracking. The literature review also
examines two different musical possibilities for the interface.
Chapter 3 details competing fiducial technology and gauges System Feasibility for three different
interfaces to decide the best solution. The three systems are Open Illusionist, the Visual Positioning
System and ReacTABLE.
Chapter 4 disseminates the Open Illusionist fiducial library gathering information about its output
parameters. This chapter takes an experimental approach to retrieve all the fiducial output
parameters deciding their functionality and flexibility.
Chapter 5 describes the Pure Data connection process necessary to link the fiducial library’s
parametric outputs with the pure data console via computer sockets. It also describes the
programming code needed to track multiple fiducials and the problems which occurred.
Chapter 6 provides the development procedure for the implemented applications and gives design
thought for those in the near future.
Chapter 7 outlines a number of technical challenges identifying the benefit of web cameras with
easily accessible Application Programming Interfaces and the speed limitations of the transfer
protocols used. It gives guidance as to how these problems can be alleviated.
Chapter 8 concludes all the findings throughout the project and gives future vision and ideas to
inspire a new generation of compositions, installations and interactive sonic works.

9
1.2 Aims and Objectives
This section describes the projects main aim which is then broken down into a number of objectives.
Figure 1.1 shows the areas of research to be covered.
Aim
To research into parallel products and gauge feasibility of a new inexpensive musical interface,
combining existing robust visual tracking algorithms for six degrees of freedom with a mapping
console which can be used by multiple media and audio applications.
Objectives
• Give an overview of relevant Human Computer Interfaces.
• Research thoroughly into the latest multidimensional tangible (tactile) user interfaces and
augmented reality technologies, paying attention to ‘the state of the art.’
• Survey literature regarding haptic feedback including articles into Virtual Reality showing
proof of the issues associated with non-tactile feedback in 3D space.
• Research into the sound synthesis output stage to scope vision for the end application.
• Decide which environment is the most feasible for programming a robust six degree of
freedom interface and understand how to program within it.
o Understand Fiducials by speaking to experts and reviewing literature.
o Program an object for use in Pure Data or Eyesweb to recognise fiducials, providing
6-dimensional output.
• Create a mapping console, mapping the degrees of freedom to sound parameters.
o Communicate with local experts regarding high level timbral descriptors.
o Communicate with local experts regarding sound spatialisation
• Experiment and use a piece of music to gauge the musical results.

10
Figure 1.1: Project research sections and central focus.

11
CHAPTER 2. LITERATURE SURVEY
This section aims to survey all the relevant literature regarding the factors of Human Computer
Interactivity pertaining to parallel interfaces and alternative technologies. It examines current and
emerging web camera interfaces and previously designed control interfaces. It discusses the
programming environments available for vision based systems in order to identify a robust system
for the applications of sound spatialisation and multidimensional timbral space using current
research.
2.1 Human Computer Interaction and Electronic Musical Instruments
The world of Human Computer Interfacing (HCI) and interaction of Electronic Musical Instruments
is massive, and therefore this section has been scoped to cover only the interactive aspects in
systems that sound artists, designers and studio engineers use. Figure 2.1 shows the linear structure
and separation of the components involved in HCI with Electronic Musical Instruments. These
components begin with the physical anthropometric limitations of human control which supports
research into the theory of gestural input to manipulate multiple parameters simultaneously.
Alternative input devices are evaluated comparing the benefits of electrical hardware magnetic
position and orientation tracking with machine vision technologies.
Visual, haptic and audible feedback from various devices is discussed. These include computer
monitors, the real world and projective augmented reality where images are projected onto real
world objects. This front end is however seemingly aimless as a musical instrument if it cannot be
mapped to sound output parameters, so an investigation of current research into communication
protocols and programming environments is required.
The mapping process is where ‘low level’ degrees of freedom are mapped to the many synthesis or
sound parameters. The mapping process can essentially decide the nature of the musical interface.
The final stage is the sound synthesis or application phase. Due to the nature of the controller being
both a multi-parametric and spatial controller, a number of mappings can be created for timbral
manipulation and spatial audio manipulation.
Figure 2.1: Showing the control system flow from human input to sound output

12
Today computer users are equipped with two main commodity input devices.
“The computer keyboard began with the invention of the typewriter in 1868 which was later
integrated in the teletype machine with the telegraph in 1930. Following punch cards, MIT,
General Electric and Bell Laboratories together created the computer system “Multics” in
1964. This encouraged the development of video display terminals (VDT) enabling computer
users to view graphics on a screen.
In 1964 the first computer mouse prototype was made by Douglas Engelbart with a GUI
(Windows)
Engelbart received a patent for the wooden shell with two metal wheels in 1970, describing it
in the patent application as an X-Y position indicator for a display system. During the time he
was working at his own lab (Augmentation Research Centre, Stanford Research Institute) he
staged a public demonstration of his mouse, windows and hypermedia with object linking and
addressing, and video conferencing. Due to the commodity of his invention Douglas
Engelbart was awarded the 1997 Lemelson-MIT Prize of $500,000, the world’s largest single
prize for invention and innovation. He now has his own company “Bootstrap Institute”
housed rent free, courtesy of the Logitech Corporation.”
(About Inventors, 2006)
The mouse is an example of a time multiplexed user pointing device (figure 2.2) which the user
must drag in order to search through menu systems and control parameters, clicking to operate the
virtual world. The mouse uses a sequential process of clicking, where each operation requires a
number of clicks and drags to reach the desired outcome. The computer keyboard and audio mixer
are examples of space multiplexed interfaces (figure 2.2). All the parameters are physically spaced
out on a surface and the user can reach all the hardware functionality, although the subjects are
limited to operating some of them at the same time (Fitzmaurice et al, 1997). Imagine a virtual
keyboard where the mouse is used to click each letter. The process of writing a sentence would take
much longer than using the keyboard (provided the user had no disabilities and the test was fair.)
This simple analogy demonstrates the weaknesses of the mouse. However inversely one might
favour the mouse to navigate a 2D space rather than using two singularly dimensional mixer sliders
or arrow keys on a keyboard.
Figure 2.2 : Shows time (mouse) and space (mixer) multiplexed interfaces
(Reproduced from Fitzmaurice et al, 1997)

13
Though the question then is whether manipulating parameters in 3D space using a mouse or
keyboard is as efficient as the invention of a commodity multidimensional input device which was
purposefully designed to control these parameters simultaneously and how creative can one be in
using such an interface. (Hunt et al, 2000) prove by survey that the “multi-parametric interface
allowed people to think gesturally and had the most long-term potential”. Also the use of two-
dimensional controllers in this survey was found “confusing, frustrating or at odds with their way of
thinking.” Proof is given later in this survey which shows that multidimensional electrical hardware
alternatives are expensive. There is also an ergonomic issue with gravity, holding heavy objects in
air is more strenuous than simply interfacing with them on a desktop. It is possibly for these reasons
that they have not made it to the desktop table of commodity controllers. The beauty of commodity
interfacing is that it grants people access to improve their technique and is also much easier to create
surveys and gather first person interface reviews from people on the other side of the world.
The invention of Midi Creator at the University of York continues to provide musicians with the
ability to rapidly create new custom-built electronic interfaces, so that their feasibility and
musicality can be determined, “inspiring technology for sensory experiences”. (Hildred, 2006)
Emerging technological developments point to the use of vision systems due to the increasing
robustness of tracking algorithms. Systems using vision are not bound to the physical connections
that limit many electronic interfaces, allowing for gestural control of parameters without cables.
They take information from the real world as their source of interaction. The details of vision
systems are discussed in section 2.1.2 (Vision Systems and Tangible Interfaces).
However HCI encompasses not just the input device, but also the application in which it is used.
Consideration for how these interact is especially important today since electronic musical
instruments can be split into the input control and sound generator. Ergonomics, anthropometrics
and aesthetics are important factors in the design of any human musical interface. (Hunt et al, 2000)
proved that complex interfaces are more engaging and rewarding than simple interfaces which take
very little effort and imagination to reach their maximum potential.

14
2.1.1 Haptic, Audible and Visual Feedback
As human beings we use our senses to gather information from the environment. Without these
senses we would not be able to communicate. Feedback is the process of communication with the
interface one interacts with. Unlike traditional musical instruments where the feedback is bound to
the object, the feedback of electronic musical instruments can be separated into three main parts, the
Haptic (tactile), Audible and Visual feedback. It is because of this separation that electronic musical
instruments (EMI) require a lot of skill, experimentation and knowledge to communicate correct
feedback to the user and audience. Design considerations which help to prevent miscommunication
associated with multiple feedback systems are very much application dependent.
Haptic Feedback “refers to technology which interfaces the user via the sense of touch. It is an
emerging technology that promises to have wide reaching implications” (Wikipedia, 2006)
“The Moose” is an example of a “general-purpose haptic display device” (Gillespie et al, 1997)
prototype where haptic technology was experimented with to display elements of Graphical User
Interfaces such as MS Windows (see figure 2.3). When white ‘puck’ (connected to double flexures)
is moved, the graphical mouse pointer also moves. If the pointer goes over an icon, a haptic
representation of it is provided for the user to explore. For example “The edge of a window is
realised haptically as a groove. A check box is represented as a border with a repelling block at one
end which becomes an attracting spring when the checkbox is checked. Thus by moving the cursor
over haptic objects the user can simultaneously identify their type and check their status”
Visual Feedback is the most commonly used type of feedback, whereby through computer screens,
or directly from the object, information is relayed to the user visually. However there is an inherent
awkwardness with moving objects in 3D space and using 2D screen visual feedback to accurately
judge their position as will be described. This is due to a lack of intuitive three dimensional spatial
visual feedback technology. Instead Graphical User Interfaces (GUIs) have been designed around
control of the commodity mouse and keyboard to manipulate virtual, non tactile objects. An
example of how GUIs have been developed to portray 3D information in two dimensions can be
seen in figure 2.4 above. By selecting individual orthographic projections the direction in which
manipulations occur in 3D space is observed from the perspective view (bottom right). This
separation can cause confusion for novice users. It is therefore conceivable that changing the input
Figure 2.3 “The Moose”haptic
feedback display device prototype.
Figure 2.4 Orthographic
projections of a simple model.

15
device to a multidimensional controller could benefit from a new type of visual feedback technology
to provide accurate and direct feedback of parameter changes in 3D space. Current research by
(Chiu, 2006) provides a more immediate solution to adapt 2D computer screens into 3D utilising a
head tracking magnetic 3D position pointer. The system uses stereo vision spectacles (red and blue)
to portray 3D information. Experimentation of the interface using a mouse and keyboard seemed
very complicated and un-natural. When using the mouse and keyboard the biggest visual feedback
problem existed when the virtual world was rotated but the directionality of the controllers remained
the same, so that forward and backward movement at the start, when the virtual world was facing
the front became left and right movement when rotated, although the mouse pointers input
movement was forward and backward. User preference would dictate whether the forward and
backward movements of the mouse should be mapped to the forward and backward movements on
the screen or the upward and downward movements, which simply adds to confusion and
demonstrates again the limitations of the mouse.
Augmented reality interfacing provides a fresh approach to this problem whereby three dimensional
visual feedback is projected onto objects which exist physically in their real world environment. A
headset can be used to provide head tracked feedback, so that moving around the real world object
physically provides an alterative perspective. However headsets have been described as an
encumbrance and costly (Open Illusionist, 2006)
Attempts at 3D effects exist in the latest windows Vista where interactive windows are spaced in
perspective known as Flip 3D (see figure 2.5). Advancements in augmented reality mixed with
holographic imaging systems could be the end of GUIs contorted to represent 3D information on 2D
visual feedback technology.
Figure 2.5 Windows Vista FLIP 3D showing window group in perspective.

16
Audible feedback can also be referred to as a sonification process of complex data or parameter
manipulations. In the case of musical instruments “performer gestures affecting the control surface
have to result in perceptible changes of sound and music parameters” (Mulder, 1998).
In the case of musical instruments audible feedback and sound output is a fundamental descriptor.
However it would be disturbing to the audience and user if spatial audible feedback was given
which was related to the instruments position rather than the musicians musical expression. For
example bleeps to tell a performer that they are out of range would be a distractor and a possible
performance spoiler. For this specific problem a musical sonification of this could prove useful,
such as the volume reducing as the performer reaches the edge of their space.
2.1.1.1 Graphical User Interface Design Issues
The Graphical User Interface (GUI) is a very important part of modern day technologies.
Companies now look for 2D GUI ergonomists who spend their time creating intuitive GUIs.
“Cognitive ergonomics or cognitive engineering is an emergent branch of ergonomics that places
particular emphasis on the analysis of cognitive mental processes – e.g., diagnosis, decision making
and planning – required of operators in modern industries.” (Wikipedia, 2006)
Some of the factors which contribute to good cognitive design are:-
• A smaller visual distance between tasks.
• Fewer tasks to reach the desired goal.
• Grouped function families.
A method has been devised called Hierarchical Task Analysis (HTA) where each mental and
physical task is drawn into a methodical flow chart. This design flow diagram can be analysed and
decisions made to increase the interface’s effectiveness. Though this can be extremely complicated
involving thousands of steps when considering the above factors for each process throughout the
application.
Imagine that the goal within a particular GUI is to increase audio volume. Firstly the user would
need to find the volume control on the GUI (a mental process), then they would receive visual
feedback about the graphical slider or knob, which would tell them the position of the volume. If it
was understood that the volume control was at its maximum then the goal would be abandoned, if
not the user would grasp the mouse pointer and move it until it was hovering above the visual
volume control. They would then click on the control and then move it. At this stage there exists a
mental question whilst moving the mouse with the mouse button pressed down, “Is the volume loud
enough” and if it is then the mouse button would be released for the mission to be accomplished. A
study on the subject of HTA can be found in (Shepherd, 2001)

17
2.1.2. Tangible Interfaces using Vision Systems
This section introduces commodity web camera technology and describes a number of audiovisual
tangible interfaces (those which are tactile and can be grasped) using vision systems.
Vision based systems have advantages and disadvantages. However web cameras are becoming
increasingly affordable with improved quality, faster USB 2 interfacing, greater resolution (greater
pixel count), motorised movement, zoom with the ability to be wireless, increased frame rate, and a
wider field of view (See figure 2.6). The Creative “Game Star Webcamera” comes with ten
eyegames (similar scheme to those available using the Sony Playstation Eyetoy.) Perhaps a web
camera could be branded by an audio equipment manufacturer and come supplied with a number of
audio interfacing applications. The disadvantages of web camera technologies are occlusion (when
an object obscures its vision, or moves in front of the object being tracked). However, using
multiple webcameras arranged at different perspectives so that the object is always in view can help
to overcome this problem. Superstores now sell web cameras in pairs. The author was excited by
the possibilities of high speed USB 2 web cameras with motor control. These allow for faster data
transfer of the image grabbing process and the ability to map the cameras tracking to tilt and pan, so
that greater distanced gestural movement could be tracked. The creative live motion boasts a
rotation angle of 200º horizontal and 105 º vertical. The general price range of web cameras at
present are from £3 for the basic USB 1.0, £10 for a high quality USB 2, and £70 for a high quality
USB motor equipped web camera. During the time of this project the Creative Live Motion web
camera could be found on sale ½ priced from E-buyer. Sufficed to say, it is a rapidly developing
market
Figure 2.6: Showing the motorised Creative Live Motion web camera
(Reproduced from Creative Labs Website)
The Creative Live Motion web camera comes supplied with a face tracking algorithm which
automatically moves the camera to focus on a persons face or zoom out to fit in multiple persons.
On the technology seemed to be somewhat in its infancy because it was not very robust and needed
a great deal of time before it locked onto facial characteristics.

18
The following tangible interfaces are similar in many ways and are also all at different stages of
development. PhD research by (Fitzmaurice, 1996.) defines graspable (tangible) user interfaces as
providing access to multiple objects which can be moved and manipulated at the same time,
allowing multiple persons to interact with spatial arrangements. The benefits of tangible user
interfaces (TUI) over graphical user interfaces are expressed by (Fitzmaurice, 1996) as being
superior due to the reduction of interactive tasks. Fitzmaurice states that the GUI requires a three
stage process and has been reinterpreted as follows:
1. Take hold of the controller (commonly the mouse),
2. Find the visual device
3. Manipulate it for the end result,
whereas the TUI only requires two stages.
1. Take hold of the controller,
2. Manipulate it for the end result.
“By using physical objects, we not only allow users to employ a larger expressive range of gestures
and grasping behaviours but also to take leverage from a user's innate spatial reasoning skills and
everyday knowledge of object manipulations.” (Fitzmaurice, 1996).
(Fitzmaurice,1996) also discusses how TUIs provide interactive clues to the user relating to how one
might use that particular object which are derived from the object itself . The example given was a
pair of scissors, where the finger handles guides one to how it might be used.
It was found that human beings exploit their spatial awareness to increase productivity. The example
given involved the letters S P A C E M A T T E R S printed onto individual blocks left randomly in
groups and on top of each other in towers. The subject was asked to arrange these into two block
groups to spell the words “Space” and “Matters.” Though there existed two limitations 1. Only one
block can be moved at a given time, 2. a block cannot be moved if another is on top of it. It was
observed that the human approach taken was the easier route by arranging the blocks horizontally on
the table top rather than vertically in columns. This also reduced the number of steps needed to
reach the goal.

19
2.1.2.1 Audio D-Touch
Audio D-Touch is a tangible user interface for music composition and performance developed and
created at the University of York. It uses a “consumer grade web camera and customizable block
objects to provide an interactive tangible interface for a variety of time based musical tasks such as
sequencing, drum editing and collaborative composition” (Shelley et al., 2003)
The Synthesis Toolkit (STK) is a free synthesis class library for C++ and was used in Audio D-
Touch by (Shelley et al, 2003).
It is freely downloadable from http://ccrma.stanford.edu/software/stk/
Audio D-Touch tracks the position of blocks by means of a fiducial (special pattern). These patterns
can be seen in figure 2.7 as square black and white grids containing positive and negative signs. The
recognition of these fiducials is done by a set of complex algorithms, which are discussed in section
2.4 (Vision Tracking algorithms). However the fiducials are only detected in two dimensions; their
real world stance i.e. the depth from camera has not been implemented. This means that lifting them
above the table top or twisting them from left to right does not produce another separate output
parameter.
(Costanza 2006)
(a) (b)
Figure 2.7: Audio D-touch (a) Detail of physical sequencer (b) Augmented Musical Stave
(Reproduced from Shelley et al 2003)
Video Demonstrations of the interface can be seen on the attached DVD in the following path:-
Research Documents and VideosHCIVision SystemsAudio D-Touch

20
2.1.2.2 ReacTABLE
“ReacTABLE is a state of the art multi-user, electro-acoustic music instrument with a tabletop
tangible user interface.”(ReacTABLE, 2006) The main body of ReacTABLEs existence is a
software program which tracks fiducials and creates adaptive graphics.
ReacTABLEs fiducials are a development of those used in Audio D-touch. It incorporates a table
underneath which a projector augments graphical, dynamic animations onto the surface. The
camera underneath recognises the objects placed on the table allowing multiple users to interact with
it simultaneously. It incorporates an audio engine using ‘Open Sound Control’ (OSC, discussed
further in section 2.5 – Protocols) and the software and source code is downloadable from the
ReacTABLE website. (ReacTABLE, 2006) It can be used on Win 32, MacOS X or Linux and works
with any webcam with a proper driver.
The website includes a number of up to date publications in the area of fiducial tracking and tangible
user interfaces. However at present the fiducials are only tracked in two dimensions. They do not
provide a 3rd depth dimension for calculating their position in 3D space, which is necessary for
extracting data about the fiducial’s position in 3D space. (Costanza, 2006)
The various objects which can be placed onto the table surface are different in shape and feel
providing haptic feedback. These different objects also have different fiducials, and have been
assigned to particular synthesis parameter groups. So a performer knows what the objects synthesis
function is before they have even introduced it.
The fiducials shown in Figure 2.8 are configured onto the sides of a hollowed out plastic cube so
that different synth instances exist on a single physical object. This means one can simply rotate the
block to reach the functionality of a fiducial on a perpendicular or opposite face rather than
introducing a completely new object and fiducial onto the table. (Kaltenbrunner, 2006)
(a) (b)
Figure 2.8 ReacTABLE Photos (a) Interactivity with projection and fiducials
(b) Fiducials on cubes. Reproduced from (ReacTABLE, 2006)
Videos of the interaction can be seen on the DVD in the following path:-
Research Documents and VideosHCIVision SystemsReactable

21
2.1.2.3 Open Illusionist
Open Illusionist (OI) is a software framework that allows you to create augmented interactivity.
Like ReacTABLE it has also been implemented for use with a table top. The Open Illusionist team
have already created a number of applications, and have named each application an “illusion”.
‘Robot Ships’ is an Open Illusionist illusion and uses sophisticated algorithms to detect the edges of
objects which are placed on a table. The projector projects an oil rig, moving oil tankers and rescue
craft onto the table’s surface from above. Smashing one of the oil tankers creates a pool of oil.
Rescue boats and cleaning craft are then sent out from the oil rig to retrieve the oil and rescue
survivors. The craft find a way round anything you place on the table eventually reaching the
spillage and survivors. See Figure 2.9a. Each of the boats act as agents. Agents contain programs
which decide what they do as individuals.
Another illusion is ‘Fish Tank’, where edge detection of human movement decides which direction
the fish flee in, the PC screen provides graphical feedback can be seen in figure 2.9b.
A fiducial Library has been implemented and has been integrated with Open Illusionist. The system
at present can augment a cube onto the surface of one of the fiducials so that even when you move
the fiducial and twist it in space, the cube stays stuck to the surface and changes perspective and
distance accordingly. See figure 2.9c. This is because the fiducial library contains additional
programming code for pose or stance detection, effectively capturing the fiducials position and tilt in
our three dimensional world.
It boasts a frame rate of 30 frames per second on a 3GHz Intel Pentium 4 Processor with 2GB of
RAM. The Library also tracks multiple fiducials, and the maximum limit can be set in the program
code. However the frame rate reduction for tracking multiple fiducials is negligible. (Open
Illusionist 2006)
Figure 2.9: Open Illusionist images of (a) left Robot Ships Illusion (b) middle Fishtank
Illusion Screen Shot (c) right Augmented, red, wireframe cube onto fiducial
surface.

22
2.1.2.4 Others
There exist many other tangible user interfaces varying slightly in their application and approach,
and www.pixelsumo.com includes some of the latest inspiring audiovisual interactive art works.
The ReacTABLE website contains a section titled ‘related’, where even more recent tangible user
interfaces are displayed, including the audio and the music cube.
http://www.iua.upf.edu/mtg/reacTable/?related
There are many tangible, augmented user interfaces. However they seem to rely on the use of a flat
table and projector which projects up onto the table surface from underneath. The literature review
has showed no TUIs which target the applications of 3D audio which will be discussed in section
1.6 (Applications) at the end of this literature review.

23
2.2 Six Dimensional Electronic Hardware Interfaces
This section describes three hardware based interfaces the first two of which attempt to solve the
problems of manipulating 3D audio the last for MRI and seismic data.
(Paschalidou 2003) discusses a multi-parameter control for Audio
Mixing developed at MIT. Attempts by Craig Wisneski and Ed
Hammond to create a 6D controller (see figure 2.10) using
hardware without haptic feedback led to confusion and the
necessity to reduce the number of degrees of freedom.
Figure 2.10: Photo of first attempt. (Reproduced
from Wisenski et al 1998)
The second attempt was inspired by SPIDAR developed by Ishii
and Sato at the Tokyo institute of Technology. This was a ‘totally
hardware’ based approach where a ball was suspended in air by
strings (see figure 2.11). Moving the ball around gave positional
feedback. (Wisenski et al 1998)
Figure 2.11: Photo of second attempt. (Reproduced
fromWisenski et al 1998)
The Cubic Mouse was developed by Frohlich et al, (2000), see figure 2.12. They describe a new
device for “Three-Dimensional Input”. Potentiometers are used to measure the positions of the rods.
A Polhemus Fastrak sensor provides the spatial position and orientation information for the Cubic
Mouse” The Polyhemus magnetic sensor is described in the next section titled alternative
technologies.
However (Frohlich et al, 2000) also describe several interesting
applications which include:-
• Data exploration for the Oil and Gas Industry,
where the cube is used to navigate a subsurface virtual 3D model
• Visualising Volumetric Medical Data, Such as CT,
MRI and PET scanners. The cubic mouse was used to position
and view cross sections.
Figure 2.12: Photo of Cubic Mouse. (Reproduced
from Frohlich et al, 2000)

24
2.3 Alternative tracking technologies
There are a number of alternative tracking technologies. This section aims to critically analyse each
technology for the application of both orientation and positional output. (Johnston 2001) describes a
series of tracking technologies including, Acoustic, Laser, Inertial, Electromagnetic and Vision,
detailing the possible benefits of their hybrids.
2.3.1 Acoustic
Ultrasonic range finding devices are available which send sonic pulses above the human hearing
range and measure the time taken for their return (see figure 2.13). These devices need to be on axis
to the object they are measuring their distance from. Interference problems exist when using
multiple pre-built ultrasonic range finding devices in close proximity, and therefore it is necessary
for the devices to be spread a distance apart. This was found by experimentation in the author’s
undergraduate major project “The Airstation” (Ozsan 2004). These devices would be ill used to
ascertain absolute rotational position of a cube. However they do have a reasonable range.
Commercial musical devices using this technology exist such as Midi Gesture and Sound Beam,
however these do not attempt to solve the problems associated with the control of 3D Audio.
Figure 2.13: SRF04 Ultrasonic Range Finder by Daventech,
Receiver and Transmitter
http://www.acroname.com/robotics/parts/R93-SRF04.html
2.3.2 Laser
“Lasers are usually used in combination with vision systems to scan the surface shape of objects to
provide a depth field rather than an objects position.”(Johnston 2001) Lasers measuring depth are
somewhat similar to the approach of ultrasonics but are more accurate, however for accuracy and
distance using this type of technology comes a great cost. Break beam sensors (i.e. on / off
measuring lasers) are much cheaper, but are inappropriate for capturing continuous controller
information (Ozsan 2004)

25
2.3.3 Inertial and GPS
Inertial sensors are based on measuring acceleration “In just 60 seconds, a one-dimensional Inertial
Measurement Unit (IMU) using an accelerometer with an output noise level of just 0.004 g yields a
positional uncertainty of about 70 metres. It is to do with the time scale involved, An inertial device
will only be positionally accurate over a short time (sub seconds!) so it needs to be used with
another device to continually correct its drift.” (Johnston, 2001)
Hybrids sensors exist that combine Survey Grade Global Positioning Systems (GPS) Receivers and
Inertial Navigation System (INS) technology. See figure 2.14.:
The cost varies from 20 to 70 pounds, and is 234x108x63 mm in size. The device is not really
aimed at use in small spaces, “If the RT3000 is constrained to operating in a very small area, it is
unlikely that it will be properly warmed up; this might degrade accuracy.” (Datron, 2006)
This is only accurate down to 2 cm over a long time but this value varies with the satellite signal,
which is sometimes good, other times not. It needs to be connected to one or two antennas, power
and a serial cable to receive the data back. The obvious advantages are that it can be used over any
distance, as long as the altitude does not exceed 18,000m. The weight (1.7Kg), number of
connections and dependant accuracy however make it an inappropriate gestural controller for
commodity control of 3D audio. (Datron, 2006)
Figure 2.14: RT3000 Inertial and GPS Navigation System Hybrid for measuring motion,
position and orientation. (Reproduced from www.oxts.co.uk)

26
2.3.4 Electromagnetic
Inition Ltd (Inition Ltd 2006) sell a number of products.
The Polhemus Liberty is a magnetic sensor based kit tracking six degrees of freedom (see figure
2.15). The Base unit can be upgraded to allow 16 sensor inputs. The prices start at a base price of
£4,750 with one sensor allowing up to four inputs. This base unit then increases to £11,645 with
one sensor but allowing up to 16 inputs. Each individual sensor addition costs £370, the range of
which extends up to 2.52 metres (for extra cost). The further away, the higher the degradation. The
optimal range from the transmitter is within 0.9 metres.
Figure 2.15: Showing photo of Polhemus Liberty.
(Reproduced from Inition Ltd 2006)
The Polhemus Patriot (see figure 2.16) is slightly less expensive; costing £1,550.00 with six DOF
(positional and orientational) but only comes with two possible inputs.
Figure 2.16: Showing photo of Polhemus Patriot.
(Reproduced fromInition Ltd 2006)

27
In both cases the magnetic field is distorted by metallic objects put within its range. Ascension
Technology Corporation (Ascension 2006), like Inition, sell a number of products. The “flock of
Bird” (see figure 2.17) uses magnetic field detection with orientation and positional output.
However it uses pulsed DC magnetic technology which makes it less susceptible to distortion
caused by nearby metal. It costs $2,495 with one sensor and again increases with more sensors,
achieved by chain linking the flock of bird boxes.
Figure 2.17: Showing photo of Ascension Flock of Bird.
(Reproduced from Ascension 2006)
The 6D Mouse is a pointing device (see figure 2.18) for use with Ascension trackers supporting a
serial interface. It contains an embedded DC magnetic sensor for continuously tracking its position
and orientation” The mouse costs $795 but must be used in conjunction with The Flock of Bird
($2,495).
Figure 2.18: Showing photo of Ascension 6D Mouse
(Reproduced from Ascension 2006)
One of the primary benefits of magnetic tracking is that no line of sight is needed. Ascension are
focused in their technology, but do not have any cheap vision based systems.

28
2.3.5 Reflection based Vision Systems
VICON PEAK (Vicon 2006) use a number of optical, high resolution camera solutions to track
position and orientation ( see Figure 2.19). Communication with Andy Ray revealed the following:-
The system is based on reflective markers which are positioned on an object or person. At least two
cameras are needed to track the reflective markers. Multiple cameras can be linked together, and
this very much depends on the application (See Figure 2.20 for multiple camera linking). “Feature
Films use systems in excess of 200 cameras for 15 person real time capture with hands and face
detail or up to 50 subjects at once for post process capture, game developers use from 6 – 100
cameras again dependant on what they wish to capture and Gait labs can run with just 5 cameras up
to 24 being the largest camera count for a Gait lab.
Therefore optical camera systems have a large varied price; as a guide you would be looking at
£50,000 for a very basic 6 camera system and in excess of £1.5 million for a system with over 200
cameras.” (Vicon 2006)
“The second method of motion capture is via the video based system called Peak Motus. This uses
anything from standard DV cameras to high end HD cameras used for creating 3D analysis.
However this technique is used predominantly for Biomechanical sports analyses and does not
provide a Real time output. The software for this solution starts around £15,000 for the 3D type and
will be around £15,000 for the hardware required.” (Vicon 2006)
After the recording the Vicon software reconstructs the trajectory of each ball, although the system
also provides realtime feedback down to 3ms latency.
Figure 2.19: Showing photo of Vicon professional camera (reproduced fromVicon 2006)

29
Figure 1.20: Showing photo of Vicon motus 3D tracking system.
(Reproduced from Vicon 2006)
2.3.6 Alternative Electronic Interfaces Summary
This concludes that the alternatives to commodity vision system technologies are extremely
expensive and although they do have some advantages, they do not follow the trend set by the
populated world of free Virtual Studio Technology and cheaper computing, which has led to home
studios and bedroom producers.

30
2.4 Existing Vision System Environments
There are a number of emerging environments to help programmers of the visual community. The
following section examines their benefits and drawbacks to help decide which is most applicable.
2.4.1 Eyesweb
Eyesweb is a programming environment developed by the visual community. It is a visually based
and open source much like Pure Data which requires you to link objects (blocks) using inlets and
outlets. It is also freely downloadable from the internet (Eyesweb, 2006)
“An alternative digital effects controller” by Paschalidou (2003) employed vision system technology
(webcam) to capture coloured finger hand gestures against a white background. These gestures
were mapped using high level perceptual parameters to a reverb VST using the audio programming
environment Pure Data. This area of research is closely related, and the findings gave this project a
boost forward.
Paschalidou (2003) describes problems with the use of the colour tracking algorithms (objects) in
Eyesweb.
“It was found very difficult to extract information from Eyesweb that would stay steady and would
exactly represent our hands position and motion” she further says “the system has been very
sensitive with lighting and background conditions in the colour tracking analysis process” (for
example confusing bright yellow with skin) Therefore a white T-shirt with coloured thimbles were
adopted, with the camera pointing towards the person. (Paschalidou 2003)
These problems exist due to the nature of lighting conditions and how a camera detects colour
through the process of segmentation.
Colour perception can be described by the resultant colours of light that the object does not absorb,
and are therefore reflected. So for example if a full spectrum of light was shone on an object which
absorbed all the colours of light apart from blue, it would appear blue. Pigments selectively absorb
particular colours of light. If equal intensities of Red Green and Blue shone on an object which
absorbed Blue, then the colours reflected would be Red and Green. When these colours are
reflected, the object would appear yellow. Hence variations of the incident light would create
different coloured reflections and confusion for the camera. If you can imagine that every
environment contains a different selection of objects which all reflect different colours and barriers
which change the lighting in different parts of the room including artificial and natural lighting at
different times of the day, the problem becomes extremely complicated.
However the decision to use colour tracking is dependant on the application. The Interactive Audio
Environment project at the sonic arts research council is an example of such an application, where
colour change was used to respond to movement in a space. The advantage of using colour
detection in this situation is that anyone walking into the space can interact with the system, without
needing to stick fiducials on themselves.
Setting up the Eyesweb colour blob tracker with a short movie displaying a dancer wearing different
colours demonstrated the crosshair jitter when calculating the centre of the blob see figure 2.21
below. Also some colours such as the dancers right blue foot on the darker flooring would be
unrecognised although it is visible by the camera. This is represented at the bottom negative value

31
in the Blobs COGs (centre of gravity) table, and can be seen in the Tracked blobs window as a
rectangle without a crosshair.
Figure 2.21: Showing Eyeswebs Track Colour Blob object tracking six differently coloured
blobs on a dancer with a white background. Reproduced from Eyesweb
John Robinson and Dan Parnham within the department have also expressed the difficulties in using
colour pixel count colour filtering algorithms for position tracking. (Ozsan, 2004) also experienced
problems using colour filtering in the use of a CMUcam from “Active Robots”.
Communication with (Volpe, 2006) and evaluation of the latest version of Eyesweb confirmed that
it is constructed from a combination of ‘low level’ algorithms which perform particular tasks.
Groups of these tasks have been combined into objects such as the colour blob tracking block.
However none have been combined to track fiducials (Described in the fiducials section). This is
because fiducials use a very specific set of algorithms created specially for tracking the associated
fiducial (pattern). The combination of these fiducial tracking algorithms is an art in itself. It would
be very difficult for eyesweb to integrate and recognise every form of fiducial that was put in front
of it. However Gualtiero Volpe was interested in an integrated Fiducial Library set and eyesweb
blocks which could be used in different configurations for gestural tracking.

32
2.4.2 Clip
The Class Library for Image Processing (Clip) is an extensive library set created by John Robinson
for programming image processing. CLIP is implemented as a single header file called picture.h. It
has the capability to communicate with a web camera so that a frame grabber can be programmed.
However it does not include an in-built fiducial library. The MECAD Lecture series at York
provides a good understanding how the classes operate. CLIP is downloadable from the internet
(CLIP, 2006)
2.4.3 Open Illusionist and fiducial library
The demonstration of the tracking algorithms by Dan Parnham showed great promise The
augmented feedback although on a computer screen (ie 2D) accurately projected a cube onto the top
of a 2D fiducial print out. ‘The perspective of the virtual cube would change as the fiducial was
tilted backward and forward so that the cube was always sitting centrally and flat against the top of
it.’

33
2.5 Protocol
2.5.1 Open Sound Control (OSC)
Open Sound Control (OSC) is a ‘protocol for communication between computers, synthesizers and
other multimedia devices that is optimized for modern networking technology.’ It is becoming more
widely used by the designers of new musical interfaces. (Wright et al 2003)
By querying another OSC systems features and functionality, it provides information so that these
can be mapped to accordingly.
It is based on hierarchical URL style addressing which point to ‘nodes’ in OSC address spaces. For
Example (Madden et al 2001) show that ‘to play the “glock” instrument at some pitch at some
loudness, the OSC command is /glock/playnote note loudness.’
Designed for high speed systems with bandwidths of over ten megabits per second, it is roughly 300
times faster than midi (31.25 kilobits per second). Because of this extra bandwidth data can be
encoded in 32-bit or 64-bit, providing symbolic addressing and time-tag messages. (Wright, Freed
1997)
OSC delivers its data in packets “Diagrams” which are arranged as follows (Wright, Freed 1997):
“The basic unit of Open Sound Control data is a message, which consists of the following:
• A symbolic address and message name
• Any amount of binary data up to the end of the message, which represent the
arguments to the message.
An Open Sound Control packet can contain either a single message or a bundle. A bundle consists
of the following:
• The special string "#bundle" (which is illegal as a message address)
• A 64 bit fixed point time tag
• Any number of messages or bundles, each preceded by a 4-byte integer byte count”
OSC can be found in a number of environments including Max Msp, PD, CSound and
SuperCollider. It has a number of benefits including the ability to separate visual processing i.e.
GUI and Audio processing onto different computers. It is quite open ended. (Zumwalt 2003)
Updated information about OSC can be found at http://cnmat.berkeley.edu/OpenSoundControl/

34
2.6 Mapping
Unlike acoustic instruments in which the timbre is derived from the inseparable playing interface
and sound source, electronic musical instruments consist of three main parts: the technological
hardware input, mapping and sound output. Mapping (also termed as the mapping layer) is the
intermediary stage which connects input hardware (performer actions) to system parameters
(Synthesis inputs),.
Hunt et al (2002) demonstrated the “dramatic effect that mapping can have on bringing the interface
to life.” Hunt et al (2000) also described two types of mapping, ‘Explicit’ and ‘Implicit.’ see figure
2.22 below
Figure 2.22: Mapping of performer actions to synthesis parameters (Hunt et al 2000).

35
2.6.1 Explicit Mapping Strategies
One-to-One (Simple)
Each independent output gesture is assigned to one (low-
level) sound parameter. (Hunt, Kirk 2000) prove that
although this is preferred by beginners it was not as
inspiring for experienced performers. Because it was
easy, performers spent less time to master it.
Figure 2.23: One to One Mapping
One-to-Many (Divergent)
One output gesture is used to control more than one
simultaneous musical parameter. This is good for macro
control, but is lacking when finer simple parametric
adjustments are needed.
Figure 2.24: One to Many mapping
Many-to-One (Convergent)
Many gestures are coupled to one musical parameter.
This strategy requires more practice with the system in
order to achieve effective control. It proves far more
expressive than simple mapping.
Figure 2.25: Many to One mapping
Many-to-Many (Complex)
This is a mixture of the above strategies where gestures are interwoven to control a number of
synthesis parameters. Results from (Hunt et al 2002) show that performers spent more time using
the complex version of the interface, although they were frustrated by its complexity they found it
more engaging. “Several people commented that they would like to continue with it outside of the
tests” (Hunt Wanderley 2002) (Wanderley 97)
The explicit mapping strategy used in the project very much depends on the end application.
For timbral space it would be interesting to map the cubes six degrees of freedom to high level
timbral descriptors, which will be discussed in the applications section under 3D timbral
manipulation. This would be a complex mapping.
However, for the application of spatial audio, namely ambisonics, it would map initially simply to
panning positions in space.

36
2.6.2 Implicit Mapping (Generative)
This method derives its mapping strategy using a learning based, generative system such as
Artificial Neural Networks (ANN). ANN implicit mapping works in a different way than explicit
mapping. Explicit mapping involves the user setting predefined connections, which the user can
change. Implicit generates its own mapping based on information it receives during the learning
time. There are advantages and disadvantages to both systems. However, for the purpose of this
interface, focus will be put into the explicit mapping strategies.
2.6.3 Mapping Layers
Sometimes when designing the Mapping Network, it is useful to divide the mapping stages into
layers. The first layer of the RIMM project (Hunt, Wanderley, 2002) involved the mapping of the
sensor inputs to meaningful parameters, such as ‘Sax Lift’‘Brightness’ and ‘Energy’(see figure 2.26)
Mapping layer two splits the sound parameters and graphical parameters of the project. Finally the
last layer involved the mapping of these to three different sets of parameters. The synthesis engine,
the 3D sound engine and the graphical engine.
Figure 2.26: Mapping Layers within the RIMM project (Hunt, Wanderley, 2002)
2.6.4 Metaphors for musical control
A metaphor in this context is a gesture or series of interactive operations to reach a result, for
example the click, drag and drop used when using a mouse to move files on a computer is a
metaphor.
Fels et al (2002) “defined a two-axis transparency framework that can be used as a predictor of the
expressivity of a musical device. One axis is the player’s transparency scale (does the mapping
make sense to the player?), while the other is the audience’s transparency scale (do the performers
actions make sense to the audience?).” The transparency or opaqueness of an interface’s mapping
can therefore be a predictor for its expressivity.
The concept of expressive transparency is used later in this project to describe the transparent
mapping used to move a sound in 3D space with a 3D controller.

37
2.7 Audio Applications
There are a number of applications which the interface targets, including multiple media related
environments such as Maya by autodesk and 3Ds Max by Discreet, which are both 3D graphics
applications for film, media and product design. Although 3D graphics applications may be
employed to develop the GUI of the interface, focus will be on two feasible 3D audio applications.
2.7.1 Timbral Manipulation
Unlike timbre, pitch and loudness can be represented by just a single value, but timbre has many
different facets. A common goal in timbre research is to find a way of grouping sounds, and to
portray these differences accurately.
2.7.1.1 Multidimensional perceptual scaling
Multidimensional perceptual scaling techniques have been used to evaluate the perceptual
relationships between musical instrument sounds. (Rossing et al 2002). Grey, Moorer, (1977) asked
listeners to rate pairs of instrument sounds in terms of their similarity, giving a perceptual distance
of one sound from another.
The distances of these 16 instruments were then mapped into a 3D space see figure 2.27 below.
Figure 2.27: Illustration of how 3D timbre space was mapped Grey, (1977).

38
The axis labelled I in fig 2.32 represents the spectral energy distribution, axis II is the Spectral
Synchronicity (how the partials as a collective starts and finish) and whether they start and finish at
different times, axis III represents the preceding high frequency energy (often inharmonic). The
lines show the increasing strength of relationships between instruments inside clusters in order:
solid, dashed, dotted. This is known as hierarchical clustering analysis. (Johnson, 1967)
There was a differences in judgement dependant on the order of sound presentation 2.28 below
shows 2D spectrographic visualisation relating to the Axis of the 3D space diagram (Grey 1977).
Figure 2.28: 2D spectrographic visualisations relating to the Axis’ of the 3D space diagram
The instruments presented have been given labels to identify them. (Grey, 1977)
The Key is as follows:-
O1, O2: Oboes
X1, X2, X3: Saxophones (3-mf, p, sop)
C1, C2: Clarinets
S1, S2, S3: Strings
EH: English horn
FH: French horn
TM: Trombone
FL: Flute
BN: Bassoon

39
I = Axis I - The spectral energy distribution. For example at the top of figure 16a FH and S3 have a
narrow spectral bandwidth and a concentration of low frequency energy. On the other extreme, TM
towards the bottom has a wide spectral bandwidth and less of a concentration of energy in the lowest
harmonics.
II = Axis II – This axis relates to the synchronicity of the attack and decays of all the harmonics as a
collective. So towards the left the instruments harmonics generally tend to all start at the same time
and then end at the same time. This can be seen almost as a divide between the woodwind on the left
(X1, X3, C1) with very rectangular looking spectra, and the strings on the right (S1, S2, S3) which
rather tend to have tapering patterns. With the exceptions of the flute (FL) and the bassoon (BN)
III = Axis III – This axis provides a dimension for tones on one extreme such as those with high
frequency, low amplitude energy most often in-harmonic energy during the attack segment towards
the right-hand tones at the other extreme which either have low frequency in-harmonics or lacking
high frequency energy in the attack portion of the sound towards the left.
This study marked the first such scaling of naturalistic, time-variant tones which employed a three
dimensional solution for interpretability.
The exceptions to the family clustering prove to be interesting, (Grey 1977) and suggest the
possibility that certain physical factors may override the tendency for instruments to cluster by
family, such as the overblowing of a flute. Articulatory features also seem to play an important part
in this non familial clustering.
As such, a synthesizer which navigates this space and allows the user to define their own three
dimensional mappings has not been created.

40
2.7.1.2 Tristimulus
The Tristimulus synthesizer is two dimensional, created by (Riley 2004). It employs an evolution of
a timbres fundamental, harmonics and partials based on analysis by Pollard and Jansson in 1982. An
approximate timbre representation by means of a
Tristimulus diagram for note onsets and the steady
state portion of 5 sounds. is given in figure 2.29
below.
Figure 2.29: Tristimulus Diagram
(Howard and Angus, 2001).
In each case the onset starts at the end of a line and then travels towards the open circle which
represents the approximate steady-state position. ‘Mid’ at the top represents stronger second third
and fourth harmonics(resolved) ‘High’ represents stronger high frequency partials (unresolved) ‘f0’
represents a stronger fundamental.
However (Pollard, Jansson 1982) state that the time course is not even and not calibrated for clarity.
The note onsets (black lines) lasted as follows: gedackt (10-60 ms); trumpet (10-100 ms); clarinet
(30-160 ms); principal (10-150 ms); and viola (10-65 ms) (Howard, Angus 2002)
The tracks taken by the notes are very different and the steady states also all lie in different places.
It provides a straightforward means of representing timbre which (Riley 2004) implemented in a
real-time synthesizer for Pure Data.
The Tristimulus approach lacks the communication of how the sound ends, (ie what happened after
the steady state portion of the sound)
Timbre is multi-dimensional. Although experiments have been made to visualise this, a definitive
solution to quantify the massive world of timbre still requires much research.

41
2.7.1.3 The Musicians most common Use of Timbral Descriptors
Current EPSRC funded research into Timbral Descriptors at the University of York (Disley et al
2006) involves identifying the most common transparently used adjective couples to describe
timbre. The basis for this research is the need for musically intuitive instruments which appeal to
musicians’ use of adjectives rather than the underlying, low-level technological parameters. These
include “fundamental frequency, basic waveform shape, filter cut-off frequency and filter
resonance”
The research excludes “nasal, ringing, metallic, wooden, evolving, pure and rich” due to
disagreement between users, but supports the use of “bright, clear, warm, thin, harsh, dull,
percussive and gentle” (Disley et al 2006)
2.7.2 3D Ambisonic Manipulation
Ambisonics is a solution to the problems of encoding directions and amplitudes of sound and
reproducing them over loudspeaker systems in such a way as to fool the listener into imagining that
they are hearing original sounds correctly located.
The ambisonic setup can be 360 degree horizontal (pantophonic) or over a full sphere (periphonic)
The format which first order soundfield microphones create is converted into what is called B-
format.
This is a four channel encoded file. However higher order formats do exist (Malham 2006)
The encoding equations for B- Format are as follows:-
W = 0.707 (pressure signal)
X = cosA.cosB (front-back)
Y = sinA.cosB (left-right)
Z = sinB (up-down)
It is useful to understand how each of these letters contributes to the sound field, so I have
highlighted them as follows: W is the pressure signal, X corresponds to sound from the front and
back, Y is left and right and Z is the vertical channel (up and down). A more in depth look at the
ambisonic system is described by process in chapter 6 titled Applications.
The easiest way to approach panning in Ambisonics is to imagine a sphere in which you position a
sound. Unfortunately due to the two dimensional nature of common computer interfaces and
graphical feedback, this is a very complex process, in some cases requiring the operator to move a
number of one dimensional sliders at the same time. There are a number of ambisonic shareware
VSTs available from the internet. However, it has been decided for immediate experimentation to
integrate to an environment in Pure Data “A realtime graphical dataflow node programming
environment” with an object providing the parametric outputs. It was found that ambisonic Pure
Data objects did exist which were a conversion of Dave Malhams VST set. Therefore using a VST~
Pure Data object to connect to external .dll VSTs is unnecessary for quick experimental purposes.
The Pure Data object which receives information from the camera will need inputs if the pure data
application is to make use of its functionality such as zoom, focus panning and tilt control. However
this would involve delving into the web camera driver source. Obtaining this information from the
associated companies proved difficult. Sometimes secrecy is necessary for a company to be a
product leader.

42
2.8 Literature Review Summary
Commodity vision position tracking is cheaper and can produce an accurate, robust, 6 degree of
freedom output over a distance greater than the alternative electronic hardware based technologies.
A fundamental understanding from the literature survey is the fact that tracking position by colour
detection and filtering does not produce a robust enough output, and is affected by external lighting
conditions.
The fiducial based tracking systems are much more robust, more affordable and are not affected by
colour change and so are more flexible because a white background is not needed for them to be
tracked. There is however a particular threshold lighting level which is needed for them to be seen.
Creating the algorithms together with the fiducials seems to be somewhat of an artform, and requires
complex thought processes to detect their presence and retrieve a reliable tracking output.
Because of these reasons, coupled with the future possibility of integrating a fiducial library into
Eyesweb, the project was steered into using fiducials rather than using the previously imagined
Eyesweb environment and colour tracking.
A number of different tangible augmented user interfaces have been examined as well as their
electronic hardware alternatives. For the growing market of laptop computer musicians powered
with freely downloadable shareware VSTs or sequencers inside cereal packets the alternatives are
inappropriate or unaffordable.
Two audio applications have been identified for which extensive experimental research and musical
development is needed. However a number of other applications may result from communication
and experimentation of the interface that have not been conceived in the literature review.

43
CHAPTER 3. SYSTEM FEASIBILITY
This section describes the installation and procedures necessary to start programming. It looks at
the three augmented tangible user interface environments described in the Literature Review and
quickly decides which is the easiest to understand in order to achieve the aim set at the beginning of
this project. It begins by examining the fiducial (pattern tracking) technology used in each.
3.1 Fiducial Tracking Algorithms
There are a number of machine vision tracking algorithms which work in different ways, these
include edge detection, elipse detection, colour detection using colour filtering and segmentation
(where either by region based or edge based algorithmic methods, different coloured patches are
differentiated from one another). Fiducials are targets or symbols designed to be identified and
tracked using a set of algorithms. Pattern recognition seems to be robust and is much less prone to
problems with lighting and background due to the nature of the tracking process. Therefore this
section focuses on fiducial tracking and relevant literature in the area.
There seem to be a plethora of different looking fiducials, which all have advantages and
disadvantages. Each of the following Fiducials have been integrated into application environments.
The procedure required to begin programming in these environments is given in the development
section of this report. There might be a need to completely integrate a fiducial library into another
environment, such as a fiducial set from one of these environments into eyesweb or Pure Data.
Parnham et al (2006) state “in general that fiducial design balances the following constraints and
criteria:-
•Detection rate: True fiducials should be detected that is, identified as being fiducials.
•Identification rate: Once detected, a fiducial’s identity (particular code) should be correctly read.
•Misdetection rate: Natural scene elements should not be interpreted as fiducials.
•Information content: The higher the number of different fiducials that can be generated within the
design, the better.
•Compactness: The fiducial should be as compact as possible.
•Image-position accuracy: The fiducial’s centre, and perhaps other information about its
disposition in the mage should be accurately extracted.
•World-position accuracy: Information about the fiducial’s disposition (location and orientation)
in the world should be accurately extracted.
•Robustness to lighting conditions: Illumination distribution and colour and shadowing should
have little effect on detection.
•Robustness to occlusion: The fiducial should be detectable/interpretable when partially occluded.
•Robustness to pose: The fiducial should be detectable when seen obliquely.
•Robustness to deformation: The fiducial should be detectable when displayed on a curved
surface or otherwise deformed.
•Efficiency (speed) of detection/location algorithms.
•Ease of rendering: A fiducial that can be drawn may be preferred to one that must be printed, and
a monochrome fiducial may be preferred to a colour one.”

44
3.1.1 Video Positioning System Fiducials
The findings of (Johnston, 2001) “Vision-Based Location System Using Fiducials” provides a good
base for the concepts behind fiducial detection with vision systems. The PhD focused on “The
production of a real-time augmented reality system that runs on commodity hardware (ie PC-class
machines) and allows the user to roam over a comparatively wide area, up to several hundred metres
– what might be called un-tethered ” (Johnston, Clark 2003) called the research “a visual positioning
system” (VPS) “which is essentially a library written in C and usable from C++, which takes an
image and yields estimates of the position and orientation. The library also provides an interface to
the video capture sub-system on Linux. Location information can be fed directly into the industry-
standard OpenGL library.”(Johnston, 2001) In simpler terms the system is a cube which tracks
complete six degree of freedom movement using fiducials whose positions have been calculated and
added together to provide 360 degrees of rotation in three directions as well as positional translation.
The VPS system was accurate down to 0.1 of a pixel in the x and y direction. Because the fiducials
could be printed any size, the only factor which limited the range was that the smallest component
of the fiducial needed to be larger than a pixel and within the focal range of the camera. (Johnston,
2006)
Johnston (2001) details the mathematical conversion matrices needed to transform the Image plane
(Cameras 2D vision) into the real world coordinate system including the maths involved in placing
fiducials in a cube formation so that absolute six degree of freedom orientation can be tracked. A
good resource for the explanation of projective geometry and single camera machine vision can be
reviewed in chapter 4 Faugeras, (2001) and (3D Geometry, 2006)
The system tracked a cube in 3D space (see figure 3.1a) however did not yet track multiple cubes.
Initially it was confused that SML (Standard Meta Language) needed to be used to program the
tracking of fiducials, but in actual fact the fiducial tracking library was created in C. (Johnston,
2006)
VPS uses a system of fiducial identification named Region Adjacency Graphs which was similarly
implemented in ReacTABLE. This process is explained in detail in the following section (3.1.2)
titled ReacTABLE Fiducials.
Figure 3.1: (a) Left, Showing VPS Cube with augmented cube wireframe (b) Right, A
Fiducial from the VPS library showing smallest components.( Johnston 2001)

45
3.1.2 Reactable Fiducials
ReacTable uses ReacTIVision fiducials developed by Bencina et al, (2005) which were derived
from Enrico Costanzas fiducial system. They look for adjacent colour changes and the number of
these colour changes within each boundary. This method of detection can be displayed more clearly
on a Region Adjacency Graph (RAG). Example figure 3.2a below shows one black outer ring
followed by one white inner ring, and one black inner circle, inside of which lies three small white
circles. This is known as the Topology, ie the fiducial pattern. It is represented in a basic form by
the RAGs beside each one, which follow in sequence from top to bottom. Notice that the RAGs of
3.2b and 3.2c are the same although the topologies are different. This is unlike Audio D-Touch
which uses geometry to ID fiducials. Therefore “ReacTIVision fiducials are Identified purely by
their RAG or Topological structure” by Bencina et al, (2005)
Figure 3.2: Simplified topologies and their corresponding region adjacency graphs.
Reproduced from (Bencina et al, 2005)
The centre and orientation of the fiducial is determined using its smallest entities ie the last parts of
the RAG tree (leaves). These leaf centres (both black and white) are computed using a weighted
average to find the fiducials centre point, see figure 3.3 below. The orientation is created by taking
the average centroid of either all the white leafs or black leafs, and drawing a line or creating a
vector between the centroid of the average of both white and black leafs and the centroid of either
the white leafs or the black leafs. (ReacTIVision uses this method because it is easy to find the
centre of an object if it is square, circular and/or relatively small) (Bencina et al, 2005)
Figure 3.3: (a) reacTIVision fiducial (b) black and white leafs and their average centroid
(c) black leafs and their average centroid, and (d) the resultant vector used to
compute the orientation of the fiducial. (Bencina et al, 2005)

46
3.1.3 Open Illusionist Fiducials
The fiducial design used in Open Illusionist is a result of research by (Parnham et al 2006) in “A
Compact Fiducial for Affine Augmented Reality.” The fiducial consists of an outer ring, inner
segments and a centre spot. The circular outer ring is processed by an edge detector algorithm.
Elipse detection finds the perspective which the circular fiducial is at. This is because “a circle
under perspective projection is an elipse.” (Parnham et al 2006)
The elipse detection helps to define the centre position of the fiducial. The data of the fiducial is
stored in the segments, although one of the segments is dedicated to indicating which direction the
fiducial is pointing. Finally the fiducial is read for its binary or ternary value (It is identified).The
only problem with these at present is that they do not compensate for partial coverage. The figure
3.4 below illustrates four of these fiducials, though differently coded.
(a) “A 15 segment binary fiducial, clearly displaying the directional pointer. It is binary
because it only has two shades (black and white) and has only one layer.
(b) Another 15 segment binary fiducial with the code “100111010101110” Reading
anticlockwise and from left to right. (1 = white segment, 0 = black segment)
(c) A Ternary fiducial containing 15 segments that reads “0120 0221 1201 012”
(14348907 combinations) This is because it has three shades
(d) This fiducial has been split into two layers containing 30 segments excluding the
pointing segment which reads “012012012012012000111222001122” (2.1 x 1014
combinations)” (Parnham et al 2006)
However the tracking process is more robust with less segments and fewer levels and as binary code
rather than ternary. For the application of a six degree of freedom audio cube it would therefore be
logical to trade high combination count for robustness. The fiducials printed in the article are old,
recent research has led to the creation of different typed fiducials with larger segments but fewer
combinations.
Figure 3.4: Four Open Illusionist Fiducials
see explanations for (a), (b), (c) and (d)
below. (Parnham et al 2006)

47
3.2 Open Illusionist Implementation
Open Illusionist (OI) consists of two parts. The framework for creating interactive applications
called Open Illusionist and the Fiducial Library. Although the Library could be integrated with
Open Illusionist, it was later found that they can function separately. The Fiducial Library
initialisation is split into a further two sections. These are the drawing of fiducials and the tracking
of fiducials.
3.2.1 Open Illusionist Framework Setup
The following sub-sections describe the process for installing all the Open Illusionist components
and setting them up in a free IDE (Integrated Development Environment.)
3.2.1.1 C++ Compiler / Debugger
To compile program code you need a compiler or more commonly an IDE.
Microsoft Visual C++ 2005 Express Edition is freely downloadable from the internet and has
unlimited usage time when registered.
http://msdn.microsoft.com/vstudio/express/visualc/download/
The installation files are also on the DVD.
Environment Setup FilesVisual C++ Express 2005 Installationvcsetup
However this installation file downloads files from the Internet, so it is necessary to connect to the
internet before this is attempted.
Visual C++ 2005 Express Edition does not come with the relevant windows SDK(Software
Development Kit), so it needs to be downloaded and installed otherwise you will see errors referring
to ‘Windows.h’ not found. Tutorials can be found on the DVD under
Environment Setup FilesVisual C++ Express 2005Tutorials
3.2.1.2 Windows SDK
The Microsoft Windows Software Development Kit (SDK) provides the libraries, header files,
samples, tools and documentation you need for the development of applications that run in
Windows.
1. Download the Platform SDK from the following website:-
http://www.microsoft.com/downloads/details.aspx?FamilyId=0BAF2B35-C656-4969-
ACE8E4C0C0716ADB&displaylang=en
You can either download the full SDK in chunks (Windows Server R2 Platform SDK Full
Download) or download an ISO image for burning to CD (Windows Server R2 Platform SDK ISO
Download) The DVD also contains an installation file for this under:-
Environment Setup FilesWindows Server 2003 R2 Platform SDK.

48
3.2.1.3 WXwidgets
WXwidgets is a cross platform Graphical User Interface Library.
“WXWidgets lets developers create applications for Win32, Mac OS X, GTK+, X11, Motif,
WinCE, and more using one codebase. It can be used from languages such as C++, Python,
Perl, and C#/.NET. Unlike other cross-platform toolkits, wxWidgets applications look and
feel native. This is because wxWidgets uses the platform’s own native controls rather than
emulating them. It’s also extensive, free, open-source, and mature.”
http://www.wxwidgets.org/
Figure 3.5: Screenshot of the Ca3D Engine world editor which uses wxWidgets
reproduced from (wxWidgets, 2006)
For PC, download and install WXmsw (Microsoft Windows) to a location on your hard-drive.
‘Remember the location’
http://wxwidgets.org/downloads/ is a direct link to the website for the latest download, however the
installation version used in this project can be found on the DVD under:-
Environment Setup FilesWxWidgetswxMSW-2.6.3 Install.
3.2.1.4 Visual C++ Environment Setup
Open the $(WXWIN)buildmswwx.dsw project in the wxWidgets folder. Visual C++ 2005
Express should now load and ask you a question, select “yes to all” when asked to convert
For the SDK Setup procedures a useful link is as follows:-
http://www.wxwidgets.org/wiki/index.php/Compiling_WxWidgets#Microsoft_Visual_C.2B.2B_20
05_Express_Edition

49
This describes how to direct VC++ 2005 Express to point at the previously installed SDK Library.
Go to Tools Options Projects and Solutions VC++ Directories. There is a ‘show
directories for:’ bar at the top right, Use this to scroll down to include files. Clicking on the include
files drop down will change the information in the frame below. Within these drop downs, where it
refers to platform SDK you need to select and redirect this to the SDK include folder which you
previously installed. Look under both the include drop down and library drop down.
A helpful link for procedures to setup Wxwidgets in VC ++ 2005 Express is
http://wxforum.shadonet.com/viewtopic.php?t=6261&postdays=0&postorder=asc&start=0
Building the Solution may give an error as shown:-
....srcregexregerror.c(103) : warning C4996: ‘strncpy’ was
declared deprecated
C:Program FilesMicrosoft Visual Studio 8VCincludestring.h(156)
: see declaration of ‘strncpy’
Message: ‘This function or variable may be unsafe. Consider using
strncpy_s instead. To disable deprecation, use
_CRT_SECURE_NO_DEPRECATE. See online help for details.’
To correct this bug one needs to select all the projects under the solution icon at the top of the
Solution explorer down the left hand side of the screen in VC++ 2005 Express.
Clicking the icon at the very top left of the Solution Explorer panel (once all projects selected)
should bring up a screen called Property Pages. Inside this screen there are a list of configuration
properties.
1. under C/C++ Code Generation Enable C++ Exceptions: set to “Yes with SEH exceptions
(/Eha)
2. under C/C++ Command Line Additional options: REMOVE the Ehsc, and add
/D“_CRT_SECURE_NO_DEPRECATE”
/D“_CRT_NONSTDC_NO_DEPRECATE”
See instruction number 7. on website for further details…
http://wxforum.shadonet.com/viewtopic.php?t=6261&postdays=0&postorder=asc&start=0
The wxWidget library has two versions. The release version and the debug version, both of these
versions need to have their solutions built.
To change between the version inside VC++ Express 2005 you need to go to
Build --> Configuration Manager --> and under Active solution configuration change to release and
then build solution and change to Debug and then build solution.
This should now build without errors, though there may be some warnings.

50
3.2.1.5 Install and setup Open Illusionist
Download and install Open Illusionist from their website, version 1.3.0 was available at the time this
paper was written (2006)
http://www.openillusionist.org.uk/documentation/doku.php?id=site:downloads
Alternatively it is also possible to download the latest version of Open Illusionist from Source Forge
using Tortoise SVN (Subversion Version Control)
“The location of the wxWidget libraries must be known to Open Illusionist and any derived
applications so you will need to setup an environment variable (unless the wxWidgets installer has
already done so) called WXWIN. You can do this by opening the System Properties from the
Windows Control Panel, selecting the tab “Advanced” and clicking the button “Environment
Variables”. From there you can add a user variable called WXWIN and set its value to the absolute
path at which wxWidgets resides.”
http://www.openillusionist.org.uk/documentation/doku.php?id=install:preparation#wxwidgets
If you open the file called Illusionist (VC++ project) under Open Illusionistv1.3.0illusionist, this
will open the project in Visual C++ 2005 Express. Then you need to build both its debug and
release solutions.
This should put two library files into the open illusionist (version) Lib
Although there exist a number of workspaces for open illusionist, creating an executable file for the
fiducial library required the fiducial library to be extracted from open illusionist and programmed
separately. Closer examination of this revealed that the open illusionist fiducial component could be
treated separately from the open illusionist framework.

51
3.3 VPS Implementation
The approach taken for this is similar to the implementation process used for Open Illusionist in that
first the Fiducial Drawing program needs to be solved and then the tracking programs linked. The
drawing process however involves using the Standard Meta Language New Jersey (SMLNJ). SML
is only necessary for writing lines into the command console to save fiducials to a file. The tracking
code was found later to be programmed in C using Linux. Therefore the tracking programs can be
compiled in visual C++ or any other C IDE.
3.3.1 Standard Meta Language Setup
First the smlnj.zip file for Microsoft Windows needs to be installed from the website
http://smlnj.cs.uchicago.edu/dist/working/110.59/ and unpacked to a specified location such as.
C:........SML
This folder contains two sub folders called Bin and Lib. The following process must be done to
understand SML and save fiducials to a postscript file
First the windows XP environment variables needed to be edited.
It is necessary to change the file extension in the environment variables to match the location of the
SML installation.
To view or change environment variables:
“1. Right-click My Computer, and then click Properties.
2. Click the Advanced tab.
3. Click Environment variables.
4. Click on the following options, for either a user or a system variable:
Click New to add a new variable name and value.
Click an existing variable, and then click Edit to change its name or value.
Click an existing variable, and then click Delete to remove it.”
http://support.microsoft.com/default.aspx?scid=kb;en-us;310519&sd=tech
SMLNJ_HOME = c:sml
PATH = <EXISTING STUFF>; c:smlbin
These path descriptions should be visible in the user variables window, In this case they are directed
to the C drive
In the case of the author
C:sml needed to be replaced by
G:.......SML
And C:smlbin needed to be replaced by
G:.......SMLbin

An alternative tangible interface for manipulating 3D audio and multiple media

An alternative tangible interface for manipulating 3D audio and multiple media

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie An alternative tangible interface for manipulating 3D audio and multiple media

Ähnlich wie An alternative tangible interface for manipulating 3D audio and multiple media (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

An alternative tangible interface for manipulating 3D audio and multiple media