1. Integrating Human-ComputerIntegrating Human-Computer
Interaction with Planning for aInteraction with Planning for a
Telerobotic SystemTelerobotic System
Zunaid KaziZunaid Kazi
Computer and Information SciencesComputer and Information Sciences
Applied Science and Engineering LaboratoriesApplied Science and Engineering Laboratories
3. DefinitionsDefinitions
TelerobotsTelerobots
Robotic devices that extend a person’sRobotic devices that extend a person’s
manipulation capability to a location remotemanipulation capability to a location remote
from the personfrom the person
TelemanipulationTelemanipulation
Manipulating objects using a telerobotManipulating objects using a telerobot
4. Problem domainProblem domain
Telemanipulation in anTelemanipulation in an unstructuredunstructured
domain where direct physical control isdomain where direct physical control is
limited due tolimited due to
Time delayTime delay
DistanceDistance
Lack of structureLack of structure
Lack of sensation and coordinationLack of sensation and coordination
5. The problem of controlThe problem of control
Direct controlDirect control
Autonomous controlAutonomous control
Supervised controlSupervised control
6. The control gamutThe control gamut
Direct ControlDirect Control
LoadLoad
AutonomousAutonomous
ComputerComputer
LoadLoad
7. The control gamutThe control gamut
SupervisedSupervised
LoadLoad
ComputerComputer
SharedShared
LoadLoad
ComputerComputer
8. Need for interventionNeed for intervention
Non-repetitive and unpredictable tasksNon-repetitive and unpredictable tasks
Incomplete domain knowledgeIncomplete domain knowledge
Unpredictable changesUnpredictable changes
Insufficient sensory informationInsufficient sensory information
Inherent inaccuracy of the telerobotInherent inaccuracy of the telerobot
9. The proposed solutionThe proposed solution
A new telemanipulation technique forA new telemanipulation technique for
unstructured environments thatunstructured environments that
integrates the human user into a sharedintegrates the human user into a shared
control mechanismcontrol mechanism
extends Bolt’s (MIT 1989) “Put that there”extends Bolt’s (MIT 1989) “Put that there”
interaction scheme to true 3-Dinteraction scheme to true 3-D
unstructrured worldsunstructrured worlds
10. System requirementsSystem requirements
Shared controlShared control
Flexible human-machine interfaceFlexible human-machine interface
Semi-autonomous task-planningSemi-autonomous task-planning
Adaptability and reactivityAdaptability and reactivity
Robust perceptionRobust perception
15. Knowledge-driven plannerKnowledge-driven planner
Semi-autonomous plannerSemi-autonomous planner
Uses three knowledge-basesUses three knowledge-bases
Two knowledge-bases of objects defined inTwo knowledge-bases of objects defined in
abstraction hierarchyabstraction hierarchy
WorldBaseWorldBase
DomainBaseDomainBase
A knowledge-base of user extendible plansA knowledge-base of user extendible plans
PlanBasePlanBase
17. Knowledge representationKnowledge representation
Name and className and class
ShapeShape
Height, width and thicknessHeight, width and thickness
Location, pose and colorLocation, pose and color
ConstraintsConstraints
Plan fragmentsPlan fragments
Other attributesOther attributes
18. Multimodal interfaceMultimodal interface
Combined speech and gesture inputCombined speech and gesture input
Objects identified through gesture andObjects identified through gesture and
speechspeech
Parsed input string passed toParsed input string passed to
supervisorsupervisor
19. MotivationMotivation
Critical disambiguating functionCritical disambiguating function
Relaxing perceptual and processingRelaxing perceptual and processing
requirementsrequirements
Extending direct control metaphor to 3-Extending direct control metaphor to 3-
D domainsD domains
Simplification of processingSimplification of processing
21. Parsed outputParsed output
Generally two types of user instructionsGenerally two types of user instructions
Assigns a valueAssigns a value
Performs an actionPerforms an action
ExampleExample
That’s a strawThat’s a straw
Insert the straw into the cupInsert the straw into the cup
22. IllustrationIllustration
That’s a strawThat’s a straw
:assign::assign: that’sthat’s 2874 9702874 970
:value: :object: straw:value: :object: straw
Insert the straw into the cupInsert the straw into the cup
:action::action: insertinsert
:object: straw:object: straw
:direction: into:direction: into
:object: cup:object: cup
23. Supervisor’s interpreterSupervisor’s interpreter
LISP-like languageLISP-like language
Parses input string into a list of S-Parses input string into a list of S-
ExpressionsExpressions
Recursively evaluates the listRecursively evaluates the list
Responsible for speech-gestureResponsible for speech-gesture
combinationcombination
24. IllustrationIllustration
• Incorporating gestureIncorporating gesture
((:assign::assign: thatsthats nn nnnn nn
:value: :object: straw):value: :object: straw)
• Invoking the plannerInvoking the planner
((:action::action: insert :object: strawinsert :object: straw
:direction: into :object: cup):direction: into :object: cup)
25. The plannerThe planner
Uses user-extendible plan libraryUses user-extendible plan library
Exhibits shared controlExhibits shared control
Capable ofCapable of
Interacting with the userInteracting with the user
Supervised learningSupervised learning
ReactivityReactivity
Modifying/adapting old plansModifying/adapting old plans
26. Plan classificationPlan classification
Simple plansSimple plans
opengripper, home, etc.opengripper, home, etc.
Complex plansComplex plans
insert, pickup, rotate, etc.insert, pickup, rotate, etc.
User defined plansUser defined plans
feed, open-door, etc.feed, open-door, etc.
27. Plan structurePlan structure
Plan namePlan name
Plan typePlan type
Plan preconditionsPlan preconditions
Plan bodyPlan body
Plan goalsPlan goals
28. IllustrationIllustration
Plan namePlan name
(insert :object :location)(insert :object :location)
Plan typePlan type
(N)(N)
Plan preconditionsPlan preconditions
((robothomed)(notholding))((robothomed)(notholding))
Plan bodyPlan body
((grab :object) (moveto :location)((grab :object) (moveto :location)
(slowdrop) (opengripper) (ready))(slowdrop) (opengripper) (ready))
Plan goalPlan goal
(objectat :location)(objectat :location)
29. Plan executionPlan execution
Plan synthesisPlan synthesis
check for preconditions and constraintscheck for preconditions and constraints
build list of primitivesbuild list of primitives
Plan execution (for each primitive)Plan execution (for each primitive)
check for preconditions and constraintscheck for preconditions and constraints
execute primitive and test for successexecute primitive and test for success
check and test for user inputcheck and test for user input
30. Advanced planningAdvanced planning
Supervised learningSupervised learning
off-lineoff-line
on-lineon-line
Plan adaptationPlan adaptation
adapting old plan to new situationadapting old plan to new situation
adapting old plan to a new objectadapting old plan to a new object
33. Integrating HCI and AIIntegrating HCI and AI
Achieve accuracy and reliabilityAchieve accuracy and reliability
Faster and unconstrained controlFaster and unconstrained control
Easier control and minimized demandsEasier control and minimized demands
Overcome time-delaysOvercome time-delays
Failstop capabilityFailstop capability
Function as assistive robotFunction as assistive robot
34. ContributionContribution
A novel telemanipulation techniqueA novel telemanipulation technique
Operates in unstructured domainsOperates in unstructured domains
Overcomes limitations in A.I., vision andOvercomes limitations in A.I., vision and
roboticsrobotics
Is scaleable beyond the test domainIs scaleable beyond the test domain
Augments HCI with A.I. or vice versaAugments HCI with A.I. or vice versa
37. Completing the contextCompleting the context
Exploring NASA application for remoteExploring NASA application for remote
telemanipulation in conjunction with thetelemanipulation in conjunction with the
Bartol InstituteBartol Institute
Continuing clinical studies to use theContinuing clinical studies to use the
technology for people with disabilitiestechnology for people with disabilities
Extending to projects involving mobileExtending to projects involving mobile
robotsrobots
38. AcknowledgementsAcknowledgements
Rehabilitation Engineering ResearchRehabilitation Engineering Research
Center on Rehabilitation Robotics,Center on Rehabilitation Robotics,
National Institute for Disabilities andNational Institute for Disabilities and
Rehabilitation Research GrantRehabilitation Research Grant
#H133E30013 of the US Department of#H133E30013 of the US Department of
EducationEducation
Nemours Research ProgramsNemours Research Programs
Hinweis der Redaktion
This talk presents my doctorate research on integrating Human Computer Integration with planning for a telerobotic system.
I will first outline the the order of this presentation
Before launching into the core of my presentation I will first define some necessary terms.
I will then set the scene by defining the problem domain an providing background information
I will then present the actual core of the research to be followed by a discussion about the contributions provided by this
I will finally conclude by comparing this research to relevant works by other researches and touch on future work
Some definitions are in order
......
The key phrase to note here is unstructured. We are not only interested in telemanipulation under restricted physical control but in a domain that is NOT structured.
The reason physical control may be limited is as a result of a number of factors:
Examples include:
1. Remote exploration; robot arm on the space shuttle
2. Hazardous material manipulation; nuclear power plant
3. Assistive robot; the fruition of my own research
The mode of control effectively dictates whether telemanipulation is effective under these circumstances:
Researchers have generally looked at three different modes of control:
1. Direct,
2. Autonomous
3. Supervised
Each of these modes of control have their own drawbacks which I shall now elaborate upon.
The control method actually dictates how much of the task load is carried by the human and how much by the telerobot.
In direct control the user is in charge of all the motions of the robot and is therefore carrying all the task load. Direct control is only possible under unstructured environments when there is full sensory feedback and no delay. And even when this is possible, one has the physical and cognitive load to deal with.
Autonomous telerobots essentially replace the human user and carries the entire task load. However, a number of reasons preclude autonomous systems mostly stemming from current state-of-the-art in A.I., machine visions and robotics communities. Planning under all contingencies, Full natural language, General purpose object recognition
Problems such as these prevent us from having a practical and effective system of telemanipulation
Then we have supervised control systems are where the user and the telerobot is trading control. Some tasks are done by the system and some by the user and the trade-off is strictly delineated. This mechanism while solving some of the problems inherit some others.
However, if we have shared control where the line of control is not as rigorously delineated and the control is shared by the user and the system as need arises. In this control mechanism not only some of then task load is carried by the system, the system actually extends the carrying capacity.
Therefore some degree of human intervention is necessary
This is even more apparent if we consider the following points
Therefore the solution for operating a telemanipulator in an unstructured environment necessitates some mechanism that integrates the user into the control schema, thereby not only overcoming the inherent difficulties of direct control but also overcoming the limitations imposed by the current state-of-the-art in A.I., vision and robotics
The means of achieving this is the core of my dissertation
The requirements for such a system to be effective are:
is necessary? now?
This leads us to MUSIIC which represents the new telemanipulation technique. MUSIIC stands for - Multimodal User Supervised Interface and Intelligent Control
This is achieved through
First obviously, shared control
A multimodal human-machine interface
Object-oriented knowledge-driven planning
The three major components that brings about the system are
1. The vision system
2. The multimodal human-computer interface
3. The knowledge driven planner
A supervisor coordinates the different components and the human user is integral to the whole
Before going into the details, a video is in order at his point
To summarize;
The vision system while being integral to the system does not form a part of my research;
The other two components are the two integral components in achieving this new telemanipulation technique
The first major component is the knowledge-driven planner.
This is a semi-autonomous planner that interacts with the user. (These will be explained in details as we explore further)
The planner uses three knowledge bases to synthesize and execute plans:
2 are .....
Where WorldBase is ... DomainBase is
PlanBase is .....
One of the keys to achieving telemanipulation in an unstructured domain is the knowledge structure of the objects.
Objects are represented in a 4-tier abstraction hierarchy of increased specialization.
The 4 tiers are:
The top tier.... what is known is what is obtained from the vision system. As long as an object is located regardless of what it is manipulation from general principle is possible.
The second tier represents object classified in terms of shape. Since, grasping is essential to manipulation, shapes are essential to accurate telemanipulation
The third tier represents objects that may be found in the user domain. such as.... They inherit properties from the prev. class and may have further attributes that control and effect manipulation, such as orientation, grasp position
The fourth tier represents instantiations of objects in the domain. Such as a specific cup.... etc.....
(A video illustration will be useful here....)
What kind of information is contained in an object definition?
Height, width, thickness, pose and color determined from the vision system
Other’s are learned during operation or user supplied.
Constraints are ....
Plan fragments are ...
Other attributes may include weight, malleability etc..
We now next change our focus to the multimodal interface where:
speech and gesture inputs is the means of interactions as was evidenced by the video
Objects in the domain are identified by gestures (in the implementation we used pointing) but can be expanded to include more complex gestures
The user input string is parsed and then sent to the supervisor for interpretation (the process includes combining the speech with the gesture)
What advantages do we garner from having such a human-computer interfaces?
1. Focus of user intention is marked by gestures, obviating general purpose object recognition schema
2. This relaxes the perceptual and processing requirements of the system and hence makes it more practical
3. Extends the “DC” metaphor into three-D domains with all the inherent advantages
4. There is an overall simplification of processing
Let us look into the parsing process in more details:
The grammar is phrase-structured
The construct is of the general form VSO where V is ....
In order for speech to be combined with gesture deictics are time stamped. - This process needs to be described in a bit more detail
Parsed output-
User input generally takes two forms
1. Where the user assigns a value
2. Where an action needs to be performed
As examples let us consider the following two instructions from our demo video:
1. That’s a straw
Assigning the object straw in the WorldBase to the the deictic “thats”
2. Insert the straw into the cup
Performing the task insert in the straw into the cup
Here we illustrate the actual parsed output (the grammar detail is available)
The parsed string from the input is sent to the supervisor where further processing takes place. There is another interpreter at the supervisor end which is a LISP-lie language.
It parses the input string into a list of S-Expressions, and recursively evaluates the list:
During the recursive evaluation
1. Some internal procedures maybe invoked (such as that deals with speech and gesture combination)
2. The planner maybe invoked
As an illustration two examples are provide.
The first one deals with incorporating gesture information:
The second one involves invoking the planner:
The word :action: invokes the planner which then searches the plan library for the existence of the plan for insert.
The parameters for the plan. are then instantiated.
In this case the parameters involve, what is being inserted and to where....
As has been shown, the planner uses the plan-library of task plans. This plan library is user-extendible (we will show this later).
The planner exhibits the shared-control that is essential
The planner is also capable of
..
..
All of these features will be discussed in succeeding slides..
Plans may be
Simple: where there is a one 2 one correspondence between a task and a low level robot action. e.g..
Complex: where the task may be build up of more than one plans, complex or simple. examples include
Then you have user-defined plans which are taught during the operation of the system, such as ....
Plan structure:
The structure of the plans are very similar to STRIPS with some differences.
Plan name: the name of the task and parameters
Plan type: whether simple or complex
Plan preconditions: conditions that must be true prior to execution
Plan body: series of tasks whose successful execution implies the execution of this task
Plan goals: the primary goal to be achieved by executing this task
As an illustration, let us consider the following complex plan:
The planning process involves two steps, the plan synthesis process and the plan execution process
the plan synthesis process involve checking for preconditions and constraints: It is during this process much of the shared control is exhibited. Let me illustrate. The planner then recursively builds a list of primitive plans that comprise the top level plan.
The plan execution process then takes each task in the list built earlier and... .....
it again checks for preconditions and constraints
executes the primitive and tests for success
checks for user input for any modification
More advanced planning involves
1. Supervised learning where the user can teach the system both off-line and on-line some new task.
2. Plan adaptation - Where a plan which fails because of some constraint violation is modified to succeed.
3. Plan modification - Where a plan for doing something on some object can be modified to perform a similar task on a different object based on object properties.
All of these features will be illustrated by the video
How does this new technique for manipulation compare to other work?
There are numerous examples of research focusing on
the first three but there has not been any on the shared control as I have proposed.
Looking at it from the interface viewpoint we do have researchers who have looked at speech () and gesture () for robot control.
Multimodal control (incorporating speech with gesture) was first demonstrated by bolt in a 3-D domain at the media lab and then extended by Cannon to 3-D domains in robotics. However, Cannon presupposes a structured environment with complete a priori knowledge of objects and object recognition being the focus of his work.
So what advantages do we accrue from integrating HCI with AI in this domain?
1. of the machine without sacrificing the cognitive ability of the human user
2. control that is fast and unconstrained
3. control is easier and their is less load on the user
4. We can overcome time delays between the user and the remote device
5. Failstop capability
6. The telemanipulator can function as an assistive robot for a user with physical manipulation disability
Contribution of this thesis can be summarized thus:
The validation is the implementation and was shown in the different video clips.
No research is complete if it has not unearth more questions to answer:
Each of these areas are evoking and have evoked significant research efforts