Human-AI communication for human-human communication: Applying interpretable unsupervised anomaly detection to executive coaching
In this paper, we discuss the potential of applying unsupervised anomaly detection in constructing AI-based interactive systems that deal with highly contextual situations, i.e., human-human communication, in collaboration with domain experts. We reached this approach of utilizing unsupervised anomaly detection through our experience of developing a computational support tool for executive coaching, which taught us the importance of providing interpretable results so that expert coaches can take both the results and contexts into account. The key idea behind this approach is to leave room for expert coaches to unleash their open-ended interpretations, rather than simplifying the nature of social interactions to well-defined problems that are tractable by conventional supervised algorithms. In addition, we found that this approach can be extended to nurturing novice coaches; by prompting them to interpret the results from the system, it can provide the coaches with educational opportunities. Although the applicability of this approach should be validated in other domains, we believe that the idea of leveraging unsupervised anomaly detection to construct AI-based interactive systems would shed light on another direction of human-AI communication.
Scaling API-first – The story of a global engineering organization
Human-AI communication for human-human communication / CHAI Workshop @ IJCAI '22
1. Human-AI communication for human-human communication:
Applying interpretable unsupervised anomaly detection to executive coaching
(equal contribution)
CHAI Workshop @ IJCAI '22
July 24, 2022
Riku Arakawa†
Carnegie Mellon University, USA
Hiromu Yakura†
University of Tsukuba, Japan
2. Background: Deep-learning-based human behavior analysis
Advancement in human behavior
analysis techniques:
・Facial expression recognition [1]
・Posture estimation [2]
[1] I. Çugu, et al., 2017. MicroExpNet: An Extremely Small and Fast Model For Expression Recognition From Frontal Face Images. arXiv.
[2] S.-E. Wei, et al., 2016. Convolutional Pose Machines. IEEE CVPR.
It is expected that we can analyze
and support human communication
by applying these techniques.
2
3. Background: A tool for helping public speaking with feedback
[3] M. I. Tanveer, et al., 2015. A Real-Time In-Situ Intelligent Interface to Help People With Public Speaking. ACM IUI.
[4] I. Damian, et al., 2015. Measuring the impact of multimodal behavioural feedback loops on social interactions.. ACM ICMI.
Speech-feature-based feedback [3]
Show feedback such as “louder”
and “faster” on a Google Glass
based on speech speed or volume.
Posture-based feedback [4]
Alert a speaker when they cross
their arm for a long time
based on posture estimation.
4. Our perspective: Limitation of heuristic approach
Human-to-human communication is very contextual:
[5] J. Navarro and M. Karlins, 2008. What Every BODY Is Saying: An Ex-FBI Agent’s Guide to Speed Reading People. HarperCollins, New York.
[6] R Friedman and A. J. Elliot, 2008. The effect of arm crossing on persistence and performance. Europ. J. Soc. Psych.
Heuristic approach
Unsupervised approach
w/o rules or training data
4
Defensive attitude [5] Deeply thinking [6]
Thus, we need a new framework of human-AI communication:
Supervised approach w/
training data of numerous classes
5. Research object: Executive coaching
• It consists of one-on-one conversation, in
which coaches are required to observe the
nonverbal behavior of coachees [7].
• The importance of observing nonverbal
behavior is emphasized in terms of reading
the nuance of what the coachee said [8].
But, notifying the detection of specific postures (e.g., crossing arms)
or emotions (e.g., confusing) without context was not appreciated.
[7] E. Cox, et al., 2009. The Complete Handbook of Coaching. SAGE Publications, Los Angeles.
[8] D. B. Drake, 2009. Narrative coaching. In The Complete Hand- book of Coaching. SAGE Publications, Los Angeles. 5
We hypothesized that AI can help novice coaches in the observation process.
6. Key idea: Separating observation and judgement
Coaches ignored the outputs
once the outputs contradicted
their observation or intuition.
They found it difficult to rely on
outputs based on simplified classes
that are indifferent to subtle context.
Human
Pros: Good at understanding context
Cons: Difficult to keep stable perspective
due to their skills or mental load
AIs
Pros: Stable performance
Cons: Not good at dealing with context
Separation of observation and
judgment would be an alternative
way of human-AI communication.
This guided us to reframe the way of
human-AI communication:
6
7. REsCUE: Real-time feedback using anomaly detection
1. Extract posture and gaze
information of the coachee.
2. Calculate outlierness score using
anomaly detection algorithm.
3. Notify the coach in real-time with
an interpretive visualization.
We developed a supporting system that observes
the nonverbal behavior of coachees using unsupervised anomaly detection.
It detects informative cues of the behavior and notifies the coach in real-time.
Detailed workflow
7
8. • The GMM gradually adapts to newly obtained nonverbal behavior data.
• When the trend of the input data suddenly changes,
it is detected by the spike of negative log-likelihood.
REsCUE: How anomaly detection algorithm works
[61] Kenji Yamanishi, et al. 2004. On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms. Data Mining and Knowledge Discovery.
We use an algorithm based on a time-adaptive gaussian mixture model [9].
Time series behavior data of
the coachee taken from webcam:
The parameters of
GMM (e.g., mean and cov)
are updated with
a forgetting rate r.
9. REsCUE: Visualization based on GMM
The GMM allows us to provide interpretative visualization.
In GMM, each component fits
the past representative states.
Most anomalous frames can be
specified by sorting with the likelihoods.
Just by arranging these frames, the coach can compare them
and understand the change easily even during the session.
9
10. REsCIE: Detection results
10
These behaviors were detected without
any rules or heuristics and regarded as
informative by professional coaches.
The algorithm sometimes detected
apparent behavioral changes.
(e.g., taking a personal organizer out of a bag)
The visualization allows the coach to
interpret why the scene is detected,
which avoids destroying their trust.
Now, REsCUE is practically deployed
as a supporting system.
11. Lens of Parasuraman’s framework of automation
11
The design of our approach can be explained using Parasuraman's framework.
Information
acquisition
10: the computer decides everything,
acts autonomously, ignoring the human
1: the computer offers no assistance;
human must take all decisions and actions
Information
analysis
Decision & action
selection
Action
implementation
Realm of automation
human performance
automation reliability
cost of consequences
Trade-off between
12. Lens of Parasuraman’s framework of automation
12
The design of our approach can be explained using Parasuraman's framework.
Information
acquisition
Information
analysis
Decision & action
selection
Action
implementation
Realm of automation
10: the computer decides everything,
acts autonomously, ignoring the human
1: the computer offers no assistance;
human must take all decisions and actions
human performance
automation reliability
cost of consequences
Trade-off between
13. Lens of Parasuraman’s framework of automation
13
The design of our approach can be explained using Parasuraman's framework.
Information
acquisition
Information
analysis
Decision & action
selection
Action
implementation
Realm of automation
Low human performance:
• Dependency on the skills
or mental load
High automation reliability:
• No dependency on
heuristics or training data
Low cost of consequence:
• Interpretable visualization to
discern uninformative cues
This characteristic plot
of our approach came from ...
observation
14. Lens of Parasuraman’s framework of automation
14
The design of our approach can be explained using Parasuraman's framework.
Information
acquisition
Information
analysis
Decision & action
selection
Action
implementation
Realm of automation
High human performance:
• Good at dealing with context
Low automation reliability:
• Automatic interpretation can
be insensitive to subtle context
High cost of consequence:
• Risk of asking irrelevant questions
that disturbs the session
This characteristic plot
of our approach came from ...
interpretation
15. Application: Supporting skill transfer
The informativeness of the detected cues depends on the coach's skill:
15
Skillful coach gains information
from trifling behaviors.
Novice coach often disregards
such behaviors.
The difference in how each coach interprets the cues
reveals the difference in their skills.
This can be utilized for skill transfer of coaches by helping novice coaches to
learn how skillful coaches gain information from various behaviors.
16. Application: Supporting skill transfer
16
Annotation phase:
They classify whether each
detected cues is informative or not.
Skillful coach
Novice coach
Discussion phase:
Through the discussion about the discrepancies,
the novice coach can learn the way of interpretation.
The transparency of the results and the design of
allowing open-ended interpretation enable this tool.
17. Conclusion & On-going work
• We introduced a new framework of human-AI communication that is based on
the unsupervised anomaly detection algorithm.
• Its design of separating observation and interpretation enables human-AI
collaboration in highly contextual situations, such as executive coaching.
• Its interpretable visualization enabled by GMM provides transparency in
its detection results, which helps maintain trust with humans.
We remark that REsCUE does not require any prior
knowledge or rules and can be used in various domains.
Now, we are working on applying this
to analyzing sales communication
17
Read our
paper!