SlideShare ist ein Scribd-Unternehmen logo
1 von 83
1
1
How to build someone
we can talk to
Jonathan Mugan
@jmugan
Data Day Texas
November 17, 2022
2
2
Steve Jurvetson from Los Altos, USA, CC BY 2.0 <https://creativecommons.org/licenses/by/2.0>, via Wikimedia
https://commons.wikimedia.org/wiki/File:Rethink_Robotics_%E2%80%94_Brooks_and_Baxter_%288000143255%29.jpg
So many
have left
us
Many intelligent
robots have
come and gone,
failing to become
a commercial
success
3
3
Maurizio Pesce from Milan, Italia, CC BY 2.0
<https://creativecommons.org/licenses/by/2.0>, via Wikimedia Commons
Romo is
gone
4
4
Cynthiabreazeal, CC BY-SA 4.0
<https://creativecommons.org/licenses/by-sa/4.0>, via
Wikimedia Commons
Jibo never
seemed to
get off the
ground
5
5
Sony photographer, CC0, via Wikimedia Commons
Mighty
Sony
couldn’t
make Aibo
stay
6
6
Gregory Varnum, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons
These robots didn’t reach
their potential because you
can’t talk to them, not really.
Even
Alexa is
reducing
staff
https://arstechnica.com/gadgets/2022/11/amazon-alexa-is-a-colossal-failure-on-pace-to-lose-10-billion-this-year/
7
7
Data Day 2017
Talked about how language
evolved and the current
state-of-the-art for chatbots
and how they won’t work
without understanding
Went into detail about how
neural networks can
generate text
Data Day 2018
Talked about two paths to
getting understanding
Data Day 2019
We got
distracted and
did MLOps with
TensorFlow
Extended
Data Day 2020
Data Day 2021
Data Day 2022
meaning
symbolic
subsymbolic
Covid
8
8
Conversations are interactive
• but they are also hyperlocal
and hyper-specific to the
current context
• context entails the current
situation and conversation
history
Conversation is a
unique medium
9
9
Conversation is a
unique medium
When you have a
conversation with someone,
you direct their imagination
and they yours
10
10
Conversation is a
unique medium
We don’t have any other
medium that has those
properties
• movies, tv, videos, don’t
• The best conversations teach
• Conversation is one of the
hardest things we do as
humans—why they say
people need to socialize to
keep their thinking sharp
11
11
Why a
conversation
partner needs to be
“somebody”
Reliability: If it represents
the world the way that we do
less likely to make a
catastrophic mistake
(since its knowledge structure
is built on our representation)
Trust: need the mistakes it
does make to make sense
to us
12
12
Why a
conversation
partner needs to be
“somebody”
Fun: much more interesting to
talk to somebody.
Consider the humor
• Household robot: should
have some goal obscure to
the owner, like tapping on
the walls. “What the hell is it
doing?”
• NPC: “You’re playing a
video game, right? Can you
get me out of here?”
13
13
Why a
conversation
partner needs to be
“somebody”
Truth: most important
• Truth comes from
interaction with the real
world
• A person seeks to bend
the world to their goals
• A person strives for
things
• ChatGPT: you can kind-
of have a conversation
with it, but it isn’t there
14
14
Somebody
home requires
State
• The world around it
Transitions (actions)
• How the world changes
Goals
• Transitions enable one to talk
about goals
• Why would it talk at all?
• Thinking beyond natural-
language interface
15
15
Somebody
home requires
• Back to Kant and before
• He calls it Judgment
• https://plato.stanford.e
du/entries/kant-
judgment/
Inference
• Inference is mapping a
continuous situation to
some space of values
• Keeps us from being brittle
• See Eiffel tower, in Paris
16
16
Somebody
home requires
Inference
• Inference is mapping a
continuous situation to
some space of values
• Keeps us from being brittle
• See Eiffel tower, in Paris
“The Beginning of Infinity,” by David
Deutsch
• “People” are entities that can create
explanations.
• When we see stars, we don’t see
what they are, we just see points of
light.
17
17
Inference has a precise formulation
𝑃(ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠|𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛) =
𝑃 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑃(ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠)
𝑃(𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛)
𝑃 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 ∝ 𝑃 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑃(ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠)
𝑃 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 is your model of the world
Hypothesis: is an explanation in the David Deutsch sense
“A big burning ball of hydrogen and helium burning millions
of miles away might look like that”
it’s the Bayesian equation
Or more simply
18
18
Inference has a precise formulation
𝑃(ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠|𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛) =
𝑃 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑃(ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠)
𝑃(𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛)
𝑃 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 ∝ 𝑃 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑃(ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠)
it’s the Bayesian equation
Or more simply
𝑃 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 must be composable and you
have to come up with it on the fly
19
19
Words
Mental
Scene
Possibilitie
s
“The table is
on the table.”
• If you want to pick up top table,
you must first walk to the table
• If you push the bottom table, the
top table will fall
• Party guests will think this is
weird looking.
Inference leads to “meaning”
Meaning
(inner world)
Reminiscent of Robert Brandom
through Richard Evans
https://www.doc.ic.ac.uk/~re14/Evan
s-R-2020-PhD-Thesis.pdf
20
20
Words
Mental
Scene
Possibilitie
s
Inference leads to “meaning”
Meaning
(inner world)
“The table is
on the table.” What?
21
21
Words
Mental
Scene
Possibilitie
s
Inference leads to “meaning”
Meaning
(inner world)
Two ways to not understand in a conversation:
1. Wrong mental scene
2. Not knowing the possibilities
22
22
Stimuli
Mental
Scene
Possibilitie
s
Inference leads to “meaning”
Meaning
(inner world)
𝑃 𝑚𝑒𝑛𝑡𝑎𝑙 𝑠𝑐𝑒𝑛𝑒 𝑠𝑡𝑖𝑚𝑢𝑙𝑖)
∝ 𝑃 𝑠𝑡𝑖𝑚𝑢𝑙𝑖 𝑚𝑒𝑛𝑡𝑎𝑙 𝑠𝑐𝑒𝑛𝑒 𝑃(𝑚𝑒𝑛𝑡𝑎𝑙 𝑠𝑐𝑒𝑛𝑒) 𝑃 𝑛𝑒𝑥𝑡 𝑠𝑡𝑎𝑡𝑒 𝑚𝑒𝑛𝑡𝑎𝑙 𝑠𝑐𝑒𝑛𝑒)
Transition model
𝑃(𝑚𝑒𝑛𝑡𝑎𝑙 𝑠𝑐𝑒𝑛𝑒)
State prior
• These two models must be composable.
• They must be dynamic and can’t be precomputed, as we will see.
𝑃 𝑠𝑡𝑖𝑚𝑢𝑙𝑖 𝑚𝑒𝑛𝑡𝑎𝑙 𝑠𝑐𝑒𝑛𝑒
State model
23
23
Inference for conversation requires dynamic composition
Coordination
of meaning
Coordination
of inner worlds
Instruction
Acknowledgement
Acknowledgement
Break
Break
Modified from
Gärdenfors (2014),
which was based on
Winter (1998)
• Levels of discourse
• Complicated to go
up and down the pyramid
24
24
A fishing pole
is a stick, string
and hook
You can catch fish
with a fishing pole
Get me some fish
Acknowledgement
Break
Break
Acknowledgement
Modified from
Gärdenfors (2014),
which was based on
Winter (1998)
• Levels of discourse
• Complicated to go
up and down the pyramid
Inference for conversation requires dynamic composition
25
25
Inference is over more than words: conversation has its own rules
(pragmatics)
• Conversational maxims: Grice (1975, 1978)
• Breaking these rules is a way to communicate more than the meaning of the words.
Maxim of Quantity: Say only what is not implied.
Yes: “Bring me the table.”
No: “Bring me the table by transporting it to my location.”
What did she mean by that?
Maxim of Quality: Say only things that are true.
Yes: “I hate carrying tables.”
No: “I love carrying tables, especially when they are covered in fire ants.”
She must be being sarcastic.
Maxim of Relevance: Say only things that matter.
Yes: “Bring me the table.”
No: “Bring me the table and birds sing.”
What did she mean by that?
Maxim of Manner: Speak in a way that can be understood.
Yes: “Bring me the table.”
No: “Use personal physical force to levitate the table and transport it to me.”
What did she mean by that?
26
26
When I saw this,
my first thought
was, “Where do
people enter?”
Words are
only hints at
possible
meanings
27
27
Our inference models need to compose for
maximum coverage
Foundational Metaphors (See Mark Johnson and others. Steven Pinker talks about two main ones)
• Force
• An offer or a person can be attractive
• A broken air conditioner can force you to move a meeting
• Location in space
• AI has come a long way in the last 15 years
Conceptual Blending (The Way We Think by Gilles Fauconnier and Mark Turner)
• That running back is a truck
• Dall-e 2 could generate a good picture of this, but it couldn’t imagine what it is like to
tackle a vehicle
Analogies (Melanie Mitchell and Douglas Hofstadter)
• We can broadly apply the story of sour grapes (Hofstadter in Surfaces and Essences)
Multiplicative: Most general sense, a bird that can drive a car
28
28
Prompt:
“Darth Vader on the cover of
Vogue magazine”
OpenAI Dall-e 2
• Trained on combinations of
images and text; CLIP and
diffusion models
• Can create images from
whatever you type
https://openai.com/dall-e-2/
Thanks to you @hardmaru for
the images!
29
29
Outline
• Conversation makes compelling robots and what conversation
requires
• The symbolic path of autonomous simulation
• The sub-symbolic path of coaxing neural networks
30
30
Outline
• Conversation makes compelling robots and what conversation
requires
• The symbolic path of autonomous simulation
• The sub-symbolic path of coaxing neural networks
31
31
Robots can use simulation for inference and
meaning
• One way to build supple
intelligence is to have a
robot build a simulation of
its environment.
• The robot simulates what
you say in something like
Unity.
A table on
a table
See “Computers could understand natural language using simulated physics” from 2017 for additional details.
https://chatbotslife.com/computers-could-understand-natural-language-using-simulated-physics-26e9706013da
32
32
Robots can use simulation for inference and
meaning
• Simulation enables robust
inference because it
provides an unbroken
description of the dynamics
of the environment, even if it
is not complete in full detail.
• This unbroken description is
a grounding.
A table on
a table
See “Computers could understand natural language using simulated physics” from 2017 for additional details.
https://chatbotslife.com/computers-could-understand-natural-language-using-simulated-physics-26e9706013da
33
33
Robots can use simulation for inference and
meaning
What will happen if I push the
bottom table?
A table on
a table
See “Computers could understand natural language using simulated physics” from 2017 for additional details.
https://chatbotslife.com/computers-could-understand-natural-language-using-simulated-physics-26e9706013da
Well, mentally push it and see.
34
34
Robots can use simulation for inference and
meaning
Classic AI question, can birds fly?
∀𝑥 𝑏𝑖𝑟𝑑 𝑥 → 𝑐𝑎𝑛_𝑓𝑙𝑦(𝑥)
Okay, okay
∀𝑥 𝑏𝑖𝑟𝑑 𝑥 ∧ 𝑓𝑙𝑦𝑖𝑛𝑔_𝑏𝑖𝑟𝑑(𝑥) → 𝑐𝑎𝑛_𝑓𝑙𝑦 𝑥
WKRP "As God as my witness,
I thought turkeys could fly”
https://www.youtube.com/watch?v=lf3mgmEdfwg&ab_channel=EpicHouston
∀𝑥 𝑏𝑖𝑟𝑑 𝑥 ∧ 𝑓𝑙𝑦𝑖𝑛𝑔_𝑏𝑖𝑟𝑑(𝑥)
∧ 𝑛𝑜𝑡 𝑏𝑟𝑜𝑘𝑒𝑛_𝑤𝑖𝑛𝑔(𝑥) → 𝑐𝑎𝑛_𝑓𝑙𝑦 𝑥
Okay, okay
Okay, okay, what if it is covered
in maple syrup?
35
35
Words
Mental
Scene
Possibilitie
s
“The table is
on the table.”
• If you want to pick up top table,
you must first walk to the table
• If you push the bottom table, the
top table will fall
• Party guests will think this is
weird looking.
The AI must build its own simulations
Meaning
(inner world)
36
36
How to Build an AI that Imagines through
Simulation
Gradually
working
our way
to real
machine
learning
manually build it → manual configuration → machine learning
Build it in two steps
1. Design hypothesis space
2. Human-like machine learning
Step 1 Step 2
We don’t have to put as much information in as if we were using logic because
the simulations will automatically combine things, but we still have a challenging
problem ahead of us. How do we do it?
37
37
By Sewaqu - Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=11967659
𝑦 = 𝑚𝑥 + 𝑏
Simplest
hypothesis space,
the space of all
real values of x
and b
The hypothesis space is the foundation of
machine learning
38
38
Recall from 2015
Neural networks are
just millions of linear
regressions with little
nonlinear functions
thrown in between the
layers to keep it from
squashing down.
The hypothesis space is the foundation of
machine learning
39
39
But it doesn’t have to be simple like that
Structured hypothesis spaces
• Space of Bayesian networks
• Space of probabilistic programs
• Space of antenna shapes
• Space of relational database schemas
• Space of DNA sequences
Once you set up the space,
then you can search in that
space
• Genetic algorithms
• Hillclimbing
• Backpropagation (like in
neural networks, if the
space is continuous)
40
40
Interlude: Best
book if you are
interested in the
fundaments of
machine learning
You can read it for free!
Tom Mitchell,
“Machine Learning”
http://www.cs.cmu.edu/~tom/mlbook.html
41
41
So, we got to set up a particularly complicated
hypothesis space
Domain, Classes, Relations https://web.stanford.edu/~jurafsky/slp3/21.pdf
Frames: frames, slots, classes, instances, types, and values
• My view: you can’t get very far with formal representation and reasoning.
• Even simple problems are hard.
• Yale shooting problem https://en.wikipedia.org/wiki/Yale_shooting_problem
• Humans don’t do it that way. We have to be taught How the Mind Works, Steven Pinker
So, it has to be some kind of embodied (problem specific) representation, but it needs to be structured
enough that you can add to it without having to write Python code.
manually build it → manual configuration → machine learning
Step 1
42
42
Building the hypothesis space: Manual
Configuration
Embodied construction grammar https://github.com/icsi-berkeley
Protégé is another familiar
example
https://protege.stanford.edu/
43
43
Build for composability
Foundational Metaphors and Analogies
• Build it so code reuses code with “duck” typing
• Evolution progresses by finding new uses for existing structures
Conceptual Blending
• Focus on properties; create new objects using unions of
properties.
Multiplicative
• Build it so code reuses code with “duck” typing
44
44
2. Use the hypothesis space for machine
learning
manually build it → manual configuration → machine learning
Step 2
With the right hypothesis space, we can do real machine learning. We don’t need
special algorithms; basic rule learning and decision-tree type stuff will be enough.
If A and B were there when great thing C happened, A and B are good and A,B C
Real (human-like) machine learning means you have all the pieces, you just have to label
some subsets.
45
45
2. Use the hypothesis space for machine
learning
manually build it → manual configuration → machine learning
Step 2
Different from the standard paradigm:
while not converged
run training examples through and update parameters based on errors
Real (human-like) machine learning means you have all the pieces, you just have to label
some subsets.
Less of a pushing in continuous space and more of labeling groups.
46
46
2. Use the hypothesis space for machine
learning
manually build it → manual configuration → machine learning
Step 2
Real (human-like) machine learning also means you have all the pieces those pieces
should causally fit together
If A and B were there when great thing C happened, A and B are good and A,B C
Based on other associations between A and B and C there should be a causal
mechanism that ties them together
“Oh, that makes sense because A  M  N and B  Q and, of course and
everyone knows that N&GC”
47
47
Words
Mental
Scene
Possibilitie
s
How to build someone to talk to using the
symbolic method
Meaning
(inner world)
Build autonomous simulation through
manually build it → manual configuration → machine learning
You don’t build an AI that can understand stories. You build an AI that can *build* stories. Then, it
can “understand” stories by constructing a sequence of events that meets the constraints of the
story.
48
48
Outline
• Conversation makes compelling robots and what conversation
requires
• The symbolic path of autonomous simulation
• The sub-symbolic path of coaxing neural networks
49
49
ChatGPT is
amazing
50
50
ChatGPT is
amazing
51
51
ChatGPT is
amazing but
it has no
sense of
truth
52
52
Encoding sentence meaning into a vector
h0
The
“The patient fell.”
Using a recurrent neural network (RNN).
53
53
Encoding sentence meaning into a vector
h0
The
h1
patient
“The patient fell.”
Using a recurrent neural network (RNN).
54
54
Encoding sentence meaning into a vector
h0
The
h1
patient
h2
fell
“The patient fell.”
Using a recurrent neural network (RNN).
55
55
Encoding sentence meaning into a vector
RNN is like a hidden Markov model but doesn’t make the
Markov assumption and benefits from a vector representation.
h0
The
h1
patient
h2
fell
h3
.
“The patient fell.”
56
56
Decoding sentence meaning
El
h3
Machine translation.
57
57
Decoding sentence meaning
El
h3 h4
Machine translation.
58
58
Decoding sentence meaning
El
h3
paciente
h4
Machine translation.
59
59
Decoding sentence meaning
Machine translation.
El
h3
paciente
h4
se
h5
cayó
h6
[Cho et al., 2014]
• It keeps generating until it generates a stop symbol.
• It is using a kind of interpolation from a huge set of training data.
60
60
Attention [Bahdanau et al., 2014]
El
h3
paciente
h4
se
h5
cayó
h6
h0
The
h1
patient
h2
fell
h3
.
61
61
Transformers:
Attention is all you need
https://arxiv.org/abs/1706.03762
62
62
You can’t have truth without world interaction
Is this stove hot? Ouch!
Language is a representation medium
for the world—it isn't the world itself.
• When we talk, we only say what
can't be inferred because we
assume the listener has a basic
understanding of the dynamics of the
world (e.g., if I push a table the
things on it will also move)
The inferences are not
about the world
Although ChatGPT could interact with a virtual machine
https://www.engraved.blog/building-a-virtual-machine-inside/
We need to build an AI that knows what a toddler
knows before we can build one that understands
Wikipedia.
63
63
Robots can watch YouTube and learn to
imitate, analogous to ChatGPT
Multimodal, language and object
manipulation
The trick is the tokenization of events in video, but Google
has made some good progress
Robotics Transformer 1 (RT-1)
• Transformer model trained by copying demonstrations
• Predict the next most likely action based on what it has learned from the demonstrations
https://blog.google/technology/ai/helping-robots-learn-from-each-other/
https://ai.googleblog.com/2022/12/rt-1-robotics-transformer-for-real.html
https://robotics-transformer.github.io/assets/rt1.pdf
Imitation Learning from Video with
Transformers
64
64
Imitation Learning from Video with
Transformers
Image used with permission.
Thanks Keerthana Gopalakrishnan!
@keerthanpg
The trick is the tokenization of events in video, but Google
has made some good progress
Robotics Transformer 1 (RT-1)
• Transformer model trained by copying demonstrations
• Predict the next most likely action based on what it has learned from the demonstrations
https://blog.google/technology/ai/helping-robots-learn-from-each-other/
https://ai.googleblog.com/2022/12/rt-1-robotics-transformer-for-real.html
https://robotics-transformer.github.io/assets/rt1.pdf
65
65
Recall the
report card for
transformer
trained on text
66
66
Imitation Learning from Video with
Transformers
Using robots in the real or simulated
world provides State.
Goals are weak since it is imitation
learning.
We have transitions/actions that
happen in the real world.
Also limited due to imitation learning
67
67
Reinforcement learning trained in simulation
Instead of building a system that can learn to build its own
simulations, we train robots in simulated worlds that we build
Microsoft Flight Simulator
https://www.flightsimulator.com/
AI2Thor by Allen AI
https://ai2thor.allenai.org/
68
68
Reinforcement learning trained in simulation
Instead of building a system that can learn to build its own
simulations, we train robots in simulated worlds that we build
There has been some exciting progress in the area from DeepMind and others
https://www.deepmind.com/blog/from-motor-control-to-embodied-intelligence Soccer!
Decision transformers
https://huggingface.co/blog/decision-transformers
https://deumbra.com/2022/08/rllib-for-deep-hierarchical-
multiagent-reinforcement-learning/
69
69
Imitation Learning from Video with
Transformers
Using robots in the real or simulated
world provides State.
Goals are weak since it is imitation
learning.
We have transitions/actions that
happen in the real world.
Also limited due to imitation learning
70
70
Using robots in the real or simulated
world provides State.
Goals are more real with
reinforcement learning
We have transitions/actions that
happen in the real world.
This will be the challenge to
generalize
Reinforcement learning trained in in simulation
71
71
Words
Mental
Scene
Possibilitie
s
How to build someone to talk to using the
subsymbolic method
Meaning
(inner world)
Have it watch
videos with
dialog
Give it reward for getting to good
states in a simulation of our
world, where actions include
words
72
72
Conclusion
When we build an AI with somebody home, it will have goals.
It will talk with us to achieve those goals.
It may not be familiar and chatty like ChatGPT.
It will be a little alien and sometimes be hard to understand.
But it will be more reliable because it understands the conversation, and it
will be more trustworthy because the mistakes it makes will make sense to
us.
It will be somebody worth talking to!
73
73
Bonus slides
74
74
Time flies like an arrow
“Subconscious” neural network tokens
“Conscious” symbolic tokens
The “subconscious” tokens from neural networks could fill in the gaps in the symbolic.
The symbolic could then “imagine” creating a simulated world for reinforcement learning.
?
Learn these
associations
75
75
Human-Like Machine Learning
This is the trick. Two solutions
1. New neural-network architectures
2. Human-like symbolic learning on top
76
76
Prompt:
“Photograph of two robot cats
going on a date in Manhattan
at night”
• Trained on combinations of
images and text; CLIP and
diffusion models
• Can create images from
whatever you type
https://openai.com/dall-e-2/
Thanks to you @hardmaru for
the images!
OpenAI Dall-e 2
77
77
Prompt:
“Photograph of Apes
attending the World
Economic Forum in Davos”
OpenAI Dall-e 2
• Trained on combinations of
images and text; CLIP and
diffusion models
• Can create images from
whatever you type
https://openai.com/dall-e-2/
Thanks to you @hardmaru for
the images!
78
78
Using language models to guide robot action
Have a single-armed mobile robot in a kitchen setting.
Human: I spilled my drink, can you help?
Robot: tries the text associated of each action and sees which one most likely in the language model.
Action 1: get sponge
Action 2: get vacuum
Since sponge has a higher likelihood in the language model, the sponge is the best action.
Language Model
I spilled my drink, can you help? I’ll get a sponge. score = .062
I spilled my drink, can you help? I’ll get a vacuum. score = .013
Do As I Can, Not As I Say:
Grounding Language in Robotic
Affordances
https://say-can.github.io/
Robotics at Google and Everyday Robots
They are extracting world knowledge from language, very cool,
but still not enough, as we will see.
79
79
DeepMind Flamingo: Conversations about pictures
Foundational Language Model
Foundational Vision Encoder
Other trained models
to connect them
together
Conversations about images
Image of
weird
cloth monster
in soup
Example from DeepMind
https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model
Human: What is in this picture
Flamingo: It’s a bowl of soup with a monster face on it
Human: What is the monster made out of?
Flamingo: It is made out of vegetables
Human: No, it’s made out of a kind of fabric, can you see what kind?
Flamingo: It’s made out of a woolen fabric.
80
80
AI needs the experience children get
Foundational models learn from text, but too many basic things aren’t written
down, because everyone knows them
Some of you may recall the romance novel I’m writing:
Words are only hints at possible meanings; to understand those hints
we need experience in addition to text, even videos may be insufficient.
He pushed the table between them aside to embrace her, and
all the objects on the table moved as well, and the liquid
within the glasses spilled, and some things fell to the ground,
but because it was carpet it wasn’t as loud as it would be if it
was a hard floor, and the table had friction with the floor, so
it didn’t fly through the window and into the street, and …
81
81
while alive:
if there is a breakdown:
1. the cortex populates mental state
2. internal attention says where to focus
3. non-interchangeable value function
says what is good
else:
continue in auto zombie mode
if expected state 𝑠𝑒 ≠ 𝑠
receive state 𝑠
𝑇 𝑠, 𝑎 → 𝑠′
Attention(𝑠′) → 𝑠
𝑉(𝑠) s.t. 𝑉 𝑠1 + 𝑉(𝑠2) ≠ 𝑉(𝑠1 + 𝑠2)
𝜋 𝑠 → 𝑎
A pseudocode of consciousness
and choose policy 𝜋
According to this organization of ideas, if we can avoid building a non-interchangeable value function in a
computer we can keep it from being conscious. Will save us a lot of trouble.
An idea that makes me chuckle: What if by some fluke virus, printers are conscious? Would
explain why they are so recalcitrant.
Reinforcement learning notation
82
82
https://www.flickr.com/photos/obamawhitehouse/4921383047/
Can AI tell us why this picture is funny?
Karpathy, 2012
http://karpathy.github.io/2012/10/22/state-of-
computer-vision/
Yannic Kilcher discusses how Flamingo
gets close
https://www.youtube.com/watch?v=smUHQndcmO
Y&t=152s&ab_channel=YannicKilcher
• You can’t ask why it is funny, but you
can ask “Where is Obama’s foot
positioned?”
Tweet by Florin Bulgarov
https://twitter.com/florinzf/status/152263
3003511517187/photo/1
Volleyball locker room at UT Austin.
83
83
CC BY-SA 2.5,
https://en.wikipedia.org/w/index.php?curid=9399198
Robotic lab

Weitere ähnliche Inhalte

Ähnlich wie How to build someone we can talk to

Image is Everything: ipadpalooza16
Image is Everything: ipadpalooza16Image is Everything: ipadpalooza16
Image is Everything: ipadpalooza16Amy Burvall
 
3 d web round table 2 (10 feb 2013)
3 d web round table 2 (10 feb 2013)3 d web round table 2 (10 feb 2013)
3 d web round table 2 (10 feb 2013)David Fliesen
 
Open Your Mind, Open Your Library (Slides): Texas Library Association 2016
Open Your Mind, Open Your Library (Slides): Texas Library Association 2016Open Your Mind, Open Your Library (Slides): Texas Library Association 2016
Open Your Mind, Open Your Library (Slides): Texas Library Association 2016M.J. D'Elia
 
Stupidity spawns creativity
Stupidity spawns creativityStupidity spawns creativity
Stupidity spawns creativityAli Anani, PhD
 
UX 101: The secrets of good (web & mobile) design
UX 101: The secrets of good (web & mobile) designUX 101: The secrets of good (web & mobile) design
UX 101: The secrets of good (web & mobile) designMary Lan
 
Week1 Interactivity
Week1 InteractivityWeek1 Interactivity
Week1 InteractivityCMoz
 
Normal Considered Harmful
Normal Considered HarmfulNormal Considered Harmful
Normal Considered Harmfulgreenwop
 
Gordon brown brain juice - workshop 1
Gordon brown   brain juice - workshop 1Gordon brown   brain juice - workshop 1
Gordon brown brain juice - workshop 1Anna-Marie Taylor
 
From Natural Language Processing to Artificial Intelligence
From Natural Language Processing to Artificial IntelligenceFrom Natural Language Processing to Artificial Intelligence
From Natural Language Processing to Artificial IntelligenceJonathan Mugan
 
Data Day Seattle, Chatbots from First Principles
Data Day Seattle, Chatbots from First PrinciplesData Day Seattle, Chatbots from First Principles
Data Day Seattle, Chatbots from First PrinciplesJonathan Mugan
 
03012019 cia secrets_tocreativeproblemsolving
03012019 cia secrets_tocreativeproblemsolving03012019 cia secrets_tocreativeproblemsolving
03012019 cia secrets_tocreativeproblemsolvinghttps://www.cia.gov.com
 
03012019 cia secrets_tocreativeproblemsolving
03012019 cia secrets_tocreativeproblemsolving03012019 cia secrets_tocreativeproblemsolving
03012019 cia secrets_tocreativeproblemsolvinghttps://www.cia.gov.com
 
Visual Rhetoric Feb 3: Early Show
Visual Rhetoric Feb 3: Early ShowVisual Rhetoric Feb 3: Early Show
Visual Rhetoric Feb 3: Early ShowMiami University
 
Storytelling January 6 2010
Storytelling January 6 2010Storytelling January 6 2010
Storytelling January 6 2010Hanson Hosein
 
Webstock 2010 - Stack Overflow: Building Social Software for the Anti-Social
Webstock 2010 - Stack Overflow: Building Social Software for the Anti-SocialWebstock 2010 - Stack Overflow: Building Social Software for the Anti-Social
Webstock 2010 - Stack Overflow: Building Social Software for the Anti-Socialcodinghorror
 
GCurtis_IASummit_poster
GCurtis_IASummit_posterGCurtis_IASummit_poster
GCurtis_IASummit_posterGayle Curtis
 
Perrone crafting outstandingpresentationsstorytelling
Perrone crafting outstandingpresentationsstorytellingPerrone crafting outstandingpresentationsstorytelling
Perrone crafting outstandingpresentationsstorytellingTihamer
 
The Future of Marketing: Make Things People Want or Make People Want Things?
The Future of Marketing: Make Things People Want or Make People Want Things?The Future of Marketing: Make Things People Want or Make People Want Things?
The Future of Marketing: Make Things People Want or Make People Want Things?John V Willshire
 
The man being carried does not realize how far away the town really is.
The man being carried does not realize how far away the town really is.The man being carried does not realize how far away the town really is.
The man being carried does not realize how far away the town really is.Rhea Myers
 
Steps In Research Paper Writing. Steps Of Writing A Research Proposa
Steps In Research Paper Writing. Steps Of Writing A Research ProposaSteps In Research Paper Writing. Steps Of Writing A Research Proposa
Steps In Research Paper Writing. Steps Of Writing A Research ProposaStacy Taylor
 

Ähnlich wie How to build someone we can talk to (20)

Image is Everything: ipadpalooza16
Image is Everything: ipadpalooza16Image is Everything: ipadpalooza16
Image is Everything: ipadpalooza16
 
3 d web round table 2 (10 feb 2013)
3 d web round table 2 (10 feb 2013)3 d web round table 2 (10 feb 2013)
3 d web round table 2 (10 feb 2013)
 
Open Your Mind, Open Your Library (Slides): Texas Library Association 2016
Open Your Mind, Open Your Library (Slides): Texas Library Association 2016Open Your Mind, Open Your Library (Slides): Texas Library Association 2016
Open Your Mind, Open Your Library (Slides): Texas Library Association 2016
 
Stupidity spawns creativity
Stupidity spawns creativityStupidity spawns creativity
Stupidity spawns creativity
 
UX 101: The secrets of good (web & mobile) design
UX 101: The secrets of good (web & mobile) designUX 101: The secrets of good (web & mobile) design
UX 101: The secrets of good (web & mobile) design
 
Week1 Interactivity
Week1 InteractivityWeek1 Interactivity
Week1 Interactivity
 
Normal Considered Harmful
Normal Considered HarmfulNormal Considered Harmful
Normal Considered Harmful
 
Gordon brown brain juice - workshop 1
Gordon brown   brain juice - workshop 1Gordon brown   brain juice - workshop 1
Gordon brown brain juice - workshop 1
 
From Natural Language Processing to Artificial Intelligence
From Natural Language Processing to Artificial IntelligenceFrom Natural Language Processing to Artificial Intelligence
From Natural Language Processing to Artificial Intelligence
 
Data Day Seattle, Chatbots from First Principles
Data Day Seattle, Chatbots from First PrinciplesData Day Seattle, Chatbots from First Principles
Data Day Seattle, Chatbots from First Principles
 
03012019 cia secrets_tocreativeproblemsolving
03012019 cia secrets_tocreativeproblemsolving03012019 cia secrets_tocreativeproblemsolving
03012019 cia secrets_tocreativeproblemsolving
 
03012019 cia secrets_tocreativeproblemsolving
03012019 cia secrets_tocreativeproblemsolving03012019 cia secrets_tocreativeproblemsolving
03012019 cia secrets_tocreativeproblemsolving
 
Visual Rhetoric Feb 3: Early Show
Visual Rhetoric Feb 3: Early ShowVisual Rhetoric Feb 3: Early Show
Visual Rhetoric Feb 3: Early Show
 
Storytelling January 6 2010
Storytelling January 6 2010Storytelling January 6 2010
Storytelling January 6 2010
 
Webstock 2010 - Stack Overflow: Building Social Software for the Anti-Social
Webstock 2010 - Stack Overflow: Building Social Software for the Anti-SocialWebstock 2010 - Stack Overflow: Building Social Software for the Anti-Social
Webstock 2010 - Stack Overflow: Building Social Software for the Anti-Social
 
GCurtis_IASummit_poster
GCurtis_IASummit_posterGCurtis_IASummit_poster
GCurtis_IASummit_poster
 
Perrone crafting outstandingpresentationsstorytelling
Perrone crafting outstandingpresentationsstorytellingPerrone crafting outstandingpresentationsstorytelling
Perrone crafting outstandingpresentationsstorytelling
 
The Future of Marketing: Make Things People Want or Make People Want Things?
The Future of Marketing: Make Things People Want or Make People Want Things?The Future of Marketing: Make Things People Want or Make People Want Things?
The Future of Marketing: Make Things People Want or Make People Want Things?
 
The man being carried does not realize how far away the town really is.
The man being carried does not realize how far away the town really is.The man being carried does not realize how far away the town really is.
The man being carried does not realize how far away the town really is.
 
Steps In Research Paper Writing. Steps Of Writing A Research Proposa
Steps In Research Paper Writing. Steps Of Writing A Research ProposaSteps In Research Paper Writing. Steps Of Writing A Research Proposa
Steps In Research Paper Writing. Steps Of Writing A Research Proposa
 

Mehr von Jonathan Mugan

Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedJonathan Mugan
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksJonathan Mugan
 
Chatbots from first principles
Chatbots from first principlesChatbots from first principles
Chatbots from first principlesJonathan Mugan
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceJonathan Mugan
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingJonathan Mugan
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceJonathan Mugan
 

Mehr von Jonathan Mugan (6)

Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow Extended
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
Chatbots from first principles
Chatbots from first principlesChatbots from first principles
Chatbots from first principles
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial Intelligence
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial Intelligence
 

Kürzlich hochgeladen

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Kürzlich hochgeladen (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

How to build someone we can talk to

  • 1. 1 1 How to build someone we can talk to Jonathan Mugan @jmugan Data Day Texas November 17, 2022
  • 2. 2 2 Steve Jurvetson from Los Altos, USA, CC BY 2.0 <https://creativecommons.org/licenses/by/2.0>, via Wikimedia https://commons.wikimedia.org/wiki/File:Rethink_Robotics_%E2%80%94_Brooks_and_Baxter_%288000143255%29.jpg So many have left us Many intelligent robots have come and gone, failing to become a commercial success
  • 3. 3 3 Maurizio Pesce from Milan, Italia, CC BY 2.0 <https://creativecommons.org/licenses/by/2.0>, via Wikimedia Commons Romo is gone
  • 4. 4 4 Cynthiabreazeal, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons Jibo never seemed to get off the ground
  • 5. 5 5 Sony photographer, CC0, via Wikimedia Commons Mighty Sony couldn’t make Aibo stay
  • 6. 6 6 Gregory Varnum, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons These robots didn’t reach their potential because you can’t talk to them, not really. Even Alexa is reducing staff https://arstechnica.com/gadgets/2022/11/amazon-alexa-is-a-colossal-failure-on-pace-to-lose-10-billion-this-year/
  • 7. 7 7 Data Day 2017 Talked about how language evolved and the current state-of-the-art for chatbots and how they won’t work without understanding Went into detail about how neural networks can generate text Data Day 2018 Talked about two paths to getting understanding Data Day 2019 We got distracted and did MLOps with TensorFlow Extended Data Day 2020 Data Day 2021 Data Day 2022 meaning symbolic subsymbolic Covid
  • 8. 8 8 Conversations are interactive • but they are also hyperlocal and hyper-specific to the current context • context entails the current situation and conversation history Conversation is a unique medium
  • 9. 9 9 Conversation is a unique medium When you have a conversation with someone, you direct their imagination and they yours
  • 10. 10 10 Conversation is a unique medium We don’t have any other medium that has those properties • movies, tv, videos, don’t • The best conversations teach • Conversation is one of the hardest things we do as humans—why they say people need to socialize to keep their thinking sharp
  • 11. 11 11 Why a conversation partner needs to be “somebody” Reliability: If it represents the world the way that we do less likely to make a catastrophic mistake (since its knowledge structure is built on our representation) Trust: need the mistakes it does make to make sense to us
  • 12. 12 12 Why a conversation partner needs to be “somebody” Fun: much more interesting to talk to somebody. Consider the humor • Household robot: should have some goal obscure to the owner, like tapping on the walls. “What the hell is it doing?” • NPC: “You’re playing a video game, right? Can you get me out of here?”
  • 13. 13 13 Why a conversation partner needs to be “somebody” Truth: most important • Truth comes from interaction with the real world • A person seeks to bend the world to their goals • A person strives for things • ChatGPT: you can kind- of have a conversation with it, but it isn’t there
  • 14. 14 14 Somebody home requires State • The world around it Transitions (actions) • How the world changes Goals • Transitions enable one to talk about goals • Why would it talk at all? • Thinking beyond natural- language interface
  • 15. 15 15 Somebody home requires • Back to Kant and before • He calls it Judgment • https://plato.stanford.e du/entries/kant- judgment/ Inference • Inference is mapping a continuous situation to some space of values • Keeps us from being brittle • See Eiffel tower, in Paris
  • 16. 16 16 Somebody home requires Inference • Inference is mapping a continuous situation to some space of values • Keeps us from being brittle • See Eiffel tower, in Paris “The Beginning of Infinity,” by David Deutsch • “People” are entities that can create explanations. • When we see stars, we don’t see what they are, we just see points of light.
  • 17. 17 17 Inference has a precise formulation 𝑃(ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠|𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛) = 𝑃 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑃(ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠) 𝑃(𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛) 𝑃 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 ∝ 𝑃 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑃(ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠) 𝑃 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 is your model of the world Hypothesis: is an explanation in the David Deutsch sense “A big burning ball of hydrogen and helium burning millions of miles away might look like that” it’s the Bayesian equation Or more simply
  • 18. 18 18 Inference has a precise formulation 𝑃(ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠|𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛) = 𝑃 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑃(ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠) 𝑃(𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛) 𝑃 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 ∝ 𝑃 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑃(ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠) it’s the Bayesian equation Or more simply 𝑃 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 must be composable and you have to come up with it on the fly
  • 19. 19 19 Words Mental Scene Possibilitie s “The table is on the table.” • If you want to pick up top table, you must first walk to the table • If you push the bottom table, the top table will fall • Party guests will think this is weird looking. Inference leads to “meaning” Meaning (inner world) Reminiscent of Robert Brandom through Richard Evans https://www.doc.ic.ac.uk/~re14/Evan s-R-2020-PhD-Thesis.pdf
  • 20. 20 20 Words Mental Scene Possibilitie s Inference leads to “meaning” Meaning (inner world) “The table is on the table.” What?
  • 21. 21 21 Words Mental Scene Possibilitie s Inference leads to “meaning” Meaning (inner world) Two ways to not understand in a conversation: 1. Wrong mental scene 2. Not knowing the possibilities
  • 22. 22 22 Stimuli Mental Scene Possibilitie s Inference leads to “meaning” Meaning (inner world) 𝑃 𝑚𝑒𝑛𝑡𝑎𝑙 𝑠𝑐𝑒𝑛𝑒 𝑠𝑡𝑖𝑚𝑢𝑙𝑖) ∝ 𝑃 𝑠𝑡𝑖𝑚𝑢𝑙𝑖 𝑚𝑒𝑛𝑡𝑎𝑙 𝑠𝑐𝑒𝑛𝑒 𝑃(𝑚𝑒𝑛𝑡𝑎𝑙 𝑠𝑐𝑒𝑛𝑒) 𝑃 𝑛𝑒𝑥𝑡 𝑠𝑡𝑎𝑡𝑒 𝑚𝑒𝑛𝑡𝑎𝑙 𝑠𝑐𝑒𝑛𝑒) Transition model 𝑃(𝑚𝑒𝑛𝑡𝑎𝑙 𝑠𝑐𝑒𝑛𝑒) State prior • These two models must be composable. • They must be dynamic and can’t be precomputed, as we will see. 𝑃 𝑠𝑡𝑖𝑚𝑢𝑙𝑖 𝑚𝑒𝑛𝑡𝑎𝑙 𝑠𝑐𝑒𝑛𝑒 State model
  • 23. 23 23 Inference for conversation requires dynamic composition Coordination of meaning Coordination of inner worlds Instruction Acknowledgement Acknowledgement Break Break Modified from Gärdenfors (2014), which was based on Winter (1998) • Levels of discourse • Complicated to go up and down the pyramid
  • 24. 24 24 A fishing pole is a stick, string and hook You can catch fish with a fishing pole Get me some fish Acknowledgement Break Break Acknowledgement Modified from Gärdenfors (2014), which was based on Winter (1998) • Levels of discourse • Complicated to go up and down the pyramid Inference for conversation requires dynamic composition
  • 25. 25 25 Inference is over more than words: conversation has its own rules (pragmatics) • Conversational maxims: Grice (1975, 1978) • Breaking these rules is a way to communicate more than the meaning of the words. Maxim of Quantity: Say only what is not implied. Yes: “Bring me the table.” No: “Bring me the table by transporting it to my location.” What did she mean by that? Maxim of Quality: Say only things that are true. Yes: “I hate carrying tables.” No: “I love carrying tables, especially when they are covered in fire ants.” She must be being sarcastic. Maxim of Relevance: Say only things that matter. Yes: “Bring me the table.” No: “Bring me the table and birds sing.” What did she mean by that? Maxim of Manner: Speak in a way that can be understood. Yes: “Bring me the table.” No: “Use personal physical force to levitate the table and transport it to me.” What did she mean by that?
  • 26. 26 26 When I saw this, my first thought was, “Where do people enter?” Words are only hints at possible meanings
  • 27. 27 27 Our inference models need to compose for maximum coverage Foundational Metaphors (See Mark Johnson and others. Steven Pinker talks about two main ones) • Force • An offer or a person can be attractive • A broken air conditioner can force you to move a meeting • Location in space • AI has come a long way in the last 15 years Conceptual Blending (The Way We Think by Gilles Fauconnier and Mark Turner) • That running back is a truck • Dall-e 2 could generate a good picture of this, but it couldn’t imagine what it is like to tackle a vehicle Analogies (Melanie Mitchell and Douglas Hofstadter) • We can broadly apply the story of sour grapes (Hofstadter in Surfaces and Essences) Multiplicative: Most general sense, a bird that can drive a car
  • 28. 28 28 Prompt: “Darth Vader on the cover of Vogue magazine” OpenAI Dall-e 2 • Trained on combinations of images and text; CLIP and diffusion models • Can create images from whatever you type https://openai.com/dall-e-2/ Thanks to you @hardmaru for the images!
  • 29. 29 29 Outline • Conversation makes compelling robots and what conversation requires • The symbolic path of autonomous simulation • The sub-symbolic path of coaxing neural networks
  • 30. 30 30 Outline • Conversation makes compelling robots and what conversation requires • The symbolic path of autonomous simulation • The sub-symbolic path of coaxing neural networks
  • 31. 31 31 Robots can use simulation for inference and meaning • One way to build supple intelligence is to have a robot build a simulation of its environment. • The robot simulates what you say in something like Unity. A table on a table See “Computers could understand natural language using simulated physics” from 2017 for additional details. https://chatbotslife.com/computers-could-understand-natural-language-using-simulated-physics-26e9706013da
  • 32. 32 32 Robots can use simulation for inference and meaning • Simulation enables robust inference because it provides an unbroken description of the dynamics of the environment, even if it is not complete in full detail. • This unbroken description is a grounding. A table on a table See “Computers could understand natural language using simulated physics” from 2017 for additional details. https://chatbotslife.com/computers-could-understand-natural-language-using-simulated-physics-26e9706013da
  • 33. 33 33 Robots can use simulation for inference and meaning What will happen if I push the bottom table? A table on a table See “Computers could understand natural language using simulated physics” from 2017 for additional details. https://chatbotslife.com/computers-could-understand-natural-language-using-simulated-physics-26e9706013da Well, mentally push it and see.
  • 34. 34 34 Robots can use simulation for inference and meaning Classic AI question, can birds fly? ∀𝑥 𝑏𝑖𝑟𝑑 𝑥 → 𝑐𝑎𝑛_𝑓𝑙𝑦(𝑥) Okay, okay ∀𝑥 𝑏𝑖𝑟𝑑 𝑥 ∧ 𝑓𝑙𝑦𝑖𝑛𝑔_𝑏𝑖𝑟𝑑(𝑥) → 𝑐𝑎𝑛_𝑓𝑙𝑦 𝑥 WKRP "As God as my witness, I thought turkeys could fly” https://www.youtube.com/watch?v=lf3mgmEdfwg&ab_channel=EpicHouston ∀𝑥 𝑏𝑖𝑟𝑑 𝑥 ∧ 𝑓𝑙𝑦𝑖𝑛𝑔_𝑏𝑖𝑟𝑑(𝑥) ∧ 𝑛𝑜𝑡 𝑏𝑟𝑜𝑘𝑒𝑛_𝑤𝑖𝑛𝑔(𝑥) → 𝑐𝑎𝑛_𝑓𝑙𝑦 𝑥 Okay, okay Okay, okay, what if it is covered in maple syrup?
  • 35. 35 35 Words Mental Scene Possibilitie s “The table is on the table.” • If you want to pick up top table, you must first walk to the table • If you push the bottom table, the top table will fall • Party guests will think this is weird looking. The AI must build its own simulations Meaning (inner world)
  • 36. 36 36 How to Build an AI that Imagines through Simulation Gradually working our way to real machine learning manually build it → manual configuration → machine learning Build it in two steps 1. Design hypothesis space 2. Human-like machine learning Step 1 Step 2 We don’t have to put as much information in as if we were using logic because the simulations will automatically combine things, but we still have a challenging problem ahead of us. How do we do it?
  • 37. 37 37 By Sewaqu - Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=11967659 𝑦 = 𝑚𝑥 + 𝑏 Simplest hypothesis space, the space of all real values of x and b The hypothesis space is the foundation of machine learning
  • 38. 38 38 Recall from 2015 Neural networks are just millions of linear regressions with little nonlinear functions thrown in between the layers to keep it from squashing down. The hypothesis space is the foundation of machine learning
  • 39. 39 39 But it doesn’t have to be simple like that Structured hypothesis spaces • Space of Bayesian networks • Space of probabilistic programs • Space of antenna shapes • Space of relational database schemas • Space of DNA sequences Once you set up the space, then you can search in that space • Genetic algorithms • Hillclimbing • Backpropagation (like in neural networks, if the space is continuous)
  • 40. 40 40 Interlude: Best book if you are interested in the fundaments of machine learning You can read it for free! Tom Mitchell, “Machine Learning” http://www.cs.cmu.edu/~tom/mlbook.html
  • 41. 41 41 So, we got to set up a particularly complicated hypothesis space Domain, Classes, Relations https://web.stanford.edu/~jurafsky/slp3/21.pdf Frames: frames, slots, classes, instances, types, and values • My view: you can’t get very far with formal representation and reasoning. • Even simple problems are hard. • Yale shooting problem https://en.wikipedia.org/wiki/Yale_shooting_problem • Humans don’t do it that way. We have to be taught How the Mind Works, Steven Pinker So, it has to be some kind of embodied (problem specific) representation, but it needs to be structured enough that you can add to it without having to write Python code. manually build it → manual configuration → machine learning Step 1
  • 42. 42 42 Building the hypothesis space: Manual Configuration Embodied construction grammar https://github.com/icsi-berkeley Protégé is another familiar example https://protege.stanford.edu/
  • 43. 43 43 Build for composability Foundational Metaphors and Analogies • Build it so code reuses code with “duck” typing • Evolution progresses by finding new uses for existing structures Conceptual Blending • Focus on properties; create new objects using unions of properties. Multiplicative • Build it so code reuses code with “duck” typing
  • 44. 44 44 2. Use the hypothesis space for machine learning manually build it → manual configuration → machine learning Step 2 With the right hypothesis space, we can do real machine learning. We don’t need special algorithms; basic rule learning and decision-tree type stuff will be enough. If A and B were there when great thing C happened, A and B are good and A,B C Real (human-like) machine learning means you have all the pieces, you just have to label some subsets.
  • 45. 45 45 2. Use the hypothesis space for machine learning manually build it → manual configuration → machine learning Step 2 Different from the standard paradigm: while not converged run training examples through and update parameters based on errors Real (human-like) machine learning means you have all the pieces, you just have to label some subsets. Less of a pushing in continuous space and more of labeling groups.
  • 46. 46 46 2. Use the hypothesis space for machine learning manually build it → manual configuration → machine learning Step 2 Real (human-like) machine learning also means you have all the pieces those pieces should causally fit together If A and B were there when great thing C happened, A and B are good and A,B C Based on other associations between A and B and C there should be a causal mechanism that ties them together “Oh, that makes sense because A  M  N and B  Q and, of course and everyone knows that N&GC”
  • 47. 47 47 Words Mental Scene Possibilitie s How to build someone to talk to using the symbolic method Meaning (inner world) Build autonomous simulation through manually build it → manual configuration → machine learning You don’t build an AI that can understand stories. You build an AI that can *build* stories. Then, it can “understand” stories by constructing a sequence of events that meets the constraints of the story.
  • 48. 48 48 Outline • Conversation makes compelling robots and what conversation requires • The symbolic path of autonomous simulation • The sub-symbolic path of coaxing neural networks
  • 51. 51 51 ChatGPT is amazing but it has no sense of truth
  • 52. 52 52 Encoding sentence meaning into a vector h0 The “The patient fell.” Using a recurrent neural network (RNN).
  • 53. 53 53 Encoding sentence meaning into a vector h0 The h1 patient “The patient fell.” Using a recurrent neural network (RNN).
  • 54. 54 54 Encoding sentence meaning into a vector h0 The h1 patient h2 fell “The patient fell.” Using a recurrent neural network (RNN).
  • 55. 55 55 Encoding sentence meaning into a vector RNN is like a hidden Markov model but doesn’t make the Markov assumption and benefits from a vector representation. h0 The h1 patient h2 fell h3 . “The patient fell.”
  • 57. 57 57 Decoding sentence meaning El h3 h4 Machine translation.
  • 59. 59 59 Decoding sentence meaning Machine translation. El h3 paciente h4 se h5 cayó h6 [Cho et al., 2014] • It keeps generating until it generates a stop symbol. • It is using a kind of interpolation from a huge set of training data.
  • 60. 60 60 Attention [Bahdanau et al., 2014] El h3 paciente h4 se h5 cayó h6 h0 The h1 patient h2 fell h3 .
  • 61. 61 61 Transformers: Attention is all you need https://arxiv.org/abs/1706.03762
  • 62. 62 62 You can’t have truth without world interaction Is this stove hot? Ouch! Language is a representation medium for the world—it isn't the world itself. • When we talk, we only say what can't be inferred because we assume the listener has a basic understanding of the dynamics of the world (e.g., if I push a table the things on it will also move) The inferences are not about the world Although ChatGPT could interact with a virtual machine https://www.engraved.blog/building-a-virtual-machine-inside/ We need to build an AI that knows what a toddler knows before we can build one that understands Wikipedia.
  • 63. 63 63 Robots can watch YouTube and learn to imitate, analogous to ChatGPT Multimodal, language and object manipulation The trick is the tokenization of events in video, but Google has made some good progress Robotics Transformer 1 (RT-1) • Transformer model trained by copying demonstrations • Predict the next most likely action based on what it has learned from the demonstrations https://blog.google/technology/ai/helping-robots-learn-from-each-other/ https://ai.googleblog.com/2022/12/rt-1-robotics-transformer-for-real.html https://robotics-transformer.github.io/assets/rt1.pdf Imitation Learning from Video with Transformers
  • 64. 64 64 Imitation Learning from Video with Transformers Image used with permission. Thanks Keerthana Gopalakrishnan! @keerthanpg The trick is the tokenization of events in video, but Google has made some good progress Robotics Transformer 1 (RT-1) • Transformer model trained by copying demonstrations • Predict the next most likely action based on what it has learned from the demonstrations https://blog.google/technology/ai/helping-robots-learn-from-each-other/ https://ai.googleblog.com/2022/12/rt-1-robotics-transformer-for-real.html https://robotics-transformer.github.io/assets/rt1.pdf
  • 65. 65 65 Recall the report card for transformer trained on text
  • 66. 66 66 Imitation Learning from Video with Transformers Using robots in the real or simulated world provides State. Goals are weak since it is imitation learning. We have transitions/actions that happen in the real world. Also limited due to imitation learning
  • 67. 67 67 Reinforcement learning trained in simulation Instead of building a system that can learn to build its own simulations, we train robots in simulated worlds that we build Microsoft Flight Simulator https://www.flightsimulator.com/ AI2Thor by Allen AI https://ai2thor.allenai.org/
  • 68. 68 68 Reinforcement learning trained in simulation Instead of building a system that can learn to build its own simulations, we train robots in simulated worlds that we build There has been some exciting progress in the area from DeepMind and others https://www.deepmind.com/blog/from-motor-control-to-embodied-intelligence Soccer! Decision transformers https://huggingface.co/blog/decision-transformers https://deumbra.com/2022/08/rllib-for-deep-hierarchical- multiagent-reinforcement-learning/
  • 69. 69 69 Imitation Learning from Video with Transformers Using robots in the real or simulated world provides State. Goals are weak since it is imitation learning. We have transitions/actions that happen in the real world. Also limited due to imitation learning
  • 70. 70 70 Using robots in the real or simulated world provides State. Goals are more real with reinforcement learning We have transitions/actions that happen in the real world. This will be the challenge to generalize Reinforcement learning trained in in simulation
  • 71. 71 71 Words Mental Scene Possibilitie s How to build someone to talk to using the subsymbolic method Meaning (inner world) Have it watch videos with dialog Give it reward for getting to good states in a simulation of our world, where actions include words
  • 72. 72 72 Conclusion When we build an AI with somebody home, it will have goals. It will talk with us to achieve those goals. It may not be familiar and chatty like ChatGPT. It will be a little alien and sometimes be hard to understand. But it will be more reliable because it understands the conversation, and it will be more trustworthy because the mistakes it makes will make sense to us. It will be somebody worth talking to!
  • 74. 74 74 Time flies like an arrow “Subconscious” neural network tokens “Conscious” symbolic tokens The “subconscious” tokens from neural networks could fill in the gaps in the symbolic. The symbolic could then “imagine” creating a simulated world for reinforcement learning. ? Learn these associations
  • 75. 75 75 Human-Like Machine Learning This is the trick. Two solutions 1. New neural-network architectures 2. Human-like symbolic learning on top
  • 76. 76 76 Prompt: “Photograph of two robot cats going on a date in Manhattan at night” • Trained on combinations of images and text; CLIP and diffusion models • Can create images from whatever you type https://openai.com/dall-e-2/ Thanks to you @hardmaru for the images! OpenAI Dall-e 2
  • 77. 77 77 Prompt: “Photograph of Apes attending the World Economic Forum in Davos” OpenAI Dall-e 2 • Trained on combinations of images and text; CLIP and diffusion models • Can create images from whatever you type https://openai.com/dall-e-2/ Thanks to you @hardmaru for the images!
  • 78. 78 78 Using language models to guide robot action Have a single-armed mobile robot in a kitchen setting. Human: I spilled my drink, can you help? Robot: tries the text associated of each action and sees which one most likely in the language model. Action 1: get sponge Action 2: get vacuum Since sponge has a higher likelihood in the language model, the sponge is the best action. Language Model I spilled my drink, can you help? I’ll get a sponge. score = .062 I spilled my drink, can you help? I’ll get a vacuum. score = .013 Do As I Can, Not As I Say: Grounding Language in Robotic Affordances https://say-can.github.io/ Robotics at Google and Everyday Robots They are extracting world knowledge from language, very cool, but still not enough, as we will see.
  • 79. 79 79 DeepMind Flamingo: Conversations about pictures Foundational Language Model Foundational Vision Encoder Other trained models to connect them together Conversations about images Image of weird cloth monster in soup Example from DeepMind https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model Human: What is in this picture Flamingo: It’s a bowl of soup with a monster face on it Human: What is the monster made out of? Flamingo: It is made out of vegetables Human: No, it’s made out of a kind of fabric, can you see what kind? Flamingo: It’s made out of a woolen fabric.
  • 80. 80 80 AI needs the experience children get Foundational models learn from text, but too many basic things aren’t written down, because everyone knows them Some of you may recall the romance novel I’m writing: Words are only hints at possible meanings; to understand those hints we need experience in addition to text, even videos may be insufficient. He pushed the table between them aside to embrace her, and all the objects on the table moved as well, and the liquid within the glasses spilled, and some things fell to the ground, but because it was carpet it wasn’t as loud as it would be if it was a hard floor, and the table had friction with the floor, so it didn’t fly through the window and into the street, and …
  • 81. 81 81 while alive: if there is a breakdown: 1. the cortex populates mental state 2. internal attention says where to focus 3. non-interchangeable value function says what is good else: continue in auto zombie mode if expected state 𝑠𝑒 ≠ 𝑠 receive state 𝑠 𝑇 𝑠, 𝑎 → 𝑠′ Attention(𝑠′) → 𝑠 𝑉(𝑠) s.t. 𝑉 𝑠1 + 𝑉(𝑠2) ≠ 𝑉(𝑠1 + 𝑠2) 𝜋 𝑠 → 𝑎 A pseudocode of consciousness and choose policy 𝜋 According to this organization of ideas, if we can avoid building a non-interchangeable value function in a computer we can keep it from being conscious. Will save us a lot of trouble. An idea that makes me chuckle: What if by some fluke virus, printers are conscious? Would explain why they are so recalcitrant. Reinforcement learning notation
  • 82. 82 82 https://www.flickr.com/photos/obamawhitehouse/4921383047/ Can AI tell us why this picture is funny? Karpathy, 2012 http://karpathy.github.io/2012/10/22/state-of- computer-vision/ Yannic Kilcher discusses how Flamingo gets close https://www.youtube.com/watch?v=smUHQndcmO Y&t=152s&ab_channel=YannicKilcher • You can’t ask why it is funny, but you can ask “Where is Obama’s foot positioned?” Tweet by Florin Bulgarov https://twitter.com/florinzf/status/152263 3003511517187/photo/1 Volleyball locker room at UT Austin.