230922 Semantic Exploration from Language Abstractions and Pretrained Representations - public.pdf

S
Seungjoon1Graduated student
Machine
Learning
LABoratory
Seungjoon Lee. 2023-09-22. sjlee1218@postech.ac.kr
Semantic Exploration from
Language Abstractions and
Pretrained Representations
Neurips 2022. Paper Summary
1
Contents
• Introduction
• Methods
• Experiments
• Conclusion
2
Caution!!!
• This is the material I summarized a paper at my personal research meeting.
• Some of the contents may be incorrect!
• Some experiments are excluded intentionally, because they are not directly
related to my research interest.
• Methods are simplified for easy explanation.
• Please send me an email if you want to contact me: sjlee1218@postech.ac.kr
(for correction or addition of materials, ideas to develop this paper, discussion).
3
Situations
• Novelty-based RL exploration methods incentivize exploration using novelty
as intrinsic rewards.
• The novelty is calculated based on the degree to which an observation is new.
4
Complication
• The existing visual novelty-based methods will fail on partial observable high
dimensional state space, especially in 3D environments.
• Because the semantically similar states can be observed very differently, in
terms of the point of view.
5
Questions & Hypothesis
• Question:
• Can you recognize similarities in high dimensional states that are
semantically similar but visually different for novelty-based exploration?
• Hypothesis:
• Language abstraction can make semantic-based novelty intrinsic reward,
accelerating exploration.
6
Contributions
• This paper shows novelty calculation using language abstraction can
accelerate RL exploration because
• 1) language can abstracts a state space coarsely, and
• 2) language can abstracts a state space semantically.
• Furthermore, this paper shows the above idea is applicable in environments
without language, using vision-language model (VLM)
7
Methods
8
Problem Formulation
• Goal conditioned MDP
• , goal space. A goal is a language instruction in this paper.
• .
• , language oracle is used in proof of concept experiments (PoC).
• Oracle output is never observed by an agent and is distinct from the instruction .
•
that maximizes is considered.
(S, A, G, P, Re, γ)
G
Re : S × G → ℝ
𝒪 : S → L
g
πg( ⋅ |S) E[
H
∑
t=0
γt
(re
t + βri
t)]
9
Method Outline
• Novelty calculation baseline + RL + (Language encoder or pretrained VLM)
• Novelty calculation baseline: Random Network Distillation
• RL agent cannot see oracle language and does not share any parameters
with pretrained models.
10
Method - Novelty Calculation Baseline
Outline
• Random Network Distillation (RND)
• RND makes the higher intrinsic rewards when visiting the unfamiliar states
using the trainable network.
• Today’s paper calls the original RND as visual RND, VisRND.
11
Method - Novelty Calculation Baseline
Environment interaction diagram
• RND makes intrinsic rewards using two state encoders:
• a fixed target function and a trainable predictor function .
ffixed fψ
12
Method - Novelty Calculation Baseline
Calculation of intrinsic reward
• Intrinsic reward
• Target function .
• Deterministic, randomly initialized, fixed NN.
• Predictor function .
• Trainable NN with parameter .
ri
= ||ffixed(s) − fψ(s)||2
ffixed : S → ℝk
fψ : S → ℝk
ψ
13
Method - Novelty Calculation Baseline
Training of the state encoder
• is trained to mimic the random feature .
•
• implicitly stores the visit counts, familiarity of states.
fψ ffixed(s)
L(ψ) = ||ffixed(s) − fψ(s)||2
fψ
14
Method - Novelty Calculation Baseline
Training of a RL agent
• RL agent: on-policy IMPALA
•
Value loss , where
and .
•
Policy loss
L(ϕ) =
∑
t
[yt − Vϕ(st, g)]
2
yt =
t+n−1
∑
k=t
γk−t
rk + γn
Vϕ(st+n, g) rk = re
k + βri
k
L(θ) = −
∑
t
[
log πθ(at |st, g)(rt + γyt+1 − Vϕ(st, g))]
15
Method - Language Encoder
• Language-RND, Lang-RND, makes higher intrinsic rewards when getting the
unfamiliar language.
• Language is from the oracle, is a fixed random LSTM.
• Lang-RND shows language’s coarse abstraction is helpful for RL exploration.
ffixed : L → ℝk
16
Method - Oracle Language Distillation
• Language Distillation, LD, makes higher intrinsic rewards when getting the
visual observation with unfamiliar linguistic meaning.
• , oracle.
• , trained to generate text caption like oracle, with a CNN encoder
and LSTM decoder.
•
, where K is the length of oracle language.
• LD shows semantic meaning can accelerate RL exploration.
ffixed : S → L
fψ : S → L
ri
= −
K
∑
k=1
log (fψ(s))
(ffixed(s))k
k
17
Method - VLM Encoder
• Network Distillation, ND, makes higher intrinsic rewards when getting the
visual observation with unfamiliar linguistic meaning.
• , pretrained to make visual embedding aligned to the
corresponding language embedding.
• ND shows this paper’s idea is applicable in envs without language.
ffixed : S → ℝk
18
Experiments
19
PoC: Is Language a Meaningful Abstraction?
• Using oracle language, the authors does proof of concept (PoC) showing:
• 1) Language abstraction forces a RL agent to explore much more states,
• because language coarsely abstracts states.
• 2) Language abstraction forces a RL agent to explore semantically diverse
states,
• because language semantically abstracts states.
20
PoC
Environment
• Playroom Environment:
• Rooms with various household objects.
• Tasks: lift, put, find.
• Goal: an instruction like “find <object>”.
• If the goal is achieved, reward is +1 and an episode ends.
• Oracle language is made by Unity.
21
PoC - Coarse Abstraction by Language
Results
• Claim:
• If language abstracts the state space coarsely first, novelty by the
abstraction accelerates RL exploration.
22
PoC: Methods Taxonomy
23
Is state space
abstracted
coarsely?
Is semantic
meaning
considered?
Trainable
Network
Target
function
Vis-RND X X Fixed random NN
Lang-RND O Fixed random NN
S → Rk
L → Rk
△
PoC - Coarse Abstraction by Language
Results
• Claim:
• If language abstracts the state space coarsely first, novelty by the abstraction accelerates
RL exploration.
• Exploration with language novelty solves the tasks much faster than
exploration with visual novelty.
24
Lang-RND: state -> language -> random feature
Vis-RND: state -> random feature
Trajectory comparison between
Lang-RND v.s. Vis-RND
PoC - Coarse Abstraction by Language
Why coarse? And so what?
• States are coarsely grouped into language by Unity oracle.
• Therefore, the agent should explore wider to get higher intrinsic rewards.
• Because random language features are made from the oracle language,
random feature space coarsely abstracts the states.
25
POC - Semantic Diversity from Images
• Claims:
• 1) Just coarse abstraction is not enough. Semantic should be considered.
• 2) We can use language abstraction-based novelty from visual states.
26
PoC: Methods Taxonomy
27
Is state space
abstracted
coarsely?
Is semantic
meaning
considered?
Trainable
Network
Target
function
Vis-RND X X Fixed random NN
Lang-RND O Fixed random NN
Shuffled
Language
Distillation
O X
Fixed random NN
whose
distribution is
same with oracle
Language
Distillation
O O Unity oracle
S → Rk
L → Rk
S → L
S → L
△
POC - Semantic Diversity from Images
Results
• Claims:
• 1) Just coarse abstraction is not enough. Semantic should be considered.
• 2) We can use language abstraction-based novelty from visual states.
• Exploration with coarse + meaningful embedding helps exploration more than
coarse + meaningless embedding.
28
LD: state -> meaningful text
S-LD: state -> meaningless text S-LD output examples
POC - Semantic Diversity
Why semantically diverse? And so what?
• LD makes higher intrinsic rewards when visiting states with newer semantics.
• RL agent should explore semantically diverse states to get higher .
ri
29
POC - Semantic Diversity
Why semantically diverse? And so what?
• LD makes higher intrinsic rewards when visiting states with newer semantics.
• RL agent should explore semantically diverse states to get higher .
• The dramatic gap between LD v.s. S-LD is due to the environment choice.
• Because oracle makes captions of agent interactions.
• LD interacts more, getting higher intrinsic rewards with new languages.
ri
30
Experiments - Intrinsic Rewards with VLM
• Authors use VLM encoder to eliminate the need for the language oracle.
• An agent gets higher intrinsic rewards when visiting states with the
unfamiliar linguistic embedding.
31
Experiments - Intrinsic Rewards with VLM
Results
• With coarse and semantic embedding of VLM, ALM-ND can learns faster than
Vis-RND, without oracle language.
• ALM-ND uses an ALIGN model encoder, pretrained to make image
embedding aligned to the corresponding text embedding.
32
Conclusion
• Conclusion:
• Novelty calculation using language abstraction can accelerate RL exploration
because it abstracts state space 1) coarsely 2) semantically.
• Novelty calculation using language abstraction works on various settings; on-policy
and off-policy, different novelty calculations, different 3D domains even without
oracle language.
• Limitations:
• There is no 2D env performance comparison with existing visual novelty methods.
• The performance of pretrained VLM affects the resulting RL sample efficiency a lot.
33
Appendix
36
Contents
• Related works and rationale
• Methods - novelty calculation baseline: Random Network Distillation PoC
• Methods - novelty calculation baseline: Never Give Up
• Methods - construction of S-LD
• More experiments
37
Why is This New?
• Existing family of intrinsic reward exploration methods could fail on 3D state spaces,
because those all use visual state representations.
• This method abstracts the states using semantic, avoiding useless explorations.
• Existing RL methods with language require environment-specific annotations or
semantic parsers.
• This method can be applied to any visually natural env using pretrained VLM.
• Existing RL with pretrained embeddings mainly put the embedding directly into the
agent.
• This method shows large pretrained models can be used to guide exploration.
38
Rationale
• Why would VLM representation be helpful for semantic novelty-based
exploration? Intuitions?
• 1. Language is inherently abstract.
• language links situations similarly which are superficially distinct but
causally related, and vice versa.
• 2. Language carries important information efficiently ignoring miscellaneous
noises.
39
Random Network Distillation
Proof of Concept
• Question: can be a novelty measure?
• Dataset: many 0 and other digit. (Eg. 5000 images for 0, 10 images for 1)
• is trained to .
||fψ(s) − ffixed(s)||2
N
fψ min
ψ
||fψ(s) − ffixed(s)||2
40
N, # of target class in training data
MSE
for
unseen data
Novelty Calculation Baseline - Never Give Up
Outline
• Never Give Up (NGU) makes based on how new the state is in this episode.
• NUG components
• , state encoder:
• , memory of for all in one episode.
• is distinct with the experience replay buffer of the whole game.
ri
fψ s → Rk
M f(s) s
M
41
Novelty Calculation Baseline - Never Give Up
Environment interaction diagram
• is made by encoder and non-parametric buffer of encoded states .
• does not share any parameters with a RL agent.
ri
fψ M
fψ
42
Novelty Calculation Baseline - Never Give Up
Calculation of intrinsic reward
•
• is k-nearest neighbors of in the episode memory .
• is bigger when the encoded state is far from the existing encoded states.
• is filled with for all which are visited states so far in this episode.
ri
= R(f(s′), M) ∝
∑
f(x)∈knn(f(s′),M)
||f(s′) − f(x)||2
knn(f(s′), M) f(s′) M
ri
M f(s) s
43
Novelty Calculation Baseline - Never Give Up
Training of the state encoder
• is trained to extract visual features related only to the agent’s actions.
• , where is MLP.
• should extract the features relevant to its action.
• In the today’s paper, only Vis-NGU trains in this way.
• Lang-NGU, LSE-NGU use the fixed pretrained (CLIP, ALM etc.).
fψ
at = h(fψ(st), fψ(st+1)) h
fψ
fψ
fψ
44
Novelty Calculation Baseline - Never Give Up
Training of a RL agent
• RL agent: DRQN + -greedy.
• Q function loss: ,
where ~ experience replay buffer of the whole game.
ϵ
L(ϕ) = ||(re
t + βri
t) + γQϕ(st+1, at+1) − Qϕ(st, at)||2
(st, at, re
t , ri
t, st+1)
45
S-LD Construction
• S-LD uses a fixed target network , whose output distribution is
same with the oracle, but the mapping is random.
• construction procedure:
• Get the empirical oracle language distribution by , which is trained by LD.
• Get an image embedding as a real number using random fixed NN.
• Map the random real number to language, according to the oracle distribution.
ffixed : S → L
ffixed
πLD
46
Pretrained Image Model instead of VLM
• CNN encoder pretrained on ImageNet is compared.
• Language embedding gets much bigger performance in harder tasks. (Put,
Find)
47
Oracle Language v.s. Image Embedding from VLM
• Methods with image are not significantly worse than methods with oracle
languages.
48
Visited State Heatmap in City Environment
• NGU variants explores in City environment only with intrinsic rewards.
• Language abstractions makes the visited state wider.
49
Observation examples in City Environment
1 von 47

Recomendados

NovelD: A Simple yet Effective Exploration Criterion von
NovelD: A Simple yet Effective Exploration CriterionNovelD: A Simple yet Effective Exploration Criterion
NovelD: A Simple yet Effective Exploration CriterionSeungjoon1
4 views39 Folien
Natural Language to Visualization by Neural Machine Translation von
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translationivaderivader
157 views20 Folien
230906 paper summary - learning to world model with language - public.pdf von
230906 paper summary - learning to world model with language - public.pdf230906 paper summary - learning to world model with language - public.pdf
230906 paper summary - learning to world model with language - public.pdfSeungjoon1
38 views26 Folien
230915 paper summary learning to world model with language with details - pub... von
230915 paper summary learning to world model with language with details - pub...230915 paper summary learning to world model with language with details - pub...
230915 paper summary learning to world model with language with details - pub...Seungjoon1
33 views38 Folien
Talk from NVidia Developer Connect von
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
497 views36 Folien
Learning RGB-D Salient Object Detection using background enclosure, depth con... von
Learning RGB-D Salient Object Detection using background enclosure, depth con...Learning RGB-D Salient Object Detection using background enclosure, depth con...
Learning RGB-D Salient Object Detection using background enclosure, depth con...Benyamin Moadab
114 views26 Folien

Más contenido relacionado

Similar a 230922 Semantic Exploration from Language Abstractions and Pretrained Representations - public.pdf

Leveraging high level and low-level features for multimedia event detection.2... von
Leveraging high level and low-level features for multimedia event detection.2...Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...Lu Jiang
554 views29 Folien
NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre... von
NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...
NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...ssuser4b1f48
70 views26 Folien
Named entity extraction tools for raw OCR text von
Named entity extraction tools for raw OCR textNamed entity extraction tools for raw OCR text
Named entity extraction tools for raw OCR textKepa J. Rodriguez
3.1K views19 Folien
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review von
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Reviewchangedaeoh
59 views30 Folien
Segmentation - based Historical Handwritten Word Spotting using document-spec... von
Segmentation - based Historical Handwritten Word Spotting using document-spec...Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Konstantinos Zagoris
1.1K views20 Folien
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T... von
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Spark Summit
4.1K views40 Folien

Similar a 230922 Semantic Exploration from Language Abstractions and Pretrained Representations - public.pdf(20)

Leveraging high level and low-level features for multimedia event detection.2... von Lu Jiang
Leveraging high level and low-level features for multimedia event detection.2...Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...
Lu Jiang554 views
NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre... von ssuser4b1f48
NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...
NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...
ssuser4b1f4870 views
Named entity extraction tools for raw OCR text von Kepa J. Rodriguez
Named entity extraction tools for raw OCR textNamed entity extraction tools for raw OCR text
Named entity extraction tools for raw OCR text
Kepa J. Rodriguez3.1K views
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review von changedaeoh
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
changedaeoh59 views
Segmentation - based Historical Handwritten Word Spotting using document-spec... von Konstantinos Zagoris
Segmentation - based Historical Handwritten Word Spotting using document-spec...Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T... von Spark Summit
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Spark Summit4.1K views
Nlp and transformer (v3s) von H K Yoon
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
H K Yoon146 views
Understanding Large Social Networks | IRE Major Project | Team 57 von Raj Patel
Understanding Large Social Networks | IRE Major Project | Team 57 Understanding Large Social Networks | IRE Major Project | Team 57
Understanding Large Social Networks | IRE Major Project | Team 57
Raj Patel91 views
Ire presentation von Raj Patel
Ire presentationIre presentation
Ire presentation
Raj Patel251 views
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE von Raj Patel
Understanding Large Social Networks | IRE Major Project | Team 57 | LINEUnderstanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
Raj Patel221 views
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks von Junho Cho
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
Junho Cho273 views
Platform-independent static binary code analysis using a meta-assembly language von zynamics GmbH
Platform-independent static binary code analysis using a meta-assembly languagePlatform-independent static binary code analysis using a meta-assembly language
Platform-independent static binary code analysis using a meta-assembly language
zynamics GmbH613 views
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 von Saurabh Kaushik
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Saurabh Kaushik2.5K views
Depth Fusion from RGB and Depth Sensors by Deep Learning von Yu Huang
Depth Fusion from RGB and Depth Sensors by Deep LearningDepth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep Learning
Yu Huang1K views
Packed Levitated Marker for Entity and Relation Extraction von taeseon ryu
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
taeseon ryu9 views
ECCV2022 paper reading - MultiMAE: Multi-modal Multi-task Masked Autoencoders... von Antonio Tejero de Pablos
ECCV2022 paper reading - MultiMAE: Multi-modal Multi-task Masked Autoencoders...ECCV2022 paper reading - MultiMAE: Multi-modal Multi-task Masked Autoencoders...
ECCV2022 paper reading - MultiMAE: Multi-modal Multi-task Masked Autoencoders...
PR-355: Masked Autoencoders Are Scalable Vision Learners von Jinwon Lee
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
Jinwon Lee1.5K views
Advanced deep learning based object detection methods von Brodmann17
Advanced deep learning based object detection methodsAdvanced deep learning based object detection methods
Advanced deep learning based object detection methods
Brodmann171.6K views

Último

Distinct distributions of elliptical and disk galaxies across the Local Super... von
Distinct distributions of elliptical and disk galaxies across the Local Super...Distinct distributions of elliptical and disk galaxies across the Local Super...
Distinct distributions of elliptical and disk galaxies across the Local Super...Sérgio Sacani
33 views12 Folien
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl... von
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...GIFT KIISI NKIN
28 views31 Folien
RemeOs science and clinical evidence von
RemeOs science and clinical evidenceRemeOs science and clinical evidence
RemeOs science and clinical evidencePetrusViitanen1
44 views96 Folien
Nitrosamine & NDSRI.pptx von
Nitrosamine & NDSRI.pptxNitrosamine & NDSRI.pptx
Nitrosamine & NDSRI.pptxNileshBonde4
18 views22 Folien
Experimental animal Guinea pigs.pptx von
Experimental animal Guinea pigs.pptxExperimental animal Guinea pigs.pptx
Experimental animal Guinea pigs.pptxMansee Arya
28 views16 Folien
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... von
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...ILRI
5 views6 Folien

Último(20)

Distinct distributions of elliptical and disk galaxies across the Local Super... von Sérgio Sacani
Distinct distributions of elliptical and disk galaxies across the Local Super...Distinct distributions of elliptical and disk galaxies across the Local Super...
Distinct distributions of elliptical and disk galaxies across the Local Super...
Sérgio Sacani33 views
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl... von GIFT KIISI NKIN
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...
GIFT KIISI NKIN28 views
Experimental animal Guinea pigs.pptx von Mansee Arya
Experimental animal Guinea pigs.pptxExperimental animal Guinea pigs.pptx
Experimental animal Guinea pigs.pptx
Mansee Arya28 views
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... von ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI5 views
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... von ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI5 views
How to be(come) a successful PhD student von Tom Mens
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
Tom Mens513 views
Metatheoretical Panda-Samaneh Borji.pdf von samanehborji
Metatheoretical Panda-Samaneh Borji.pdfMetatheoretical Panda-Samaneh Borji.pdf
Metatheoretical Panda-Samaneh Borji.pdf
samanehborji16 views
별헤는 사람들 2023년 12월호 전명원 교수 자료 von sciencepeople
별헤는 사람들 2023년 12월호 전명원 교수 자료별헤는 사람들 2023년 12월호 전명원 교수 자료
별헤는 사람들 2023년 12월호 전명원 교수 자료
sciencepeople50 views
CSF -SHEEBA.D presentation.pptx von SheebaD7
CSF -SHEEBA.D presentation.pptxCSF -SHEEBA.D presentation.pptx
CSF -SHEEBA.D presentation.pptx
SheebaD714 views
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance... von InsideScientific
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
InsideScientific67 views
A giant thin stellar stream in the Coma Galaxy Cluster von Sérgio Sacani
A giant thin stellar stream in the Coma Galaxy ClusterA giant thin stellar stream in the Coma Galaxy Cluster
A giant thin stellar stream in the Coma Galaxy Cluster
Sérgio Sacani16 views
Pollination By Nagapradheesh.M.pptx von MNAGAPRADHEESH
Pollination By Nagapradheesh.M.pptxPollination By Nagapradheesh.M.pptx
Pollination By Nagapradheesh.M.pptx
MNAGAPRADHEESH17 views
application of genetic engineering 2.pptx von SankSurezz
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptx
SankSurezz12 views

230922 Semantic Exploration from Language Abstractions and Pretrained Representations - public.pdf

  • 1. Machine Learning LABoratory Seungjoon Lee. 2023-09-22. sjlee1218@postech.ac.kr Semantic Exploration from Language Abstractions and Pretrained Representations Neurips 2022. Paper Summary 1
  • 2. Contents • Introduction • Methods • Experiments • Conclusion 2
  • 3. Caution!!! • This is the material I summarized a paper at my personal research meeting. • Some of the contents may be incorrect! • Some experiments are excluded intentionally, because they are not directly related to my research interest. • Methods are simplified for easy explanation. • Please send me an email if you want to contact me: sjlee1218@postech.ac.kr (for correction or addition of materials, ideas to develop this paper, discussion). 3
  • 4. Situations • Novelty-based RL exploration methods incentivize exploration using novelty as intrinsic rewards. • The novelty is calculated based on the degree to which an observation is new. 4
  • 5. Complication • The existing visual novelty-based methods will fail on partial observable high dimensional state space, especially in 3D environments. • Because the semantically similar states can be observed very differently, in terms of the point of view. 5
  • 6. Questions & Hypothesis • Question: • Can you recognize similarities in high dimensional states that are semantically similar but visually different for novelty-based exploration? • Hypothesis: • Language abstraction can make semantic-based novelty intrinsic reward, accelerating exploration. 6
  • 7. Contributions • This paper shows novelty calculation using language abstraction can accelerate RL exploration because • 1) language can abstracts a state space coarsely, and • 2) language can abstracts a state space semantically. • Furthermore, this paper shows the above idea is applicable in environments without language, using vision-language model (VLM) 7
  • 9. Problem Formulation • Goal conditioned MDP • , goal space. A goal is a language instruction in this paper. • . • , language oracle is used in proof of concept experiments (PoC). • Oracle output is never observed by an agent and is distinct from the instruction . • that maximizes is considered. (S, A, G, P, Re, γ) G Re : S × G → ℝ 𝒪 : S → L g πg( ⋅ |S) E[ H ∑ t=0 γt (re t + βri t)] 9
  • 10. Method Outline • Novelty calculation baseline + RL + (Language encoder or pretrained VLM) • Novelty calculation baseline: Random Network Distillation • RL agent cannot see oracle language and does not share any parameters with pretrained models. 10
  • 11. Method - Novelty Calculation Baseline Outline • Random Network Distillation (RND) • RND makes the higher intrinsic rewards when visiting the unfamiliar states using the trainable network. • Today’s paper calls the original RND as visual RND, VisRND. 11
  • 12. Method - Novelty Calculation Baseline Environment interaction diagram • RND makes intrinsic rewards using two state encoders: • a fixed target function and a trainable predictor function . ffixed fψ 12
  • 13. Method - Novelty Calculation Baseline Calculation of intrinsic reward • Intrinsic reward • Target function . • Deterministic, randomly initialized, fixed NN. • Predictor function . • Trainable NN with parameter . ri = ||ffixed(s) − fψ(s)||2 ffixed : S → ℝk fψ : S → ℝk ψ 13
  • 14. Method - Novelty Calculation Baseline Training of the state encoder • is trained to mimic the random feature . • • implicitly stores the visit counts, familiarity of states. fψ ffixed(s) L(ψ) = ||ffixed(s) − fψ(s)||2 fψ 14
  • 15. Method - Novelty Calculation Baseline Training of a RL agent • RL agent: on-policy IMPALA • Value loss , where and . • Policy loss L(ϕ) = ∑ t [yt − Vϕ(st, g)] 2 yt = t+n−1 ∑ k=t γk−t rk + γn Vϕ(st+n, g) rk = re k + βri k L(θ) = − ∑ t [ log πθ(at |st, g)(rt + γyt+1 − Vϕ(st, g))] 15
  • 16. Method - Language Encoder • Language-RND, Lang-RND, makes higher intrinsic rewards when getting the unfamiliar language. • Language is from the oracle, is a fixed random LSTM. • Lang-RND shows language’s coarse abstraction is helpful for RL exploration. ffixed : L → ℝk 16
  • 17. Method - Oracle Language Distillation • Language Distillation, LD, makes higher intrinsic rewards when getting the visual observation with unfamiliar linguistic meaning. • , oracle. • , trained to generate text caption like oracle, with a CNN encoder and LSTM decoder. • , where K is the length of oracle language. • LD shows semantic meaning can accelerate RL exploration. ffixed : S → L fψ : S → L ri = − K ∑ k=1 log (fψ(s)) (ffixed(s))k k 17
  • 18. Method - VLM Encoder • Network Distillation, ND, makes higher intrinsic rewards when getting the visual observation with unfamiliar linguistic meaning. • , pretrained to make visual embedding aligned to the corresponding language embedding. • ND shows this paper’s idea is applicable in envs without language. ffixed : S → ℝk 18
  • 20. PoC: Is Language a Meaningful Abstraction? • Using oracle language, the authors does proof of concept (PoC) showing: • 1) Language abstraction forces a RL agent to explore much more states, • because language coarsely abstracts states. • 2) Language abstraction forces a RL agent to explore semantically diverse states, • because language semantically abstracts states. 20
  • 21. PoC Environment • Playroom Environment: • Rooms with various household objects. • Tasks: lift, put, find. • Goal: an instruction like “find <object>”. • If the goal is achieved, reward is +1 and an episode ends. • Oracle language is made by Unity. 21
  • 22. PoC - Coarse Abstraction by Language Results • Claim: • If language abstracts the state space coarsely first, novelty by the abstraction accelerates RL exploration. 22
  • 23. PoC: Methods Taxonomy 23 Is state space abstracted coarsely? Is semantic meaning considered? Trainable Network Target function Vis-RND X X Fixed random NN Lang-RND O Fixed random NN S → Rk L → Rk △
  • 24. PoC - Coarse Abstraction by Language Results • Claim: • If language abstracts the state space coarsely first, novelty by the abstraction accelerates RL exploration. • Exploration with language novelty solves the tasks much faster than exploration with visual novelty. 24 Lang-RND: state -> language -> random feature Vis-RND: state -> random feature Trajectory comparison between Lang-RND v.s. Vis-RND
  • 25. PoC - Coarse Abstraction by Language Why coarse? And so what? • States are coarsely grouped into language by Unity oracle. • Therefore, the agent should explore wider to get higher intrinsic rewards. • Because random language features are made from the oracle language, random feature space coarsely abstracts the states. 25
  • 26. POC - Semantic Diversity from Images • Claims: • 1) Just coarse abstraction is not enough. Semantic should be considered. • 2) We can use language abstraction-based novelty from visual states. 26
  • 27. PoC: Methods Taxonomy 27 Is state space abstracted coarsely? Is semantic meaning considered? Trainable Network Target function Vis-RND X X Fixed random NN Lang-RND O Fixed random NN Shuffled Language Distillation O X Fixed random NN whose distribution is same with oracle Language Distillation O O Unity oracle S → Rk L → Rk S → L S → L △
  • 28. POC - Semantic Diversity from Images Results • Claims: • 1) Just coarse abstraction is not enough. Semantic should be considered. • 2) We can use language abstraction-based novelty from visual states. • Exploration with coarse + meaningful embedding helps exploration more than coarse + meaningless embedding. 28 LD: state -> meaningful text S-LD: state -> meaningless text S-LD output examples
  • 29. POC - Semantic Diversity Why semantically diverse? And so what? • LD makes higher intrinsic rewards when visiting states with newer semantics. • RL agent should explore semantically diverse states to get higher . ri 29
  • 30. POC - Semantic Diversity Why semantically diverse? And so what? • LD makes higher intrinsic rewards when visiting states with newer semantics. • RL agent should explore semantically diverse states to get higher . • The dramatic gap between LD v.s. S-LD is due to the environment choice. • Because oracle makes captions of agent interactions. • LD interacts more, getting higher intrinsic rewards with new languages. ri 30
  • 31. Experiments - Intrinsic Rewards with VLM • Authors use VLM encoder to eliminate the need for the language oracle. • An agent gets higher intrinsic rewards when visiting states with the unfamiliar linguistic embedding. 31
  • 32. Experiments - Intrinsic Rewards with VLM Results • With coarse and semantic embedding of VLM, ALM-ND can learns faster than Vis-RND, without oracle language. • ALM-ND uses an ALIGN model encoder, pretrained to make image embedding aligned to the corresponding text embedding. 32
  • 33. Conclusion • Conclusion: • Novelty calculation using language abstraction can accelerate RL exploration because it abstracts state space 1) coarsely 2) semantically. • Novelty calculation using language abstraction works on various settings; on-policy and off-policy, different novelty calculations, different 3D domains even without oracle language. • Limitations: • There is no 2D env performance comparison with existing visual novelty methods. • The performance of pretrained VLM affects the resulting RL sample efficiency a lot. 33
  • 35. Contents • Related works and rationale • Methods - novelty calculation baseline: Random Network Distillation PoC • Methods - novelty calculation baseline: Never Give Up • Methods - construction of S-LD • More experiments 37
  • 36. Why is This New? • Existing family of intrinsic reward exploration methods could fail on 3D state spaces, because those all use visual state representations. • This method abstracts the states using semantic, avoiding useless explorations. • Existing RL methods with language require environment-specific annotations or semantic parsers. • This method can be applied to any visually natural env using pretrained VLM. • Existing RL with pretrained embeddings mainly put the embedding directly into the agent. • This method shows large pretrained models can be used to guide exploration. 38
  • 37. Rationale • Why would VLM representation be helpful for semantic novelty-based exploration? Intuitions? • 1. Language is inherently abstract. • language links situations similarly which are superficially distinct but causally related, and vice versa. • 2. Language carries important information efficiently ignoring miscellaneous noises. 39
  • 38. Random Network Distillation Proof of Concept • Question: can be a novelty measure? • Dataset: many 0 and other digit. (Eg. 5000 images for 0, 10 images for 1) • is trained to . ||fψ(s) − ffixed(s)||2 N fψ min ψ ||fψ(s) − ffixed(s)||2 40 N, # of target class in training data MSE for unseen data
  • 39. Novelty Calculation Baseline - Never Give Up Outline • Never Give Up (NGU) makes based on how new the state is in this episode. • NUG components • , state encoder: • , memory of for all in one episode. • is distinct with the experience replay buffer of the whole game. ri fψ s → Rk M f(s) s M 41
  • 40. Novelty Calculation Baseline - Never Give Up Environment interaction diagram • is made by encoder and non-parametric buffer of encoded states . • does not share any parameters with a RL agent. ri fψ M fψ 42
  • 41. Novelty Calculation Baseline - Never Give Up Calculation of intrinsic reward • • is k-nearest neighbors of in the episode memory . • is bigger when the encoded state is far from the existing encoded states. • is filled with for all which are visited states so far in this episode. ri = R(f(s′), M) ∝ ∑ f(x)∈knn(f(s′),M) ||f(s′) − f(x)||2 knn(f(s′), M) f(s′) M ri M f(s) s 43
  • 42. Novelty Calculation Baseline - Never Give Up Training of the state encoder • is trained to extract visual features related only to the agent’s actions. • , where is MLP. • should extract the features relevant to its action. • In the today’s paper, only Vis-NGU trains in this way. • Lang-NGU, LSE-NGU use the fixed pretrained (CLIP, ALM etc.). fψ at = h(fψ(st), fψ(st+1)) h fψ fψ fψ 44
  • 43. Novelty Calculation Baseline - Never Give Up Training of a RL agent • RL agent: DRQN + -greedy. • Q function loss: , where ~ experience replay buffer of the whole game. ϵ L(ϕ) = ||(re t + βri t) + γQϕ(st+1, at+1) − Qϕ(st, at)||2 (st, at, re t , ri t, st+1) 45
  • 44. S-LD Construction • S-LD uses a fixed target network , whose output distribution is same with the oracle, but the mapping is random. • construction procedure: • Get the empirical oracle language distribution by , which is trained by LD. • Get an image embedding as a real number using random fixed NN. • Map the random real number to language, according to the oracle distribution. ffixed : S → L ffixed πLD 46
  • 45. Pretrained Image Model instead of VLM • CNN encoder pretrained on ImageNet is compared. • Language embedding gets much bigger performance in harder tasks. (Put, Find) 47
  • 46. Oracle Language v.s. Image Embedding from VLM • Methods with image are not significantly worse than methods with oracle languages. 48
  • 47. Visited State Heatmap in City Environment • NGU variants explores in City environment only with intrinsic rewards. • Language abstractions makes the visited state wider. 49 Observation examples in City Environment