SlideShare a Scribd company logo
1 of 23
Download to read offline
Code available at: http://vision.snu.ac.kr/projects/cb
Curiosity-Bottleneck:
Exploration by
DistillingTask-Specific Novelty
ICML 2019
Youngjin Kim
Hyunwoo Kim*
Wontae Nam*
Jihoon Kim
Gunhee Kim
(*equal contribution)
Exploitation vs. Exploration
Image source: UC Berkeley AI course slide, lecture 11
NEW
!
Extrinsic Reward vs. Intrinsic Reward
+500 SCORE for getting an item !
-150 SCORE for stepping a bomb : ( +200 MOTIVATION SCORE
as I’ve never been to this place !
-150 MOTIVATION SCORE
I’ve been here too many times
Previous Research on Exploration
Anything Novel
Source for Novelty
Task-irrelevant
Novelty
Task-relevant
Novelty
Our Research
Task-irrelevant
Novelty
Task-relevant
Novelty
1. Distractive environments are widespread
§ Real-world observations contain novel but task-irrelevant information.
Problematic situation:
Exploration under Distraction
(a) Known Place
(b) Known Place
with Strangers
Navigating robot
2. Degeneration of prior novelty-based exploration strategies
§ Due to task-agnostic intrinsic reward
§ Need mechanisms to prioritize task-relevant novelty
Not Novel Novel
Problematic situation:
Exploration under Distraction
(a) Known Place
(b) Known Place
with Strangers
Navigating robot
Quantify the ‘Degree of Compression’ using
a compressive value network
𝑥" 𝜋E
Compressor
𝑟"
%
E 𝑟"
&
𝑎"
Value Predictor
Intrinsic Reward
External Reward
Environment Policy Environment
Our approach: Curiosity-Bottleneck
(𝑦"
§ Encode rare 𝑥 to a lengthy code and common 𝑥 to a shorter code
§ Discard information about 𝑥 during compression
Our approach: Curiosity-Bottleneck
𝑥" 𝜋E
Compressor
𝑟"
%
E 𝑟"
&
𝑎"
Value Predictor
Intrinsic Reward
External Reward
Environment Policy Environment
Compressor
(𝑦"
§ Prevent the Compressor from discarding task-related information
𝑥" 𝜋E
Compressor
𝑟"
%
E 𝑟"
&
𝑎"
Value Predictor
Intrinsic Reward
External Reward
Environment Policy Environment
Our approach: Curiosity-Bottleneck
Value Predictor
(𝑦"
1. Objective Function
§ Minimize average code-length of representation 𝑍
§ Discard information about observation 𝑋
𝑚𝑎𝑥 𝐼(𝑍; 𝑌)
𝑚𝑖𝑛 𝐻(𝑍) − 𝐻 𝑍 𝑋 = 𝑚𝑖𝑛 𝐼(𝑋; 𝑍)
§ Preserve information related to value estimate 𝑌
𝐿 = −𝐼 𝑍; 𝑌 + 𝛽𝐼 𝑋; 𝑍
𝑟%
(𝑥) = :
;
𝑝 𝑧 𝑥 log
𝑝 𝑥, 𝑧
𝑝 𝑥 𝑝(𝑧)
𝑑𝑧
2. Intrinsic Reward: Per-instance Mutual Information
Our approach: Curiosity-Bottleneck
3. Approximation
Variational Information Bottleneck with Gaussian assumptions
𝐿C,D = 𝐸F,G[− log 𝑞D 𝑦 𝑧 + 𝛽𝐾𝐿[𝑝C 𝑍 𝑥 | 𝑞 𝑍 ]
𝑟%
(𝑥) = 𝐾𝐿[𝑝C 𝑍 𝑥 ||𝑞 𝑍 ]
𝑧" ∼ 𝑝C(𝑍|𝑥")𝑥"
Compressor
𝜇C, 𝜎C
𝐾𝐿[𝑝C(𝑍|𝑥")||𝑞(𝑍)]
Value Predictor
𝜇D, 𝜎D
𝑟"
%
−log𝑞D(𝑦"|𝑧")
𝐿C,D
+
Our approach: Curiosity-Bottleneck
Proof of concept: static images
Random
Box
Object
Pixel
Noise
Detects novelty 𝑝"( ) while being robust to distraction 𝑝P( )
(b) Ideal (c) CB (d) CB-noKL (e) RND (f) SimHash
0.1 0.9
0.1
0.9
𝑝"
𝑝P
Random
Box
Object
Pixel
Noise
(a) Input
0.1
0.9
𝑝"
0.1
0.9
𝑝"
0.1 0.9𝑝P 0.1 0.9𝑝P 0.1 0.9𝑝P 0.1 0.9𝑝P
Proof of concept: static images
Experiment:Treasure Hunt
§ Agent is depicted as a circle
§ Item(triangle) with reward is hidden somewhere
§ The item appears only when the agent is nearby
§ Once the agent obtains an item, the next item
will be spawned in another area (also hidden)
§ The traces(pentagon) of eaten items will remain
§ Get the maximum score!
Example of the game play
Outline of the game
Experiment:Treasure Hunt
Movement condition
2 types of onset conditions for distraction
Location condition
When the agent stays
in the same location
When the agent stays
in the corners of the map
Consistently outperform baselines on different distraction settings
MeanEpisodicReward
(a) Movement Condition
CB CB-noKL RND Dynamics SimHash
(b) Location Condition
1e6 1e6
Experiment:Treasure Hunt
Experiment:Treasure Hunt
𝑥
𝑧
𝑞(𝑍)𝑝C(𝑍|𝑥") 𝑝C(𝑍|𝑥P)
𝑥" 𝑥P
Range of Experiences
𝑞(𝑍)𝑝C(𝑍|𝑥") 𝑝C(𝑍|𝑥P)
𝑥P𝑥"
Range of Experiences
𝛻KL− 𝛻log 𝑞D
− 𝛻log 𝑞D𝛻KL
𝑦 Target Value ( ) and Prediction ( )
(a) Early Training Steps (b) After Collecting Rewards
𝛻KL
− 𝛻log 𝑞D
𝛻KL − 𝛻log 𝑞D
18.2 8.018.1 4.6
𝑧
.….
illustration of adaptive exploration strategy
(a) Input (b) CB-Early (d) CB-noKL (f) Dynamics(e) RND(c) CB (g) SimHash
Compression loss term induces task-agnostic exploration in early stages
𝑲𝑳[𝒑 𝜽 𝒁 𝒙 ||𝒒 𝒁 ]
Grad-CamVisualization
The adaptive exploration strategy
Experiment:Treasure Hunt
Value prediction loss term induces task-specific exploration
after collecting external rewards
− 𝒍𝒐𝒈 𝒒 𝝓 𝒚 𝒛
(a) Input (b) CB-Early (d) CB-noKL (f) Dynamics(e) RND(c) CB (g) SimHash
Grad-CamVisualization
The adaptive exploration strategy
Experiment:Treasure Hunt
Gravitar Solaris
WithDistractionW.o.Distraction
Montezuma
CB CB-noKL RND Dynamics SimHash
Experiment: Atari Hard-exploration Games
Contributions
• First work to discriminate information by task-relevancy
→ Focus on task-relevant novelty and filter out distractive information
• Utilize information bottleneck as a novelty measure
→ the KL-divergence term as a degree of compression
• Extensive experiments
→ Experimented on a custom grid-world environment
to show situations where previous methods suffer.
Experimented on Atari environment for generality.
• Psychologically plausible

More Related Content

Similar to Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty

Chakrabarti alpha go analysis
Chakrabarti alpha go analysisChakrabarti alpha go analysis
Chakrabarti alpha go analysisDave Selinger
 
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...Numenta
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningMatthew Opala
 
Robot, Learning From Data
Robot, Learning From DataRobot, Learning From Data
Robot, Learning From DataSungjoon Choi
 
Paper reading best of both world
Paper reading best of both worldPaper reading best of both world
Paper reading best of both worldShinagawa Seitaro
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Hansol Kang
 
CG OpenGL surface detection+illumination+rendering models-course 9
CG OpenGL surface detection+illumination+rendering models-course 9CG OpenGL surface detection+illumination+rendering models-course 9
CG OpenGL surface detection+illumination+rendering models-course 9fungfung Chen
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slidesSara Asher
 
Kernel Bayes Rule
Kernel Bayes RuleKernel Bayes Rule
Kernel Bayes RuleYan Xu
 
Neural Art (English Version)
Neural Art (English Version)Neural Art (English Version)
Neural Art (English Version)Mark Chang
 
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7tingyuansenastro
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning재연 윤
 
13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdfEmanAsem4
 
#10 pydata warsaw object detection with dn ns
#10   pydata warsaw object detection with dn ns#10   pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn nsAndrew Brozek
 

Similar to Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty (20)

Chakrabarti alpha go analysis
Chakrabarti alpha go analysisChakrabarti alpha go analysis
Chakrabarti alpha go analysis
 
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep Learning
 
Robot, Learning From Data
Robot, Learning From DataRobot, Learning From Data
Robot, Learning From Data
 
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof..."Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
 
Paper reading best of both world
Paper reading best of both worldPaper reading best of both world
Paper reading best of both world
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
unit 4.pptx
unit 4.pptxunit 4.pptx
unit 4.pptx
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
 
CG OpenGL surface detection+illumination+rendering models-course 9
CG OpenGL surface detection+illumination+rendering models-course 9CG OpenGL surface detection+illumination+rendering models-course 9
CG OpenGL surface detection+illumination+rendering models-course 9
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 
Kernel Bayes Rule
Kernel Bayes RuleKernel Bayes Rule
Kernel Bayes Rule
 
Neural Art (English Version)
Neural Art (English Version)Neural Art (English Version)
Neural Art (English Version)
 
riken-RBlur-slides.pptx
riken-RBlur-slides.pptxriken-RBlur-slides.pptx
riken-RBlur-slides.pptx
 
Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval
 
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
Hierarchical Object Detection with Deep Reinforcement Learning
Hierarchical Object Detection with Deep Reinforcement LearningHierarchical Object Detection with Deep Reinforcement Learning
Hierarchical Object Detection with Deep Reinforcement Learning
 
13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf
 
#10 pydata warsaw object detection with dn ns
#10   pydata warsaw object detection with dn ns#10   pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn ns
 

More from Hyunwoo Kim

서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업Hyunwoo Kim
 
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks
Abstractive Summarization of Reddit Posts with Multi-level Memory NetworksAbstractive Summarization of Reddit Posts with Multi-level Memory Networks
Abstractive Summarization of Reddit Posts with Multi-level Memory NetworksHyunwoo Kim
 
Genetic Algorithm Project 2
Genetic Algorithm Project 2Genetic Algorithm Project 2
Genetic Algorithm Project 2Hyunwoo Kim
 
Sentiment Analysis Intro
Sentiment Analysis IntroSentiment Analysis Intro
Sentiment Analysis IntroHyunwoo Kim
 
Universal Adversarial Perturbation
Universal Adversarial PerturbationUniversal Adversarial Perturbation
Universal Adversarial PerturbationHyunwoo Kim
 
Two VWM representations simultaneously control attention
Two VWM representations simultaneously control attentionTwo VWM representations simultaneously control attention
Two VWM representations simultaneously control attentionHyunwoo Kim
 
Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표Hyunwoo Kim
 
Neural Networks Basics with PyTorch
Neural Networks Basics with PyTorchNeural Networks Basics with PyTorch
Neural Networks Basics with PyTorchHyunwoo Kim
 
Capstone Design(2) 중간 발표
Capstone Design(2) 중간 발표Capstone Design(2) 중간 발표
Capstone Design(2) 중간 발표Hyunwoo Kim
 
Capstone Design(2) 연구제안 발표
Capstone Design(2) 연구제안 발표Capstone Design(2) 연구제안 발표
Capstone Design(2) 연구제안 발표Hyunwoo Kim
 
Capstone Design(1) 최종 발표
Capstone Design(1) 최종 발표Capstone Design(1) 최종 발표
Capstone Design(1) 최종 발표Hyunwoo Kim
 
Capstone Design(1) 중간 발표
Capstone Design(1) 중간 발표Capstone Design(1) 중간 발표
Capstone Design(1) 중간 발표Hyunwoo Kim
 
Capstone Design(1) 연구제안 발표
Capstone Design(1) 연구제안 발표Capstone Design(1) 연구제안 발표
Capstone Design(1) 연구제안 발표Hyunwoo Kim
 
Neural Network Intro [인공신경망 설명]
Neural Network Intro [인공신경망 설명]Neural Network Intro [인공신경망 설명]
Neural Network Intro [인공신경망 설명]Hyunwoo Kim
 
Random Forest Intro [랜덤포레스트 설명]
Random Forest Intro [랜덤포레스트 설명]Random Forest Intro [랜덤포레스트 설명]
Random Forest Intro [랜덤포레스트 설명]Hyunwoo Kim
 
Decision Tree Intro [의사결정나무]
Decision Tree Intro [의사결정나무]Decision Tree Intro [의사결정나무]
Decision Tree Intro [의사결정나무]Hyunwoo Kim
 

More from Hyunwoo Kim (16)

서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
 
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks
Abstractive Summarization of Reddit Posts with Multi-level Memory NetworksAbstractive Summarization of Reddit Posts with Multi-level Memory Networks
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks
 
Genetic Algorithm Project 2
Genetic Algorithm Project 2Genetic Algorithm Project 2
Genetic Algorithm Project 2
 
Sentiment Analysis Intro
Sentiment Analysis IntroSentiment Analysis Intro
Sentiment Analysis Intro
 
Universal Adversarial Perturbation
Universal Adversarial PerturbationUniversal Adversarial Perturbation
Universal Adversarial Perturbation
 
Two VWM representations simultaneously control attention
Two VWM representations simultaneously control attentionTwo VWM representations simultaneously control attention
Two VWM representations simultaneously control attention
 
Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표
 
Neural Networks Basics with PyTorch
Neural Networks Basics with PyTorchNeural Networks Basics with PyTorch
Neural Networks Basics with PyTorch
 
Capstone Design(2) 중간 발표
Capstone Design(2) 중간 발표Capstone Design(2) 중간 발표
Capstone Design(2) 중간 발표
 
Capstone Design(2) 연구제안 발표
Capstone Design(2) 연구제안 발표Capstone Design(2) 연구제안 발표
Capstone Design(2) 연구제안 발표
 
Capstone Design(1) 최종 발표
Capstone Design(1) 최종 발표Capstone Design(1) 최종 발표
Capstone Design(1) 최종 발표
 
Capstone Design(1) 중간 발표
Capstone Design(1) 중간 발표Capstone Design(1) 중간 발표
Capstone Design(1) 중간 발표
 
Capstone Design(1) 연구제안 발표
Capstone Design(1) 연구제안 발표Capstone Design(1) 연구제안 발표
Capstone Design(1) 연구제안 발표
 
Neural Network Intro [인공신경망 설명]
Neural Network Intro [인공신경망 설명]Neural Network Intro [인공신경망 설명]
Neural Network Intro [인공신경망 설명]
 
Random Forest Intro [랜덤포레스트 설명]
Random Forest Intro [랜덤포레스트 설명]Random Forest Intro [랜덤포레스트 설명]
Random Forest Intro [랜덤포레스트 설명]
 
Decision Tree Intro [의사결정나무]
Decision Tree Intro [의사결정나무]Decision Tree Intro [의사결정나무]
Decision Tree Intro [의사결정나무]
 

Recently uploaded

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 

Recently uploaded (20)

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 

Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty

  • 1. Code available at: http://vision.snu.ac.kr/projects/cb Curiosity-Bottleneck: Exploration by DistillingTask-Specific Novelty ICML 2019 Youngjin Kim Hyunwoo Kim* Wontae Nam* Jihoon Kim Gunhee Kim (*equal contribution)
  • 2. Exploitation vs. Exploration Image source: UC Berkeley AI course slide, lecture 11 NEW !
  • 3. Extrinsic Reward vs. Intrinsic Reward +500 SCORE for getting an item ! -150 SCORE for stepping a bomb : ( +200 MOTIVATION SCORE as I’ve never been to this place ! -150 MOTIVATION SCORE I’ve been here too many times
  • 4. Previous Research on Exploration Anything Novel
  • 7. 1. Distractive environments are widespread § Real-world observations contain novel but task-irrelevant information. Problematic situation: Exploration under Distraction (a) Known Place (b) Known Place with Strangers Navigating robot
  • 8. 2. Degeneration of prior novelty-based exploration strategies § Due to task-agnostic intrinsic reward § Need mechanisms to prioritize task-relevant novelty Not Novel Novel Problematic situation: Exploration under Distraction (a) Known Place (b) Known Place with Strangers Navigating robot
  • 9. Quantify the ‘Degree of Compression’ using a compressive value network 𝑥" 𝜋E Compressor 𝑟" % E 𝑟" & 𝑎" Value Predictor Intrinsic Reward External Reward Environment Policy Environment Our approach: Curiosity-Bottleneck (𝑦"
  • 10. § Encode rare 𝑥 to a lengthy code and common 𝑥 to a shorter code § Discard information about 𝑥 during compression Our approach: Curiosity-Bottleneck 𝑥" 𝜋E Compressor 𝑟" % E 𝑟" & 𝑎" Value Predictor Intrinsic Reward External Reward Environment Policy Environment Compressor (𝑦"
  • 11. § Prevent the Compressor from discarding task-related information 𝑥" 𝜋E Compressor 𝑟" % E 𝑟" & 𝑎" Value Predictor Intrinsic Reward External Reward Environment Policy Environment Our approach: Curiosity-Bottleneck Value Predictor (𝑦"
  • 12. 1. Objective Function § Minimize average code-length of representation 𝑍 § Discard information about observation 𝑋 𝑚𝑎𝑥 𝐼(𝑍; 𝑌) 𝑚𝑖𝑛 𝐻(𝑍) − 𝐻 𝑍 𝑋 = 𝑚𝑖𝑛 𝐼(𝑋; 𝑍) § Preserve information related to value estimate 𝑌 𝐿 = −𝐼 𝑍; 𝑌 + 𝛽𝐼 𝑋; 𝑍 𝑟% (𝑥) = : ; 𝑝 𝑧 𝑥 log 𝑝 𝑥, 𝑧 𝑝 𝑥 𝑝(𝑧) 𝑑𝑧 2. Intrinsic Reward: Per-instance Mutual Information Our approach: Curiosity-Bottleneck
  • 13. 3. Approximation Variational Information Bottleneck with Gaussian assumptions 𝐿C,D = 𝐸F,G[− log 𝑞D 𝑦 𝑧 + 𝛽𝐾𝐿[𝑝C 𝑍 𝑥 | 𝑞 𝑍 ] 𝑟% (𝑥) = 𝐾𝐿[𝑝C 𝑍 𝑥 ||𝑞 𝑍 ] 𝑧" ∼ 𝑝C(𝑍|𝑥")𝑥" Compressor 𝜇C, 𝜎C 𝐾𝐿[𝑝C(𝑍|𝑥")||𝑞(𝑍)] Value Predictor 𝜇D, 𝜎D 𝑟" % −log𝑞D(𝑦"|𝑧") 𝐿C,D + Our approach: Curiosity-Bottleneck
  • 14. Proof of concept: static images Random Box Object Pixel Noise
  • 15. Detects novelty 𝑝"( ) while being robust to distraction 𝑝P( ) (b) Ideal (c) CB (d) CB-noKL (e) RND (f) SimHash 0.1 0.9 0.1 0.9 𝑝" 𝑝P Random Box Object Pixel Noise (a) Input 0.1 0.9 𝑝" 0.1 0.9 𝑝" 0.1 0.9𝑝P 0.1 0.9𝑝P 0.1 0.9𝑝P 0.1 0.9𝑝P Proof of concept: static images
  • 16. Experiment:Treasure Hunt § Agent is depicted as a circle § Item(triangle) with reward is hidden somewhere § The item appears only when the agent is nearby § Once the agent obtains an item, the next item will be spawned in another area (also hidden) § The traces(pentagon) of eaten items will remain § Get the maximum score! Example of the game play Outline of the game
  • 17. Experiment:Treasure Hunt Movement condition 2 types of onset conditions for distraction Location condition When the agent stays in the same location When the agent stays in the corners of the map
  • 18. Consistently outperform baselines on different distraction settings MeanEpisodicReward (a) Movement Condition CB CB-noKL RND Dynamics SimHash (b) Location Condition 1e6 1e6 Experiment:Treasure Hunt
  • 19. Experiment:Treasure Hunt 𝑥 𝑧 𝑞(𝑍)𝑝C(𝑍|𝑥") 𝑝C(𝑍|𝑥P) 𝑥" 𝑥P Range of Experiences 𝑞(𝑍)𝑝C(𝑍|𝑥") 𝑝C(𝑍|𝑥P) 𝑥P𝑥" Range of Experiences 𝛻KL− 𝛻log 𝑞D − 𝛻log 𝑞D𝛻KL 𝑦 Target Value ( ) and Prediction ( ) (a) Early Training Steps (b) After Collecting Rewards 𝛻KL − 𝛻log 𝑞D 𝛻KL − 𝛻log 𝑞D 18.2 8.018.1 4.6 𝑧 .…. illustration of adaptive exploration strategy
  • 20. (a) Input (b) CB-Early (d) CB-noKL (f) Dynamics(e) RND(c) CB (g) SimHash Compression loss term induces task-agnostic exploration in early stages 𝑲𝑳[𝒑 𝜽 𝒁 𝒙 ||𝒒 𝒁 ] Grad-CamVisualization The adaptive exploration strategy Experiment:Treasure Hunt
  • 21. Value prediction loss term induces task-specific exploration after collecting external rewards − 𝒍𝒐𝒈 𝒒 𝝓 𝒚 𝒛 (a) Input (b) CB-Early (d) CB-noKL (f) Dynamics(e) RND(c) CB (g) SimHash Grad-CamVisualization The adaptive exploration strategy Experiment:Treasure Hunt
  • 22. Gravitar Solaris WithDistractionW.o.Distraction Montezuma CB CB-noKL RND Dynamics SimHash Experiment: Atari Hard-exploration Games
  • 23. Contributions • First work to discriminate information by task-relevancy → Focus on task-relevant novelty and filter out distractive information • Utilize information bottleneck as a novelty measure → the KL-divergence term as a degree of compression • Extensive experiments → Experimented on a custom grid-world environment to show situations where previous methods suffer. Experimented on Atari environment for generality. • Psychologically plausible