SlideShare a Scribd company logo
1 of 1
Download to read offline
Utilizing OpenCV for Q-Learning State Space Reduction in
Re-Purposed Off-The-Shelf FPV Rovers
Dr. Elan Barenholtz, William Hahn, Shawn Martin, Paul Morris,
Nick Tutuianu, Marcus McGuire, Washington Garcia
Machine Perception and Cognitive Robotics Laboratory
Center for Complex Systems and Brain Sciences
Introduction
• In our lab, we have been using Brookstone’s
Rover 2.0 Spy Tank to conduct our experiments,
which can be purchased off the shelf for $99.
Five years ago, a rover of similar grade wouldn’t
cost anything under $500, going to show that
technology like this is only now becoming more
and more relevant. With this rover and our
software, we are able to tap into all of its
functionality, but most importantly its movement
mechanics, IR sensors, and controllable
camera.
• We are using libraries such as OpenCV and Q-
Learning. Like teaching anything, Q-Learning in
its simplest form, is a process in which the rover
learns in an action-reward based system (see
Figure 3). This functionality along with the help
of OpenCV, allows the rover to look for certain
shapes or colors, and ultimately learn to look for
those shapes or colors without explicitly being
told to do so.
Method
• The rover was surrounded with a box
environment with four differently colored sides,
designed to provide a simple environment with
limited choices. A reward was given to the rover
when pink was detected in the center of the
image. The rover would make random
movements on its primary trials. Eventually, the
program would guide the rover to the color pink
having taken the least amount of steps/most
efficient route.
• The program uses functions from the library
OpenCV to mask image frames and search for
specific color ranges. From there, the Q-
Learning algorithm (see Figure 3) is
implemented that allows the program to weigh
each decision to move the rover left or right
from its current position. In a simple
explanation, it starts to work by factoring in the
decision it made the time before it received a
reward. As the simulation is run more and more
times, the program will begin to have weighted
values for more and more decisions taken from
more and more locations. This is called a Q-
table.
Introduction
• In our lab, we have been using Brookstone’s
Rover 2.0 Spy Tank to conduct our experiments,
which can be purchased off the shelf for $99.
Five years ago, a rover of similar grade wouldn’t
cost anything under $500, going to show that
technology like this is only now becoming more
and more relevant. With this rover and our
software, we are able to tap into all of its
functionality, but most importantly its movement
mechanics, IR sensors, and controllable
camera.
• We are using libraries such as OpenCV and Q-
Learning. Like teaching anything, Q-Learning in
its simplest form, is a process in which the rover
learns in an action-reward based system (see
Figure 3). This functionality along with the help
of OpenCV, allows the rover to look for certain
shapes or colors, and ultimately learn to look for
those shapes or colors without explicitly being
told to do so.
Method
• The rover was surrounded with a box
environment with four differently colored sides,
designed to provide a simple environment with
limited choices. A reward was given to the rover
when pink was detected in the center of the
image. The rover would make random
movements on its primary trials. Eventually, the
program would guide the rover to the color pink
having taken the least amount of steps/most
efficient route.
• The program uses functions from the library
OpenCV to mask image frames and search for
specific color ranges. From there, the Q-
Learning algorithm (see Figure 3) is
implemented that allows the program to weigh
each decision to move the rover left or right
from its current position. In a simple
explanation, it starts to work by factoring in the
decision it made the time before it received a
reward. As the simulation is run more and more
times, the program will begin to have weighted
values for more and more decisions taken from
more and more locations. This is called a Q-
table.
Results
Our Neural Net Q-Learning implementation was
tested in a simulation of the box with 12 possible
states (see Figure 2). After 2000 training iterations
at a learning rate of 1e-4 for each iteration, the
length of the path the network learned to take was
evaluated against the actual shortest path. The
problem was evaluated at multiple solution states
where the shortest path length fell into one of three
categories:
• Shortest path = 2 state changes: avg path
length across 100 runs = 2.00, diff = 0.00
compared to random avg=26.52, diff=22.52
• Shortest path = 4 state changes: avg path
length across 100 runs = 5.07, diff = 1.07
compared to random avg=39.71, diff=35.71
• Shortest path = 6 state changes: avg path
length across 100 runs = 6.00, diff = 0.00
compared to random avg=45.07, diff=41.07
Discussion
1. The Q-Learning algorithm performed perfectly
when the shortest path was of length 2 or 6.
The algorithm still performed well on 4-length
paths, but occasionally found an 8 length path
(meaning it turned the other direction.) This
result seems to occur because the 1e-4 learning
rate only propagated a significantly discounted
reward to states 2-4 states away from the
terminal reward. The network saw the same
reward turning left and turning right. At length 6,
this issue did not occur because the distance
around left and right are both equal.
2. In future research, we intend to find a solution to
the learning rate issue to the algorithm
generalizes. Then, we will run a physical rover
using a pre-trained network from the simulator
to demonstrate that this research translates to
intelligence in the real world.
References
1. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I.
Antonoglou, D. Wierstra, and M. Riedmiller.
“Playing Arari with deep reinforcement learning”.
Neural Information Processing Systems (NIPS)
Deep Learning Workshop, 2013.
Results
Our Neural Net Q-Learning implementation was
tested in a simulation of the box with 12 possible
states (see Figure 2). After 2000 training iterations
at a learning rate of 1e-4 for each iteration, the
length of the path the network learned to take was
evaluated against the actual shortest path. The
problem was evaluated at multiple solution states
where the shortest path length fell into one of three
categories:
• Shortest path = 2 state changes: avg path
length across 100 runs = 2.00, diff = 0.00
compared to random avg=26.52, diff=22.52
• Shortest path = 4 state changes: avg path
length across 100 runs = 5.07, diff = 1.07
compared to random avg=39.71, diff=35.71
• Shortest path = 6 state changes: avg path
length across 100 runs = 6.00, diff = 0.00
compared to random avg=45.07, diff=41.07
Discussion
1. The Q-Learning algorithm performed perfectly
when the shortest path was of length 2 or 6.
The algorithm still performed well on 4-length
paths, but occasionally found an 8 length path
(meaning it turned the other direction.) This
result seems to occur because the 1e-4 learning
rate only propagated a significantly discounted
reward to states 2-4 states away from the
terminal reward. The network saw the same
reward turning left and turning right. At length 6,
this issue did not occur because the distance
around left and right are both equal.
2. In future research, we intend to find a solution to
the learning rate issue to the algorithm
generalizes. Then, we will run a physical rover
using a pre-trained network from the simulator
to demonstrate that this research translates to
intelligence in the real world.
References
1. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I.
Antonoglou, D. Wierstra, and M. Riedmiller.
“Playing Arari with deep reinforcement learning”.
Neural Information Processing Systems (NIPS)
Deep Learning Workshop, 2013.
Undergraduate Research Symposium, April 6, 2012
Figure 5: The rover has found and
encompassed the entire pink card in its view.
Figure 2: In this simulation, the arrow has
learned to find the pink side of the box in the
shortest path.
Figure 1: UML Sequence Diagram of our
program structure
Figure 4: Image of Brookestone’s Rover 2.0
Spy Tanks used in our lab.
Figure 3:
Q-Learning
Algorithm

More Related Content

What's hot

Finding bursty topics from microblogs
Finding bursty topics from microblogsFinding bursty topics from microblogs
Finding bursty topics from microblogsmoresmile
 
Convolutional neural networks deepa
Convolutional neural networks deepaConvolutional neural networks deepa
Convolutional neural networks deepadeepa4466
 
15 Machine Learning Multilayer Perceptron
15 Machine Learning Multilayer Perceptron15 Machine Learning Multilayer Perceptron
15 Machine Learning Multilayer PerceptronAndres Mendez-Vazquez
 
VDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityVDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityPier Luca Lanzi
 
Complexity analysis of multilayer perceptron neural network embedded into a w...
Complexity analysis of multilayer perceptron neural network embedded into a w...Complexity analysis of multilayer perceptron neural network embedded into a w...
Complexity analysis of multilayer perceptron neural network embedded into a w...Amir Shokri
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksHannes Hapke
 
TensorFlow Tutorial Part1
TensorFlow Tutorial Part1TensorFlow Tutorial Part1
TensorFlow Tutorial Part1Sungjoon Choi
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep LearningOswald Campesato
 
Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Ki...
Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Ki...Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Ki...
Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Ki...Myungyon Kim
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Tutorial on convolutional neural networks
Tutorial on convolutional neural networksTutorial on convolutional neural networks
Tutorial on convolutional neural networksHojin Yang
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringPier Luca Lanzi
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesArvind Rapaka
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Fatimakhan325
 
Practical deepllearningv1
Practical deepllearningv1Practical deepllearningv1
Practical deepllearningv1arthi v
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 

What's hot (20)

Finding bursty topics from microblogs
Finding bursty topics from microblogsFinding bursty topics from microblogs
Finding bursty topics from microblogs
 
Convolutional neural networks deepa
Convolutional neural networks deepaConvolutional neural networks deepa
Convolutional neural networks deepa
 
15 Machine Learning Multilayer Perceptron
15 Machine Learning Multilayer Perceptron15 Machine Learning Multilayer Perceptron
15 Machine Learning Multilayer Perceptron
 
VDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityVDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with Unity
 
Complexity analysis of multilayer perceptron neural network embedded into a w...
Complexity analysis of multilayer perceptron neural network embedded into a w...Complexity analysis of multilayer perceptron neural network embedded into a w...
Complexity analysis of multilayer perceptron neural network embedded into a w...
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clustering
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
L010628894
L010628894L010628894
L010628894
 
TensorFlow Tutorial Part1
TensorFlow Tutorial Part1TensorFlow Tutorial Part1
TensorFlow Tutorial Part1
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep Learning
 
Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Ki...
Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Ki...Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Ki...
Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Ki...
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Tutorial on convolutional neural networks
Tutorial on convolutional neural networksTutorial on convolutional neural networks
Tutorial on convolutional neural networks
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial Usecases
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)
 
Practical deepllearningv1
Practical deepllearningv1Practical deepllearningv1
Practical deepllearningv1
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 

Viewers also liked

news corp Reconciliation of Non-GAAP Financial Measures
news corp Reconciliation of Non-GAAP Financial Measures news corp Reconciliation of Non-GAAP Financial Measures
news corp Reconciliation of Non-GAAP Financial Measures finance9
 
Death on Facebook. Mourning and memory as a prosumer activity - Piergiorgio D...
Death on Facebook. Mourning and memory as a prosumer activity - Piergiorgio D...Death on Facebook. Mourning and memory as a prosumer activity - Piergiorgio D...
Death on Facebook. Mourning and memory as a prosumer activity - Piergiorgio D...cyborgology
 
Mansoor Malik
Mansoor MalikMansoor Malik
Mansoor Malikagrilinea
 
Jack Tuszynski
Jack TuszynskiJack Tuszynski
Jack Tuszynskiagrilinea
 
A line in the sand
A line in the sandA line in the sand
A line in the sandPriya Dutta
 
Interventional orthopedics foundation amniotic tissue products poster
Interventional orthopedics foundation amniotic tissue products posterInterventional orthopedics foundation amniotic tissue products poster
Interventional orthopedics foundation amniotic tissue products posterChris Centeno
 
Caselli sidt2014 craving, pensiero desiderante, sessualità
Caselli sidt2014 craving, pensiero desiderante, sessualitàCaselli sidt2014 craving, pensiero desiderante, sessualità
Caselli sidt2014 craving, pensiero desiderante, sessualitàGabriele Caselli
 
Alessandra Bordoni
Alessandra  BordoniAlessandra  Bordoni
Alessandra Bordoniagrilinea
 
Review of verb tenses - EF Pre-intermediate 6C
Review of verb tenses - EF Pre-intermediate 6CReview of verb tenses - EF Pre-intermediate 6C
Review of verb tenses - EF Pre-intermediate 6CEOI Alcalá de Guadaíra
 

Viewers also liked (13)

news corp Reconciliation of Non-GAAP Financial Measures
news corp Reconciliation of Non-GAAP Financial Measures news corp Reconciliation of Non-GAAP Financial Measures
news corp Reconciliation of Non-GAAP Financial Measures
 
Cuero
CueroCuero
Cuero
 
Death on Facebook. Mourning and memory as a prosumer activity - Piergiorgio D...
Death on Facebook. Mourning and memory as a prosumer activity - Piergiorgio D...Death on Facebook. Mourning and memory as a prosumer activity - Piergiorgio D...
Death on Facebook. Mourning and memory as a prosumer activity - Piergiorgio D...
 
Dayara bugyal
Dayara bugyalDayara bugyal
Dayara bugyal
 
Mansoor Malik
Mansoor MalikMansoor Malik
Mansoor Malik
 
Jack Tuszynski
Jack TuszynskiJack Tuszynski
Jack Tuszynski
 
A line in the sand
A line in the sandA line in the sand
A line in the sand
 
Interventional orthopedics foundation amniotic tissue products poster
Interventional orthopedics foundation amniotic tissue products posterInterventional orthopedics foundation amniotic tissue products poster
Interventional orthopedics foundation amniotic tissue products poster
 
9
99
9
 
Caselli sidt2014 craving, pensiero desiderante, sessualità
Caselli sidt2014 craving, pensiero desiderante, sessualitàCaselli sidt2014 craving, pensiero desiderante, sessualità
Caselli sidt2014 craving, pensiero desiderante, sessualità
 
Alessandra Bordoni
Alessandra  BordoniAlessandra  Bordoni
Alessandra Bordoni
 
Monthly Report June 2015
Monthly Report June 2015Monthly Report June 2015
Monthly Report June 2015
 
Review of verb tenses - EF Pre-intermediate 6C
Review of verb tenses - EF Pre-intermediate 6CReview of verb tenses - EF Pre-intermediate 6C
Review of verb tenses - EF Pre-intermediate 6C
 

Similar to OpenCV Q-Learning Rover State Reduction

EIPOMDP Poster (PDF)
EIPOMDP Poster (PDF)EIPOMDP Poster (PDF)
EIPOMDP Poster (PDF)Teddy Ni
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfHassanElalfy4
 
A Deep Belief Network Approach to Learning Depth from Optical Flow
A Deep Belief Network Approach to Learning Depth from Optical FlowA Deep Belief Network Approach to Learning Depth from Optical Flow
A Deep Belief Network Approach to Learning Depth from Optical FlowReuben Feinman
 
Convolutional Neural Networks Research
Convolutional Neural Networks ResearchConvolutional Neural Networks Research
Convolutional Neural Networks ResearchTanmay Ghai
 
A Comprehensive and Comparative Study Of Maze-Solving Techniques by Implement...
A Comprehensive and Comparative Study Of Maze-Solving Techniques by Implement...A Comprehensive and Comparative Study Of Maze-Solving Techniques by Implement...
A Comprehensive and Comparative Study Of Maze-Solving Techniques by Implement...IOSR Journals
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
A study on data fusion techniques used in multiple radar tracking
A study on data fusion techniques used in multiple radar trackingA study on data fusion techniques used in multiple radar tracking
A study on data fusion techniques used in multiple radar trackingTBSS Group
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptxManiMaran230751
 
Travelling Salesman Problem
Travelling Salesman ProblemTravelling Salesman Problem
Travelling Salesman ProblemShikha Gupta
 
自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用Ryo Iwaki
 
Iaetsd modified artificial potential fields algorithm for mobile robot path ...
Iaetsd modified  artificial potential fields algorithm for mobile robot path ...Iaetsd modified  artificial potential fields algorithm for mobile robot path ...
Iaetsd modified artificial potential fields algorithm for mobile robot path ...Iaetsd Iaetsd
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Neelabha Pant
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Randa Elanwar
 

Similar to OpenCV Q-Learning Rover State Reduction (20)

MPCR_R_O_V_E_R_Final
MPCR_R_O_V_E_R_FinalMPCR_R_O_V_E_R_Final
MPCR_R_O_V_E_R_Final
 
EIPOMDP Poster (PDF)
EIPOMDP Poster (PDF)EIPOMDP Poster (PDF)
EIPOMDP Poster (PDF)
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdf
 
A Deep Belief Network Approach to Learning Depth from Optical Flow
A Deep Belief Network Approach to Learning Depth from Optical FlowA Deep Belief Network Approach to Learning Depth from Optical Flow
A Deep Belief Network Approach to Learning Depth from Optical Flow
 
O18020393104
O18020393104O18020393104
O18020393104
 
Research Poster_3
Research Poster_3Research Poster_3
Research Poster_3
 
Convolutional Neural Networks Research
Convolutional Neural Networks ResearchConvolutional Neural Networks Research
Convolutional Neural Networks Research
 
A Comprehensive and Comparative Study Of Maze-Solving Techniques by Implement...
A Comprehensive and Comparative Study Of Maze-Solving Techniques by Implement...A Comprehensive and Comparative Study Of Maze-Solving Techniques by Implement...
A Comprehensive and Comparative Study Of Maze-Solving Techniques by Implement...
 
E017142429
E017142429E017142429
E017142429
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
A study on data fusion techniques used in multiple radar tracking
A study on data fusion techniques used in multiple radar trackingA study on data fusion techniques used in multiple radar tracking
A study on data fusion techniques used in multiple radar tracking
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
 
Travelling Salesman Problem
Travelling Salesman ProblemTravelling Salesman Problem
Travelling Salesman Problem
 
自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用
 
Iaetsd modified artificial potential fields algorithm for mobile robot path ...
Iaetsd modified  artificial potential fields algorithm for mobile robot path ...Iaetsd modified  artificial potential fields algorithm for mobile robot path ...
Iaetsd modified artificial potential fields algorithm for mobile robot path ...
 
Sudoku solver
Sudoku solverSudoku solver
Sudoku solver
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
 

OpenCV Q-Learning Rover State Reduction

  • 1. Utilizing OpenCV for Q-Learning State Space Reduction in Re-Purposed Off-The-Shelf FPV Rovers Dr. Elan Barenholtz, William Hahn, Shawn Martin, Paul Morris, Nick Tutuianu, Marcus McGuire, Washington Garcia Machine Perception and Cognitive Robotics Laboratory Center for Complex Systems and Brain Sciences Introduction • In our lab, we have been using Brookstone’s Rover 2.0 Spy Tank to conduct our experiments, which can be purchased off the shelf for $99. Five years ago, a rover of similar grade wouldn’t cost anything under $500, going to show that technology like this is only now becoming more and more relevant. With this rover and our software, we are able to tap into all of its functionality, but most importantly its movement mechanics, IR sensors, and controllable camera. • We are using libraries such as OpenCV and Q- Learning. Like teaching anything, Q-Learning in its simplest form, is a process in which the rover learns in an action-reward based system (see Figure 3). This functionality along with the help of OpenCV, allows the rover to look for certain shapes or colors, and ultimately learn to look for those shapes or colors without explicitly being told to do so. Method • The rover was surrounded with a box environment with four differently colored sides, designed to provide a simple environment with limited choices. A reward was given to the rover when pink was detected in the center of the image. The rover would make random movements on its primary trials. Eventually, the program would guide the rover to the color pink having taken the least amount of steps/most efficient route. • The program uses functions from the library OpenCV to mask image frames and search for specific color ranges. From there, the Q- Learning algorithm (see Figure 3) is implemented that allows the program to weigh each decision to move the rover left or right from its current position. In a simple explanation, it starts to work by factoring in the decision it made the time before it received a reward. As the simulation is run more and more times, the program will begin to have weighted values for more and more decisions taken from more and more locations. This is called a Q- table. Introduction • In our lab, we have been using Brookstone’s Rover 2.0 Spy Tank to conduct our experiments, which can be purchased off the shelf for $99. Five years ago, a rover of similar grade wouldn’t cost anything under $500, going to show that technology like this is only now becoming more and more relevant. With this rover and our software, we are able to tap into all of its functionality, but most importantly its movement mechanics, IR sensors, and controllable camera. • We are using libraries such as OpenCV and Q- Learning. Like teaching anything, Q-Learning in its simplest form, is a process in which the rover learns in an action-reward based system (see Figure 3). This functionality along with the help of OpenCV, allows the rover to look for certain shapes or colors, and ultimately learn to look for those shapes or colors without explicitly being told to do so. Method • The rover was surrounded with a box environment with four differently colored sides, designed to provide a simple environment with limited choices. A reward was given to the rover when pink was detected in the center of the image. The rover would make random movements on its primary trials. Eventually, the program would guide the rover to the color pink having taken the least amount of steps/most efficient route. • The program uses functions from the library OpenCV to mask image frames and search for specific color ranges. From there, the Q- Learning algorithm (see Figure 3) is implemented that allows the program to weigh each decision to move the rover left or right from its current position. In a simple explanation, it starts to work by factoring in the decision it made the time before it received a reward. As the simulation is run more and more times, the program will begin to have weighted values for more and more decisions taken from more and more locations. This is called a Q- table. Results Our Neural Net Q-Learning implementation was tested in a simulation of the box with 12 possible states (see Figure 2). After 2000 training iterations at a learning rate of 1e-4 for each iteration, the length of the path the network learned to take was evaluated against the actual shortest path. The problem was evaluated at multiple solution states where the shortest path length fell into one of three categories: • Shortest path = 2 state changes: avg path length across 100 runs = 2.00, diff = 0.00 compared to random avg=26.52, diff=22.52 • Shortest path = 4 state changes: avg path length across 100 runs = 5.07, diff = 1.07 compared to random avg=39.71, diff=35.71 • Shortest path = 6 state changes: avg path length across 100 runs = 6.00, diff = 0.00 compared to random avg=45.07, diff=41.07 Discussion 1. The Q-Learning algorithm performed perfectly when the shortest path was of length 2 or 6. The algorithm still performed well on 4-length paths, but occasionally found an 8 length path (meaning it turned the other direction.) This result seems to occur because the 1e-4 learning rate only propagated a significantly discounted reward to states 2-4 states away from the terminal reward. The network saw the same reward turning left and turning right. At length 6, this issue did not occur because the distance around left and right are both equal. 2. In future research, we intend to find a solution to the learning rate issue to the algorithm generalizes. Then, we will run a physical rover using a pre-trained network from the simulator to demonstrate that this research translates to intelligence in the real world. References 1. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. “Playing Arari with deep reinforcement learning”. Neural Information Processing Systems (NIPS) Deep Learning Workshop, 2013. Results Our Neural Net Q-Learning implementation was tested in a simulation of the box with 12 possible states (see Figure 2). After 2000 training iterations at a learning rate of 1e-4 for each iteration, the length of the path the network learned to take was evaluated against the actual shortest path. The problem was evaluated at multiple solution states where the shortest path length fell into one of three categories: • Shortest path = 2 state changes: avg path length across 100 runs = 2.00, diff = 0.00 compared to random avg=26.52, diff=22.52 • Shortest path = 4 state changes: avg path length across 100 runs = 5.07, diff = 1.07 compared to random avg=39.71, diff=35.71 • Shortest path = 6 state changes: avg path length across 100 runs = 6.00, diff = 0.00 compared to random avg=45.07, diff=41.07 Discussion 1. The Q-Learning algorithm performed perfectly when the shortest path was of length 2 or 6. The algorithm still performed well on 4-length paths, but occasionally found an 8 length path (meaning it turned the other direction.) This result seems to occur because the 1e-4 learning rate only propagated a significantly discounted reward to states 2-4 states away from the terminal reward. The network saw the same reward turning left and turning right. At length 6, this issue did not occur because the distance around left and right are both equal. 2. In future research, we intend to find a solution to the learning rate issue to the algorithm generalizes. Then, we will run a physical rover using a pre-trained network from the simulator to demonstrate that this research translates to intelligence in the real world. References 1. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. “Playing Arari with deep reinforcement learning”. Neural Information Processing Systems (NIPS) Deep Learning Workshop, 2013. Undergraduate Research Symposium, April 6, 2012 Figure 5: The rover has found and encompassed the entire pink card in its view. Figure 2: In this simulation, the arrow has learned to find the pink side of the box in the shortest path. Figure 1: UML Sequence Diagram of our program structure Figure 4: Image of Brookestone’s Rover 2.0 Spy Tanks used in our lab. Figure 3: Q-Learning Algorithm