SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.5/6, November 2020
DOI: 10.5121/ijaia.2020.11605 47
AUTOMATIC TRANSFER RATE ADJUSTMENT FOR
TRANSFER REINFORCEMENT LEARNING
Hitoshi Kono1
, Yuto Sakamoto2
, Yonghoon Ji3
and Hiromitsu Fujii4
1
Department of Engineering, Tokyo Polytechnic University, Kanagawa, Japan
2
Department of Electronics and Mechatronics,
Tokyo Polytechnic University, Kanagawa, Japan
3
Graduate School for Advanced Science and Technology, Japan Advanced Institute of
Science and Technology, Ishikawa, Japan
4
Department of Advanced Robotics, Chiba Institute of Technology, Chiba, Japan
ABSTRACT
This paper proposes a novel parameter for transfer reinforcement learning to avoid over-fitting when an
agent uses a transferred policy from a source task. Learning robot systems have recently been studied for
many applications, such as home robots, communication robots, and warehouse robots. However, if the
agent reuses the knowledge that has been sufficiently learned in the source task, deadlock may occur and
appropriate transfer learning may not be realized. In the previous work, a parameter called transfer rate
was proposed to adjust the ratio of transfer, and its contribution include avoiding dead lock in the target
task. However, adjusting the parameter depends on human intuition and experiences. Furthermore, the
method for deciding transfer rate has not discussed. Therefore, an automatic method for adjusting the
transfer rate is proposed in this paper using a sigmoid function. Further, computer simulations are used to
evaluate the effectiveness of the proposed method to improve the environmental adaptation performance in
a target task, which refers to the situation of reusing knowledge.
KEYWORDS
Reinforcement Learning, Transfer Learning, Transfer rate, Overfitting, Overlearning
1. INTRODUCTION
Machine learning systems and intelligent robot systems are increasingly being deployed to solve
practical problems, such as house cleaning and conveyance systems in warehouses [1] [2]. The
reinforcement learning framework has been widely discussed in applications of machine learning,
such as deep Q-networks [3] [4]. Basic reinforcement learning techniques are usually time
consuming. Thus, they are disadvantageous for implementation in actual applications, such as
robot systems. To address this problem, transfer learning has been proposed for reinforcement
learning [5-8]. Transfer learning theory allows the application of prior knowledge to another
similar task. In reinforcement learning, a learning agent is used to draw and transfer policies from
previous tasks (source task) to current tasks (target task). Advantages of transfer learning in
reinforcement learning include: improved learning speed, fast adaptation to environments, and
exploration of more effective performances. Agent systems, including transfer learning, have
been successful in some cases. However, transfer learning is not successful in all applications,
and in most cases, it is necessary to adjust learning conditions and avoid negative transfer
situations, such as deadlock. In the previous years, adjusting the ratio of reusing policy had been
proposed as a method to increase the environmental adaptation performance in the target task.
The method of adjusting the transfer rate is a mechanism that determines the action value of the
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.5/6, November 2020
48
reusing policy, and then uses it in the target task [9] [10]. However, the transfer rate decision
depends on human intuition and experience, and the method to determine the transfer rate has not
been discussed. In addition, the transfer rate is a fixed value, and it is desirable to adjust it
adaptively according to the environment and behavioral conditions. Therefore, an adjusting
method of transfer rate using a sigmoid function is proposed in this paper. The proposed method
automatically adjusts the transfer rate, and the method adjusting the value has to be discounted
when triggered by a bad situation of learning in a target task owing to the reuse of knowledge,
such as collision with an obstacle. Computer simulation is used to confirm whether the proposed
method for transfer learning in reinforcement learning can adjust the transfer rate. The knowledge
obtained is not reused in the case of a negative transfer situation, such as a deadlock; however, it
can be reused in other cases. In particular, in recent years, transfer in reinforcement learning in
multi-agent systems and human robot teams has been discussed, and it is considered essential to
avoid deadlock, that is, negative transfer by transfer learning [11-13].
The remainder of this paper is organized as follows. Section 2 provides an overview of the
reinforcement learning and transfer learning method, and discusses the previous method. It
adjusts the transferring ratio of reusing knowledge. Section 3 discusses the adjusting method for
transferring ratio with the sigmoid function as the proposed method. Section 4 presents the
computer simulation experiments and results to evaluate the performance of the proposed method
compared with previous methods. Section 5 presents the concluding remarks.
2. PREVIOUS WORKS
2.1. Reinforcement Learning
Reinforcement learning is a machine learning algorithm [3]. The reinforcement learning agent
explores the optimal solution via trial-and-error and creates its own knowledge as policy. Thus,
reinforcement learning does not require training datasets, unlike supervised learning. Many types
of reinforcement learning algorithms have been developed in the past decades. In this study, Q-
learning was adopted as the learning algorithm [14]. The Q-learning is defined by
𝑄(𝑠𝑡, 𝑎𝑡) ← 𝑄(𝑠𝑡, 𝑎𝑡) + 𝛼 {𝑟𝑡+1 + 𝛾 max
𝑎∈𝐴
𝑄(𝑠𝑡+1, 𝑎) − 𝑄(𝑠𝑡, 𝑎𝑡)}, (1)
where 𝑠 ∈ 𝑺 is the state of environments in a state space, 𝑎 ∈ 𝑨 is the action of agent in an action
space, 𝛼 is the learning rate (0 < 𝛼 ≤ 1), 𝛾 is the discount rate (0 < 𝛾 ≤ 1), and r denotes the
reward. 𝑄(𝑠, 𝑎), also known as Q-table, contains all states of the environment and each action
value pair.
To select action by the agent, an action selection method was employed in previous
reinforcement learning research. In this study, the Boltzmann distribution model is adopted as an
action selection function [3]. The action selection probability adopted from the Boltzmann
distribution is defined by
P(𝑎𝑖|𝑠𝑡) =
exp {
𝑄(𝑠𝑡, 𝑎𝑖)
𝑇
⁄ }
∑ exp {
𝑄(𝑠𝑡, 𝑎)
𝑇
⁄ }
𝑎∈𝐴
, (2)
where T is a parameter that determines the randomness of action selection. The following
discussion is based on Watkins’s Q-learning and action selection function is derived from the
Boltzmann distribution model.
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.5/6, November 2020
49
2.2. Policy Copy in Transfer Learning
For traditional transfer learning in reinforcement learning, Tyalor et al. defined a method for
transferring the learned action-value function [5] [6]. This method is referred to as policy copy in
transfer learning, and is defined as follows:
𝑄𝑡(𝑠, 𝑎) ← 𝑄𝑡(𝑠, 𝑎) + 𝑄𝑠(𝑠, 𝑎). (3)
Here, 𝑄𝑡(𝑠, 𝑎) is the policy that includes the initial value in the target task. It is used for learning
and action selection. 𝑄𝑠(𝑠, 𝑎) is reusing policy from the source task. Reusing policy has state-
action values obtained from the source task. In Taylor’s method, the agent in the initial state of
the target task sums the initial value of 𝑄𝑡(𝑠, 𝑎) and transferring policy 𝑄𝑠(𝑠, 𝑎). The agent is
learned using the assimilated policy 𝑄𝑡(𝑠, 𝑎) in the target task. Simply, the agent is reusing the
transferred policy and overwriting the policy through trial-and-error simultaneously.
Originally, Taylor’s method includes inter-task mapping, which defines the mapping between the
agent’s observable environmental state S and the executable action A. Inter-task mapping is not
mentioned in this paper owing to the description of simplicity.
In Taylor’s method, if the transferred policy is sufficiently learned in the source task, it will
behave according to the policy when used in different environments, and deadlock, such as
collision with obstacles, may occur. This phenomenon is called over-fitting and over learning.
The agent can overwrite the policy through behavior in the target task, and some learning time is
required.
2.3. Adjusting of Transfer Ratio
Takano et al. proposed a method to adjust the action value of the transferred policy [9] by using
parameters ζ in the policy assimilation equation. One of the methods in Takano’s study is defined
by
𝑄𝑐(𝑠, 𝑎) ←
1
2
(1 − 𝜁) 𝑄𝑡(𝑠, 𝑎) + 𝜁𝑄𝑠(𝑠, 𝑎), (4)
Where ζ is the adjusting parameter (0 < 𝜁 < 1). 𝑄𝑐(𝑠, 𝑎) is the assimilation policy, and is used
for action selection. Learning in the target task is used as a policy 𝑄𝑡(𝑠, 𝑎). This method
discounts 𝑄𝑠(𝑠, 𝑎) by ζ , and conversely uses 𝑄𝑡(𝑠, 𝑎) by the amount of 1 − 𝜁. Thus, it is possible
to reduce the effect on the target ask even if a policy 𝑄𝑠(𝑠, 𝑎)with a high action value is reused.
However, Takano’s method requires the value of 𝜁 to be determined before operation and
depends on human intuition and experience. Furthermore, the policy 𝑄𝑡(𝑠, 𝑎)that should be
trusted is discounted.
2.4. Transfer Rate
Kono et al. proposed the transfer rate to adjust the ratio of transferring the obtained policy [10].
The method is based on Takano’s method, and it is similar and simpler than Takano’s method.
Transferring method with transfer rate can be defined as
𝑄𝑐(𝑠, 𝑎) ← 𝑄𝑡(𝑠, 𝑎) + 𝜏𝑄𝑠(𝑠, 𝑎), (5)
where 𝜏, (0 < 𝜏 ≤ 1) is transfer rate, and 𝑄𝑐(𝑠, 𝑎) is an assimilated policy, which is used for
action selection in the target task. 𝑄𝑡(𝑠, 𝑎) is the policy in the target task., and is used for learning
in the target task. 𝑄𝑠(𝑠, 𝑎) is a reusing policy that was learned in the source task. Kono’s method
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.5/6, November 2020
50
is also used to determine the parameters 𝜏 before the learning operation, and the value setting
depends on human intuition and experience. In this paper, apart from Taylor’s method, Kono’s
and Takano’s methods are collectively called Takano’s method.
3. PROPOSED METHOD
In previous study, transferring policy method overwrote transferred policy and fixed discount
parameters. An automatic method for adjusting the parameters is proposed in this section. When
considering the situation of negative transfer, the action of the agent that can be originally
executed cannot be executed by the target task because the policy is reused, and thus, a collision
with an obstacle occurs. Therefore, to avoid the above situation, if the agent becomes unable to
act, the transfer rate value should be lowered. To lower the value of the transfer rate, the value is
adjusted using a sigmoid function according to the number of times that the agent cannot act. The
sigmoid function is defined as:
1
1 + 𝑒−𝜎𝑔
, (6)
where 𝑔 is the gain value, which is determined by an arbitrary value, and 𝜎 is the input value of
the sigmoid function. The transfer method is reformulated using the sigmoid function as defined
by
𝑄𝑐(𝑠, 𝑎) ← 𝑄𝑡(𝑠, 𝑎) +
1
1 + 𝑒−𝜎𝑔
𝑄𝑠(𝑠, 𝑎), (7)
𝜎 is the increment or decrement with state 𝑠𝑡 = 𝑠𝑡+1 and 𝑠𝑡 ≠ 𝑠𝑡+1, respectively. The inability of
the agent to act means that the current environmental state 𝑠𝑡 and the next environmental state
𝑠𝑡+1 are the same; thus, 𝜎 is subtracted by an arbitrary constant value 𝒹. Conversely, if the action
is performed by reusing the policy, only an arbitrary constant 𝒹 is added for each action. This
mechanism is defined as follows:
σ = {
𝜎 − 𝒹 (𝑠𝑡 = 𝑠𝑡+1)
𝜎 + 𝒹 (𝑠𝑡 ≠ 𝑠𝑡+1)
. (8)
The above function is calculated for each action, such as a step of the agent. It is possible to
adjust the rate of reuse of the observes by comparing the environmental condition at each time
step in a generalized form, regardless of actual situations, such as collisions with obstacles.
4. EVALUATION WITH COMPUTER SIMULATION
4.1. Experimental Setup
This experiment aims to confirm the performance of an agent’s environmental adaptation using
the proposed method. Furthermore, the shortest path problem is adopted in this study as the basic
evaluation with a single agent. The learning environment is shown in Figure 1 in the computer
simulation. Figures 1 (a) and (b) show the source task environment and target task environment,
respectively. In this environment, if the agent achieves the goal, the agent can obtain a reward
𝑟 = 1 from the environment. An agent can observe self-localization. The reinforcement learning
parameter is set as α = 0.1 and γ = 0.99. The parameter T of the Boltzmann distribution model
is set as 𝑇 = 0.01. These parameters are common to all conditions. In each experiment, 300
episodes were conducted for the source and target tasks. The agent can move to 1 grid in each
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.5/6, November 2020
51
step, and there are four directions that can be moved: front, right, back, and left. The time when
the agent reaches the goal from the start is referred to as a single 1episode.
(a) Source task (b) Target task.
Figure 1. Grid world of shortest path problem. Circle mark is agent and initial position in each simulation.
Green square is set as goal position. In each sub-figure, red is shortest path between start position and goal
position with reinforcement learning.
For experimental conditions, the learning results are compared using Taylor, Takano, and the
proposed methods. In Taylor’s method, reinforcement learning is executed, as shown in Figure 1
(a), and the acquired policy is transferred to the agent in Figure 1 (b) for transfer learning. In
Takano’s method, Kono’s method was adopted for the implementation of this experiment. It is
clear that Takano’s method is overfitting, and it assumes a considerable amount of time to learn
with the target task. Therefore, the agent is set up to obtain a negative reward 𝑟 = −1 in the event
of a collision with a wall. In addition, it was confirmed in advance that the transfer learning was
possible, and the transfer rate was set to 𝜏 = 0.1. In the proposed method, the parameter 𝒹 is set
to 0.1, and the gain 𝑔 is set to 9.0.
4.2. Results
In this section, the experimental results under the three conditions are presented. The obtained
learning curves are shown in Figure 2. The learning curve represents the performance related to
the learning progress, whereby the horizontal axis is the number of learning episodes and the
vertical axis is the number of steps required from the start to the goal. Each learning curve had 10
trial averages. In Figure 2, the learning curve of reinforcement learning that does not reuse the
policy in the target task is the standard for performance evaluation. The black line in Figure 2
represents the learning curve of reinforcement learning.
4.2.1. Result of Taylor’s Method
Taylor’s method is believed to have a high number of steps in the early stage of learning, and it is
considered that an overfitting state emerges. However, because the target task is relearned, the
optimal solution, i.e., the convergence to the shortest path, appears from the learning curve of
reinforcement learning.
4.2.2. Result of Takano’s Method
The learning curve of Tekano has a relatively small number of steps from the early state of
learning compared with the learning curve of reinforcement learning, and its convergence is
considered to be equivalent to the learning curve of reinforcement learning. High performance in
the early stages of the learning curve is called the jump start, and is one of the basic profits of
transfer reinforcement learning. It has been confirmed that relearning in the target task is not
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.5/6, November 2020
52
possible when the transfer rate 𝜏 = 1.0 because if the learning in the target task progresses, the
value of the update is small and it is time-consuming to cancel the action value of the transferred
policy.
Figure 2. Learning curves
4.2.3. Result of the Proposed Method
The learning curve of the proposed method also has jumpstarted. The speed of convergence is
slower than that of Takano’s method and is close to the learning curve of reinforcement learning.
From the figure, it can be observed that the proposed method performs better than reinforcement
learning. In the initial state, the proposed method is equivalent to the transfer rate 𝜏 = 1.0 of
Takano’s method, which normally requires time for learning. However, to adaptively change the
transfer rate, the performance of the proposed method is verified by Takano’s method with a
transfer rate 𝜏 = 0.1.
4.2.4. Discussion
The transfer ratio was used to quantitatively compare the performance. The transfer ratio can be
evaluated by improving the learning time compared with the basic learning curve, such as
reinforcement learning. Transfer ratio 𝑟 can be defined as follows:
𝑟 =
∑ 𝐿𝑡(𝑡) − ∑ 𝐿𝑤𝑡(𝑡)
∑𝐿𝑤𝑡(𝑡)
, (9)
where 𝐿𝑡(𝑡) is the learning curve with transfer and 𝑡 is the number of episodes. Therefore,
∑ 𝐿𝑡(𝑡)is calculated as the learning curve area. In addition, 𝐿𝑤𝑡(𝑡) is the learning curve without
transfer, which indicates reinforcement learning in the target task. ∑ 𝐿𝑤𝑡(𝑡) is the calculated
learning curve area of without transfer. Table 1 presents the transfer ratios in Taylor, Takano, and
the proposed methods after evaluation.
Table 1. Comparison of transfer ratio in each conditions
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.5/6, November 2020
53
In Table 1, it can be observed that only Taylor’s method increases the learning time by
approximately 1.41 times. Takano’s method and the proposed method decrease learning time.
Figure 3 shows the transition of the output value of the sigmoid function at the beginning and end
of learning. In the early stage of learning, the value corresponding to the transfer rate is adjusted
by interaction with the environment. Sometimes, the transferred policy is used by the agent
moving to the goal. At the end of learning, the reusing policy may not be used once. It time-
consuming to use the policy at the end of learning than in the early stages of learning because the
agent learns a new policy and obtains a high action value; therefore, it becomes possible to reach
the goal without being affected by the reusing policy. However, if the agent obtains sufficient
policy, there is no need to adjust the ratio of reusing policy. The learning curve of the proposed
method does not converge because the ratio is still adjusted, even at the end of learning.
Therefore, compared to Takano’s method, the result suggested that the proposed method allows
the agent to interact with the environment to automatically adjust the ratio of reuse of the policy
without manual adjustment or determining the transfer rate. Moreover, compared with the
previous method, it was shown that the agent can improve the environmental adaptability while
maintaining the characteristic of not overwriting the reusing policy from the source task.
(a) Early stage of learning (1 episode) (b) End of learning (300 episodes)
Figure 3. Transition of the value output from the sigmoid function
5. CONCLUSIONS
This paper proposed a novel parameter for transfer reinforcement learning to avoid over-fitting in
the relearning process of the target task. In the proposed method, the ratio of reusing policy from
the source task is adjusted by the sigmoid function and input value, such as collision with
obstacle. Basic experiments were performed with the shortest path problem with transfer
reinforcement learning based on Taylor’s method without inter-task mapping. The result suggests
that the learning agent can be adapted to the environment. Moreover, compared with the previous
method, it was shown that the agent can improve the environmental adaptability while
maintaining the characteristic of not overwriting the reuse policy from the source task. This result
suggests that it can be applied to reusing policy selection in future applications of reinforcement
learning agents.
Our proposed method does not show convergence in the target task compared with the previous
method. For future work, it is necessary to discuss the adjusting method of reusing the ratio of
transferring policy at the end of learning. In the proposed method, the ratio of reuse was adjusted
adaptively. The effect of the transferring policy cannot be ignored even at the end of learning.
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.5/6, November 2020
54
Moreover, the proposed method is evaluated in a simple task and environment in this study. The
computer simulation environment is a Markov decision process (MDP), which is different from
the real-world environment. To demonstrate the effectiveness of the proposed method, it is
necessary to carry out various types of more complex experiments and evaluations. In addition,
the proposed method should be evaluated not only in various environments but also in non-MDP
environments, such as multi-agent systems.
ACKNOWLEDGEMENTS
This study was partially supported by JSPS KAKENHI Grant number JP18K18133, and
SUZUKI Foundation. This work was founded by The Japan Atomic Energy Agency (JAEA)
FY2020 Center of World Intelligence Project for Nuclear Science/Technology and Human
Resource Development (Grant no. R02I015).
REFERENCES
[1] B. Ramalingam, A. K. Lakshmanan, M. Ilyas, A. V. Le, and M. R. Elara (2018). “Cascaded Machine-
Learning Technique for Debris Classification in Floor-Cleaning Robot Application.” Applied
Sciences 8(12): 1-19.
[2] R. D. Andrea (2012). “Guest Editorial: A Revolution in the Warehouse: A Retrospective on Kiva
Systems and the Grand Challenges Ahead.”IEEE Transactions on Automation Science and
Engineering 9(4): 638-639.
[3] R. S. Sutton and A. G. Barto (1998). “Reinforcement learning: An introduction.” MIT press.
[4] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.Bellemare, A. Graves, M.
Riedmiller, A. K. Fidjeland, G. Ostrovski, S.Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King,
D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis (2015). “Human-level control through deep
reinforcement learning.” Nature 518: 529-533.
[5] M. E. Taylor and P. Stone (2009). “Transfer learning for reinforcement learning domains: A survey.”
Journal of Machine Learning Research 10(Jul): 1633-1685.
[6] M. E. Taylor (2009). “Transfer in Reinforcement Learning Domains.”Studies in Computational
Intelligence 216: Springer.
[7] A. Lazaric (2012). “Transfer in Reinforcement Learning: A Framework and a Survey. Reinforcement
Learning. Adaptation, Learning, and Optimization.” Berlin, Heidelberg, Springer. 12: 143-173.
[8] Q. Yang, Y. Zhang, W. Dai, and S. J. Pan (2020) “Transfer learning.”Cambridge University Press.
[9] T. Takano, H. Takase, H. Kawanaka, H. Kita, T. Hayashi and S. Tsuruoka (2011). “Transfer Learning
based on Forbidden Rule Set in Actor-Critic Method.” International Journal of Innovative
Computing, Information and Control 7(5(B)).
[10] H. Kono, A. Kamimura, K. Tomita, Y. Murata, and T. Suzuki (2014) “Transfer Learning Method
Using Ontology for Heterogeneous Multi-agent Reinforcement Learning.” International Journal of
Advanced Computer Science and Application 5(10): pp.156-164.
[11] R. Ramakrishnan, C. Zhang, and J. Shah (2017). “Perturbation Training for Human-Robot Teams.”
Journal of Artificial Intelligence Research 59: pp.495-541.
[12] F. L. Da Silva and A. H. R. Costa (2019). “A Survey on Transfer Learning for Multiagent
Reinforcement Learning Systems.” Journal of Artificial Intelligence Research 69: pp.645-703.
[13] F. L. Da Silva, G. Warnell, A.H.R.Costa, and P. Stone (2020). “Agents teaching agents: a survey on
inter-agent transfer learning.”Autonomous Agents and Multi-Agent Systems 34 (9).
[14] C. J. C. H. Watkins and P. Dayan (1992). “Q-Learning.” Machine Learning 8: pp.279-292

Weitere ähnliche Inhalte

Was ist angesagt?

COMBINED CLASSIFIERS FOR TIME SERIES SHAPELETS
COMBINED CLASSIFIERS FOR TIME SERIES SHAPELETSCOMBINED CLASSIFIERS FOR TIME SERIES SHAPELETS
COMBINED CLASSIFIERS FOR TIME SERIES SHAPELETScsandit
 
Genetic Approach to Parallel Scheduling
Genetic Approach to Parallel SchedulingGenetic Approach to Parallel Scheduling
Genetic Approach to Parallel SchedulingIOSR Journals
 
Two-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential EvolutionTwo-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential EvolutionXin-She Yang
 
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021Chris Ohk
 
SigOpt_Bayesian_Optimization_Primer
SigOpt_Bayesian_Optimization_PrimerSigOpt_Bayesian_Optimization_Primer
SigOpt_Bayesian_Optimization_PrimerIan Dewancker
 
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection:  Comparative StudyA Threshold Fuzzy Entropy Based Feature Selection:  Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
 
Tree Physiology Optimization in Constrained Optimization Problem
Tree Physiology Optimization in Constrained Optimization ProblemTree Physiology Optimization in Constrained Optimization Problem
Tree Physiology Optimization in Constrained Optimization ProblemTELKOMNIKA JOURNAL
 
A Combined Approach for Feature Subset Selection and Size Reduction for High ...
A Combined Approach for Feature Subset Selection and Size Reduction for High ...A Combined Approach for Feature Subset Selection and Size Reduction for High ...
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
 
Manager’s Preferences Modeling within Multi-Criteria Flowshop Scheduling Prob...
Manager’s Preferences Modeling within Multi-Criteria Flowshop Scheduling Prob...Manager’s Preferences Modeling within Multi-Criteria Flowshop Scheduling Prob...
Manager’s Preferences Modeling within Multi-Criteria Flowshop Scheduling Prob...Waqas Tariq
 
Concatenated decision paths classification for time series shapelets
Concatenated decision paths classification for time series shapeletsConcatenated decision paths classification for time series shapelets
Concatenated decision paths classification for time series shapeletsijics
 
Hybrid Genetic Algorithm with Multiparents Crossover for Job Shop Scheduling ...
Hybrid Genetic Algorithm with Multiparents Crossover for Job Shop Scheduling ...Hybrid Genetic Algorithm with Multiparents Crossover for Job Shop Scheduling ...
Hybrid Genetic Algorithm with Multiparents Crossover for Job Shop Scheduling ...CHUNG SIN ONG
 
Artificial Intelligence in Robot Path Planning
Artificial Intelligence in Robot Path PlanningArtificial Intelligence in Robot Path Planning
Artificial Intelligence in Robot Path Planningiosrjce
 
Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...André Gonçalves
 
HYBRID GENETIC ALGORITHM FOR BI-CRITERIA MULTIPROCESSOR TASK SCHEDULING WITH ...
HYBRID GENETIC ALGORITHM FOR BI-CRITERIA MULTIPROCESSOR TASK SCHEDULING WITH ...HYBRID GENETIC ALGORITHM FOR BI-CRITERIA MULTIPROCESSOR TASK SCHEDULING WITH ...
HYBRID GENETIC ALGORITHM FOR BI-CRITERIA MULTIPROCESSOR TASK SCHEDULING WITH ...aciijournal
 

Was ist angesagt? (17)

COMBINED CLASSIFIERS FOR TIME SERIES SHAPELETS
COMBINED CLASSIFIERS FOR TIME SERIES SHAPELETSCOMBINED CLASSIFIERS FOR TIME SERIES SHAPELETS
COMBINED CLASSIFIERS FOR TIME SERIES SHAPELETS
 
Genetic Approach to Parallel Scheduling
Genetic Approach to Parallel SchedulingGenetic Approach to Parallel Scheduling
Genetic Approach to Parallel Scheduling
 
Two-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential EvolutionTwo-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential Evolution
 
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
 
SigOpt_Bayesian_Optimization_Primer
SigOpt_Bayesian_Optimization_PrimerSigOpt_Bayesian_Optimization_Primer
SigOpt_Bayesian_Optimization_Primer
 
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection:  Comparative StudyA Threshold Fuzzy Entropy Based Feature Selection:  Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
 
Tree Physiology Optimization in Constrained Optimization Problem
Tree Physiology Optimization in Constrained Optimization ProblemTree Physiology Optimization in Constrained Optimization Problem
Tree Physiology Optimization in Constrained Optimization Problem
 
A Combined Approach for Feature Subset Selection and Size Reduction for High ...
A Combined Approach for Feature Subset Selection and Size Reduction for High ...A Combined Approach for Feature Subset Selection and Size Reduction for High ...
A Combined Approach for Feature Subset Selection and Size Reduction for High ...
 
Manager’s Preferences Modeling within Multi-Criteria Flowshop Scheduling Prob...
Manager’s Preferences Modeling within Multi-Criteria Flowshop Scheduling Prob...Manager’s Preferences Modeling within Multi-Criteria Flowshop Scheduling Prob...
Manager’s Preferences Modeling within Multi-Criteria Flowshop Scheduling Prob...
 
Ica 2013021816274759
Ica 2013021816274759Ica 2013021816274759
Ica 2013021816274759
 
Concatenated decision paths classification for time series shapelets
Concatenated decision paths classification for time series shapeletsConcatenated decision paths classification for time series shapelets
Concatenated decision paths classification for time series shapelets
 
Hybrid Genetic Algorithm with Multiparents Crossover for Job Shop Scheduling ...
Hybrid Genetic Algorithm with Multiparents Crossover for Job Shop Scheduling ...Hybrid Genetic Algorithm with Multiparents Crossover for Job Shop Scheduling ...
Hybrid Genetic Algorithm with Multiparents Crossover for Job Shop Scheduling ...
 
Artificial Intelligence in Robot Path Planning
Artificial Intelligence in Robot Path PlanningArtificial Intelligence in Robot Path Planning
Artificial Intelligence in Robot Path Planning
 
Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...
 
HYBRID GENETIC ALGORITHM FOR BI-CRITERIA MULTIPROCESSOR TASK SCHEDULING WITH ...
HYBRID GENETIC ALGORITHM FOR BI-CRITERIA MULTIPROCESSOR TASK SCHEDULING WITH ...HYBRID GENETIC ALGORITHM FOR BI-CRITERIA MULTIPROCESSOR TASK SCHEDULING WITH ...
HYBRID GENETIC ALGORITHM FOR BI-CRITERIA MULTIPROCESSOR TASK SCHEDULING WITH ...
 

Ähnlich wie AUTOMATIC TRANSFER RATE ADJUSTMENT FOR TRANSFER REINFORCEMENT LEARNING

An Algorithm of Policy Gradient Reinforcement Learning with a Fuzzy Controlle...
An Algorithm of Policy Gradient Reinforcement Learning with a Fuzzy Controlle...An Algorithm of Policy Gradient Reinforcement Learning with a Fuzzy Controlle...
An Algorithm of Policy Gradient Reinforcement Learning with a Fuzzy Controlle...Waqas Tariq
 
An improved teaching learning
An improved teaching learningAn improved teaching learning
An improved teaching learningcsandit
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.pptcharusharma165
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningRyo Iwaki
 
Proposing a scheduling algorithm to balance the time and cost using a genetic...
Proposing a scheduling algorithm to balance the time and cost using a genetic...Proposing a scheduling algorithm to balance the time and cost using a genetic...
Proposing a scheduling algorithm to balance the time and cost using a genetic...Editor IJCATR
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Xin-She Yang
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Xin-She Yang
 
Reinforcement Learning Guide For Beginners
Reinforcement Learning Guide For BeginnersReinforcement Learning Guide For Beginners
Reinforcement Learning Guide For Beginnersgokulprasath06
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.pptssuser43a599
 
ASSOCIATION RULE DISCOVERY FOR STUDENT PERFORMANCE PREDICTION USING METAHEURI...
ASSOCIATION RULE DISCOVERY FOR STUDENT PERFORMANCE PREDICTION USING METAHEURI...ASSOCIATION RULE DISCOVERY FOR STUDENT PERFORMANCE PREDICTION USING METAHEURI...
ASSOCIATION RULE DISCOVERY FOR STUDENT PERFORMANCE PREDICTION USING METAHEURI...cscpconf
 
Association rule discovery for student performance prediction using metaheuri...
Association rule discovery for student performance prediction using metaheuri...Association rule discovery for student performance prediction using metaheuri...
Association rule discovery for student performance prediction using metaheuri...csandit
 
Hibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning AgentsHibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning Agentsbutest
 
Cuckoo Search: Recent Advances and Applications
Cuckoo Search: Recent Advances and ApplicationsCuckoo Search: Recent Advances and Applications
Cuckoo Search: Recent Advances and ApplicationsXin-She Yang
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningPrabhu Kumar
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
 
TEACHING AND LEARNING BASED OPTIMISATION
TEACHING AND LEARNING BASED OPTIMISATIONTEACHING AND LEARNING BASED OPTIMISATION
TEACHING AND LEARNING BASED OPTIMISATIONUday Wankar
 
A hybrid constructive algorithm incorporating teaching-learning based optimiz...
A hybrid constructive algorithm incorporating teaching-learning based optimiz...A hybrid constructive algorithm incorporating teaching-learning based optimiz...
A hybrid constructive algorithm incorporating teaching-learning based optimiz...IJECEIAES
 
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...University Utara Malaysia
 

Ähnlich wie AUTOMATIC TRANSFER RATE ADJUSTMENT FOR TRANSFER REINFORCEMENT LEARNING (20)

An Algorithm of Policy Gradient Reinforcement Learning with a Fuzzy Controlle...
An Algorithm of Policy Gradient Reinforcement Learning with a Fuzzy Controlle...An Algorithm of Policy Gradient Reinforcement Learning with a Fuzzy Controlle...
An Algorithm of Policy Gradient Reinforcement Learning with a Fuzzy Controlle...
 
An improved teaching learning
An improved teaching learningAn improved teaching learning
An improved teaching learning
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.ppt
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
Proposing a scheduling algorithm to balance the time and cost using a genetic...
Proposing a scheduling algorithm to balance the time and cost using a genetic...Proposing a scheduling algorithm to balance the time and cost using a genetic...
Proposing a scheduling algorithm to balance the time and cost using a genetic...
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
 
Mangai
MangaiMangai
Mangai
 
Reinforcement Learning Guide For Beginners
Reinforcement Learning Guide For BeginnersReinforcement Learning Guide For Beginners
Reinforcement Learning Guide For Beginners
 
YijueRL.ppt
YijueRL.pptYijueRL.ppt
YijueRL.ppt
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.ppt
 
ASSOCIATION RULE DISCOVERY FOR STUDENT PERFORMANCE PREDICTION USING METAHEURI...
ASSOCIATION RULE DISCOVERY FOR STUDENT PERFORMANCE PREDICTION USING METAHEURI...ASSOCIATION RULE DISCOVERY FOR STUDENT PERFORMANCE PREDICTION USING METAHEURI...
ASSOCIATION RULE DISCOVERY FOR STUDENT PERFORMANCE PREDICTION USING METAHEURI...
 
Association rule discovery for student performance prediction using metaheuri...
Association rule discovery for student performance prediction using metaheuri...Association rule discovery for student performance prediction using metaheuri...
Association rule discovery for student performance prediction using metaheuri...
 
Hibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning AgentsHibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning Agents
 
Cuckoo Search: Recent Advances and Applications
Cuckoo Search: Recent Advances and ApplicationsCuckoo Search: Recent Advances and Applications
Cuckoo Search: Recent Advances and Applications
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game Learning
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
TEACHING AND LEARNING BASED OPTIMISATION
TEACHING AND LEARNING BASED OPTIMISATIONTEACHING AND LEARNING BASED OPTIMISATION
TEACHING AND LEARNING BASED OPTIMISATION
 
A hybrid constructive algorithm incorporating teaching-learning based optimiz...
A hybrid constructive algorithm incorporating teaching-learning based optimiz...A hybrid constructive algorithm incorporating teaching-learning based optimiz...
A hybrid constructive algorithm incorporating teaching-learning based optimiz...
 
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
 

Mehr von gerogepatton

THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...
THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...
THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...gerogepatton
 
13th International Conference on Software Engineering and Applications (SEA 2...
13th International Conference on Software Engineering and Applications (SEA 2...13th International Conference on Software Engineering and Applications (SEA 2...
13th International Conference on Software Engineering and Applications (SEA 2...gerogepatton
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
 
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONAN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONgerogepatton
 
10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...gerogepatton
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
 
A Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of CyberbullyingA Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of Cyberbullyinggerogepatton
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
 
10th International Conference on Data Mining (DaMi 2024)
10th International Conference on Data Mining (DaMi 2024)10th International Conference on Data Mining (DaMi 2024)
10th International Conference on Data Mining (DaMi 2024)gerogepatton
 
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...gerogepatton
 
WAVELET SCATTERING TRANSFORM FOR ECG CARDIOVASCULAR DISEASE CLASSIFICATION
WAVELET SCATTERING TRANSFORM FOR ECG CARDIOVASCULAR DISEASE CLASSIFICATIONWAVELET SCATTERING TRANSFORM FOR ECG CARDIOVASCULAR DISEASE CLASSIFICATION
WAVELET SCATTERING TRANSFORM FOR ECG CARDIOVASCULAR DISEASE CLASSIFICATIONgerogepatton
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
 
Passive Sonar Detection and Classification Based on Demon-Lofar Analysis and ...
Passive Sonar Detection and Classification Based on Demon-Lofar Analysis and ...Passive Sonar Detection and Classification Based on Demon-Lofar Analysis and ...
Passive Sonar Detection and Classification Based on Demon-Lofar Analysis and ...gerogepatton
 
10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...gerogepatton
 
The International Journal of Artificial Intelligence & Applications (IJAIA)
The International Journal of Artificial Intelligence & Applications (IJAIA)The International Journal of Artificial Intelligence & Applications (IJAIA)
The International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
 
Foundations of ANNs: Tolstoy’s Genius Explored using Transformer Architecture
Foundations of ANNs: Tolstoy’s Genius Explored using Transformer ArchitectureFoundations of ANNs: Tolstoy’s Genius Explored using Transformer Architecture
Foundations of ANNs: Tolstoy’s Genius Explored using Transformer Architecturegerogepatton
 
2nd International Conference on Computer Science, Engineering and Artificial ...
2nd International Conference on Computer Science, Engineering and Artificial ...2nd International Conference on Computer Science, Engineering and Artificial ...
2nd International Conference on Computer Science, Engineering and Artificial ...gerogepatton
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
 
Website URL:https://www.airccse.org/journal/ijaia/ijaia.html Review of AI Mat...
Website URL:https://www.airccse.org/journal/ijaia/ijaia.html Review of AI Mat...Website URL:https://www.airccse.org/journal/ijaia/ijaia.html Review of AI Mat...
Website URL:https://www.airccse.org/journal/ijaia/ijaia.html Review of AI Mat...gerogepatton
 

Mehr von gerogepatton (20)

THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...
THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...
THE TRANSFORMATION RISK-BENEFIT MODEL OF ARTIFICIAL INTELLIGENCE:BALANCING RI...
 
13th International Conference on Software Engineering and Applications (SEA 2...
13th International Conference on Software Engineering and Applications (SEA 2...13th International Conference on Software Engineering and Applications (SEA 2...
13th International Conference on Software Engineering and Applications (SEA 2...
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
 
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONAN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
 
10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
 
A Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of CyberbullyingA Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of Cyberbullying
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
 
10th International Conference on Data Mining (DaMi 2024)
10th International Conference on Data Mining (DaMi 2024)10th International Conference on Data Mining (DaMi 2024)
10th International Conference on Data Mining (DaMi 2024)
 
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
 
WAVELET SCATTERING TRANSFORM FOR ECG CARDIOVASCULAR DISEASE CLASSIFICATION
WAVELET SCATTERING TRANSFORM FOR ECG CARDIOVASCULAR DISEASE CLASSIFICATIONWAVELET SCATTERING TRANSFORM FOR ECG CARDIOVASCULAR DISEASE CLASSIFICATION
WAVELET SCATTERING TRANSFORM FOR ECG CARDIOVASCULAR DISEASE CLASSIFICATION
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
 
Passive Sonar Detection and Classification Based on Demon-Lofar Analysis and ...
Passive Sonar Detection and Classification Based on Demon-Lofar Analysis and ...Passive Sonar Detection and Classification Based on Demon-Lofar Analysis and ...
Passive Sonar Detection and Classification Based on Demon-Lofar Analysis and ...
 
10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...10th International Conference on Artificial Intelligence and Applications (AI...
10th International Conference on Artificial Intelligence and Applications (AI...
 
The International Journal of Artificial Intelligence & Applications (IJAIA)
The International Journal of Artificial Intelligence & Applications (IJAIA)The International Journal of Artificial Intelligence & Applications (IJAIA)
The International Journal of Artificial Intelligence & Applications (IJAIA)
 
Foundations of ANNs: Tolstoy’s Genius Explored using Transformer Architecture
Foundations of ANNs: Tolstoy’s Genius Explored using Transformer ArchitectureFoundations of ANNs: Tolstoy’s Genius Explored using Transformer Architecture
Foundations of ANNs: Tolstoy’s Genius Explored using Transformer Architecture
 
2nd International Conference on Computer Science, Engineering and Artificial ...
2nd International Conference on Computer Science, Engineering and Artificial ...2nd International Conference on Computer Science, Engineering and Artificial ...
2nd International Conference on Computer Science, Engineering and Artificial ...
 
International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)International Journal of Artificial Intelligence & Applications (IJAIA)
International Journal of Artificial Intelligence & Applications (IJAIA)
 
Website URL:https://www.airccse.org/journal/ijaia/ijaia.html Review of AI Mat...
Website URL:https://www.airccse.org/journal/ijaia/ijaia.html Review of AI Mat...Website URL:https://www.airccse.org/journal/ijaia/ijaia.html Review of AI Mat...
Website URL:https://www.airccse.org/journal/ijaia/ijaia.html Review of AI Mat...
 

Kürzlich hochgeladen

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Kürzlich hochgeladen (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

AUTOMATIC TRANSFER RATE ADJUSTMENT FOR TRANSFER REINFORCEMENT LEARNING

  • 1. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.5/6, November 2020 DOI: 10.5121/ijaia.2020.11605 47 AUTOMATIC TRANSFER RATE ADJUSTMENT FOR TRANSFER REINFORCEMENT LEARNING Hitoshi Kono1 , Yuto Sakamoto2 , Yonghoon Ji3 and Hiromitsu Fujii4 1 Department of Engineering, Tokyo Polytechnic University, Kanagawa, Japan 2 Department of Electronics and Mechatronics, Tokyo Polytechnic University, Kanagawa, Japan 3 Graduate School for Advanced Science and Technology, Japan Advanced Institute of Science and Technology, Ishikawa, Japan 4 Department of Advanced Robotics, Chiba Institute of Technology, Chiba, Japan ABSTRACT This paper proposes a novel parameter for transfer reinforcement learning to avoid over-fitting when an agent uses a transferred policy from a source task. Learning robot systems have recently been studied for many applications, such as home robots, communication robots, and warehouse robots. However, if the agent reuses the knowledge that has been sufficiently learned in the source task, deadlock may occur and appropriate transfer learning may not be realized. In the previous work, a parameter called transfer rate was proposed to adjust the ratio of transfer, and its contribution include avoiding dead lock in the target task. However, adjusting the parameter depends on human intuition and experiences. Furthermore, the method for deciding transfer rate has not discussed. Therefore, an automatic method for adjusting the transfer rate is proposed in this paper using a sigmoid function. Further, computer simulations are used to evaluate the effectiveness of the proposed method to improve the environmental adaptation performance in a target task, which refers to the situation of reusing knowledge. KEYWORDS Reinforcement Learning, Transfer Learning, Transfer rate, Overfitting, Overlearning 1. INTRODUCTION Machine learning systems and intelligent robot systems are increasingly being deployed to solve practical problems, such as house cleaning and conveyance systems in warehouses [1] [2]. The reinforcement learning framework has been widely discussed in applications of machine learning, such as deep Q-networks [3] [4]. Basic reinforcement learning techniques are usually time consuming. Thus, they are disadvantageous for implementation in actual applications, such as robot systems. To address this problem, transfer learning has been proposed for reinforcement learning [5-8]. Transfer learning theory allows the application of prior knowledge to another similar task. In reinforcement learning, a learning agent is used to draw and transfer policies from previous tasks (source task) to current tasks (target task). Advantages of transfer learning in reinforcement learning include: improved learning speed, fast adaptation to environments, and exploration of more effective performances. Agent systems, including transfer learning, have been successful in some cases. However, transfer learning is not successful in all applications, and in most cases, it is necessary to adjust learning conditions and avoid negative transfer situations, such as deadlock. In the previous years, adjusting the ratio of reusing policy had been proposed as a method to increase the environmental adaptation performance in the target task. The method of adjusting the transfer rate is a mechanism that determines the action value of the
  • 2. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.5/6, November 2020 48 reusing policy, and then uses it in the target task [9] [10]. However, the transfer rate decision depends on human intuition and experience, and the method to determine the transfer rate has not been discussed. In addition, the transfer rate is a fixed value, and it is desirable to adjust it adaptively according to the environment and behavioral conditions. Therefore, an adjusting method of transfer rate using a sigmoid function is proposed in this paper. The proposed method automatically adjusts the transfer rate, and the method adjusting the value has to be discounted when triggered by a bad situation of learning in a target task owing to the reuse of knowledge, such as collision with an obstacle. Computer simulation is used to confirm whether the proposed method for transfer learning in reinforcement learning can adjust the transfer rate. The knowledge obtained is not reused in the case of a negative transfer situation, such as a deadlock; however, it can be reused in other cases. In particular, in recent years, transfer in reinforcement learning in multi-agent systems and human robot teams has been discussed, and it is considered essential to avoid deadlock, that is, negative transfer by transfer learning [11-13]. The remainder of this paper is organized as follows. Section 2 provides an overview of the reinforcement learning and transfer learning method, and discusses the previous method. It adjusts the transferring ratio of reusing knowledge. Section 3 discusses the adjusting method for transferring ratio with the sigmoid function as the proposed method. Section 4 presents the computer simulation experiments and results to evaluate the performance of the proposed method compared with previous methods. Section 5 presents the concluding remarks. 2. PREVIOUS WORKS 2.1. Reinforcement Learning Reinforcement learning is a machine learning algorithm [3]. The reinforcement learning agent explores the optimal solution via trial-and-error and creates its own knowledge as policy. Thus, reinforcement learning does not require training datasets, unlike supervised learning. Many types of reinforcement learning algorithms have been developed in the past decades. In this study, Q- learning was adopted as the learning algorithm [14]. The Q-learning is defined by 𝑄(𝑠𝑡, 𝑎𝑡) ← 𝑄(𝑠𝑡, 𝑎𝑡) + 𝛼 {𝑟𝑡+1 + 𝛾 max 𝑎∈𝐴 𝑄(𝑠𝑡+1, 𝑎) − 𝑄(𝑠𝑡, 𝑎𝑡)}, (1) where 𝑠 ∈ 𝑺 is the state of environments in a state space, 𝑎 ∈ 𝑨 is the action of agent in an action space, 𝛼 is the learning rate (0 < 𝛼 ≤ 1), 𝛾 is the discount rate (0 < 𝛾 ≤ 1), and r denotes the reward. 𝑄(𝑠, 𝑎), also known as Q-table, contains all states of the environment and each action value pair. To select action by the agent, an action selection method was employed in previous reinforcement learning research. In this study, the Boltzmann distribution model is adopted as an action selection function [3]. The action selection probability adopted from the Boltzmann distribution is defined by P(𝑎𝑖|𝑠𝑡) = exp { 𝑄(𝑠𝑡, 𝑎𝑖) 𝑇 ⁄ } ∑ exp { 𝑄(𝑠𝑡, 𝑎) 𝑇 ⁄ } 𝑎∈𝐴 , (2) where T is a parameter that determines the randomness of action selection. The following discussion is based on Watkins’s Q-learning and action selection function is derived from the Boltzmann distribution model.
  • 3. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.5/6, November 2020 49 2.2. Policy Copy in Transfer Learning For traditional transfer learning in reinforcement learning, Tyalor et al. defined a method for transferring the learned action-value function [5] [6]. This method is referred to as policy copy in transfer learning, and is defined as follows: 𝑄𝑡(𝑠, 𝑎) ← 𝑄𝑡(𝑠, 𝑎) + 𝑄𝑠(𝑠, 𝑎). (3) Here, 𝑄𝑡(𝑠, 𝑎) is the policy that includes the initial value in the target task. It is used for learning and action selection. 𝑄𝑠(𝑠, 𝑎) is reusing policy from the source task. Reusing policy has state- action values obtained from the source task. In Taylor’s method, the agent in the initial state of the target task sums the initial value of 𝑄𝑡(𝑠, 𝑎) and transferring policy 𝑄𝑠(𝑠, 𝑎). The agent is learned using the assimilated policy 𝑄𝑡(𝑠, 𝑎) in the target task. Simply, the agent is reusing the transferred policy and overwriting the policy through trial-and-error simultaneously. Originally, Taylor’s method includes inter-task mapping, which defines the mapping between the agent’s observable environmental state S and the executable action A. Inter-task mapping is not mentioned in this paper owing to the description of simplicity. In Taylor’s method, if the transferred policy is sufficiently learned in the source task, it will behave according to the policy when used in different environments, and deadlock, such as collision with obstacles, may occur. This phenomenon is called over-fitting and over learning. The agent can overwrite the policy through behavior in the target task, and some learning time is required. 2.3. Adjusting of Transfer Ratio Takano et al. proposed a method to adjust the action value of the transferred policy [9] by using parameters ζ in the policy assimilation equation. One of the methods in Takano’s study is defined by 𝑄𝑐(𝑠, 𝑎) ← 1 2 (1 − 𝜁) 𝑄𝑡(𝑠, 𝑎) + 𝜁𝑄𝑠(𝑠, 𝑎), (4) Where ζ is the adjusting parameter (0 < 𝜁 < 1). 𝑄𝑐(𝑠, 𝑎) is the assimilation policy, and is used for action selection. Learning in the target task is used as a policy 𝑄𝑡(𝑠, 𝑎). This method discounts 𝑄𝑠(𝑠, 𝑎) by ζ , and conversely uses 𝑄𝑡(𝑠, 𝑎) by the amount of 1 − 𝜁. Thus, it is possible to reduce the effect on the target ask even if a policy 𝑄𝑠(𝑠, 𝑎)with a high action value is reused. However, Takano’s method requires the value of 𝜁 to be determined before operation and depends on human intuition and experience. Furthermore, the policy 𝑄𝑡(𝑠, 𝑎)that should be trusted is discounted. 2.4. Transfer Rate Kono et al. proposed the transfer rate to adjust the ratio of transferring the obtained policy [10]. The method is based on Takano’s method, and it is similar and simpler than Takano’s method. Transferring method with transfer rate can be defined as 𝑄𝑐(𝑠, 𝑎) ← 𝑄𝑡(𝑠, 𝑎) + 𝜏𝑄𝑠(𝑠, 𝑎), (5) where 𝜏, (0 < 𝜏 ≤ 1) is transfer rate, and 𝑄𝑐(𝑠, 𝑎) is an assimilated policy, which is used for action selection in the target task. 𝑄𝑡(𝑠, 𝑎) is the policy in the target task., and is used for learning in the target task. 𝑄𝑠(𝑠, 𝑎) is a reusing policy that was learned in the source task. Kono’s method
  • 4. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.5/6, November 2020 50 is also used to determine the parameters 𝜏 before the learning operation, and the value setting depends on human intuition and experience. In this paper, apart from Taylor’s method, Kono’s and Takano’s methods are collectively called Takano’s method. 3. PROPOSED METHOD In previous study, transferring policy method overwrote transferred policy and fixed discount parameters. An automatic method for adjusting the parameters is proposed in this section. When considering the situation of negative transfer, the action of the agent that can be originally executed cannot be executed by the target task because the policy is reused, and thus, a collision with an obstacle occurs. Therefore, to avoid the above situation, if the agent becomes unable to act, the transfer rate value should be lowered. To lower the value of the transfer rate, the value is adjusted using a sigmoid function according to the number of times that the agent cannot act. The sigmoid function is defined as: 1 1 + 𝑒−𝜎𝑔 , (6) where 𝑔 is the gain value, which is determined by an arbitrary value, and 𝜎 is the input value of the sigmoid function. The transfer method is reformulated using the sigmoid function as defined by 𝑄𝑐(𝑠, 𝑎) ← 𝑄𝑡(𝑠, 𝑎) + 1 1 + 𝑒−𝜎𝑔 𝑄𝑠(𝑠, 𝑎), (7) 𝜎 is the increment or decrement with state 𝑠𝑡 = 𝑠𝑡+1 and 𝑠𝑡 ≠ 𝑠𝑡+1, respectively. The inability of the agent to act means that the current environmental state 𝑠𝑡 and the next environmental state 𝑠𝑡+1 are the same; thus, 𝜎 is subtracted by an arbitrary constant value 𝒹. Conversely, if the action is performed by reusing the policy, only an arbitrary constant 𝒹 is added for each action. This mechanism is defined as follows: σ = { 𝜎 − 𝒹 (𝑠𝑡 = 𝑠𝑡+1) 𝜎 + 𝒹 (𝑠𝑡 ≠ 𝑠𝑡+1) . (8) The above function is calculated for each action, such as a step of the agent. It is possible to adjust the rate of reuse of the observes by comparing the environmental condition at each time step in a generalized form, regardless of actual situations, such as collisions with obstacles. 4. EVALUATION WITH COMPUTER SIMULATION 4.1. Experimental Setup This experiment aims to confirm the performance of an agent’s environmental adaptation using the proposed method. Furthermore, the shortest path problem is adopted in this study as the basic evaluation with a single agent. The learning environment is shown in Figure 1 in the computer simulation. Figures 1 (a) and (b) show the source task environment and target task environment, respectively. In this environment, if the agent achieves the goal, the agent can obtain a reward 𝑟 = 1 from the environment. An agent can observe self-localization. The reinforcement learning parameter is set as α = 0.1 and γ = 0.99. The parameter T of the Boltzmann distribution model is set as 𝑇 = 0.01. These parameters are common to all conditions. In each experiment, 300 episodes were conducted for the source and target tasks. The agent can move to 1 grid in each
  • 5. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.5/6, November 2020 51 step, and there are four directions that can be moved: front, right, back, and left. The time when the agent reaches the goal from the start is referred to as a single 1episode. (a) Source task (b) Target task. Figure 1. Grid world of shortest path problem. Circle mark is agent and initial position in each simulation. Green square is set as goal position. In each sub-figure, red is shortest path between start position and goal position with reinforcement learning. For experimental conditions, the learning results are compared using Taylor, Takano, and the proposed methods. In Taylor’s method, reinforcement learning is executed, as shown in Figure 1 (a), and the acquired policy is transferred to the agent in Figure 1 (b) for transfer learning. In Takano’s method, Kono’s method was adopted for the implementation of this experiment. It is clear that Takano’s method is overfitting, and it assumes a considerable amount of time to learn with the target task. Therefore, the agent is set up to obtain a negative reward 𝑟 = −1 in the event of a collision with a wall. In addition, it was confirmed in advance that the transfer learning was possible, and the transfer rate was set to 𝜏 = 0.1. In the proposed method, the parameter 𝒹 is set to 0.1, and the gain 𝑔 is set to 9.0. 4.2. Results In this section, the experimental results under the three conditions are presented. The obtained learning curves are shown in Figure 2. The learning curve represents the performance related to the learning progress, whereby the horizontal axis is the number of learning episodes and the vertical axis is the number of steps required from the start to the goal. Each learning curve had 10 trial averages. In Figure 2, the learning curve of reinforcement learning that does not reuse the policy in the target task is the standard for performance evaluation. The black line in Figure 2 represents the learning curve of reinforcement learning. 4.2.1. Result of Taylor’s Method Taylor’s method is believed to have a high number of steps in the early stage of learning, and it is considered that an overfitting state emerges. However, because the target task is relearned, the optimal solution, i.e., the convergence to the shortest path, appears from the learning curve of reinforcement learning. 4.2.2. Result of Takano’s Method The learning curve of Tekano has a relatively small number of steps from the early state of learning compared with the learning curve of reinforcement learning, and its convergence is considered to be equivalent to the learning curve of reinforcement learning. High performance in the early stages of the learning curve is called the jump start, and is one of the basic profits of transfer reinforcement learning. It has been confirmed that relearning in the target task is not
  • 6. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.5/6, November 2020 52 possible when the transfer rate 𝜏 = 1.0 because if the learning in the target task progresses, the value of the update is small and it is time-consuming to cancel the action value of the transferred policy. Figure 2. Learning curves 4.2.3. Result of the Proposed Method The learning curve of the proposed method also has jumpstarted. The speed of convergence is slower than that of Takano’s method and is close to the learning curve of reinforcement learning. From the figure, it can be observed that the proposed method performs better than reinforcement learning. In the initial state, the proposed method is equivalent to the transfer rate 𝜏 = 1.0 of Takano’s method, which normally requires time for learning. However, to adaptively change the transfer rate, the performance of the proposed method is verified by Takano’s method with a transfer rate 𝜏 = 0.1. 4.2.4. Discussion The transfer ratio was used to quantitatively compare the performance. The transfer ratio can be evaluated by improving the learning time compared with the basic learning curve, such as reinforcement learning. Transfer ratio 𝑟 can be defined as follows: 𝑟 = ∑ 𝐿𝑡(𝑡) − ∑ 𝐿𝑤𝑡(𝑡) ∑𝐿𝑤𝑡(𝑡) , (9) where 𝐿𝑡(𝑡) is the learning curve with transfer and 𝑡 is the number of episodes. Therefore, ∑ 𝐿𝑡(𝑡)is calculated as the learning curve area. In addition, 𝐿𝑤𝑡(𝑡) is the learning curve without transfer, which indicates reinforcement learning in the target task. ∑ 𝐿𝑤𝑡(𝑡) is the calculated learning curve area of without transfer. Table 1 presents the transfer ratios in Taylor, Takano, and the proposed methods after evaluation. Table 1. Comparison of transfer ratio in each conditions
  • 7. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.5/6, November 2020 53 In Table 1, it can be observed that only Taylor’s method increases the learning time by approximately 1.41 times. Takano’s method and the proposed method decrease learning time. Figure 3 shows the transition of the output value of the sigmoid function at the beginning and end of learning. In the early stage of learning, the value corresponding to the transfer rate is adjusted by interaction with the environment. Sometimes, the transferred policy is used by the agent moving to the goal. At the end of learning, the reusing policy may not be used once. It time- consuming to use the policy at the end of learning than in the early stages of learning because the agent learns a new policy and obtains a high action value; therefore, it becomes possible to reach the goal without being affected by the reusing policy. However, if the agent obtains sufficient policy, there is no need to adjust the ratio of reusing policy. The learning curve of the proposed method does not converge because the ratio is still adjusted, even at the end of learning. Therefore, compared to Takano’s method, the result suggested that the proposed method allows the agent to interact with the environment to automatically adjust the ratio of reuse of the policy without manual adjustment or determining the transfer rate. Moreover, compared with the previous method, it was shown that the agent can improve the environmental adaptability while maintaining the characteristic of not overwriting the reusing policy from the source task. (a) Early stage of learning (1 episode) (b) End of learning (300 episodes) Figure 3. Transition of the value output from the sigmoid function 5. CONCLUSIONS This paper proposed a novel parameter for transfer reinforcement learning to avoid over-fitting in the relearning process of the target task. In the proposed method, the ratio of reusing policy from the source task is adjusted by the sigmoid function and input value, such as collision with obstacle. Basic experiments were performed with the shortest path problem with transfer reinforcement learning based on Taylor’s method without inter-task mapping. The result suggests that the learning agent can be adapted to the environment. Moreover, compared with the previous method, it was shown that the agent can improve the environmental adaptability while maintaining the characteristic of not overwriting the reuse policy from the source task. This result suggests that it can be applied to reusing policy selection in future applications of reinforcement learning agents. Our proposed method does not show convergence in the target task compared with the previous method. For future work, it is necessary to discuss the adjusting method of reusing the ratio of transferring policy at the end of learning. In the proposed method, the ratio of reuse was adjusted adaptively. The effect of the transferring policy cannot be ignored even at the end of learning.
  • 8. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.5/6, November 2020 54 Moreover, the proposed method is evaluated in a simple task and environment in this study. The computer simulation environment is a Markov decision process (MDP), which is different from the real-world environment. To demonstrate the effectiveness of the proposed method, it is necessary to carry out various types of more complex experiments and evaluations. In addition, the proposed method should be evaluated not only in various environments but also in non-MDP environments, such as multi-agent systems. ACKNOWLEDGEMENTS This study was partially supported by JSPS KAKENHI Grant number JP18K18133, and SUZUKI Foundation. This work was founded by The Japan Atomic Energy Agency (JAEA) FY2020 Center of World Intelligence Project for Nuclear Science/Technology and Human Resource Development (Grant no. R02I015). REFERENCES [1] B. Ramalingam, A. K. Lakshmanan, M. Ilyas, A. V. Le, and M. R. Elara (2018). “Cascaded Machine- Learning Technique for Debris Classification in Floor-Cleaning Robot Application.” Applied Sciences 8(12): 1-19. [2] R. D. Andrea (2012). “Guest Editorial: A Revolution in the Warehouse: A Retrospective on Kiva Systems and the Grand Challenges Ahead.”IEEE Transactions on Automation Science and Engineering 9(4): 638-639. [3] R. S. Sutton and A. G. Barto (1998). “Reinforcement learning: An introduction.” MIT press. [4] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S.Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis (2015). “Human-level control through deep reinforcement learning.” Nature 518: 529-533. [5] M. E. Taylor and P. Stone (2009). “Transfer learning for reinforcement learning domains: A survey.” Journal of Machine Learning Research 10(Jul): 1633-1685. [6] M. E. Taylor (2009). “Transfer in Reinforcement Learning Domains.”Studies in Computational Intelligence 216: Springer. [7] A. Lazaric (2012). “Transfer in Reinforcement Learning: A Framework and a Survey. Reinforcement Learning. Adaptation, Learning, and Optimization.” Berlin, Heidelberg, Springer. 12: 143-173. [8] Q. Yang, Y. Zhang, W. Dai, and S. J. Pan (2020) “Transfer learning.”Cambridge University Press. [9] T. Takano, H. Takase, H. Kawanaka, H. Kita, T. Hayashi and S. Tsuruoka (2011). “Transfer Learning based on Forbidden Rule Set in Actor-Critic Method.” International Journal of Innovative Computing, Information and Control 7(5(B)). [10] H. Kono, A. Kamimura, K. Tomita, Y. Murata, and T. Suzuki (2014) “Transfer Learning Method Using Ontology for Heterogeneous Multi-agent Reinforcement Learning.” International Journal of Advanced Computer Science and Application 5(10): pp.156-164. [11] R. Ramakrishnan, C. Zhang, and J. Shah (2017). “Perturbation Training for Human-Robot Teams.” Journal of Artificial Intelligence Research 59: pp.495-541. [12] F. L. Da Silva and A. H. R. Costa (2019). “A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems.” Journal of Artificial Intelligence Research 69: pp.645-703. [13] F. L. Da Silva, G. Warnell, A.H.R.Costa, and P. Stone (2020). “Agents teaching agents: a survey on inter-agent transfer learning.”Autonomous Agents and Multi-Agent Systems 34 (9). [14] C. J. C. H. Watkins and P. Dayan (1992). “Q-Learning.” Machine Learning 8: pp.279-292