Paper presented at MISTA2013, Gent.
In this paper, we present a method based on Learning Automata to solve Hybrid Flexible Flowline Scheduling Problems (HFFSP) with additional constraints like sequence dependent setup times, precedence relations between jobs and machine eligibility. This category of production scheduling problems is noteworthy because it involves several types of constraints that occur in complex real-life production scheduling problems like those in process industry and batch production. In the proposed technique, Learning Automata play a dispersion game to determine the order of jobs to be processed in a way that makespan is minimized, and precedence constraint violations are avoided. Experiments on a set of benchmark problems indicate that this method can yield better results than the ones known until now.
Separation of Lanthanides/ Lanthanides and Actinides
A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems
1.
2. A Reinforcement Learning Approach to Solving
Hybrid Flexible Flowline Scheduling Problems
Bert Van Vreckem Dmitriy Borodin Wim De Bruyn Ann
Now´e
3. Authors
• Bert Van Vreckem, HoGent Business and Information
Management
bert.vanvreckem@hogent.be
• Dmitriy Borodin, OMPartners
dborodin@ompartners.com
• Wim De Bruyn, HoGent Business and Information
Management
wim.debruyn@hogent.be
• Ann Now´e, Artificial Intelligence Lab, Vrije Universiteit Brussel
ann.nowe@vub.ac.be
HFFSP MISTA2013: 29 August 2013 3/28
14. Hybrid Flexible Flowline Scheduling Problems
Other constraints: Precendence relations between jobs
1 2 3 4 5 6 7 8 9 10 11 12
J1 J2M1
J1 J2M2
J2 J1M1
J2 J1M2
HFFSP MISTA2013: 29 August 2013 12/28
15. Hybrid Flexible Flowline Scheduling Problems
Precedence relations between jobs make the problem much
harder, in a way that MILP/CPLEX approach doesn’t work
anymore for larger instances (Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 13/28
17. A Machine Learning Approach
Scheduling Hybrid Flexible Flowline Scheduling Problems
Two stages:
• Job permutations
• Machine assignment
HFFSP MISTA2013: 29 August 2013 15/28
18. A Machine Learning Approach
Scheduling Hybrid Flexible Flowline Scheduling Problems
Two stages:
• Job permutations → Learning Automata
• Machine assignment
HFFSP MISTA2013: 29 August 2013 15/28
19. A Machine Learning Approach
Scheduling Hybrid Flexible Flowline Scheduling Problems
Two stages:
• Job permutations → Learning Automata
• Machine assignment → Earliest Preparation Next Stage
(EPNS) (Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 15/28
20. A Machine Learning Approach
Scheduling Hybrid Flexible Flowline Scheduling Problems
Two stages:
• Job permutations → Learning Automata
• Machine assignment → Earliest Preparation Next Stage
(EPNS) (Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 15/28
21. Reinforcement learning
At every discrete time step t:
• Agent percieves environment state s(t)
• Agent chooses action a(t) ∈ A = a1, . . . , an according to
some policy
• Environment places agent in new state s(t + 1) and gives
reinforcement r(t)
• Goal: learn policy that maximizes long term cumulative
reward t r(t)
Environment
Agent
s
r
a
HFFSP MISTA2013: 29 August 2013 16/28
22. Learning Automata (LA)
Reinforcement Learning agents that choose action according to
probability distribution p(t) = (p1(t), . . . , pn(t)), with
pi = Prob[a(t) = ai] and s.t. n
i=1 pi = 1
pi(0) = 1
n (1)
pi(t + 1) = pi(t) +αrewr(t)(1 − pi(t))
−αpen(1 − r(t))pi(t) (2)
if ai is the action taken at instant t
pj(t + 1) = pj(t) −αrewr(t)pj(t)
+αpen(1 − r(t))
1
n − 1
− pj(t) (3)
if aj = ai
HFFSP MISTA2013: 29 August 2013 17/28
23. Learning Automata (LA)
Reinforcement Learning agents that choose action according to
probability distribution p(t) = (p1(t), . . . , pn(t)), with
pi = Prob[a(t) = ai] and s.t. n
i=1 pi = 1
pi(0) = 1
n (1)
pi(t + 1) = pi(t) +αrewr(t)(1 − pi(t))
−αpen(1 − r(t))pi(t) (2)
if ai is the action taken at instant t
pj(t + 1) = pj(t) −αrewr(t)pj(t)
+αpen(1 − r(t))
1
n − 1
− pj(t) (3)
if aj = ai
HFFSP MISTA2013: 29 August 2013 17/28
24. Learning Automata (LA)
Reinforcement Learning agents that choose action according to
probability distribution p(t) = (p1(t), . . . , pn(t)), with
pi = Prob[a(t) = ai] and s.t. n
i=1 pi = 1
pi(0) = 1
n (1)
pi(t + 1) = pi(t) +αrewr(t)(1 − pi(t))
−αpen(1 − r(t))pi(t) (2)
if ai is the action taken at instant t
pj(t + 1) = pj(t) −αrewr(t)pj(t)
+αpen(1 − r(t))
1
n − 1
− pj(t) (3)
if aj = ai
HFFSP MISTA2013: 29 August 2013 17/28
30. Probabilistic Basic Simple Strategy (PBSS)
(Wauters, 2012)
• A LA is assigned to every position of a permutation
HFFSP MISTA2013: 29 August 2013 20/28
31. Probabilistic Basic Simple Strategy (PBSS)
(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resulting
in a permutation
HFFSP MISTA2013: 29 August 2013 20/28
32. Probabilistic Basic Simple Strategy (PBSS)
(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resulting
in a permutation
• Quality of solution is evaluated
HFFSP MISTA2013: 29 August 2013 20/28
33. Probabilistic Basic Simple Strategy (PBSS)
(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resulting
in a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule Linear
Reward-Inaction (αpen = 0):
HFFSP MISTA2013: 29 August 2013 20/28
34. Probabilistic Basic Simple Strategy (PBSS)
(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resulting
in a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule Linear
Reward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1
HFFSP MISTA2013: 29 August 2013 20/28
35. Probabilistic Basic Simple Strategy (PBSS)
(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resulting
in a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule Linear
Reward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1
• If not, r(t) = 0
HFFSP MISTA2013: 29 August 2013 20/28
36. Probabilistic Basic Simple Strategy (PBSS)
(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resulting
in a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule Linear
Reward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1
• If not, r(t) = 0
• Repeat until convergence
HFFSP MISTA2013: 29 August 2013 20/28
37. Probabilistic Basic Simple Strategy (PBSS)
• PBSS: great results in several optimization problems that
involve learning permutations
HFFSP MISTA2013: 29 August 2013 21/28
38. Probabilistic Basic Simple Strategy (PBSS)
• PBSS: great results in several optimization problems that
involve learning permutations
• but doesn’t work well when precedence constraints are
involved
HFFSP MISTA2013: 29 August 2013 21/28
39. Probabilistic Basic Simple Strategy (PBSS)
• PBSS: great results in several optimization problems that
involve learning permutations
• but doesn’t work well when precedence constraints are
involved
• PBSS only learns from positive experience (i.e. improving on
previous solutions)
HFFSP MISTA2013: 29 August 2013 21/28
40. Probabilistic Basic Simple Strategy (PBSS)
• PBSS: great results in several optimization problems that
involve learning permutations
• but doesn’t work well when precedence constraints are
involved
• PBSS only learns from positive experience (i.e. improving on
previous solutions)
• Doesn’t learn to avoid invalid permutations
HFFSP MISTA2013: 29 August 2013 21/28
41. Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update with
r(t) = 0 and αpen > 0 for all agents that are involved in the
violation of precedence constraints.
HFFSP MISTA2013: 29 August 2013 22/28
42. Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update with
r(t) = 0 and αpen > 0 for all agents that are involved in the
violation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in all
agents, depending on the resulting makespan ms and best
makespan until now msbest:
HFFSP MISTA2013: 29 August 2013 22/28
43. Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update with
r(t) = 0 and αpen > 0 for all agents that are involved in the
violation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in all
agents, depending on the resulting makespan ms and best
makespan until now msbest:
• improved: r(t) = 1;
HFFSP MISTA2013: 29 August 2013 22/28
44. Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update with
r(t) = 0 and αpen > 0 for all agents that are involved in the
violation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in all
agents, depending on the resulting makespan ms and best
makespan until now msbest:
• improved: r(t) = 1;
• equally good: r(t) = 1/2;
HFFSP MISTA2013: 29 August 2013 22/28
45. Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update with
r(t) = 0 and αpen > 0 for all agents that are involved in the
violation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in all
agents, depending on the resulting makespan ms and best
makespan until now msbest:
• improved: r(t) = 1;
• equally good: r(t) = 1/2;
• worse: r(t) = msbest
2ms ;
HFFSP MISTA2013: 29 August 2013 22/28
46. Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update with
r(t) = 0 and αpen > 0 for all agents that are involved in the
violation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in all
agents, depending on the resulting makespan ms and best
makespan until now msbest:
• improved: r(t) = 1;
• equally good: r(t) = 1/2;
• worse: r(t) = msbest
2ms ;
• no valid schedule found: r(t) = 0;
HFFSP MISTA2013: 29 August 2013 22/28
48. Experiments
• HFFSP Benchmark problems from (Ruiz et al., 2008)2
• problem sets with 5, 7, 9, 11, 13, 15 jobs, 96 instances in each
set
• + other constraints that make problems harder (precedence
relations!)
• αrew = 0.1; αpen = 0.5 (no tuning)
• Run until converges, or at most 300 seconds
2
Available at http://soa.iti.es/problem-instances
HFFSP MISTA2013: 29 August 2013 24/28
52. Results and Discussion
Contributions:
• Extension of PBSS for learning permutations with precedence
constraints
• Simple model + RL approach can yield good quality results
for challenging HFFSP instances
HFFSP MISTA2013: 29 August 2013 27/28
53. Results and Discussion
Contributions:
• Extension of PBSS for learning permutations with precedence
constraints
• Simple model + RL approach can yield good quality results
for challenging HFFSP instances
Discussion & future work:
• Precedence relations do make the problem harder
• Parameter tuning
• Convergence
• Larger instances (50, 100 jobs)
• Explore possibilities for improvement in machine assignment
HFFSP MISTA2013: 29 August 2013 27/28