A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

A Reinforcement Learning Approach to Solving
Hybrid Flexible Flowline Scheduling Problems
Bert Van Vreckem Dmitriy Borodin Wim De Bruyn Ann
Now´e

Authors
• Bert Van Vreckem, HoGent Business and Information
Management
bert.vanvreckem@hogent.be
• Dmitriy Borodin, OMPartners
dborodin@ompartners.com
• Wim De Bruyn, HoGent Business and Information
Management
wim.debruyn@hogent.be
• Ann Now´e, Artiﬁcial Intelligence Lab, Vrije Universiteit Brussel
ann.nowe@vub.ac.be
HFFSP MISTA2013: 29 August 2013 3/28

Contents
1 Hybrid Flexible Flowline Scheduling Problems
2 A Machine Learning Approach
3 Learning Permutations with Precedence Constraints
4 Experiments & results
5 Conclusion

Powerful model for complex real-life production scheduling
problems.
In α/β/γ notation1:
HFFLm, ((RM(i)
)
(m)
i=1/Mj, rm, prec, Siljk, Ailjk, lag/Cmax
1
(Urlings, 2010)

Powerful model for complex real-life production scheduling
problems.
In α/β/γ notation1:
HFFLm, ((RM(i)
)
(m)
i=1/Mj, rm, prec, Siljk, Ailjk, lag/Cmax
Flowline Scheduling problems: jobs processed in consecutive stages.
Stage 1 Stage 2 Stage 3 Stage 4
1
(Urlings, 2010)

Hybrid case: unrelated parallel machines
M11
M12
M13
M21
M22
M31
M32
M33
M34
M41
M42

Flexible case: stages may be skipped
M11
M12
M13
M21
M22
M41
M42

Other constraints: Machine eligibility
M11
M13
M21
M22
M31
M33
M42

Other constraints: Time lag between stages
Stage 1
Stage 2
Stage 3
Stage 4

Other constraints: Sequence dependent setup times
1 2 3 4 5 6 7 8 9 10 11 12
J1 J2M1
J1 J2M2

1 2 3 4 5 6 7 8 9 10 11 12
J1 J2M1
J1 J2M2
J2 J1M1
J2 J1M2

Other constraints: Precendence relations between jobs
1 2 3 4 5 6 7 8 9 10 11 12
J1 J2M1
J1 J2M2
J2 J1M1
J2 J1M2

Precedence relations between jobs make the problem much
harder, in a way that MILP/CPLEX approach doesn’t work
anymore for larger instances (Urlings, 2010)

Contents
5 Conclusion

A Machine Learning Approach
Scheduling Hybrid Flexible Flowline Scheduling Problems
Two stages:
• Job permutations
• Machine assignment

Two stages:
• Job permutations → Learning Automata
• Machine assignment

Two stages:
• Job permutations → Learning Automata
• Machine assignment → Earliest Preparation Next Stage
(EPNS) (Urlings, 2010)

Reinforcement learning
At every discrete time step t:
• Agent percieves environment state s(t)
• Agent chooses action a(t) ∈ A = a1, . . . , an according to
some policy
• Environment places agent in new state s(t + 1) and gives
reinforcement r(t)
• Goal: learn policy that maximizes long term cumulative
reward t r(t)
Environment
Agent
s
r
a

Learning Automata (LA)
Reinforcement Learning agents that choose action according to
probability distribution p(t) = (p1(t), . . . , pn(t)), with
pi = Prob[a(t) = ai] and s.t. n
i=1 pi = 1
pi(0) = 1
n (1)
pi(t + 1) = pi(t) +αrewr(t)(1 − pi(t))
−αpen(1 − r(t))pi(t) (2)
if ai is the action taken at instant t
pj(t + 1) = pj(t) −αrewr(t)pj(t)
+αpen(1 − r(t))
1
n − 1
− pj(t) (3)
if aj = ai

Learning Automaton update
1 2 3 4
0
0.2
0.4
0.6
0.8
1
i
pi

1 2 3 4
0
0.2
0.4
0.6
0.8
1
i
pi
E.g. action 3 was chosen

1 2 3 4
0
0.2
0.4
0.6
0.8
1
i
pi
1 2 3 4
0
0.2
0.4
0.6
0.8
1
r(t) = 1
pi

1 2 3 4
0
0.2
0.4
0.6
0.8
1
i
pi
1 2 3 4
0
0.2
0.4
0.6
0.8
1
r(t) = 1
pi
1 2 3 4
0
0.2
0.4
0.6
0.8
1
r(t) = 0
pi

Contents
5 Conclusion

Probabilistic Basic Simple Strategy (PBSS)
(Wauters, 2012)
• A LA is assigned to every position of a permutation

(Wauters, 2012)
• LAs play a dispersion game to choose unique action, resulting
in a permutation

(Wauters, 2012)
in a permutation
• Quality of solution is evaluated

(Wauters, 2012)
in a permutation
• Update probabilities according to LA update rule Linear
Reward-Inaction (αpen = 0):

(Wauters, 2012)
in a permutation
• Better result than best one so far: r(t) = 1

(Wauters, 2012)
in a permutation
• If not, r(t) = 0

(Wauters, 2012)
in a permutation
• If not, r(t) = 0
• Repeat until convergence

• PBSS: great results in several optimization problems that
involve learning permutations

• but doesn’t work well when precedence constraints are
involved

involved
• PBSS only learns from positive experience (i.e. improving on
previous solutions)

involved
• PBSS only learns from positive experience (i.e. improving on
previous solutions)
• Doesn’t learn to avoid invalid permutations

Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update with
r(t) = 0 and αpen > 0 for all agents that are involved in the
violation of precedence constraints.

• If the job permutation is valid, perform a LR−I update in all
agents, depending on the resulting makespan ms and best
makespan until now msbest:

• improved: r(t) = 1;

• equally good: r(t) = 1/2;

• worse: r(t) = msbest
2ms ;

• worse: r(t) = msbest
2ms ;
• no valid schedule found: r(t) = 0;

Contents
5 Conclusion

Experiments
• HFFSP Benchmark problems from (Ruiz et al., 2008)2
• problem sets with 5, 7, 9, 11, 13, 15 jobs, 96 instances in each
set
• + other constraints that make problems harder (precedence
relations!)
• αrew = 0.1; αpen = 0.5 (no tuning)
• Run until converges, or at most 300 seconds
2
Available at http://soa.iti.es/problem-instances

Results
Instance set 5 7 9 11 13 15 overall
mean RD (%) 0.0697 2.0131 1.1568 1.6565 3.7294 7.9189 2.7484
best RD (%) -35.70 -24.71 -26.92 -21.10 -43.34 -10.46 -43.34
# improved 11 12 18 12 9 6 68
# equal 62 40 19 18 8 7 154
# worse 23 44 59 66 79 82 354

Contents
5 Conclusion

Results and Discussion
Contributions:
• Extension of PBSS for learning permutations with precedence
constraints
• Simple model + RL approach can yield good quality results
for challenging HFFSP instances

Results and Discussion
Contributions:
• Extension of PBSS for learning permutations with precedence
constraints
• Simple model + RL approach can yield good quality results
for challenging HFFSP instances
Discussion & future work:
• Precedence relations do make the problem harder
• Parameter tuning
• Convergence
• Larger instances (50, 100 jobs)
• Explore possibilities for improvement in machine assignment

Thank you!
Questions?
bert.vanvreckem@hogent.be
http://www.slideshare.net/bertvanvreckem/

A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Ähnlich wie A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems