Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Â
04 search heuristic
1. Heuristic Search
(Where we try to be smarter in how we choose among alternatives)
Dr. Ali Abdallah
(dr_ali_abdallah@hotmail.com)
2. Search Algorithm (Modified)
ďˇ INSERT(initial-node,FRINGE)
ďˇ Repeat:
ďˇ If empty(FRINGE) then return failure
ďˇ n ď REMOVE(FRINGE)
ďˇ s ď STATE(n)
ďˇ If GOAL?(s) then return path or goal state
ďˇ For every state sâ in SUCCESSORS(s)
ď§ Create a node nâ as a successor of n
ď§ INSERT(nâ,FRINGE)
3. Best-First Search
ď§ It exploits state description to estimate
how promising each search node is
ď§ An evaluation function f maps each search
node N to positive real number f(N)
ď§ Traditionally, the smaller f(N), the more
promising N
ď§ Best-first search sorts the fringe in
increasing f [random order is assumed among
nodes with equal values of f]
4. Best-First Search
âBestâ only refers to the value of f,
not to the quality of the actual path.
Best-first search does not generate
optimal paths in general
5. How to construct
an evaluation function?
ď§ There are no limitations on f. Any
function of your choice is acceptable.
But will it help the search algorithm?
ď§ The classical approach is to construct
f(N) as an estimator of the cost of a
solution path through N
6. Heuristic Function
ď§ The heuristic function h(N) estimates the
distance of STATE(N) to a goal state
Its value is independent of the current
search tree; it depends only on STATE(N)
and the goal test
ď§ Example: 5 8 1 2 3
4 2 1 4 5 6
7 3 6 7 8
STATE(N) Goal state
ď§ h1(N) = number of misplaced tiles = 6
7. Other Examples
5 8 1 2 3
4 2 1 4 5 6
7 3 6 7 8
STATE(N) Goal state
ď§ h1(N) = number of misplaced tiles = 6
ď§ h2(N) = sum of the (Manhattan) distances of
every tile to its goal position
= 2 + 3 + 0 + 1 + 3 + 0 + 3 + 1 = 13
8. Other Examples
(Robot Navigation)
N
yN
yg
xN xg
h1 (N) = (xN -xg ) +(yN -yg )
2 2
(Euclidean distance)
h2(N) = |xN-xg| + |yN-yg| (Manhattan distance)
9. Classical Evaluation Functions
ď§ h(N): heuristic function
[Independent of search tree]
ď§ g(N): cost of the best path found so far
between the initial node and N
[Dependent on search tree]
ď§ f(N) = h(N) ď greedy best-first search
ď§ f(N) = g(N) + h(N)
14. Can we Prove Anything?
ď§ If the state space is finite and we discard
nodes that revisit states, the search is
complete, but in general is not optimal
ď§ If the state space is finite and we do not
discard nodes that revisit states, in general
the search is not complete
ď§ If the state space is infinite, in general the
search is not complete
15. Admissible Heuristic
ď§ Let h*(N) be the cost of the optimal path
from N to a goal node
ď§ The heuristic function h(N) is admissible
if:
0 ⤠h(N) ⤠h*(N)
ď§ An admissible heuristic function is always
optimistic !
⢠Note: G is a goal node ď¨ h(G) = 0
16. 8-Puzzle Heuristics
5 8 1 2 3
4 2 1 4 5 6
7 3 6 7 8
STATE(N) Goal state
ď§ h1(N) = number of misplaced tiles = 6
is admissible
ď§ h2(N) = sum of the (Manhattan) distances of
every tile to its goal position
= 2 + 3 + 0 + 1 + 3 + 0 + 3 + 1 = 13
is admissible
17. Robot Navigation Heuristics
Cost of one horizontal/vertical step = 1
Cost of one diagonal step = 2
h1 (N) = (xN -xg )2 +(yN -yg )2 is admissible
h2(N) = |xN-xg| + |yN-yg| is admissible if moving along
diagonals is not allowed, and
not admissible otherwise
18. A* Search
(most popular algorithm in AI)
ď§ f(N) = g(N) + h(N), where:
⢠g(N) = cost of best path found so far to N
⢠h(N) = admissible heuristic function
ď§ for all arcs: 0 < Îľ ⤠c(N,Nâ)
ď§ âmodifiedâ search algorithm is used
ď¨ Best-first search is then called A* search
19. Result #1
A* is complete and optimal
[This result holds if nodes revisiting
states are not discarded]
20. Proof (1/2)
⢠If a solution exists, A* terminates and
returns a solution
For each node N on the fringe, f(N)âĽd(N)ĂÎľ,
where d(N) is the depth of N in the tree
As long as A* hasnât terminated, a node K on
the fringe lies on a solution path
Since each node expansion increases the
length of one path, K will eventually be
selected for expansion
21. Proof (2/2)
⢠Whenever A* chooses to expand a goal
node, the path to this node is optimal
C*= h*(initial-node)
Gâ: non-optimal goal node in the fringe
f(Gâ) = g(Gâ) + h(Gâ) = g(Gâ) > C*
A node K in the fringe lies on an optimal path:
f(K) = g(K) + h(K) < C*
So, Gâ is not selected for expansion
22. Time Limit Issue
ď§ When a problem has no solution, A* runs for ever if
the state space is infinite or states can be revisited
an arbitrary number of times (the search tree can
grow arbitrarily large). In other cases, it may take a
huge amount of time to terminate
ď§ So, in practice, A* must be given a time limit. If it
has not found a solution within this limit, it stops.
Then there is no way to know if the problem has no
solution or A* needed more time to find it
ď§ In the past, when AI systems were âsmallâ and solving
a single search problem at a time, this was not too
much of a concern. As AI systems become larger,
they have to solve a multitude of search problems
concurrently. Then, a question arises: What should
be the time limit for each of them?
23. Time Limit Issue
Hence, the usefulness of a simple test,
like in the (2n-1)-puzzle, that determines
if the goal is reachable
Unfortunately, such a test rarely exists
24. 8-Puzzle
f(N) = g(N) + h(N)
with h(N) = number of misplaced tiles
3+3
1+5 2+3
3+4
5+2
0+4 3+2 4+1
1+3 2+3
5+0
3+4
1+5 2+4
29. How to create an admissible h?
ď§ An admissible heuristic can usually be seen as
the cost of an optimal solution to a relaxed
problem (one obtained by removing constraints)
ď§ In robot navigation on a grid:
⢠The Manhattan distance corresponds to removing the
obstacles
⢠The Euclidean distance corresponds to removing both
the obstacles and the constraint that the robot
moves on a grid
ď§ More on this topic later
30. What to do with revisited states?
c=1 2
h = 100 1
The heuristic h is
2
1
90
clearly admissible
100
0
31. What to do with revisited states?
c=1 2
f = 1+100 2+1
h = 100 1
2
1
90 4+90
100
?
0 104
If we discard this new node, then the search
algorithm expands the goal node next and
returns a non-optimal solution
32. What to do with revisited states?
1 2
1+100 2+1
100 1
2
1
90 2+90 4+90
100
0 102 104
Instead, if we do not discard nodes revisiting
states, the search terminates with an optimal
solution
33. But ...
If we do not discard nodes revisiting
states, the size of the search tree can be
exponential in the number of visited states
1 1
1 1+1 1+1
1 1
1 2+1 2+1 2+1 2+1
2 2
4 4 4 4 4 4 4 4
34. ď§ It is not harmful to discard a node revisiting a
state if the new path to this state has higher
cost than the previous one
ď§ A* remains optimal
ď§ Fortunately, for a large family of admissible
heuristics â consistent heuristics â there is a
much easier way of dealing with revisited
states
35. Consistent Heuristic
A heuristic h is consistent if
1) for each node N and each child Nâ of N:
h(N) ⤠c(N,Nâ) + h(Nâ) N
c(N,Nâ)
Nâ h(N)
2) for each goal node G: h(Nâ)
h(G) = 0 (triangle inequality)
The heuristic is also said to be monotone
36. Consistency Violation
If h tells that N is
100 units from the
goal, then moving N
from N along an arc c(N,Nâ)
costing 10 units =10
Nâ h(N)
should not lead to a h(Nâ)
=100
node Nâ that h =10
estimates to be 10 (triangle inequality)
units away from the
goal
37. Admissibility and Consistency
ď§ A consistent heuristic is also admissible
ď§ An admissible heuristic may not be
consistent, but many admissible heuristics
are consistent
38. 8-Puzzle
5 8 1 2 3
4 2 1 4 5 6
7 3 6 7 8
STATE(N) Goal
ď§ h1(N) = number of misplaced tiles
ď§ h2(N) = sum of the (Manhattan) distances
of every tile to its goal position
are both consistent
39. Robot Navigation
Cost of one horizontal/vertical step = 1
Cost of one diagonal step = 2
h1 (N) = (xN -xg )2 +(yN -yg )2 is consistent
h2(N) = |xN-xg| + |yN-yg| is consistent if moving along
diagonals is not allowed, and
not consistent otherwise
40. Result #2
If h is consistent, then whenever A*
expands a node, it has already found
an optimal path to this nodeâs state
41. N
Proof
Nâ
ď§ Consider a node N and its child Nâ
Since h is consistent: h(N) ⤠c(N,Nâ)+h(Nâ)
f(N) = g(N)+h(N) ⤠g(N)+c(N,Nâ)+h(Nâ) = f(Nâ)
So, f is non-decreasing along any path
ď§ If K is selected for expansion, then any other
node Kâ in the fringe verifies f(Kâ) ⼠f(K)
So, if one node Kâ lies on another path to the
state of K, the cost of this other path is no
smaller than the path to K
42. Result #2
If h is consistent, then whenever A*
expands a node, it has already found
an optimal path to this nodeâs state
The path to N
is the optimal
path to S
N N1
S S1
N2 can be
discarded
N2
43. Revisited States with Consistent
Heuristic
ď§ When a node is expanded, store its state
into CLOSED
ď§ When a new node N is generated:
⢠If STATE(N) is in CLOSED, discard N
⢠If there exists a node Nâ in the fringe
such that STATE(Nâ) = STATE(N),
discard the node â N or Nâ â with the
largest f
44. Is A* with some consistent
heuristic all that we need?
No !
ď§ There are very dumb consistent
heuristics
45. h ⥠0
ď§ It is consistent (hence, admissible) !
ď§ A* with hâĄ0 is uniform-cost search
ď§ Breadth-first and uniform-cost are
particular cases of A*
46. Heuristic Accuracy
Let h1 and h2 be two consistent heuristics such
that for all nodes N:
h1(N) ⤠h2(N)
h2 is said to be more accurate (or more informed)
than h1
5 8 1 2 3 ď§ h1(N) = number of misplaced
tiles
4 2 1 4 5 6
ď§ h2(N) = sum of distances of
7 3 6 7 8
every tile to its goal position
STATE(N) Goal state
ď§ h2 is more accurate than h1
47. Result #3
ď§ Let h2 be more accurate than h1
ď§ Let A1* be A* using h1 and A2* be A*
using h2
ď§ Whenever a solution exists, all the
nodes expanded by A2* are also
expanded by A1*
48. Proof
ď§ C* = h*(initial-node)
ď§ Every node N such that f(N) < C* is
eventually expanded
ď§ Every node N such that h(N) < C*âg(N)
is eventually expanded
ď§ Since h1(N) ⤠h2(N), every node expanded
by A2* is also expanded by A1*
ď§ âŚexcept for nodes N such that f(N) = C*
49. Effective Branching Factor
ď§ It is used as a measure the effectiveness
of a heuristic
ď§ Let n be the total number of nodes
expanded by A* for a particular problem
and d the depth of the solution
ď§ The effective branching factor b* is
defined by n = 1 + b* + (b*)2 +...+ (b*)d
50. Experimental Results
ď§ 8-puzzle with:
ď§ h1 = number of misplaced tiles
ď§ h2 = sum of distances of tiles to their goal positions
ď§ Random generation of many problem instances
ď§ Average effective branching factors (number of
expanded nodes):
d IDS A1* A2*
2 2.45 1.79 1.79
6 2.73 1.34 1.30
12 2.78 (3,644,035) 1.42 (227) 1.24 (73)
16 -- 1.45 1.25
20 -- 1.47 1.27
24 -- 1.48 (39,135) 1.26 (1,641)
51. How to create good heuristics?
ď§ By solving relaxed problems at each node
ď§ In the 8-puzzle, the sum of the distances of
each tile to its goal position (h2) corresponds to
solving 8 simple problems:
5 8 1 2 3
4 2 1 4 5 6
7 3 6 7 8
5
5
ď§ It ignores negative interactions among tiles
52. Can we do better?
ď§ For example, we could consider two more complex
relaxed problems:
5 8 1 2 3
4 2 1 4 5 6
7 3 6 7 8
1 2 3 5 8
4 2 1 4 5 6
3 7 6 7 8
ď§ ď h = d1234 + d5678 [disjoint pattern heuristic]
ď§ These distances could have been precomputed in a
database
53. Can we do better?
Several order-of-magnitude speedups
have been obtained this way for the
15- and 24-puzzle (see R&N)
54. On Completeness and Optimality
ď§ A* with a consistent heuristic has nice
properties: completeness, optimality, no need
to revisit states
ď§ Theoretical completeness does not mean
âpracticalâ completeness if you must wait too
long to get a solution (remember the time
limit issue)
ď§ So, if one canât design an accurate consistent
heuristic, it may be better to settle for a
non-admissible heuristic that âworks well in
practiceâ, even through completeness and
optimality are no longer guaranteed
55. Iterative Deepening A* (IDA*)
ď§ Idea: Reduce memory requirement of
A* by applying cutoff on values of f
ď§ Consistent heuristic h
ď§ Algorithm IDA*:
1. Initialize cutoff to f(initial-node)
2. Repeat:
ď§ Perform depth-first search by expanding all
nodes N such that f(N) ⤠cutoff
ď§ Reset cutoff to smallest value f of non-
expanded (leaf) nodes
62. 8-Puzzle
f(N) = g(N) + h(N)
with h(N) = number of misplaced tiles
4
Cutoff=5 4
6 6
63. 8-Puzzle
f(N) = g(N) + h(N)
with h(N) = number of misplaced tiles
4
Cutoff=5 4 5
6 6
64. 8-Puzzle
f(N) = g(N) + h(N)
with h(N) = number of misplaced tiles
4
Cutoff=5 4 5
7
6 6
65. 8-Puzzle
f(N) = g(N) + h(N)
with h(N) = number of misplaced tiles
4 5
Cutoff=5 4 5
7
6 6
66. 8-Puzzle
f(N) = g(N) + h(N)
with h(N) = number of misplaced tiles
4 5 5
Cutoff=5 4 5
7
6 6
67. 8-Puzzle
f(N) = g(N) + h(N)
with h(N) = number of misplaced tiles
4 5 5
Cutoff=5 4 5
5
7
6 6
68. Advantages/Drawbacks of IDA*
ď§ Advantages:
⢠Still complete and optimal
⢠Requires less memory than A*
⢠Avoid the overhead to sort the fringe
ď§ Drawbacks:
⢠Canât avoid revisiting states not on the
current path
⢠Available memory is poorly used
69. SMA*
(Simplified Memory-bounded A*)
ď§ Works like A* until memory is full
ď§ Then SMA* drops the node in the fringe with the
largest f value and âbacks upâ this value to its parent
ď§ When all children of a node N have been dropped, the
smallest backed up value replaces f(N)
ď§ In this way, the root of an erased subtree
remembers the best path in that subtree
ď§ SMA* will regenerate this subtree only if all other
nodes in the fringe have greater f values
ď§ SMA* canât completely avoid revisiting states, but it
can do a better job at this that IDA*
70. Local Search
ď§ Light-memory search method
ď§ No search tree; only the current state
is represented!
ď§ Only applicable to problems where the
path is irrelevant (e.g., 8-queen), unless
the path is encoded in the state
ď§ Many similarities with optimization
techniques
71. Steepest Descent
ď§ S ď initial state
ď§ Repeat:
ď§ Sâ ď arg minSââSUCCESSORS(S){h(Sâ)}
ď§ if GOAL?(Sâ) return Sâ
ď§ if h(Sâ) < h(S) then S ď Sâ else return failure
Similar to:
- hill climbing with âh
- gradient descent over continuous space
72. Application: 8-Queen
Repeat n times:
2) Pick an initial state S at random with one queen in each column
3) Repeat k times:
â If GOAL?(S) then return S
â Pick an attacked queen Q at random
â Move Q it in its column to minimize the number of attacking
queens is minimum ď new S [min-conflicts heuristic]
4) Return failure
1
2 2
0
3 2
3 2
2 2
2 2
3 2
73. ⢠Why does it work ???
⢠There are many goal states that are
well-distributed over the state space
⢠If no solution has been found after a few
steps, itâs better to start it all over again.
Building a search tree would be much less
efficient because of the high branching
factor
⢠Running time almost independent of the
number of queens
74. Steepest Descent
may easily get stuck in local minima
ď Random restart (as in n-queen
example)
ď Monte Carlo descent: chooses
probabilistically among the successors
75. Monte Carlo Descent
ď§ S ď initial state
ď§ Repeat k times:
ď§ If GOAL?(S) then return S
ď§ Sâ ď successor of S picked at random
ď§ if h(Sâ) ⤠h(S) then S ď Sâ
ď§ else
- âh = h(Sâ)-h(S)
- with probability ~ exp(ââh/T), where T is called the
âtemperatureâ S ď Sâ [Metropolis criterion]
3) Return failure
Simulated annealing lowers T over the k iterations.
It starts with a large T and slowly decreases T
76. Search problems
Blind search
Heuristic search:
best-first and A*
Construction of heuristics Variants of A* Local search
77. When to Use Search Techniques?
1) The search space is small, and
⢠No other technique is available, or
⢠Developing a more efficient technique is not
worth the effort
2)The search space is large, and
⢠No other available technique is available, and
⢠There exist âgoodâ heuristics