SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
M = {S, A, pT, p0, g}
Pr{St+1 = s′

|At = a, St = s, …} = Pr{St+1 = s′

|At = a, St = s}
=: pT(s′

|s, a), Pr(S0 = s) =: p0(s)
π ∈ ΠM
Pr(At = a|St = s, …) = Pr(At = a|St = s)
=: π(a|s)
Vπ
Vπ
(s) :=
𝔼
π
[C0 |S0 = s], Ct :=
∞
∑
i=0
γi
g(At+i, St+i), γ ∈ [0,1)
f(π)
f(π) :=
∑
s∈S
p0(s)Vπ
(s)
π∈ΠM
f(π) M
Vπ
(s) =
𝔼
π
[C0 |S0 = s]
=
𝔼
π
[g(A0, S0) + γC1 |S0 = s]
=
∑
a∈A
π(a|s)(g(a, s) + γ
∑
a∈A
∑
s′

∈S
π(a|s)pT(s′

|s, a)
𝔼
[C1 |s1 = s′

])
=
∑
a∈A
π(a|s)(g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)V(s′

)), ∀s ∈ S
V*
V*(s) := max
(π0,π1,…)
V(π0,π1,…)
(s)
V*(s) = max
(π0,π1,…)
𝔼
(π0,π1,…)
[g(A0, S0) + γC1 |S0 = s]
= max
π0
𝔼
π0
[g(A0, S0) + γ max
(π1,π2,…)
𝔼
(π1,π2,…)
[C1 |S1 ∼ pT( ⋅ |S0, A0)]|S0 = s]
= max
π0
∑
a∈A
π0(a|s)((g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)V*(s′

))
= max
a∈A
((g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)V*(s′

)), ∀s ∈ S
Bπ(V) :=
∑
a∈A
π(a| ⋅ )(g(a, ⋅ ) + γ
∑
s′

∈S
pT(s′

| ⋅ ,a)V(s′

))
B*(V) := max
a∈A
{g(a, ⋅ ) + γ
∑
s′

∈S
pT(s′

| ⋅ ,a)V(s′

)}
V = B(V), B := {B*, Bπ}
v, v′

: S → ℝ
v ≤ v′

⇔ v(s) ≤ v′

(s), ∀s ∈ S
∥v − v′

∥ := max
s∈S
|v(s) − v(s′

)|
v ≤ v′

⇒ B(v) ≤ B(v′

)
B(v + c) = B(v) + γc, ∀c ∈ ℝ
∥B(v) − B(v′

)∥ ≤ γ∥v − v′

∥
v* = B(v*) v*
lim
k→∞
Bk
(v0) = v*, ∀v0 : S → ℝ
B*(v)(s) = max
a∈A
{g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)v(s′

)}
≤ max
a∈A
{g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)v′

(s′

)}
= B*(v′

)(s), ∀s ∈ S
Bπ
B*(v + c)(s) = max
a∈A
{g(a, s) + γ
∑
s′

∈S
pT(s′
|s, a)(v(s′

) + c)}
= max
a∈A
{g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)v(s′

)} + γc
= B*(v)(s) + γc, ∀s ∈ S
Bπ
v′

− ∥v − v′

∥ ≤ v ≤ v′

+ ∥v − v′

∥
⇒ B(v′

) − γ∥v′

− v∥ ≤ B(v) ≤ B(v′

) + γ∥v′

− v∥
⇒ ∥B(v′

) − B(v)∥ ≤ γ∥v − v′

∥
v, v′

: S → ℝ
v ≤ v′

⇔ v(s) ≤ v′

(s), ∀s ∈ S
∥v − v′

∥ := max
s∈S
|v(s) − v(s′

)|
v ≤ v′

⇒ B(v) ≤ B(v′

)
B(v + c) = B(v) + γc, ∀c ∈ ℝ
∥B(v) − B(v′

)∥ ≤ γ∥v − v′

∥
v* = B(v*) v*
lim
k→∞
Bk
(v0) = v*, ∀v0 : S → ℝ
∥v − v′

∥ ≤ ∥B(v) − B(v′

)∥ + ∥v − B(v)∥ + ∥v′

− B(v′

)∥
≤ γ∥v − v′

∥ + ∥v − B(v)∥ + ∥v′

− B(v′

)∥
⇒ ∥v − v′

| ≤
∥v − B(v)∥ + ∥v′

− B(v′

)∥
1 − γ
vk := Bk
(v0)
∥vn − vm∥ ≤
∥Bn
(v0) − Bn
(v1)∥ + ∥Bm
(v0) − Bm
(v1)∥
1 − γ
≤
γn
∥v0 − v1∥ + γm
∥v0 − v1∥
1 − γ
=
γn
+ γm
1 − γ
∥v0 − v1∥
lim
n,m→∞
∥vn − vm∥ = 0
∥vn − v*∥ ≤
∥Bn
(v0) − Bn
(v1)∥
1 − γ
=
γn
1 − γ
∥v0 − v1∥
lim
n→∞
∥vn − v*∥ = 0
B*(V) := max
a∈A
{g(a, ⋅ ) + γ
∑
s′

∈S
pT(s′

| ⋅ ,a)V(s′

)}
πd
*
πd
* (s) := arg max
a∈A
{g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)V*(s′

)}
lim
k→∞
Bk
(v0) = v*, ∀v0 : S → ℝ
M = {S, A, pT, p0, g} ε ∈ (0,∞)
v′

: S → ℝ π*v′

: S → A
v′

: S → ℝ
v′

= max
a∈A
{g(a, ⋅ ) + γ
∑
s′

∈S
pT(s′

| ⋅ ,a)v(s′

)}
∥v − v′

∥ < ε πd
*
πd
v′

(s) := arg max
a∈A
{g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)v′

(s′

)}
v = v′


Weitere ähnliche Inhalte

Was ist angesagt?

Trend Based + Reg And Holtns
Trend Based + Reg And HoltnsTrend Based + Reg And Holtns
Trend Based + Reg And Holtns3abooodi
 
SUEC 高中 Adv Maths (Trigo Function Part 2)
SUEC 高中 Adv Maths (Trigo Function Part 2)SUEC 高中 Adv Maths (Trigo Function Part 2)
SUEC 高中 Adv Maths (Trigo Function Part 2)tungwc
 
Expressões numéricas
Expressões numéricasExpressões numéricas
Expressões numéricasniltonco77
 
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討Tomoki Koriyama
 
11. simpl met-algebraicos
11. simpl met-algebraicos11. simpl met-algebraicos
11. simpl met-algebraicossonsolesbar
 
18. simpl met-algebraicos.ppt
18. simpl met-algebraicos.ppt18. simpl met-algebraicos.ppt
18. simpl met-algebraicos.pptMarcos Rdguez
 
11. simpl met algebraicos
11. simpl met algebraicos11. simpl met algebraicos
11. simpl met algebraicosboounzueta
 
simplificacion sistemas algebraicos
simplificacion sistemas algebraicossimplificacion sistemas algebraicos
simplificacion sistemas algebraicosPEDROASTURES21
 
Analysis and design of tail stock assembly
Analysis and design of tail stock assemblyAnalysis and design of tail stock assembly
Analysis and design of tail stock assemblyLunavath Suresh
 
Data Science Workflow
Data Science WorkflowData Science Workflow
Data Science WorkflowPyData
 
18. simpl met-algebraicos
18. simpl met-algebraicos18. simpl met-algebraicos
18. simpl met-algebraicosClauFdzSrz
 

Was ist angesagt? (18)

Trend Based + Reg And Holtns
Trend Based + Reg And HoltnsTrend Based + Reg And Holtns
Trend Based + Reg And Holtns
 
SUEC 高中 Adv Maths (Trigo Function Part 2)
SUEC 高中 Adv Maths (Trigo Function Part 2)SUEC 高中 Adv Maths (Trigo Function Part 2)
SUEC 高中 Adv Maths (Trigo Function Part 2)
 
RM FUNCIONAL
RM FUNCIONALRM FUNCIONAL
RM FUNCIONAL
 
Adbequipo8..
Adbequipo8..Adbequipo8..
Adbequipo8..
 
Algebra de Boole
Algebra de BooleAlgebra de Boole
Algebra de Boole
 
Expressões numéricas
Expressões numéricasExpressões numéricas
Expressões numéricas
 
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
 
Eq 1º grau
Eq 1º grauEq 1º grau
Eq 1º grau
 
11. simpl met-algebraicos
11. simpl met-algebraicos11. simpl met-algebraicos
11. simpl met-algebraicos
 
18. simpl met-algebraicos.ppt
18. simpl met-algebraicos.ppt18. simpl met-algebraicos.ppt
18. simpl met-algebraicos.ppt
 
11. simpl met algebraicos
11. simpl met algebraicos11. simpl met algebraicos
11. simpl met algebraicos
 
simplificacion sistemas algebraicos
simplificacion sistemas algebraicossimplificacion sistemas algebraicos
simplificacion sistemas algebraicos
 
Analysis and design of tail stock assembly
Analysis and design of tail stock assemblyAnalysis and design of tail stock assembly
Analysis and design of tail stock assembly
 
CAP corporate presentation 2016 (Arabic Version)
CAP corporate presentation 2016 (Arabic Version)CAP corporate presentation 2016 (Arabic Version)
CAP corporate presentation 2016 (Arabic Version)
 
Ejercicio dos
Ejercicio dosEjercicio dos
Ejercicio dos
 
Sheet no 1
Sheet no 1Sheet no 1
Sheet no 1
 
Data Science Workflow
Data Science WorkflowData Science Workflow
Data Science Workflow
 
18. simpl met-algebraicos
18. simpl met-algebraicos18. simpl met-algebraicos
18. simpl met-algebraicos
 

Ähnlich wie 強化学習勉強会の資料(3回目)

強化学習勉強会6の資料
強化学習勉強会6の資料強化学習勉強会6の資料
強化学習勉強会6の資料Yuji Okamoto
 
Formulario Trigonometria
Formulario TrigonometriaFormulario Trigonometria
Formulario TrigonometriaAntonio Guasco
 
Wu Mamber (String Algorithms 2007)
Wu  Mamber (String Algorithms 2007)Wu  Mamber (String Algorithms 2007)
Wu Mamber (String Algorithms 2007)mailund
 
【演習】Re:ゲーム理論入門 第14回 -仁-
【演習】Re:ゲーム理論入門 第14回 -仁-【演習】Re:ゲーム理論入門 第14回 -仁-
【演習】Re:ゲーム理論入門 第14回 -仁-ssusere0a682
 
ゲーム理論NEXT コア第1回 -特性関数と配分&コアの定義-
ゲーム理論NEXT コア第1回 -特性関数と配分&コアの定義-ゲーム理論NEXT コア第1回 -特性関数と配分&コアの定義-
ゲーム理論NEXT コア第1回 -特性関数と配分&コアの定義-ssusere0a682
 
2010 gabarito fisica
2010 gabarito fisica2010 gabarito fisica
2010 gabarito fisicacavip
 
A Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter ThreeA Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter ThreeChung Hua Universit
 
Bellman ford
Bellman fordBellman ford
Bellman fordKiran K
 
ゲーム理論NEXT コア第2回 -3人ゲームのコアの存在-
ゲーム理論NEXT コア第2回 -3人ゲームのコアの存在-ゲーム理論NEXT コア第2回 -3人ゲームのコアの存在-
ゲーム理論NEXT コア第2回 -3人ゲームのコアの存在-ssusere0a682
 
El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120RCCSRENKEI
 
Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明ssusere0a682
 
ゲーム理論BASIC 第44回 -続・シャープレイ値-
ゲーム理論BASIC 第44回 -続・シャープレイ値-ゲーム理論BASIC 第44回 -続・シャープレイ値-
ゲーム理論BASIC 第44回 -続・シャープレイ値-ssusere0a682
 
Ejercicios prueba de algebra de la UTN- widmar aguilar
Ejercicios prueba de algebra de la UTN-  widmar aguilarEjercicios prueba de algebra de la UTN-  widmar aguilar
Ejercicios prueba de algebra de la UTN- widmar aguilarWidmar Aguilar Gonzalez
 
Ejercicos laplace ruben gonzalez
Ejercicos laplace   ruben gonzalezEjercicos laplace   ruben gonzalez
Ejercicos laplace ruben gonzalezRuben Gonzalez
 
Pdf 635288601139411566 تحلية مياه البحار
Pdf 635288601139411566 تحلية مياه البحارPdf 635288601139411566 تحلية مياه البحار
Pdf 635288601139411566 تحلية مياه البحارMohamed Siddig Fadl Alla Moh.
 

Ähnlich wie 強化学習勉強会の資料(3回目) (20)

強化学習勉強会6の資料
強化学習勉強会6の資料強化学習勉強会6の資料
強化学習勉強会6の資料
 
Formulario Trigonometria
Formulario TrigonometriaFormulario Trigonometria
Formulario Trigonometria
 
Wu Mamber (String Algorithms 2007)
Wu  Mamber (String Algorithms 2007)Wu  Mamber (String Algorithms 2007)
Wu Mamber (String Algorithms 2007)
 
Ejercicio 211 del libro de baldor
Ejercicio 211 del libro de baldorEjercicio 211 del libro de baldor
Ejercicio 211 del libro de baldor
 
【演習】Re:ゲーム理論入門 第14回 -仁-
【演習】Re:ゲーム理論入門 第14回 -仁-【演習】Re:ゲーム理論入門 第14回 -仁-
【演習】Re:ゲーム理論入門 第14回 -仁-
 
ゲーム理論NEXT コア第1回 -特性関数と配分&コアの定義-
ゲーム理論NEXT コア第1回 -特性関数と配分&コアの定義-ゲーム理論NEXT コア第1回 -特性関数と配分&コアの定義-
ゲーム理論NEXT コア第1回 -特性関数と配分&コアの定義-
 
2010 gabarito fisica
2010 gabarito fisica2010 gabarito fisica
2010 gabarito fisica
 
Solucionario teoria-electromagnetica-hayt-2001
Solucionario teoria-electromagnetica-hayt-2001Solucionario teoria-electromagnetica-hayt-2001
Solucionario teoria-electromagnetica-hayt-2001
 
A Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter ThreeA Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter Three
 
Bellman ford
Bellman fordBellman ford
Bellman ford
 
ゲーム理論NEXT コア第2回 -3人ゲームのコアの存在-
ゲーム理論NEXT コア第2回 -3人ゲームのコアの存在-ゲーム理論NEXT コア第2回 -3人ゲームのコアの存在-
ゲーム理論NEXT コア第2回 -3人ゲームのコアの存在-
 
El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120
 
Estadistica U4
Estadistica U4Estadistica U4
Estadistica U4
 
Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明
 
ゲーム理論BASIC 第44回 -続・シャープレイ値-
ゲーム理論BASIC 第44回 -続・シャープレイ値-ゲーム理論BASIC 第44回 -続・シャープレイ値-
ゲーム理論BASIC 第44回 -続・シャープレイ値-
 
Ejercicio 211 del libro de Baldor
Ejercicio 211 del libro de BaldorEjercicio 211 del libro de Baldor
Ejercicio 211 del libro de Baldor
 
Examens math
Examens mathExamens math
Examens math
 
Ejercicios prueba de algebra de la UTN- widmar aguilar
Ejercicios prueba de algebra de la UTN-  widmar aguilarEjercicios prueba de algebra de la UTN-  widmar aguilar
Ejercicios prueba de algebra de la UTN- widmar aguilar
 
Ejercicos laplace ruben gonzalez
Ejercicos laplace   ruben gonzalezEjercicos laplace   ruben gonzalez
Ejercicos laplace ruben gonzalez
 
Pdf 635288601139411566 تحلية مياه البحار
Pdf 635288601139411566 تحلية مياه البحارPdf 635288601139411566 تحلية مياه البحار
Pdf 635288601139411566 تحلية مياه البحار
 

Kürzlich hochgeladen

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 

Kürzlich hochgeladen (20)

(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 

強化学習勉強会の資料(3回目)

  • 1.
  • 2.
  • 3. M = {S, A, pT, p0, g} Pr{St+1 = s′  |At = a, St = s, …} = Pr{St+1 = s′  |At = a, St = s} =: pT(s′  |s, a), Pr(S0 = s) =: p0(s) π ∈ ΠM Pr(At = a|St = s, …) = Pr(At = a|St = s) =: π(a|s) Vπ Vπ (s) := 𝔼 π [C0 |S0 = s], Ct := ∞ ∑ i=0 γi g(At+i, St+i), γ ∈ [0,1) f(π) f(π) := ∑ s∈S p0(s)Vπ (s) π∈ΠM f(π) M
  • 4. Vπ (s) = 𝔼 π [C0 |S0 = s] = 𝔼 π [g(A0, S0) + γC1 |S0 = s] = ∑ a∈A π(a|s)(g(a, s) + γ ∑ a∈A ∑ s′  ∈S π(a|s)pT(s′  |s, a) 𝔼 [C1 |s1 = s′  ]) = ∑ a∈A π(a|s)(g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)V(s′  )), ∀s ∈ S V* V*(s) := max (π0,π1,…) V(π0,π1,…) (s) V*(s) = max (π0,π1,…) 𝔼 (π0,π1,…) [g(A0, S0) + γC1 |S0 = s] = max π0 𝔼 π0 [g(A0, S0) + γ max (π1,π2,…) 𝔼 (π1,π2,…) [C1 |S1 ∼ pT( ⋅ |S0, A0)]|S0 = s] = max π0 ∑ a∈A π0(a|s)((g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)V*(s′  )) = max a∈A ((g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)V*(s′  )), ∀s ∈ S
  • 5. Bπ(V) := ∑ a∈A π(a| ⋅ )(g(a, ⋅ ) + γ ∑ s′  ∈S pT(s′  | ⋅ ,a)V(s′  )) B*(V) := max a∈A {g(a, ⋅ ) + γ ∑ s′  ∈S pT(s′  | ⋅ ,a)V(s′  )} V = B(V), B := {B*, Bπ}
  • 6. v, v′  : S → ℝ v ≤ v′  ⇔ v(s) ≤ v′  (s), ∀s ∈ S ∥v − v′  ∥ := max s∈S |v(s) − v(s′  )| v ≤ v′  ⇒ B(v) ≤ B(v′  ) B(v + c) = B(v) + γc, ∀c ∈ ℝ ∥B(v) − B(v′  )∥ ≤ γ∥v − v′  ∥ v* = B(v*) v* lim k→∞ Bk (v0) = v*, ∀v0 : S → ℝ B*(v)(s) = max a∈A {g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)v(s′  )} ≤ max a∈A {g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)v′  (s′  )} = B*(v′  )(s), ∀s ∈ S Bπ B*(v + c)(s) = max a∈A {g(a, s) + γ ∑ s′  ∈S pT(s′ |s, a)(v(s′  ) + c)} = max a∈A {g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)v(s′  )} + γc = B*(v)(s) + γc, ∀s ∈ S Bπ v′  − ∥v − v′  ∥ ≤ v ≤ v′  + ∥v − v′  ∥ ⇒ B(v′  ) − γ∥v′  − v∥ ≤ B(v) ≤ B(v′  ) + γ∥v′  − v∥ ⇒ ∥B(v′  ) − B(v)∥ ≤ γ∥v − v′  ∥
  • 7. v, v′  : S → ℝ v ≤ v′  ⇔ v(s) ≤ v′  (s), ∀s ∈ S ∥v − v′  ∥ := max s∈S |v(s) − v(s′  )| v ≤ v′  ⇒ B(v) ≤ B(v′  ) B(v + c) = B(v) + γc, ∀c ∈ ℝ ∥B(v) − B(v′  )∥ ≤ γ∥v − v′  ∥ v* = B(v*) v* lim k→∞ Bk (v0) = v*, ∀v0 : S → ℝ ∥v − v′  ∥ ≤ ∥B(v) − B(v′  )∥ + ∥v − B(v)∥ + ∥v′  − B(v′  )∥ ≤ γ∥v − v′  ∥ + ∥v − B(v)∥ + ∥v′  − B(v′  )∥ ⇒ ∥v − v′  | ≤ ∥v − B(v)∥ + ∥v′  − B(v′  )∥ 1 − γ vk := Bk (v0) ∥vn − vm∥ ≤ ∥Bn (v0) − Bn (v1)∥ + ∥Bm (v0) − Bm (v1)∥ 1 − γ ≤ γn ∥v0 − v1∥ + γm ∥v0 − v1∥ 1 − γ = γn + γm 1 − γ ∥v0 − v1∥ lim n,m→∞ ∥vn − vm∥ = 0 ∥vn − v*∥ ≤ ∥Bn (v0) − Bn (v1)∥ 1 − γ = γn 1 − γ ∥v0 − v1∥ lim n→∞ ∥vn − v*∥ = 0
  • 8. B*(V) := max a∈A {g(a, ⋅ ) + γ ∑ s′  ∈S pT(s′  | ⋅ ,a)V(s′  )} πd * πd * (s) := arg max a∈A {g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)V*(s′  )} lim k→∞ Bk (v0) = v*, ∀v0 : S → ℝ M = {S, A, pT, p0, g} ε ∈ (0,∞) v′  : S → ℝ π*v′  : S → A v′  : S → ℝ v′  = max a∈A {g(a, ⋅ ) + γ ∑ s′  ∈S pT(s′  | ⋅ ,a)v(s′  )} ∥v − v′  ∥ < ε πd * πd v′  (s) := arg max a∈A {g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)v′  (s′  )} v = v′