机器学习（十） — 强化学习

发布时间：2024年01月18日

Reinforcement learning

1 key concepts

states
actions
rewards
discount factor $\gamma$
return
policy $\pi$

2 return

definition: the sum of the rewards that the system gets, weighted by the discount factor
compute:
$R_i$ : reward of state i
$\gamma$ : discount factor(usually close to 1), making the reinforcement learning impatient

$R_1 + \gamma R_2 + \cdots + \gamma^{n-1} R_n$

3 policy

policy $\pi$ maps state $s$ to some action $a$

$\pi(s) = a$

the goal of reinforcement learning is to find a policy $\pi$ to map every state $s$ to action $a$ to maximize the return

在这里插入图片描述

4 state action value function

1. definition

$Q(s, a) = $return if

start in state $s$
take action $a$ once
behave optimally after that

2. usage

the best possible return from state $s$ is $ma x$ $Q (s, a)$
the best possible action in state $s$ is the action $a$ that gives $ma x$ $Q (s, a)$

5 bellman equation

$s$ : current state

$a$ : current action

$s^{'}$ : state you get to after taking action $a$

$a^{'}$ : action that you take in state $s^{'}$

$\gamma max Q(s^{'}, a^{'})$

6 Deep Q-Network

1. definition

use neural network to learn $Q (s, a)$

$a)\\ y = R(s) + \gamma max Q(s^{'}, a^{'}) \\ f_{w, b}(x) \approx y$

在这里插入图片描述

2. step

initialize neural network randomly as guess of $Q (s, a)$
repeat:
take actions, get $s, a, R(s), s^{'})$
store N most recent $s, a, R(s), s^{'})$ tuples

train neural network:
create training set of N examples using $x = (s, a)$ and $\gamma max Q(s^{'}, a^{'})$
train $Q_{new}$ such that $Q_{new} \approx y$
set $Q = Q_{new}$

3. optimazation

在这里插入图片描述

4. $\epsilon$ -greedy policy

with probability $\epsilon$ , pick the action $a$ that maximize $Q (s, a)$
with probability $\epsilon$ , pick the action $a$ randomly

5. mini-batch

use a subset of the dataset on each gradient decent

6. soft update

instead $Q = Q_{new}$

$\alpha w_{new} + w\\ b = \alpha b_{new} + b$

文章来源:https://blog.csdn.net/m0_65591847/article/details/135641978
本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若内容造成侵权/违法违规/事实不符，请联系我的编程经验分享网邮箱：chenni525@qq.com进行投诉反馈，一经查实，立即删除！