達(dá)飝,達(dá)飝講師,達(dá)飝聯(lián)系方式,達(dá)飝培訓(xùn)師-【中華講師網(wǎng)】
制造業(yè)/專(zhuān)題課程/專(zhuān)項(xiàng)咨詢(xún)
50
鮮花排名
0
鮮花數(shù)量
達(dá)飝:AI/ANN -Reinforcement Learning (《人工智能/人工神經(jīng)元—強(qiáng)化學(xué)習(xí)方法解析 [英文授課]》)
2018-11-13 3179
對(duì)象
歐美外資企業(yè)
目的
見(jiàn)下文
內(nèi)容


《人工智能/人工神經(jīng)元—強(qiáng)化學(xué)習(xí)方法解析 [英語(yǔ)授課]》


AI/ANN -Reinforcement Learning




【Background & Goals】

Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The problem, due to its generality, is studied in many disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In this course of lectures, reinforcement learning is being saw as approximate dynamic programming, The approach is studied in the theory of optimal control, though most studies are concerned with the existence of optimal solutions and their characterization, and not with learning or approximation. In machine learning, the environment is typically formulated as a Markov decision process (MDP), as many reinforcement learning algorithms for this context utilize dynamic programming techniques.

Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Instead the focus is on performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge), The exploration vs. exploitation trade-off has been most thoroughly studied through the multi-armed bandit problem and in finite MDPs.

The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP (Markov decision process) and they are able to target large MDPs where exact methods become infeasible.


【Trainees】

Programmers and managers engaged in AI/ANN - Reinforcement Learning applications and the managers of the relevant business functions.

Trainees need to have well-understanding to advanced  higher mathematics.

(受訓(xùn)學(xué)員必須具備現(xiàn)代高等數(shù)學(xué)良好基礎(chǔ))

【Timing】 6 class hours (6 Class hrs/day)


【General Content】

PART 1  Necessary & Essential AI Knowledge

PART 2  A smart Robot in a room ——Example

PART 3  Defining a Markov Decision Process

PART 4  Monte Carlo methods

PART 5  RL Substantializing & Strengthening ——Q-learning


【Detailed Content】


PART 1  Necessary & Essential AI Knowledge

1.1 Supervised learning

     classification, regression

1.2 Unsupervised learning

     clustering, dimensionality reduction

1.3 Reinforcement learning

     generalization of supervised learning

     learn from interaction w/ environment to achieve a goal


PART 2  A smart Robot in a room ——Example

What’s the strategy to achieve max reward?

What if the actions were deterministic?

No teacher who would say “good” or “bad”

Explore the environment and learn from the experience


PART 3  Defining a Markov Decision Process

3.1 solving an MDP using Dynamic Programming

states, actions and rewards

solution and policy

Markov Decision Process (MDP)

maximize cumulative reward in the long run

Computing return from rewards

3.2 Value functions

Optimal value functions

Policy evaluation/improvement

Policy/Value iteration


PART 4  Monte Carlo methods

4.1 Monte Carlo methods

don’t need full knowledge of environment

averaging sample returns

4.2 Monte Carlo policy evaluation

want to estimate Vp(s)

first-visit MC

4.3 Monte Carlo control

4.4 Maintaining exploration

4.5 Simulated experience

4.6 Summary of Monte Carlo


PART 5  RL Substantializing & Strengthening ——Q-learning

5.1 off-policy learning

5.2 State representation

5.3 Function approximation

5.4 Features

5.5 Splitting and aggregation

5.6 Designing rewards

5.7 Case study: Back gammon

全部評(píng)論 (0)
熱門(mén)領(lǐng)域講師
互聯(lián)網(wǎng)營(yíng)銷(xiāo) 互聯(lián)網(wǎng) 新媒體運(yùn)營(yíng) 短視頻 電子商務(wù) 社群營(yíng)銷(xiāo) 抖音快手 新零售 網(wǎng)絡(luò)推廣 領(lǐng)導(dǎo)力 管理技能 中高層管理 中層管理 團(tuán)隊(duì)建設(shè) 團(tuán)隊(duì)管理 高績(jī)效團(tuán)隊(duì) 創(chuàng)新管理 溝通技巧 執(zhí)行力 阿米巴 MTP 銷(xiāo)售技巧 品牌營(yíng)銷(xiāo) 銷(xiāo)售 大客戶(hù)營(yíng)銷(xiāo) 經(jīng)銷(xiāo)商管理 銷(xiāo)講 門(mén)店管理 商務(wù)談判 經(jīng)濟(jì)形勢(shì) 宏觀(guān)經(jīng)濟(jì) 商業(yè)模式 私董會(huì) 轉(zhuǎn)型升級(jí) 股權(quán)激勵(lì) 納稅籌劃 非財(cái)管理 培訓(xùn)師培訓(xùn) TTT 公眾演說(shuō) 招聘面試 人力資源 非人管理 服裝行業(yè) 績(jī)效管理 商務(wù)禮儀 形象禮儀 職業(yè)素養(yǎng) 新員工培訓(xùn) 班組長(zhǎng)管理 生產(chǎn)管理 精益生產(chǎn) 采購(gòu)管理 易經(jīng)風(fēng)水 供應(yīng)鏈管理 國(guó)學(xué) 國(guó)學(xué)文化 國(guó)學(xué)管理 國(guó)學(xué)經(jīng)典 易經(jīng) 易經(jīng)與管理 易經(jīng)智慧 家居風(fēng)水 國(guó)際貿(mào)易
鮮花榜
頭像
+6107朵
頭像
+6098朵
頭像
+6087朵
頭像
+6087朵
頭像
+6065朵
頭像
+6059朵
頭像
+6054朵
頭像
+6049朵
頭像
+6019朵

Copyright©2008-2025 版權(quán)所有 浙ICP備06026258號(hào)-1 浙公網(wǎng)安備 33010802003509號(hào) 杭州講師網(wǎng)絡(luò)科技有限公司
講師網(wǎng) m.3969a.com 直接對(duì)接10000多名優(yōu)秀講師-省時(shí)省力省錢(qián)
講師網(wǎng)常年法律顧問(wèn):浙江麥迪律師事務(wù)所 梁俊景律師 李小平律師