Defined
- Machine learning technique which learns a strategy, called a policy, that optimizes an objective for an agent acting in an environment
- Agent performs an action, checks the state of the environment, and gets rewarded based on the resulting state of the environment
- Attempts to achieve the goal of maximizing the long-term reward that the agent receives resulting from its actions
- Good for solving problems where you want your model to make independent decisions.
- Markov Decision Process (MDP)
- Based on Markov Decision Processes (MDPs) models
- Works through a sequence of time steps with each step containing the following
-
Environment
: operating space of the RL model -
State
: complete information describing the environment and relevant past steps -
Action
: agent’s activity -
Reward
: number that reflects the state resulting from the last action -
Observation
: data about the state of the environment available at each step
-
Use Cases
- Robotics
- Traffic light control
- Predictive auto scaling
- Tuning parameters of a web system
- Optimizing chemical reactions
- Personalized recommendations
- Gaming
- Deep learning: “see an environment” and learn how to interact with it
SageMaker Algorithms
Deep Learning Framework
- Reinforcement Learning in TensorFlow and Apache MXNet
- Uses a Reinforcement Learning toolkit
- Manage the agents interaction with the environment
- SageMaker supports Intel Coach and Ray RLlib toolkits
- Uses an environment
- Custom environment
- Open source environments: EnergyPlus, RoboSchool
- Commercial environments: MATLAB, Simulink
- Uses a Reinforcement Learning toolkit
- Important Hyperparameters
-
learning_rate
: how fast the model learns -
discount_factor
: take action with short-term or long-term rewards -
entropy
: degree of uncertainty; exploit what’s already known versus thorough exploration of the environment
-
本文由
Oscaner
创作, 采用
知识共享署名4.0
国际许可协议进行许可
本站文章除注明转载/出处外, 均为本站原创或翻译, 转载前请务必署名
-
Previous
[MLS-C01] [Algorithms] Text Analysis Algorithms -
Next
[MLS-C01] [Algorithms] Forecasting Algorithms