What is reinforcement learning? How AI trains itself

by | Sep 5, 2022 | Technology

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

Table of contents

Machine learning (ML) might be considered the core subset of artificial intelligence (AI), and reinforcement learning may be the quintessential subset of ML that people imagine when they think of AI.

Reinforcement learning is the process by which a machine learning algorithm, robot, etc. can be programmed to respond to complex, real-time and real-world environments to optimally reach a desired target or outcome. Think of the challenge posed by self-driving cars.

The algorithms involved can also “learn” from, or be improved by, this process of taking in and responding to new circumstances.

Other forms of ML may be “trained” by sometimes massive sets of “training data,” often enabling an algorithm to classify or cluster data — or otherwise recognize patterns — based on the relationships and outcomes on which it has been trained. Machine learning algorithms begin with training data and create models that capture some of the patterns and lessons embedded in the data. 

MetaBeat 2022
MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

Reinforcement learning is part of the training process that often happens after deployment when the model is working. The new data captured from the environment is used to tweak and adjust the model for the current world. 

Reinforcement learning is accomplished with a feedback loop based on “rewards” and “penalties.” The scientist or user creates a list of successful and unsuccessful outcomes, and then the AI uses them to adjust the model. It might tweak some of the weights in the model, or even reevaluate some or all of the training data in light of the new reward or penalty.

For instance, an autonomous car may have a set of straightforward rewards and penalties that are predetermined. The algo …

Article Attribution | Read More at Article Source

Share This