Is reinforcement learning problem adaptable to the setting when there is only one - final - reward.I am aware about problems with sparse and delayed rewards,but what about the only one reward and quite long path?


1 Answer1


RL can be used for these cases,but,in such setting,the experience the agent receives during the trajectory does not provide much information regarding the quality of the actions.

Games can be often formulated as episodic t必威电竞asks.For example,you could formulate a chess match as an episode and you could give a reward only at the end of the match.However,this will be hard for the RL to "understand" which moves have mainly contributed to the reward received.This is called thecredit assignment problem.

The expression "delayed rewards" also refers to the cases where you receive only one reward at the end of the episode.


Your Answer

By clicking "Post Your Answer",you agree to ourterms of service,privacy policyandcookie policy

Not the answer you're looking for?Browse other 必威英雄联盟questions taggedor必威电竞ask your own question.