REBEL: A Reinforcement Learning RL Algorithm that Reduces the Problem of RL to Solving a Sequence of Relative Reward Regression Problems on Iteratively Collected Datasets

REBEL: A Reinforcement Learning RL Algorithm that Reduces the Problem of RL to Solving a Sequence of Relative Reward Regression Problems on Iteratively Collected Datasets