For a project in the fields of Reinforcement Learning for a small gridworld-example with a lot of simulated episodes I want to do the following in R:
- 0a. As a burn-in I simulate 1000 episodes with a random strategy
- 0b. For each possible action I train a model on the already simulated runs with the state as my features
- I simulate a new episode by choosing epsilon-greedy the actions in the episode with the current state as the features
- I evaluate the model up till now by handing out a reward based on the performance
- With the new learned episode I want to update the models and then for the next episode start at step 1. until I have enough episodes overall and the performance does not improve significantly
For this I first thought about using a decision tree to model expected reward ~ state, but then after each step the model has to be trained completely new, cause it has no "update" concept.
Because neural nets are trained sequentially observation by observation with stochastic gradient descent, I now wanna train a neural net with the burn-in observations and then evaluate and update it after each episode (each episode could create multiple observations for each action, so it may be, that multiple update-steps have to be performed).
The question now is, if anyone knows a package in R, which can either fit a neural net and then update it with a new observation/multiple new observation/s.
I once have written a neural net by myself in R, but its not the fastest programmed and I am pretty sure that there should exist faster and better implementations.
Thanks a lot for your help!