Offline Learning

In this research I study the property of offline learning. Combining conservative Q learning with Double DQN actor-critic structure, also I add a regularizer from SAC model, I have a derivation to show that the model can still get a conservative bound comparing to traditional Double DQN, assuming the uncertainty follows a Gaussian Distribution with fixed variance. And I use a recommender system problem to the effectiveness of this method.

Article

Dingrong Wang et al. “Conservative Evidential Learning of Long-Term User Preferences” In Submission.