Unifying Variational Inference And Policy Optimization