In the video saliency prediction task, one of the key issues is the utilization of temporal contextual information of keyframes. In this paper, a deep reinforcement learning agent for video saliency prediction is proposed, designed to look around adjacent frames and adaptively generate a salient contextual window that contains the most correlated information of keyframe for saliency prediction. More specifically, an action set step by step decides whether to expand the window, meanwhile a state set and reward function evaluate the effectiveness of the current window. The deep Q-learning algorithm is followed to train the agent to learn a policy to achieve its goal. The proposed agent can be regarded as plug-and-play which is compatible with generic video saliency prediction models. Experimental results on various datasets demonstrate that our method can achieve an advanced performance.
Yiran Tao, Yaosi Hu, Zhenzhong Chen
2021 International Conference on Visual Communications and Image Processing (VCIP)