This document presents a proposal for a project on video saliency prediction using deep neural networks. The objectives are to understand state-of-the-art saliency models, set a baseline model on the DHF1K dataset using SalGAN, and explore using complementary modalities like time dynamics as input to SalGAN. Experiments include checking evaluation metrics, setting a Pytorch SalGAN baseline on SALICON, fine-tuning the baseline on DHF1K, and adding extra inputs like depth and coordinates which improve performance. Conclusions discuss the project environment and code, state-of-the-art model performance, and boosting the baseline model on DHF1K video saliency prediction. Future work proposes exploring LSTM,