
Depth estimation in videos is essential for visual perception in real-world applications. However, existing methods either rely on simple frame-by-frame monocular models, leading to temporal inconsistencies and inaccuracies, or use computationally demanding temporal modeling, unsuitable for real-time applications. These limitations significantly restrict general applicability and performance in practical settings. To address this, we propose VeloDepth, an efficient and robust online video depth estimation pipeline that effectively leverages spatiotemporal priors from previous depth predictions and performs deep feature propagation. Our method introduces a novel Propagation Module that refines and propagates depth features and predictions using flow-based warping coupled with learned residual corrections. In addition, our design structurally enforces temporal consistency, resulting in stable depth predictions across consecutive frames with improved efficiency. Comprehensive zero-shot evaluation on multiple benchmarks demonstrates the state-of-the-art temporal consistency and competitive accuracy of VeloDepth, alongside its significantly faster inference compared to existing video-based depth estimators. VeloDepth thus provides a practical, efficient, and accurate solution for real-time depth estimation suitable for diverse perception tasks.Code and models are available at lpiccinelli-eth/velodepth.

A comparison between Base Model (UniK3D) and the propagated version.
Comparison with other SotA models, evaluated online.
The model may struggle to recover from fast disocclusions.
The model may struggle to recover from a poor/blurry initialization from the Base Model.
@inproceedings{piccinelli2026velodepth,
title = {Video Depth Propagation},
author = {Piccinelli, Luigi and Wandel, Thiemo and Sakaridis, Christos and Abbeloos, Wim and Van Gool, Luc},
booktitle = {Proceedings of the International Conference on 3D Vision (3DV)},
year = {2026}
}