UniK3D: Universal Camera Monocular 3D Estimation

CVPR 2025

Luigi Piccinelli¹ Christos Sakaridis¹ Mattia Segu¹ Yung-Hsu Yang¹ Siyuan Li¹ Wim Abbeloos³ Luc Van Gool^1,2
ETH Zurich¹ INSAIT² Toyota Motor Europe³

TL;DR

UniK3D is capable of estimating metric 3D scenes across domains and for any camera from solely single images. UniK3D directly predicts metric 3D points from the input image at inference time without any additional information.

Abstract

Monocular 3D estimation is crucial for visual perception. However, current methods fall short by relying on oversimplified assumptions, such as pinhole camera models or rectified images. These limitations severely restrict their general applicability, causing poor performance in real-world scenarios with fisheye or panoramic images and resulting in substantial context loss. To address this, we present UniK3D, the first generalizable method for monocular 3D estimation able to model any camera. Our method introduces a spherical 3D representation which allows for better disentanglement of camera and scene geometry and enables accurate metric 3D reconstruction for unconstrained camera models. Our camera component features a novel, model-independent representation of the pencil of rays, achieved through a learned superposition of spherical harmonics. We also introduce an angular loss, which, together with the camera module design, prevents the contraction of the 3D outputs for wide-view cameras. A comprehensive zero-shot evaluation on 13 diverse datasets demonstrates the state-of-the-art performance of UniK3D across 3D, depth, and camera metrics, with substantial gains in challenging large-field-of-view and panoramic settings, while maintaining top accuracy in conventional pinhole small-field-of-view domains. Code and models are available on GitHub.

Video

Plase visit HugginFace Space for an installation-free test on your images!

Monocular Video 3D Reconstruction

UniK3D applied for each frames, no post-processing is applied.

Interactive Pointcloud

The pointcloud are the output of our model with the corresponding images:

BibTex

@inproceedings{piccinelli2025unik3d,
    title     = {{U}ni{K3D}: Universal Camera Monocular 3D Estimation},
    author    = {Piccinelli, Luigi and Sakaridis, Christos and Segu, Mattia and Yang, Yung-Hsu and Li, Siyuan and Abbeloos, Wim and Van Gool, Luc},
    booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2025}
}