This page is for downloading and using the pretrained models in PyTorch. You can also try a demo in your browser or train your own models.


git clone
cd omnidata-tools/torch
conda create -n testenv -y python=3.8
source activate testenv
pip install -r requirements.txt

You can see the complete list of required packages in omnidata-tools/torch/requirements.txt. We recommend using virtualenv for the installation.

Pretrained Models

We are providing our pretrained models which (as of publishing time) have state-of-the-art performance in depth and surface normal estimation.

Network Architecture

The surface normal network is based on the UNet architecture (6 down/6 up). It is trained with both angular and L1 loss and input resolutions between 256 and 512.

The depth networks have DPT-based architectures (similar to MiDaS v3.0) and are trained with scale- and shift-invariant loss and scale-invariant gradient matching term introduced in MiDaS, and also virtual normal loss. You can see a public implementation of the MiDaS loss here. We provide 2 pretrained depth models for both DPT-hybrid and DPT-large architectures with input resolution 384.

Download pretrained models

sh ./tools/
sh ./tools/

These will download the pretrained models for depth and normals to a folder called ./pretrained_models.

Run our models on your own image

After downloading the pretrained models, you can run them on your own image with the following command:

python --task $TASK --img_path $PATH_TO_IMAGE_OR_FOLDER --output_path $PATH_TO_SAVE_OUTPUT

The --task flag should be either normal or depth. To run the script for a normal target on an example image:

python --task normal --img_path assets/demo/test1.png --output_path assets/


If you find the code or models useful, please cite our paper:

  title={Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans},
  author={Eftekhar, Ainaz and Sax, Alexander and Malik, Jitendra and Zamir, Amir},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},