Installation
git clone https://github.com/EPFL-VILAB/omnidata
cd omnidata/omnidata_tools/torch
conda create -n testenv -y python=3.8
source activate testenv
pip install -r requirements.txt
You can see the complete list of required packages in omnidata-tools/torch/requirements.txt. We recommend using virtualenv for the installation.
Pretrained Models
You can download our pretrained models for surface normal estimation and depth estimation. For each task there are two versions of the models--a V1 used in the paper, and a stronger V2 released in March 2022.
Network Architecture
Version 2 models (stronger than V1)
These are DPT architectures trained on more data using both 3D Data Augmentations and Cross-Task Consistency. Here's the list of updates in Version 2 models:
- Surface Normal Prediction:
- New model is based on DPT architecture.
- Habitat-Matterport 3D Dataset (HM3D) is added to the training data.
- 1 week of training with 2D and 3D data augmentations and 1 week of training with cross-task consistency.
Angular Error | % Within t | Relative Normal | ||||||
---|---|---|---|---|---|---|---|---|
Method | Training Data | Mean | Median | 11.25 | 22.5 | 30 | AUC_o | AUC_p |
Hourglass | OASIS | 23.91 | 18.16 | 31.23 | 59.45 | 71.77 | 0.5913 | 0.5786 |
UNet (v1) | Omnidata | 24.87 | 18.04 | 31.02 | 59.53 |
71.37 | 0.6692 | 0.6758 |
DPT (v2) | Omnidata | 24.16 | 18.23 | 27.71 | 60.95 |
74.15 | 0.6646 | 0.7261 |
Human (Approx.) | - | 17.27 | 12.92 | 44.36 | 76.16 | 85.24 | 0.8826 | 0.6514 |
- Depth Prediction:
- Habitat-Matterport 3D Dataset (HM3D) and 5 MiDaS dataset components (RedWebDataset, HRWSIDataset, MegaDepthDataset, TartanAirDataset, BlendedMVS) are added to the training data.
- 1 week of training with 2D and 3D data augmentations and 1 week of training with cross-task consistency.
Version 1 models
The surface normal network is based on the UNet architecture (6 down/6 up). It is trained with both angular and L1 loss and input resolutions between 256 and 512.
The depth networks have DPT-based architectures (similar to MiDaS v3.0) and are trained with scale- and shift-invariant loss and scale-invariant gradient matching term introduced in MiDaS, and also virtual normal loss. You can see a public implementation of the MiDaS loss here. We provide 2 pretrained depth models for both DPT-hybrid and DPT-large architectures with input resolution 384.
Download pretrained models
sh ./tools/download_depth_models.sh
sh ./tools/download_surface_normal_models.sh
These will download the pretrained models for depth
and normals
to a folder called ./pretrained_models
.
Run our models on your own image
After downloading the pretrained models, you can run them on your own image with the following command:
python demo.py --task $TASK --img_path $PATH_TO_IMAGE_OR_FOLDER --output_path $PATH_TO_SAVE_OUTPUT
The --task
flag should be either normal
or depth
. To run the script for a normal
target on an example image:
python demo.py --task normal --img_path assets/demo/test1.png --output_path assets/
Citation
If you find the code or models useful, please cite our paper:
@inproceedings{eftekhar2021omnidata,
title={Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans},
author={Eftekhar, Ainaz and Sax, Alexander and Malik, Jitendra and Zamir, Amir},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={10786--10796},
year={2021}
}
In case you use our latest pretrained models please also cite the following paper:
@inproceedings{kar20223d,
title={3D Common Corruptions and Data Augmentation},
author={Kar, O{\u{g}}uzhan Fatih and Yeo, Teresa and Atanov, Andrei and Zamir, Amir},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={18963--18974},
year={2022}
}