(in one line)

Download/Installation

omnitools.download is a one-line utility for rapidly downloading the starter (& similar) datasets. For more about the tools themselves (omnitools.download and omnitools.upload), please see the dedicated page.

To download the starter dataset, make sure that omnidata-tooling is installed and then run the full download command which will prompt you to accept the component licenses to proceed:

Run the following: (Estimated download time for [RGB + 1 Task + Masks]: 1 day) (Full dataset [30TB]: 5 days)

# Make sure everything is installed
sudo apt-get install aria2
pip install 'omnidata-tools' # Just to make sure it's installed

# Install the 'debug' subset of the Replica and Taskonomy components of the dataset
omnitools.download rgb normals point_info \
  --components replica taskonomy \
  --subset debug \
  --dest ./omnidata_starter_dataset/ --agree-all

You should see the prompt:

drawing

Examples

Here are some other examples:

Download the full Omnidata dataset and agree to licenses

omnitools.download all --components all --subset fullplus \
  --dest ./omnidata_starter_dataset/ \
  --connections_total 40 --agree

Download Taskonomy only:

omnitools.download all --components taskonomy --subset fullplus \
  --dest ./omnidata_starter_dataset/ \
  --connections_total 40 --agree

Omnidata but only depth and masks and keep the compressed files

omnitools.download rgb depth mask_valid --components all --subset fullplus \
  --dest ./omnidata_starter_dataset/ \
  --connections_total 40 --agree

Download meshes for Clevr

omnitools.download mesh --components clevr_simple --subset fullplus \
  --dest ./omnidata_starter_dataset/ \
  --dest_compressed ./omnidata_starter_dataset_compresssed --keep_compressed True \
  --connections_total 40 --agree

Use multiple workers to download Omnidata--this is for worker 7/100, but do a dryrun

omnitools.download all --components all --subset fullplus \
  --num_chunk 6 --num_total_chunks 100 \
  --dest ./omnidata_starter_dataset/ \
  --connections_total 40 --agree --dryrun

...you get the idea :)

Command-line options

omnitools.download is pretty configurable, and you can choose which comonents/subset/split/tasks to download and extract. The downloader will spawn many workers to then download those compressed files, verify the download against checksums on the server, and unpack them. Here are the available options:

> omnitools.download -h
usage: omnitools.download [-h] [--subset {debug,tiny,medium,full,fullplus}]
                          [--split {train,val,test,all}]
                          [--components {all,replica,taskonomy,gso_in_replica,hypersim,blendedmvs,hm3d,clevr_simple,clevr_complex} [{all,replica,taskonomy,gso_in_replica,hypersim,blendedmvs,hm3d,clevr_simple,clevr_complex} ...]]
                          [--dest DEST] [--dest_compressed DEST_COMPRESSED]
                          [--keep_compressed KEEP_COMPRESSED] [--only_download ONLY_DOWNLOAD]
                          [--max_tries_per_model MAX_TRIES_PER_MODEL]
                          [--connections_total CONNECTIONS_TOTAL]
                          [--connections_per_server_per_download CONNECTIONS_PER_SERVER_PER_DOWNLOAD]
                          [--n_workers N_WORKERS] [--num_chunk NUM_CHUNK]
                          [--num_total_chunks NUM_TOTAL_CHUNKS] [--ignore_checksum IGNORE_CHECKSUM]
                          [--dryrun] [--aria2_uri ARIA2_URI]
                          [--aria2_cmdline_opts ARIA2_CMDLINE_OPTS]
                          [--aria2_create_server ARIA2_CREATE_SERVER] [--aria2_secret ARIA2_SECRET]
                          [--agree_all]
                          domains [domains ...]

Downloads Omnidata starter dataset. --- The data is stored on the remote server in a compressed
format (.tar.gz). This function downloads the compressed and decompresses it. Examples: download rgb
normals point_info --components clevr_simple clevr_complex --connections_total 30

positional arguments:
  domains                                         Domains to download (comma-separated or 'all')

optional arguments:
  -h, --help                                      show this help message and exit
  --subset {debug,tiny,medium,full,fullplus}      Subset to download (default: debug)
  --split {train,val,test,all}                    Split to download (default: all)
  --components {all,replica,taskonomy,gso_in_replica,hypersim,blendedmvs,hm3d,clevr_simple,clevr_complex} [{all,replica,taskonomy,gso_in_replica,hypersim,blendedmvs,hm3d,clevr_simple,clevr_complex} ...]
                                                  Component datasets to download (comma-separated)
                                                  (default: all)
  --dest DEST                                     Where to put the uncompressed data (default:
                                                  uncompressed/)
  --dest_compressed DEST_COMPRESSED               Where to download the compressed data (default:
                                                  compressed/)
  --keep_compressed KEEP_COMPRESSED               Don't delete compressed files after decompression
                                                  (default: False)
  --only_download ONLY_DOWNLOAD                   Only download compressed data (default: False)
  --max_tries_per_model MAX_TRIES_PER_MODEL       Number of times to try to download model if
                                                  checksum fails. (default: 3)
  --connections_total CONNECTIONS_TOTAL           Number of simultaneous aria2c connections overall
                                                  (note: if not using the RPC server, this is per-
                                                  worker) (default: 8)
  --connections_per_server_per_download CONNECTIONS_PER_SERVER_PER_DOWNLOAD
                                                  Number of simulatneous aria2c connections per
                                                  server per download. Defaults to
                                                  'total_connections' (note: if not using the RPC
                                                  server, this is per-worker)
  --n_workers N_WORKERS                           Number of workers to use (default: 32)
  --num_chunk NUM_CHUNK                           Download the kth slice of the overall dataset
                                                  (default: 0)
  --num_total_chunks NUM_TOTAL_CHUNKS             Download the dataset in N total chunks. Use with '
                                                  --num_chunk' (default: 1)
  --ignore_checksum IGNORE_CHECKSUM               Ignore checksum validation (default: False)
  --dryrun                                        Keep compressed files even after decompressing
                                                  (default: False)
  --aria2_uri ARIA2_URI                           Location of aria2c RPC (if None, use CLI)
                                                  (default: http://localhost:6800)
  --aria2_cmdline_opts ARIA2_CMDLINE_OPTS         Opts to pass to aria2c (default: )
  --aria2_create_server ARIA2_CREATE_SERVER       Create a RPC server at aria2_uri (default: True)
  --aria2_secret ARIA2_SECRET                     Secret for aria2c RPC (default: )
  --agree_all                                     Agree to all license clickwraps. (default: False)

Citation

If you find the code or models useful, please cite our paper:

@inproceedings{eftekhar2021omnidata,
  title={Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans},
  author={Eftekhar, Ainaz and Sax, Alexander and Malik, Jitendra and Zamir, Amir},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={10786--10796},
  year={2021}
}