Tools to efficiently download/upload/move annotator data. pip install 'omnidata-tools'.

omnitools.download

omnitools.download is a one-line utility for rapidly downloading the starter (& similar) datasets.

The download tool is designed to be fast and easy to use (it's built off of aria2). We regularly get 70MB/s downloading from the EPFL servers to Berkeley. It's written pretty generally, too, so the tool can also be used to download other datasets stored in a similar format (i.e. other datasets formatted similarly to annotator outputs, like Taskonomy).
Note: There's also an inverse omnitools.upload for uploading an annotator-generated dataset to a server.

Here is the man page for the tool:

> omnitools.download -h
usage: omnitools.download [-h] [--subset {debug,tiny,medium,full,fullplus}]
                          [--split {train,val,test,all}]
                          [--components {all,replica,taskonomy,gso_in_replica,hypersim,blendedmvs,hm3d,clevr_simple,clevr_complex} [{all,replica,taskonomy,gso_in_replica,hypersim,blendedmvs,hm3d,clevr_simple,clevr_complex} ...]]
                          [--dest DEST] [--dest_compressed DEST_COMPRESSED]
                          [--keep_compressed KEEP_COMPRESSED] [--only_download ONLY_DOWNLOAD]
                          [--max_tries_per_model MAX_TRIES_PER_MODEL]
                          [--connections_total CONNECTIONS_TOTAL]
                          [--connections_per_server_per_download CONNECTIONS_PER_SERVER_PER_DOWNLOAD]
                          [--n_workers N_WORKERS] [--num_chunk NUM_CHUNK]
                          [--num_total_chunks NUM_TOTAL_CHUNKS] [--ignore_checksum IGNORE_CHECKSUM]
                          [--dryrun] [--aria2_uri ARIA2_URI]
                          [--aria2_cmdline_opts ARIA2_CMDLINE_OPTS]
                          [--aria2_create_server ARIA2_CREATE_SERVER] [--aria2_secret ARIA2_SECRET]
                          [--agree_all]
                          domains [domains ...]

Downloads Omnidata starter dataset. --- The data is stored on the remote server in a compressed
format (.tar.gz). This function downloads the compressed and decompresses it. Examples: download rgb
normals point_info --components clevr_simple clevr_complex --connections_total 30

positional arguments:
  domains                                         Domains to download (comma-separated or 'all')

optional arguments:
  -h, --help                                      show this help message and exit
  --subset {debug,tiny,medium,full,fullplus}      Subset to download (default: debug)
  --split {train,val,test,all}                    Split to download (default: all)
  --components {all,replica,taskonomy,gso_in_replica,hypersim,blendedmvs,hm3d,clevr_simple,clevr_complex} [{all,replica,taskonomy,gso_in_replica,hypersim,blendedmvs,hm3d,clevr_simple,clevr_complex} ...]
                                                  Component datasets to download (comma-separated)
                                                  (default: all)
  --dest DEST                                     Where to put the uncompressed data (default:
                                                  uncompressed/)
  --dest_compressed DEST_COMPRESSED               Where to download the compressed data (default:
                                                  compressed/)
  --keep_compressed KEEP_COMPRESSED               Don't delete compressed files after decompression
                                                  (default: False)
  --only_download ONLY_DOWNLOAD                   Only download compressed data (default: False)
  --max_tries_per_model MAX_TRIES_PER_MODEL       Number of times to try to download model if
                                                  checksum fails. (default: 3)
  --connections_total CONNECTIONS_TOTAL           Number of simultaneous aria2c connections overall
                                                  (note: if not using the RPC server, this is per-
                                                  worker) (default: 8)
  --connections_per_server_per_download CONNECTIONS_PER_SERVER_PER_DOWNLOAD
                                                  Number of simulatneous aria2c connections per
                                                  server per download. Defaults to
                                                  'total_connections' (note: if not using the RPC
                                                  server, this is per-worker)
  --n_workers N_WORKERS                           Number of workers to use (default: 32)
  --num_chunk NUM_CHUNK                           Download the kth slice of the overall dataset
                                                  (default: 0)
  --num_total_chunks NUM_TOTAL_CHUNKS             Download the dataset in N total chunks. Use with '
                                                  --num_chunk' (default: 1)
  --ignore_checksum IGNORE_CHECKSUM               Ignore checksum validation (default: False)
  --dryrun                                        Keep compressed files even after decompressing
                                                  (default: False)
  --aria2_uri ARIA2_URI                           Location of aria2c RPC (if None, use CLI)
                                                  (default: http://localhost:6800)
  --aria2_cmdline_opts ARIA2_CMDLINE_OPTS         Opts to pass to aria2c (default: )
  --aria2_create_server ARIA2_CREATE_SERVER       Create a RPC server at aria2_uri (default: True)
  --aria2_secret ARIA2_SECRET                     Secret for aria2c RPC (default: )
  --agree_all                                     Agree to all license clickwraps. (default: False)

omnitools.upload

TODO