Tools to efficiently download/upload/move annotator data.
pip install 'omnidata-tools'
.omnitools.download
is a one-line utility for rapidly downloading the starter (& similar) datasets.
The download tool is designed to be fast and easy to use (it's built off of aria2). We regularly get 70MB/s downloading from the EPFL servers to Berkeley. It's written pretty generally, too, so the tool can also be used to download other datasets stored in a similar format (i.e. other datasets formatted similarly to annotator outputs, like Taskonomy).
Note: There's also an inverse omnitools.upload
for uploading an annotator-generated dataset to a server.
Here is the man
page for the tool:
> omnitools.download -h
usage: omnitools.download [-h] [--subset {debug,tiny,medium,full,fullplus}] [--split {train,val,test,all}] [--components {all,replica,taskonomy,gso_in_replica,hypersim,blendedmvs,hm3d,clevr_simple,clevr_complex} [{all,replica,taskonomy,gso_in_replica,hypersim,blendedmvs,hm3d,clevr_simple,clevr_complex} ...]] [--dest DEST] [--dest_compressed DEST_COMPRESSED] [--keep_compressed KEEP_COMPRESSED] [--only_download ONLY_DOWNLOAD] [--max_tries_per_model MAX_TRIES_PER_MODEL] [--connections_total CONNECTIONS_TOTAL] [--connections_per_server_per_download CONNECTIONS_PER_SERVER_PER_DOWNLOAD] [--n_workers N_WORKERS] [--num_chunk NUM_CHUNK] [--num_total_chunks NUM_TOTAL_CHUNKS] [--ignore_checksum IGNORE_CHECKSUM] [--dryrun] [--aria2_uri ARIA2_URI] [--aria2_cmdline_opts ARIA2_CMDLINE_OPTS] [--aria2_create_server ARIA2_CREATE_SERVER] [--aria2_secret ARIA2_SECRET] [--agree_all] domains [domains ...] Downloads Omnidata starter dataset. --- The data is stored on the remote server in a compressed format (.tar.gz). This function downloads the compressed and decompresses it. Examples: download rgb normals point_info --components clevr_simple clevr_complex --connections_total 30 positional arguments: domains Domains to download (comma-separated or 'all') optional arguments: -h, --help show this help message and exit --subset {debug,tiny,medium,full,fullplus} Subset to download (default: debug) --split {train,val,test,all} Split to download (default: all) --components {all,replica,taskonomy,gso_in_replica,hypersim,blendedmvs,hm3d,clevr_simple,clevr_complex} [{all,replica,taskonomy,gso_in_replica,hypersim,blendedmvs,hm3d,clevr_simple,clevr_complex} ...] Component datasets to download (comma-separated) (default: all) --dest DEST Where to put the uncompressed data (default: uncompressed/) --dest_compressed DEST_COMPRESSED Where to download the compressed data (default: compressed/) --keep_compressed KEEP_COMPRESSED Don't delete compressed files after decompression (default: False) --only_download ONLY_DOWNLOAD Only download compressed data (default: False) --max_tries_per_model MAX_TRIES_PER_MODEL Number of times to try to download model if checksum fails. (default: 3) --connections_total CONNECTIONS_TOTAL Number of simultaneous aria2c connections overall (note: if not using the RPC server, this is per- worker) (default: 8) --connections_per_server_per_download CONNECTIONS_PER_SERVER_PER_DOWNLOAD Number of simulatneous aria2c connections per server per download. Defaults to 'total_connections' (note: if not using the RPC server, this is per-worker) --n_workers N_WORKERS Number of workers to use (default: 32) --num_chunk NUM_CHUNK Download the kth slice of the overall dataset (default: 0) --num_total_chunks NUM_TOTAL_CHUNKS Download the dataset in N total chunks. Use with ' --num_chunk' (default: 1) --ignore_checksum IGNORE_CHECKSUM Ignore checksum validation (default: False) --dryrun Keep compressed files even after decompressing (default: False) --aria2_uri ARIA2_URI Location of aria2c RPC (if None, use CLI) (default: http://localhost:6800) --aria2_cmdline_opts ARIA2_CMDLINE_OPTS Opts to pass to aria2c (default: ) --aria2_create_server ARIA2_CREATE_SERVER Create a RPC server at aria2_uri (default: True) --aria2_secret ARIA2_SECRET Secret for aria2c RPC (default: ) --agree_all Agree to all license clickwraps. (default: False)