$ tar xvzf BIG_DATASET_MANY_THOUSANDS_FOLDERS.tar.gz
It would be great is it was this simple!
The command has failed as I run out of 41 GB of free disk space before I was able to expand it.
Alternatively, I considered going one directory at the time,
$ tar xvfz BIG_DATASET_MANY_THOUSANDS_FOLDERS.tar.gz /directory_path
with a script that traverses the directories. This way I can keep track which directories were correctly expanded.
At this point, I ended up with multiple directories on various disks, a directory merging tool is very useful:
# parameters:
# -a --archive; look at everything recursively
# -i; --itemize-changes; print update about each file
# -h; --human-readable
# -W; --whole-file; avoid file deltas
# --progress; show progress in terminal
# --log-file=XYZ.log; log the progress to file, this might be useful when resuming
$ rsync -aW source_directory/ destination_directory/
References:
- https://www.thegeekstuff.com/2010/04/unix-tar-command-examples/
- https://medium.com/@sethgoldin/a-gentle-introduction-to-rsync-a-free-powerful-tool-for-media-ingest-86761ca29c34