A very BIG ML dataset un-TAR GZIP command

I have learned that none of my GUI Mac programs were able to expand the 13 GB dataset, however, the command line had no problem with it.


It would be great is it was this simple!

The command has failed as I run out of 41 GB of free disk space before I was able to expand it.

Alternatively, I considered going one directory at the time,

$ tar xvfz BIG_DATASET_MANY_THOUSANDS_FOLDERS.tar.gz /directory_path

with a script that traverses the directories. This way I can keep track which directories were correctly expanded.

At this point, I ended up with multiple directories on various disks, a directory merging tool is very useful:

# parameters:
# -a --archive; look at everything recursively
# -i; --itemize-changes; print update about each file
# -h; --human-readable
# -W; --whole-file; avoid file deltas
# --progress; show progress in terminal
# --log-file=XYZ.log; log the progress to file, this might be useful when resuming
$ rsync -aW source_directory/ destination_directory/


  • https://www.thegeekstuff.com/2010/04/unix-tar-command-examples/
  • https://medium.com/@sethgoldin/a-gentle-introduction-to-rsync-a-free-powerful-tool-for-media-ingest-86761ca29c34

As an Amazon Associate I earn from qualifying purchases.

My favorite quotations..

“A man should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects.”  by Robert A. Heinlein

"We are but habits and memories we chose to carry along." ~ Uki D. Lucas

Popular Recent Articles