Tarsplit: A Utility to Split Tarballs Into Multiple Parts

But what does Docker have to do with tar?

“Good tea. Nice house.”

How to install Tarsplit

If you’re running Homebrew on a Mac or Linux, here’s how to install Tarsplit:

How Tarsplit Works

Python ships with a module called tarfile, which provides a high-level interface to tarballs. I made use of that module to read in the contents of the tarball to split, create chunks of an equal size and write out the files as separate tarballs of close to equal size. This is done in a single thread.

Why not use multi-threading?

Yeah, I tried that after release 1.0. It turns out that even when using every trick I knew, a multithreaded approach consisting of one thread per chunk to be written was slower than just doing everything in a single thread. I observed this on a 10-core machine with an SSD, so I’m just gonna go ahead and point the finger at the GIL and remind myself that threading in Python is cursed.

Tarsplit In Action

The syntax of Tarsplit is fairly straightforward:

Tarsplit With Docker

How do things look in Docker? This is what I now see in Docker while pushing the Splunk Lab image:

In Closing

I hope you find this utility useful. I had fun writing it, and I enjoy the ability to make my Docker images just a little more manageable.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Douglas Muth

Douglas Muth


Engineer. AWS, CyberSec, DMARC, Docker, Splunk, White Mage. Staffs way too many cons. he/him. 28% Cheetah.