As I was trying to parallelize some bash script today, quite unexpectedly, I had more fun1 than I was hoping for. It was a bumpy ride, so please follow the story from the safety of a comfy chair, basking in the glow of your favorite terminal emulator.
So I was sitting there, facing off a small snippet of bash code that screamed to be optimized. Truthfully, what was actually screaming at me was the fact that the last stage in a Go pipeline was taking some 27 minutes.
Here’s the infamous snippet:
1 2 3 4 5
Actually, not that bad. Ok, the
$host value could be passed as an argument
to the function
r_refresh_host and not being used implicitly, but all in all
not too shabby.
Anyway, this can clearly be optimized by connecting to the nodes in parallel. So let’s do just that:
1 2 3 4 5 6 7
Trivial change and voilà! The deed is done. Well not really. The snippet is
embedded in the script that declares
set -e. I would not think anything of it
had not a colleague mentioned a few days prior that the failure of a subshell
does not kill the main script. That’s not the behaviour I desired as I’d like
for all the nodes to be properly updated.
The same colleague told me that he uses something called parallel to solve similar issues.
Enter GNU Parallel
GNU Parallel is “a shell tool for executing jobs in parallel”. It even has quite nice logo:
Thus began my suffering.
I’ve played with parallel on my mac after installing it via homebrew. It
worked nicely. The only thing that was annoying me a bit was the fact that I
needed to use
--no-notice flag to silence some undesired output. The
To be truthful, I fumbled a bit to get there as I’m not a bash expert™. But that’s less important for the story.
After running the Go pipeline, in a test stage that rolls out
the change to only one node, I found out that parallel version I just
installed on that ubuntu machine does not support
--no-notice flag. Oh,
well… The flag went away.
Retriggering the pipeline, the final stage that is rolling out the configuration to all nodes failed. This was confusing as the previous stage that tests rollout to one node succeeded. What is going on?
I dug deeper and it turns out not all the nodes had the same version of GNU parallel. Some of them (version 20120422) worked just fine, but others (version 20121122) failed miserably. More concretely, event the simplest command:
\n depending on the version in use.
While checking the version, there was a somewhat random message:
Not knowing the Tollef fellow, I added the
The (happy?) end
And it worked.
Here’s the final one-liner:
Somehow you’d think that it’s not necessary to use gnu flag in something called GNU parallel. But life is full of surprises.
To be fair, in the end parallel does the job. And it is better than solving the same problem with additional bash logic. I definitely plan on using it again. Check it out, maybe you should too. Just don’t forget to use a flag or two.
As a side note, this particular Go stage went from 27 min to 4.5 min. All is well that ends well.
One reader was kind enough to explain why the issue happens.
Maybe something else to mention: I’m a big fan of GNU and respect everything it stands for. So take this post as intended and laugh a bit. :)