ik

Ivan Kusalic - home page

GNU Parallel - the Adventure

As I was trying to parallelize some bash script today, quite unexpectedly, I had more fun1 than I was hoping for. It was a bumpy ride, so please follow the story from the safety of a comfy chair, basking in the glow of your favorite terminal emulator.

The task

So I was sitting there, facing off a small snippet of bash code that screamed to be optimized. Truthfully, what was actually screaming at me was the fact that the last stage in a Go pipeline was taking some 27 minutes.

Here’s the infamous snippet:

original snippet
1
2
3
4
5
for host in ${hosts}; do
    echo "Refreshing: ${host}"
    r_refresh_host
    echo
done

Actually, not that bad. Ok, the $host value could be passed as an argument to the function r_refresh_host and not being used implicitly, but all in all not too shabby.

So what does this snippet actually do? It connects to 13 nodes and updates the configuration using Puppet. If that floats your boat, imagine using Chef instead as it is not important for the story.

Anyway, this can clearly be optimized by connecting to the nodes in parallel. So let’s do just that:

snippet with subshells
1
2
3
4
5
6
7
for host in ${hosts}; do
    logfile=refresh_${host}.log
    echo "Refreshing: ${host}, log: ${logfile}"
    ( r_refresh_host > ${logfile} ) &
done

wait

Trivial change and voilĂ ! The deed is done. Well not really. The snippet is embedded in the script that declares set -e. I would not think anything of it had not a colleague mentioned a few days prior that the failure of a subshell does not kill the main script. That’s not the behaviour I desired as I’d like for all the nodes to be properly updated.

The same colleague told me that he uses something called parallel to solve similar issues.

Enter GNU Parallel

GNU Parallel is “a shell tool for executing jobs in parallel”. It even has quite nice logo:

Gnu Parallel logo

Thus began my suffering.

I’ve played with parallel on my mac after installing it via homebrew. It worked nicely. The only thing that was annoying me a bit was the fact that I needed to use --no-notice flag to silence some undesired output. The snippet:

snippet using parallel on the mac
1
echo "${hosts}" | parallel --no-notice -P 25 '. ./scripts/utils.sh; host={}; logfile=refresh_{}.log; r_refresh_host > ${logfile}'

To be truthful, I fumbled a bit to get there as I’m not a bash expert™. But that’s less important for the story.

After running the Go pipeline, in a test stage that rolls out the change to only one node, I found out that parallel version I just installed on that ubuntu machine does not support --no-notice flag. Oh, well… The flag went away.

Retriggering the pipeline, the final stage that is rolling out the configuration to all nodes failed. This was confusing as the previous stage that tests rollout to one node succeeded. What is going on?

The twist

I dug deeper and it turns out not all the nodes had the same version of GNU parallel. Some of them (version 20120422) worked just fine, but others (version 20121122) failed miserably. More concretely, event the simplest command:

1
echo 42 | parallel echo

produced output 42 or \n depending on the version in use.

While checking the version, there was a somewhat random message:

1
WARNING: YOU ARE USING --tollef. USE --gnu FOR GNU PARALLEL

Not knowing the Tollef fellow, I added the --gnu flag.

The (happy?) end

And it worked.

Here’s the final one-liner:

final snippet using parallel
1
echo "${hosts}" | parallel --gnu -P 25 '. ./scripts/utils.sh; host={}; logfile=refresh_{}.log; r_refresh_host > ${logfile}'

Somehow you’d think that it’s not necessary to use gnu flag in something called GNU parallel. But life is full of surprises.

To be fair, in the end parallel does the job. And it is better than solving the same problem with additional bash logic. I definitely plan on using it again. Check it out, maybe you should too. Just don’t forget to use a flag or two.

As a side note, this particular Go stage went from 27 min to 4.5 min. All is well that ends well.

EDIT:

One reader was kind enough to explain why the issue happens.

Maybe something else to mention: I’m a big fan of GNU and respect everything it stands for. So take this post as intended and laugh a bit. :)


  1. As in Dwarf Fortress fun