How To Run Multiple Commands In Parallel on Linux

Sarath Pillai's picture
Parallal Execution of Commands in Linux

Traditionally computers can only do one single thing at a time. It generally does not do multitasking. Well, in most cases (with single core computers - meaning computers with one single CPU), the computer gives you an illusion that multiple things are happening simultaneously.  


You might be running multiple things at the same time on a computer, but the computer will always execute one single task from your list of tasks at a time. But it will quickly switch to the next task, and then the next task and then the next (so basically in a way you can say that multiple tasks are progressing one by one).


This switching between tasks happens so fast that it is very difficult for us to notice. A typical computer will do 100s of switching between tasks in a single second. You can now imagine why we are under the illusion that multiple tasks are being executed at the same time.


This switching is generally called as context switching. When you have too many number of tasks waiting in line for CPU, then we say that the “machine is under load”. In this article, we are going to discuss the methods available to execute multiple processes in parallel on a Linux system.


The best example is to execute a command across 10s of servers from a machine. If you go one by one, it will consume a lot of time. But if we have a method to run the command simultaneously across all the servers in parallel, then that saves a lot of time. Again, from a computer/CPU standpoint it mainly deals with one task at a time, but keeps on switching between tasks, which happens too fast, and we are perfectly fine as far as multiple tasks are progressing simultaneously.


Let us dig through the different methods that are available in Linux, to execute commands simultaneously(in Parallel). The very first one is using the bash shell job control mechanism. We simply execute the command and then ask the shell to place that in background and proceed with the next command(while the first command is already being executed in the background), and then the next and so on. 


To demonstrate this, I have few junk files sitting in an object storage. Downloading all these files to a Linux machine can be done simultaneously to test and see how this parallel thing works.


Running Commands in Parallel using Bash Shell


I basically need to execute 3 wget commands and make them run in parallel. The best method is to put all the wget commands in one script, and execute the script. The only thing to note here is to put all these wget commands in background (shell background). See our simple script file below.


root@instance-10:/tmp# cat
wget &
wget &
wget &



Notice the & towards the end of each command. This will put the command in background, and execute the next (put this in background) and proceed to the next and so on.


You can confirm all these commands are being executed simultaneously using another shell, and see the process list (in our case it should show 3 wget commands, with three different processes).


root@instance-10:~# ps aux|grep wget
root      4985 16.2  0.1  27732  5080 pts/0    D+   07:05   0:00 wget
root      4986 15.8  0.1  27732  5164 pts/0    D+   07:05   0:00 wget
root      4987 16.6  0.1  27732  5228 pts/0    D+   07:05   0:00 wget
root      4994  0.0  0.0  10480  2140 pts/1    S+   07:05   0:00 grep --color=auto wget



We can clearly see from the above output that our three wget commands are running in parallel.


In case you need to execute several processes in batches, or in chunks, you can use the shell builtin command called "wait". See below.


wget &
wget &
wget &
wget &
wget &
wget &
wget &
wget &
wget &


The first three commands wget commands will be executed in parallel. "wait" will make the script wait till those 3 gets finished. Once it is finished, the script will simultaneously run the next 6 commands, and wait till it completes and so on.


We can modify our script and make it a bit more generic as shown below.


root@instance-10:/tmp# cat
for task in "$@"; do {
  $task &
} done


Now you can run as many commands as you like by using the script as shown below.


./ "cmd1 arg1 arg2 arg3" "cmd2 arg1" "cmd3 arg1 arg2"



How to Run Multiple Processes Simultaneously using Xargs?


The next method that we can use to run processes in parallel is our regular xargs command. Xargs supports an option to specify the number of processes that you want to run simultaneously. See below.


seq 1 3 | xargs -I{} -n 1 -P 3 wget{}


seq command will simply give 1, 2, and 3 as output in three lines. Which is then passed to xargs as input using the standard Linux pipe. We have used  the option -I in xargs to remove the space character that generally gets added towards the end of our command.

Without -I{} and the last junkfile{}, xargs would construct our command with the file name of "junkfile 1, junkfile 2 etc" (we needed to remove the space), rather than junkfile1, junkfile2 and so on.

You can quickly confirm that 3 processes are running in parallel (as we passed -P 3), using another terminal and counting the number of wget processes as we did earlier. See below.


root@instance-10:/tmp# ps aux|grep wget
root      2197  0.0  0.0   6376   772 pts/0    S+   04:42   0:00 xargs -I{} -n 1 -P 3 wget{}
root      2198 11.0  0.1  27732  5160 pts/0    D+   04:42   0:01 wget
root      2199  7.6  0.1  27732  5232 pts/0    D+   04:42   0:01 wget
root      2200  6.4  0.1  27732  5040 pts/0    D+   04:42   0:00 wget
root      2209  0.0  0.0  10480  2180 pts/1    S+   04:42   0:00 grep --color=auto wget


wget is just an example that we are using here for our tutorial. One reason for using wget with those three junkfile is to keep the process alive for few minutes, so that we can confirm the parallel execution (as the files are quite big, these processes will couple of few minutes to finish). You can replace wget with whatever is applicable in your use case.


How to Use GNU Parallel to Run commands simultaneously?


Apart from this, there is a tool from GNU, which is designed to execute jobs in parallel. Its called GNU Parallel. It can be installed by the below commands (depending upon your Linux distribution).

For RedHat Based Systems:

yum install parallel


For Debian Based Systems(Ubuntu):

apt-get install parallel


You can use GNU Parallel for some of the below use cases. These use cases nicely covers regular shell based system admin activities.

  • A list of files that can be passed as input to parallel command, to do some operations in parallel on all of them
  • You can give a list of IP addresses/hostnames, on which you need to fire up a command in parallel
  • List of links/URLs (similar to our wget example we saw with xargs and shell above)


GNU Parallel was designed by keeping xargs in mind, so majority of the command line options and parameters might match with xargs command. Let's first execute the wget example that we saw using GNU parallel.


seq 1 3 | parallel -j 5 -I{} wget{}


If you see the above command, parallel is using -j option (very similar to xargs, to specify the number of processes to run in parallel), and -I option (to remove the default space character). And without a doubt, it will run all our 3 wget commands simultaneously.


Do you want to compress all files in the current directory (in parallel and simultaneously)? We can do that very easily with parallel. Below shown is an example to achieve just that. 


ls | parallel -j 10 gzip

In the above example, a maximum of 10 compression will happen together. Similarly to uncompress/gunzip all the files simultaneously, run the below.

ls | parallel -j 10 gunzip


You can gzip all the files in the current directory using the below method as well (in parallel). In our below example, we have limited the number of jobs to 10, that will run in parallel.


parallel -j 10 gzip ::: *


Parallel has the option to pass a file as an argument, so that it can run command against entries in the file. For example, you can have a file with a list of URLs to download.

root@instance-10:/tmp# cat list-of-urls.txt


parallel -j 10 -a list-of-urls.txt wget

The above should download the URLs listed in the file "list-of-urls.txt" in parallel.


Parallel can also execute a series of commands specified in a test file. See an example below. Lets first create a test file with few sleep commands in there.

cat job.txt
sleep 100; echo "first"
sleep 100; echo "second"
sleep 100; echo "third"


Now let us ask parallel to execute all the commands in that file simultaneously. This can be done as shown below.


parallel -j 10 :::: job.txt


How to Manage Output in GNU Parallel?

A common issue while executing multiple commands in parallel is output. The output of different commands should not get mixed up. If you use the very first method that we saw in this article (ie: using the shell job control mechanism), there is actually no guarantee of the order of the output. For example, let us try ping command towards multiple hosts using the shell method of &.


root@testserver:~# ping -c 3 & ping -c 3 & ping -c 3 &
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.027 ms
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.018 ms
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from icmp_seq=2 ttl=64 time=0.029 ms


You can clearly see the output is completely messed up(outputs of those three pings are mixed up). Now let's try with parallel, and prevent the output from getting messed up. See below.


root@testserver:~# parallel -j 4 ping -c 3 :::
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.026 ms
64 bytes from icmp_seq=2 ttl=64 time=0.041 ms
64 bytes from icmp_seq=3 ttl=64 time=0.025 ms

--- ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.025/0.030/0.041/0.009 ms
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.021 ms
64 bytes from icmp_seq=2 ttl=64 time=0.018 ms
64 bytes from icmp_seq=3 ttl=64 time=0.019 ms

--- ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.018/0.019/0.021/0.003 ms
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.016 ms
64 bytes from icmp_seq=2 ttl=64 time=0.019 ms
64 bytes from icmp_seq=3 ttl=64 time=0.023 ms

--- ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.016/0.019/0.023/0.004 ms


Basically parallel will show the complete output of one process, only when it completes.  Does the order of the output matter to you? If you want the output also to be in the same order as your input, then you can use -k option in parallel as shown below.


parallel -k -j 4 ping -c 3 :::


Please keep the fact in mind that GNU Parallel will find out the number of CPU cores available in the system, and it will run only one job per core. We used -j option in our examples to override this default behavior. If you have 100 commands to execute using GNU parallel, the jobs will be executed in smaller chunks. The chunk size is again determined by -j option.


If you want to quickly terminate GNU parallel, you can run fire up the below command. On receiving this signal, GNU parallel will finish the currently executing chunk and exit.


killall -TERM parallel


Parallel Execution of Commands on a list of Remote Machines

To be honest, I have not found GNU Parallel that user friendly, when it comes to remote command execution on a list of servers simultaneously. For this purpose, there are tools like clustershell and pdsh (agreed that GNU parallel has parameters like sshlogin and sshloginfile), but i did not find that straight forward. In my test cases, some of these commands were not stable enough to recommend to execute against number of servers in parallel.

Am going to start this with clustershell and then pdsh.

Clustershell can be easily installed using the command applicable to your platform.


RedHat Based Systems:

yum install clustershell


Ubuntu Based Systems:

apt-get install clustershell


As the name indicates, it is used for administering a cluster of servers. Basically execute something/fetch information from all machines in a cluster. Similar to other Linux utilities, the configuration file for clustershell is located at /etc/clustershell/clush.conf. I have the below inside that file.


#cat /etc/clustershell/clush.conf
fanout: 64
connect_timeout: 15
command_timeout: 0
color: auto
fd_max: 16384
history_size: 100
node_count: yes
verbosity: 1
ssh_user: ubuntu
ssh_path: /usr/bin/ssh
ssh_options: -oStrictHostKeyChecking=no

You can execute a command of your interest across a coma separated list of servers using clustershell as below.


#clush -l ubuntu -w, uname -r 4.4.0-1041-aws 3.13.0-48-generic


Please note the fact in mind that clush uses the SSH private key inside /home/$user/.ssh/id_rsa file. For example, if am executing this command as "ubuntu" user, then the private key used will be /home/ubuntu/.ssh/id_rsa. The corresponding public key is expected to be present on all the servers where you are executing the command using clustershell.


Related:  SSH Working Explained


You can use shortcuts, and regular expressions, if you have servers in the format of or something like that. See below.


#clush -l ubuntu -w node[1-2] uname -r 4.4.0-1041-aws 3.13.0-48-generic


You can copy files in parallel to multiple servers using clsutershell using the --copy option.


clush -w node[1-2] --copy /home/ubuntu/testfile

The above command will copy the file /home/ubuntu/testfile to the same location on all servers


You can also creates a grouping of servers using the file /etc/clustershell/groups (if the file does not exist, then create it). An example groups file is shown below.

# cat /etc/clustershell/groups
web: node[1,2]
db: node[3-4]

You can now execute commands against these groups by calling the group name (web and db in our case).


#clush -l ubuntu -w @web uname -r 4.4.0-1041-aws 3.13.0-48-generic


# clush -l ubuntu -w @db uname -r 4.4.0-1041-aws 4.4.0-1041-aws


Clustershell supports an interactive mode for executing commands across multiple machines. It is quite interesting. We simply pass the -b option to clush command against one of our group and we can interactively fire commands on these group. An example is below.


# clush -l ubuntu -w @web -b
Enter 'quit' to leave this interactive mode
Working with nodes: node[1-2]
clush> uptime
 00:09:09 up 709 days, 18:19,  1 user,  load average: 0.11, 0.05, 0.05
 00:09:09 up 47 days, 21:18,  0 users,  load average: 0.00, 0.00, 0.00
clush> quit


Similar to clustershell is another utility named pdsh. The idea for pdsh is similar to the utility rsh, which can be used to execute something on one remote host. However, pdsh can be used to execute commands in parallel on multiple hosts. 

Similar to clustershell, the installation is quite straightforward (a single apt-get or yum command depending upon your distribution).

RedHat Based Systems:

yum install pdsh


Debian/Ubuntu Based Systems:

apt-get install pdsh


The very first thing to do is to tell pdsh that we would like to use SSH for remote connections. This can be done using an environment variable.

export PDSH_RCMD_TYPE=ssh


To make it permanent, you can add it inside the user's .bashrc file as well. After executing the below command, you can logout and login to confirm that the environment variable is set and available for the user.


echo "export PDSH_RCMD_TYPE=ssh" >> ~/.bashrc


Let us fire up a command against our node[1-2] using pdsh. See below.


# pdsh -l ubuntu -w node[1-2] uname -r
node2: 4.4.0-1041-aws
node1: 3.13.0-48-generic

Without specifying the RCMD environment variable, you can also run commands like the one shown below.


root@jenkins:~# pdsh -w ssh:ubuntu@node[1-2] uptime
node2:  00:32:34 up 47 days, 21:42,  0 users,  load average: 0.00, 0.00, 0.00
node1:  00:32:34 up 709 days, 18:43,  1 user,  load average: 0.08, 0.07, 0.05


Another interesting environment variable that we can use with pdsh is WCOLL. This will specify a file which contains a list of servers.

cat nodelist


export WCOLL=/root/nodelist


Similar to our previous example, you can add this one to .bashrc to make it permanent.


# pdsh -l ubuntu uptime
node2:  00:36:46 up 47 days, 21:46,  0 users,  load average: 0.00, 0.00, 0.00
node1:  00:36:47 up 709 days, 18:47,  1 user,  load average: 0.09, 0.06, 0.05


Rate this article: 
Average: 3.8 (603 votes)

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.