Linux CPU Performance Monitoring Tutorial

Sarath Pillai's picture
Linux performance monitoring

One of the main tasks of a system administrator during crisis is to find performance bottlenecks of a running system. Although there are so many ready made tools available in Linux to fetch details about the current status of the system, it is not an easy task to come to a conclusion regarding performance.

 

The main reason is that there are so many concepts and factors that needs to be understood to interpret the output of the tools out there. You need to know what is exactly going on the system currently and also should be able to fetch details and draw conclusions. Once you have the required details at hand, you can then try to fine tune stuffs in the system in order to get optimum performance.

In this article I will try to explain and elaborate certain things that can help you come to a conclusion. Things that we will be discussing are mentioned below.

 

  • Monitor System Performance

  • Find the root cause of the problem

  • Applying possible fixes to solve the problem.

I am quite sure that most of the experienced system administrators as well as newbies are aware of the command called as “top”. Top command can be used to fetch the current status of a running system.

Top command shows how busy the server is. Its always better to note the normal statistics of a system so that you can compare it with the results measured during bottlenecks.

 

top - 00:56:55 up 1 min,  2 users,  load average: 0.15, 0.11, 0.05
Tasks:  74 total,   1 running,  73 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    499148k total,   229116k used,   270032k free,    27520k buffers
Swap:   522236k total,        0k used,   522236k free,    67244k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  996 ubuntu    20   0 73440 1676  888 S  0.3  0.3   0:00.06 sshd
 1162 root      20   0 17336 1248  944 R  0.3  0.3   0:00.03 top
    1 root      20   0 24200 2204 1360 S  0.0  0.4   0:02.06 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0     0    0    0 S  0.0  0.0   0:00.34 ksoftirqd/0
    4 root      20   0     0    0    0 S  0.0  0.0   0:00.06 kworker/0:0
    5 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 kworker/0:0H
    6 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kworker/u:0
    7 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 kworker/u:0H
    8 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
    9 root      20   0     0    0    0 S  0.0  0.0   0:00.00 rcu_bh
   10 root      20   0     0    0    0 S  0.0  0.0   0:00.18 rcu_sched
   11 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 watchdog/0
   12 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 cpuset

 

Above shown is the top command output of a normal system. The first line of the output gives you the last fifteen minutes statistics of your CPU. It tells you 3 different CPU load values. They are last minute, Last 5 Minute, Last 15 minute(These are load average values.). These days there are multi-core processors which are normally used for better processing power. A single core processor is rarely used these days.

 

But an important point to understand from the top output is that, it only cares about the system and not about the cores..

 

If you have a dual quad core processor, then you need to divide each of these three average values with 8. So if the load average for last minute is 8 on a dual quad core processor, then the correct value will be 8/8 = 1.

 

Now please note the fact that the load average displayed is for the system, and not for the cpu. It's always possible to have a load of 1.0 even when the cpu is doing nothing. This condition is a normal one, when the cpu is waiting for I/O.

 

Related: Linux System IO Monitoring

 

The thing to understand while monitoring load is that its not always bad when you have a higher load average value than normal values. This depends on the kind of process running on the system.

 

If the process is too much CPU intensive, then other processes have to wait...But if the process is I/O intensive, then during the I/O wait time (the time when cpu waits for input/output operations to be completed), CPU can process another process.

 

Thus the kind of process running on your system plays a major role in drawing conclusions about the system load.

 

root@ubuntu2:~# nproc
1

 

 

The above shown nproc command will give the number of cpu cores on your linux system. As top command only gives you the summary of all cpu cores on the system, its better to divide the average values of top with the nproc output, to get the accurate value of your system load.

 

You can also fetch details of each cpu cores using top command console. Press 1 to see each per core statistics. A sample output is shown below.

 

top - 02:58:48 up 94 days, 12:15,  1 user,  load average: 0.31, 0.29, 0.24
Tasks: 173 total,   1 running, 172 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1011048k total,   949720k used,    61328k free,     9176k buffers
Swap:   524284k total,    98804k used,   425480k free,   103884k cached

 

There are different values shown in the above output. Let's see what each of them means.

 

  • us: This shows the cpu workload caused by processes run by normal users. In Other words these are load caused by applications.
  • sy: This indicates load created by the system. Means these are things most of the times executed by the Linux kernel. This normally stays low, but becomes a little high for certain tasks where kernel is involved.
  • ni: This refers to the number of processes that are running with a modified nice value. If you are new to nice values then refer the below article. \

 

Read:  Process Priority and Nice Value in Linux

  • id: This indicates the amount of time spent by CPU doing nothing. A higher id value means CPU is idle most of the times
  • wa: The amount of time, your cpu is waiting for I/O operations to complete.
  • ha: This refers to hardware interrupts. This becomes high when you have a higher disk usage, or higher network usage etc.
  • si: These are interrupts created by software. These values stay very low normally.
  • st: This relates to virtual machines running on your system. When virtual machines need CPU, then normally take it from the host machine. If you have too many virtual machines running on your server and you have this st value higher, its good to tear down a few virtual hosts.

 

Most of the processors used by servers these days has to processes multiple processes at one time. As a processor(well a processor core) can only do one thing at a time, the processor has to queue up those processes which needs to run.

Below shown is the first few lines of the top command.

top - 15:08:41 up 95 days, 25 min,  1 user,  load average: 7.12, 6.50, 5.40
Tasks: 172 total,   2 running, 170 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.9%us,  0.2%sy,  0.0%ni, 95.8%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1011048k total,   980280k used,    30768k free,    10016k buffers
Swap:   524284k total,   103676k used,   420608k free,   130100k cached

 

 

We did discuss a little bit about the load average stuff. Well although it appears to be simple at first to understand that its the load average of the server for last one minute, five minute and fifteen minutes, its really helpful to understand several concepts that were involved in getting that load average value. The main reason is because its really difficult to judge what value is an alarming one and what is acceptable, by just looking at the numbers.

 

As i told before, a linux system is capable of running multiple processes at the same time. Although we say at the same time, its only one process at one time. Other eligible processes have to wait till the resources becomes free to process. This happens too fast that we feel the processor is doing multiple things at one time. But yeah a multicore processor, for example an 8 core processor can do 8 different tasks at one time (as each of them is processed by different cores).

 

The load average value shown indicates the amount of queued processes waiting to be run (this is called as run queue). In the previously shown output, you might have seen that we have a load average of 7.0, 6.50, 5.40.

 

This means that during the last minute we had an average value of 7.0 processes in the run queue, in the last 5 minute 6.50 processes in the run queue, and in the last 15 minute there were 5.40 processes on an average in the run queue.

 

Now the question is how to check the current run queue in a Linux system. We will see that in some time. The important thing to understand here is that every process enters the run queue before its processed by the processor. As we discussed earlier each core has its own run queue(this is the reason why we divide the load average output with the total number of cpu core output from nproc command). Once the process is inside the run queue, it can have two possible status. It can be either runnable or blocked. 

 

The kernel's job is to select the next process to run based on its priority.

 

Please note the fact that the load average values shown are the processes in the run queue(no matter whether its in runnable or blocked state.)

 

VMSTAT tool can be used to check the total number of processes in the run queue. Let's see a sample output of vmstat tool.

 

root@ubuntu1:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0      0 315616  25852  56040    0    0     0     0   15   17  0  0 100  0
 0  0      0 315616  25852  56040    0    0     0     0   19   21  0  0 100  0
 0  0      0 315616  25852  56040    0    0     0     0   15   17  0  0 100  0
 0  0      0 315616  25852  56040    0    0     0     0   18   21  0  0 100  0
 0  0      0 315616  25852  56040    0    0     0     0   13   13  0  0 100  0

 

The argument 1 i gave in the above vmstat command will ask it to refresh every 1 second. The output shows multiple columns. The first two columns with "r" and "b" refers to the number of processes in the run queue with runnable or blocked state.

 

As we discussed earlier, linux is a multitasking operating system. Which means the kernel has to switch between processes many times. Although it looks simple, the processor has to do multiple things while doing multitasking. For running multiple processes at the same time(which is very normal) the processor has to do the following things.

  • Processor needs to save all context information of the current running process, before switching to another process execution. This is very necessary as the processor needs to again switch back to this process later.
  • The processor has to fetch context information of the new process to process.

 

Related: Linux Process Administration and Monitoring Tutorial

 

Although context switching is a nice feature and is necessary for multitasking, higher number of context switches can have severe performance impacts. Because each time the processor switches between processes, it has to store the current state to memory and then retrieve the state of the new process from the memory(which requires time and processing power).

 

Context switching can happen for two reasons. First one is the genuine processor switches, guided by the kernel scheduler. Second one is due to interrupts caused by hard wares and other software applications. We will get back to interrupts in some time.

 

As we discussed earlier, most of the processors these days are multi core. Which means multiple processors inside(well to the Linux kernel, cores are individual processors). Normally a process moves between different processors over time depending upon how you have set affinity. By default a process uses all the individual processors(cores). Although multiple processors improve the overall system performance, it sometimes do impact the performance of several processes. This is because when a process is switched from one core to another the new processor has to flush its cache. Now imagine if your process is running on a 8 core system, it will have many cache flushes during its lifetime. Which means a performance impact.

 

To prevent from too many cache flushes, you can set a process to run dedicatedly on one core. The cores are counted from 0-7, if you have an 8 core processor. Let's see the default affinity of a couple of process on my server.

 

Checking the process affinity can be done with a command called as taskset.

[root@www ~]# taskset -c -p 6389
pid 6389's current affinity list: 0-7
[root@www ~]# taskset -c -p 6580
pid 6580's current affinity list: 0-7

 

Well the above shown PID's are mysql and nginx process pid's(-p 6389 & -p 6580). As i have not set affinity for any of these processes, its having a default affinity of 0-7. Which means it will run on all the 8 cores in its lifetime. You can set affinity by using the same command as shown below.

 

[root@www ~]# taskset -c 0,1 -p 6389

 

 

The above command will bind the process id 6389 to cores 0 and 1 (well in an 8 core processor the counting starts from 0 to 7).

 

As we discussed, context switches is useful because it allows the linux kernel to allocate fare share of time to each process. Once a process has finished executing for the duration it was allowed, the processor will take another process from the run queue to process. This is managed by a periodic timer interrupt. The number of timer interrupts might vary depending upon your kernel version and architecture. /proc is the place to look for your current number of interrupts.

 

Related: /proc file system in linux Tutorial

 

As the value you find in /proc are dynamically populated by the kernel, you might need to watch this for a couple of times (well per second). To know the total number of timer interrupts, you need to look inside the below file and grep for "timer"

 

root@ubuntu1:~# cat /proc/interrupts | grep timer
  0:         49   IO-APIC-edge      timer
LOC:     139261   Local timer interrupts

 

Those are the current genuine timer interrupts on your system. On a multicore system, you will be able to see timer interrupts for each processor cores in that file. Make a small script to monitor the timer interrupts in that file each second so that you can get the timer interrupts per second value. Now here we need to understand something which is very important. The value you get from that file (/proc/interrupts) are the genuine timer interrupts fired by kernel to switch processes...

 

If your system has a very higher number of timer interrupts happening, which is greater than the no of timer interrupts per second, then you have some of your application doing a lot of read/write or say I/O requests.

 

Now how to check the total number of context switches on your system. This can be done by running vmstat command with 1 second interval as shown below.

 

root@ubuntu1:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0      0 235060  44896 104604    0    0     4     0   20   25  0  0 100  0
 0  0      0 235052  44896 104604    0    0     0     0   18   41  0  0 100  0
 0  0      0 235052  44896 104604    0    0     0     0   15   20  0  0 100  0
 0  0      0 235052  44896 104604    0    0     0     0   32   24  0  1 99  0

 

The cs column shown in the vmstat output shows you the total number of context switches per second. Also the in column shows you the the interrupts per second(well per second i say because we fired vmstat to refresh every second with 1 argument).

We did saw earlier that context switches in a multicore processor is a little expensive because a process will have to switch through multiple cores. We also saw how to set affinity on a pid of your interest, so that it does not move around between cores in order to save cpu cache flushes. But how will we check which core a process is currently running?. Or how will you verify and cross check that a process for which you have changed affinity to run on a single core is working as expected?

 

This can be done with top command. The default top command output does not show these details. To view this detail you will have to press f key while on top command interface and then press j(press enter key after you pressed j). Now the output will show you details regarding a process and which processor its running. A sample output is shown below.

 

top - 04:24:03 up 96 days, 13:41,  1 user,  load average: 0.11, 0.14, 0.15
Tasks: 173 total,   1 running, 172 sleeping,   0 stopped,   0 zombie
Cpu(s):  7.1%us,  0.2%sy,  0.0%ni, 88.4%id,  0.1%wa,  0.0%hi,  0.0%si,  4.2%st
Mem:   1011048k total,   950984k used,    60064k free,     9320k buffers
Swap:   524284k total,   113160k used,   411124k free,    96420k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
12426 nginx     20   0  345m  47m  29m S 77.6  4.8  40:24.92 7 php-fpm
 6685 mysql     20   0 3633m  34m 2932 S  4.3  3.5  63:12.91 4 mysqld
19014 root      20   0 15084 1188  856 R  1.3  0.1   0:01.20 4 top
    9 root      20   0     0    0    0 S  1.0  0.0 129:42.53 1 rcu_sched
 6349 memcache  20   0  355m  12m  224 S  0.3  1.2   9:34.82 6 memcached
    1 root      20   0 19404  212   36 S  0.0  0.0   0:20.64 3 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:30.02 4 kthreadd
    3 root      20   0     0    0    0 S  0.0  0.0   0:12.45 0 ksoftirqd/0

 

The p column in the output shows the processor core number where the process is currently being executed. Monitoring this for a few minutes will make you understand that a pid is switching processor cores in between. You can also verify whether your pid for which you have set affinity is running on that particular core only(please remember that core counting starts from 0)

 

Now although we discussed about context switches, we didn't discuss how to check which process is doing a lot context switches. Identifying this can be very helpful in finding the process that is causing the performance issue on the system. There is a very nice tool to do this. Its called as pidstat. The tool is available as part of sysstat package. Ubuntu users might have to install sysstat for getting this command. A sample output is shown below.

 

root@ubuntu1:~# pidstat -w
Linux 3.8.0-29-generic (ubuntu1)        03/11/2014      _x86_64_        (1 CPU)

09:32:48 AM       PID   cswch/s nvcswch/s  Command
09:32:48 AM         1      0.03      0.05  init
09:32:48 AM         2      0.01      0.00  kthreadd
09:32:48 AM         3      2.20      0.00  ksoftirqd/0
09:32:48 AM         5      0.00      0.00  kworker/0:0H
09:32:48 AM         6      0.00      0.00  kworker/u:0
09:32:48 AM         7      0.00      0.00  kworker/u:0H
09:32:48 AM         8      0.00      0.00  migration/0
09:32:48 AM         9      0.00      0.00  rcu_bh
09:32:48 AM        10      2.99      0.00  rcu_sched

 

-w Option i have used with pidstat will show you context switch details on all processes on the system. pidstat man page explains the fields very nicely. An explanation from pidstat man page regarding the output is shown below. 

 

cswch/s: Total number of voluntary context switches the task made per second.  A voluntary context switch occurs when a task blocks because it requires a resource that is unavailable.

nvcswch/s: Total  number  of  non voluntary context switches the task made per second.  A involuntary context switch takes place when a task executes for the duration of its time slice and then is forced to relinquish the processor.

 

CPU performance monitoring covers all the concepts that we discussed till now. Some of them are outlined for simplicity.

  • Run Queue
  • Context Switching Rate
  • CPU utilization by system and user specific
  • Idle time spent by the cpu

 

A system with 2 to 3 process per processor is fine (means 2 to 3 process in run queue per processor core.). Which means for an 8 core processor 24 to 25 load average is acceptable if the cpu utilization shown by vmstat command stays something in the range of the below shown percentage values.

  • 60 to 70 percent of user time
  • 30 to 40 percent of system time
  • 5 to 10 percent of idle time.

 

The above details can be fetched from vmstat output. Let me run vmstat with 1 second interval.

root@ubuntu1:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 166600  45256 169068    0    0     5     2   20   26  0  0 100  0
 0  0      0 166592  45256 169068    0    0     0     0   23   47  0  0 100  0
 0  0      0 166592  45256 169068    0    0     0     0   14   18  0  0 100  0
 0  0      0 166592  45256 169068    0    0     0     0   38   30  1  0 99  0
 0  0      0 166592  45256 169068    0    0     0     0   27   18  0  0 100  0
 0  0      0 166592  45256 169068    0    0     0     0   23   40  0  0 100  0

 

 

As my system is doing nothing (no process in the run queue), you can see that the processor is spending 0 percent of time in user section (shown in us column), 0 percent time in system section (shown in sy column). A higher amount of context switches can be ignored if the cpu utilization in us, sy, and id sections are balanced as per the previously mentioned percentage levels.

 

vmstat is the best tool out there in linux to draw conclusions on what is happening on the system. Another major plus point about vmstat is that it requires very less processing power to run, due to which it can be run to fetch details even when the system is heavily loaded.

 

Another similar tool called as mpstat can be used to fetch details about all the different cores and their respective statistics. A sample output of that command is shown below.

 

[root@www ~]# mpstat -P ALL
Linux 3.11.6-x86_64         03/12/2014      _x86_64_        (8 CPU)
05:31:41 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:31:41 AM  all    3.26    0.00    0.11    0.74    0.00    0.00    0.54    0.00   95.34
05:31:41 AM    0    1.12    0.00    0.16    1.41    0.00    0.00    0.39    0.00   96.93
05:31:41 AM    1   10.43    0.00    0.17    0.99    0.00    0.01    1.22    0.00   87.17
05:31:41 AM    2    7.47    0.00    0.19    0.76    0.00    0.01    0.89    0.00   90.69
05:31:41 AM    3    2.99    0.00    0.11    0.65    0.00    0.00    0.58    0.00   95.67
05:31:41 AM    4    1.34    0.00    0.07    0.58    0.00    0.00    0.40    0.00   97.61
05:31:41 AM    5    0.88    0.00    0.06    0.54    0.00    0.00    0.31    0.00   98.20
05:31:41 AM    6    0.91    0.00    0.05    0.52    0.00    0.00    0.28    0.00   98.23
05:31:41 AM    7    0.92    0.00    0.05    0.49    0.00    0.00    0.27    0.00   98.27

 

 

Several scenarios are mentioned below, that can be helpful in getting to know what is happening on the system.

 

  • If the us column in vmstat goes too high in the output, without a similar level of context switches, then it is possible that a single process is using the processor for a substantially higher amount of time
  • If the  wa column also experiences a too high value at the same time (when us is higher and cs is substantially low ), that means the process that caused the  us value to stay high is doing some heavy i/o operation.
  • A case where you have higher number of interrupts (shown by in column in vmstat) and substantially lower number of context switches (shown by cs) can indicate that a particular application (which might be single threaded) is sending too many hardware requests.

 

Please note the fact that the above mentioned points are only for giving an idea of what to look under crisis. Those points are not to be considered as hard core reference points.

I request my readers to point out any mistakes or suggestions, so that we both are benefited from this, and also can keep this article updated. Please let me know through the comments regarding any interesting analysis that you have gone through or any good methods. Hope this article was helpful in understanding several cpu monitoring concepts.

Rate this article: 
Average: 3.1 (702 votes)

Comments

How about using something called "dstat"?? How about using something called "pmap"? to name a few...nice writeup anyway...keep up the good job fellas.

Good article!
For nginx servers,'2 to 3 process in run queue per processor core' sometimes may not be acceptable.
It will impact the response time badly, saying that you want the response time less than 1 second.
My question is whats the cause that process like migration/0 consume much cpu time?

I find it interesting and very helpful.

Really superb article!!!

Awesome!

It gives me more than enough information. Thanks a lot.

Thanks...learned a lot and will now be my go-to load reference

I had real hard time understanding the use of these tools and how to monitor load on a system.
You guys have explained it for a newbie's point of view. Thanks a lot. I had been struggling to understand how should I conclude that a particular area is a problem.
Never had an idea to interpret what should I do with the load averages provided by top. The other article I read about I/O monitoring is awesome too.
Thanks guys.

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.