Unix Process Control

Introduction

Administrating a Unix system requires practical knowledge of how to start and stop programs. A process is the activity on a system associated with a running program. Some of these you know about - they have GUI windows with conventional ways of starting and stopping them. Alternatively you might be aware that a command you have run e.g.

./cautious
is a program with a file containing the program instructions, and when you run this certain input and output and behaviour is observed, and there is also a means to start this program ( a shell command or pipeline ) and the program stops when it finishes running. If you want to automate process control, it helps to know the shell commands used for this purpose.

Some other programs run in the background and have no obvious behaviour, except that some service is performed. For example, if you are running a webserver on a system, you might be able to put it's local contact URL into a web browser: http://localhost/ and you will see the web page at the root directory as used by the webserver for serving pages, e.g. /var/www/index.html . Clearly, for a webpage to be displayed, something must be serving this - even though the user interface is on the web browser or client.

Another situation where you will need to control programs is when these are in a stuck or supended state, no longer reponding to normal input and taking up resources or blocking other activity. Rebooting the system if you are providing high availability services or just don't want to waste time is best avoided unless neccessary.

Controlling programs from a shell

If you run a program normally from the shell, the shell prompt doesn't come back until the program has finished. You can change this behaviour if you want, by putting an ampersand: & after the command. E.G. try this with the sleep command. The

sleep 5
command on its own will take about 5 seconds to complete. Here is what happened when I put an ampersand after it, then pressed enter every 2 seconds or so:

rich@saturn:~$ sleep 5 &
[1] 7284
rich@saturn:~$
rich@saturn:~$
[1]+  Done                    sleep 5

The shell prompt came back immediately, but with the line:

 [1] 7284 
Giving me information about the job number 1, with a process ID (PID) of 7284.

We can find out about background jobs of the current shell using the jobs command:

rich@saturn:~$ sleep 60 &
[1] 7290
rich@saturn:~$ sleep 65 &
[2] 7291
rich@saturn:~$ jobs
[1]-  Running                 sleep 60 &
[2]+  Running                 sleep 65 &
rich@saturn:~$

We can also put a background job into the foreground, using the fg (foreground) command and its job number:

rich@saturn:~$ fg 2
sleep 65

We can stop a foreground job by pressing <ctrl> and <z> keys together:

[2]+  Stopped                 sleep 65
rich@saturn:~$ jobs
[1]-  Running                 sleep 65 &
[2]+  Stopped                 sleep 65

This doesn't end the program - it turns it into a temporarily stopped (zombie) process. And we can put a stopped foreground program into background using the bg command:

rich@saturn:~$ bg 2
[2]+ sleep 65 &

The jobs, bg and fg commands are shell builtins, i.e. these commands are part of the bash shell program, they are not external programs run from the bash shell. Documentation on these and other builtin commands is in bash-builtins(7), or at the end of (the very long) bash(1) manpage.

Use of the ps (process status) command

The ps command lists processes on the system. On its own ps will only give a short listing of processes associated with the current terminal. You are likely to want more information than this. Different outputs might be seen if you put the '-' hyphen in front of the flags, compared to if you don't. The following table is what happens if you don't.

flag description
x display all processes
f display processes in tree format
l more information about each process, e.g. including parent process ID and owner UID.
u display username owning each process

ps output example

$ ps fx
  PID TTY      STAT   TIME COMMAND
 7558 ?        Ss     0:00 /bin/sh /usr/bin/startkde
 7602 ?        Ss     0:00  \_ /usr/bin/ssh-agent /usr/bin/dbus-launch --exit-wi
 7671 ?        S      0:00  \_ kwrapper ksmserver
 7722 ?        Sl     0:00 /usr/lib/evolution/evolution-data-server-1.6 --oaf-ac
 7719 ?        Sl     0:00 /usr/lib/evolution/2.6/evolution-exchange-storage --o
 7717 ?        Ss     0:00 /usr/lib/bonobo-activation/bonobo-activation-server -
 7711 ?        S      0:00 knotify [kdeinit]
 7706 ?        S      0:00 /usr/lib/libgconf2-4/gconfd-2 12
 7703 ?        S      0:00 kscd -session 10e5cf7475000115209283200000187910013_1
 7687 ?        S      0:01 adept_notifier
 7685 ?        S      0:00 klipper [kdeinit]
 7683 ?        S      0:00 kbluetoothd --dontforceshow
 7682 ?        S      0:00 kio_uiserver [kdeinit]
 7679 ?        S      0:11 kicker [kdeinit]
 7677 ?        S      0:01 kdesktop [kdeinit]
 7673 ?        S      0:00 ksmserver [kdeinit]
 7656 ?        S      0:00 kaccess [kdeinit]
 7643 ?        R      0:00 /usr/lib/gamin/gam_server
 7641 ?        S      0:01 kded [kdeinit]
 7637 ?        S      0:00 dcopserver [kdeinit] --nosid
 7634 ?        Ss     0:00 kdeinit Running...
 7639 ?        S      0:00  \_ klauncher [kdeinit]
 7650 ?        S      0:01  \_ /usr/bin/artsd -F 10 -S 4096 -d -s 15 -m artsmess
 7674 ?        S      0:03  \_ kwin [kdeinit] -session 10e5cf7475000114969617900
 7699 ?        S      0:00  \_ katapult -session 10d9d39668000112680660600000081
 7700 ?        Sl     0:00  \_ /usr/lib/evolution/2.6/evolution-alarm-notify --s
 7701 ?        S      0:01  \_ gaim --session 10e5cf7475000115227963600000051270
 7781 ?        S      0:01  \_ konqueror [kdeinit] --silent
 7783 ?        S      0:00  \_ kio_file [kdeinit] file /tmp/ksocket-rich/klaunch
 8222 ?        R      0:13  \_ quanta
 8257 ?        R      0:00  \_ konsole [kdeinit]
 8258 pts/1    Ss     0:00      \_ /bin/bash
 8283 pts/1    R+     0:00          \_ ps fx
 7606 ?        S      0:00 /usr/bin/dbus-launch --exit-with-session /usr/bin/sta
 7605 ?        Ss     0:00 dbus-daemon --fork --print-pid 8 --print-address 6 --

For more information see ps(1) .

The kill command

The kill command can be used to send a signal to a process. Programs with user interfaces which have not hung are obviously better controlled by normal means. On windows you can perform similar work using the task manager after pressing ctrl, alt and del keys together. As with all shell commands, use of kill can of course be automated. You might want to program this e.g. if a buggy program won't release a lockfile when you need to stop this when you (automatically of course) carry out your nightly backup, or if a hung instance of the same batch program is holding other resources needed to reshedule it.

If the program needing control runs in background sending it a signal is likely to be the best means of controlling it. E.G. you have updated the configuration of an email server, which normally runs in background 24 x 7 x 365. This program will be designed to catch certain signals and respond in various ways to these. E.G. Sendmail will re-read its new configuration if it catches a SIGHUP signal. The signal names and numbers (1-15 are standardised) are documented in signal(7). See also kill(1).

Signal 9, otherwise known as SIGKILL is popular for interactive use, as this is designed to kill a program and be uncatchable. Generally this approach involves finding the process ID (PID) of a process you want to kill using ps and then using kill.

kill example

rich@saturn:~$ sleep 60 &
[1] 19721
rich@saturn:~$ ps ax | grep sleep
19721 pts/2    S      0:00 sleep 60
19723 pts/2    S+     0:00 grep sleep
rich@saturn:~$ kill -9 19721
rich@saturn:~$ jobs
[1]+  Killed                  sleep 60
rich@saturn:~$  

Unix Resource Monitoring

It is useful to be able to monitor Unix resources from the shell in the situation where an automated action might fail if the resources needed are unavailable. An example might be a background program that needs a certain amount of free memory or disk space.

The df command

This stands for "disk full", or tell me how much I can store on a particular disk. df has various options, the most useful being the -h (human readable) option. Read df(1) for others.

df example

rich@saturn:~$ df -h
Filesystem            Size Used Avail Use% Mounted on
/dev/hda8              43G  5.5G   35G  14% /
varrun                490M  100K  490M   1% /var/run
varlock               490M  4.0K  490M   1% /var/lock
procbususb             10M  116K  9.9M   2% /proc/bus/usb
udev                   10M  116K  9.9M   2% /dev
devshm                490M     0  490M   0% /dev/shm
lrm                   490M   18M  473M   4% /lib/modules/2.6.17-10-386/volatile
/dev/hda5              48G  9.3G   36G  21% /media/hda5
/dev/hda7              48G  2.0G   44G   5% /media/hda7
/dev/hda6              48G   24G   24G  51% /home

The free and vmstat commands

These can be used to check memory, buffer and swap space usage and disk I/O activity. If a system is performing slowly these can help investigate potential resource starvation.

rich@saturn:~$ free
             total       used       free     shared    buffers     cached
Mem:       1003388     991000      12388          0      10240     750660
-/+ buffers/cache:     230100     773288
Swap:      2755108      18228    2736880

The apparant lack of free memory on this system is due to the kernel making use of nearly all memory not used by active processes for disk buffers. In practice the amount of swap used and the amount of swapping. vmstat(8) gives more detailed information. In this example, 2 samples were taken at 10 second intervals.

rich@saturn:~$ vmstat -a -n 10 2
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa
 0  0  18224  69024 321524 566204    0    0    74   104    4  228 10  0 89  1
 0  0  18224  69056 321556 566212    0    0     0    28  312  222  0  0 99  0