The kernel maintains a list of all the current processes in a "process table"; you can use the ps command to view the contents of this table.
Each process can also be assigned a priority, or "niceness" level; a value which ranges from -20 to 19. A priority of "-20" means that the process will be given access to the CPU more often, whereas a priority of "19" means that the process will only be given CPU time when the system is idle.
You can use the nice and renice commands to specify and alter these values for specific processes.
From your shell prompt (bash), you will usually instruct the system to run an application for you; for example, "vi". This will then cause your "bash" process to "fork" off a new process. The initial process is referred to as the "parent process", and the process which it forked as the "child process".
The process table contains the parent PID (PPID), and uses this to track which processes spawned which other ones.
As well as the standard user processes that you would expect to find running, such as your shell and perhaps your editor, there are also several system processes that you would expect to find running. Examples of these include the cron daemon (crond), which handles job scheduling, and the system log daemon (syslogd), which handles the logging of system messages.
There are two methods of scheduling jobs on a Unix system. One is called at, which is used for once-off batch jobs. The other is called cron, which is used for regularly run tasks.
The at jobs are serviced by the "at daemon (atd)".
SYNTAX: at [-f script] TIME |
This command is used to schedule batch jobs.
You can either give it a script to run with the "-f" parameter, or you can specify it after you've typed the command.
The "TIME" parameter can be in the form of HH:MM, or "now + n minutes". There are several other complicated methods of specifying the time, which you should look up in the man page for at(1).
debian:~# at now + 5 minutes warning: commands will be executed using /bin/sh at> echo hello! at> <EOT> job 1 at 2004-03-12 13:27 |
We have now scheduled a job to run in 5 minutes time; that job will simply display (echo) the string "hello!" to stdout.
To tell at that you're finished typing commands to be executed, press Ctrl-d, that will display the <EOT> marker that you can see above.
SYNTAX: atq |
This command displays the current batch jobs that are queued:
debian:~# atq 1 2004-03-12 13:27 a root debian:~# |
This is the job that we queued earlier.
The first number is the "job id", followed by the date and time that the job will be executed, followed by the user who the job belongs to.
SYNTAX: atrm <job id> |
This command simply removes jobs from the queue.
debian:~# atrm 1 debian:~# atq debian:~# |
We've now removed our scheduled job from the queue, so it won't run.
Let's add another one, and see what happens when it is executed:
debian:~# at now + 1 minute warning: commands will be executed using /bin/sh at> touch /tmp/at.job.finished at> <EOT> job 3 at 2004-03-12 13:27 debian:~# atq 3 2004-03-12 13:27 a root debian:~# date Fri Mar 12 13:26:57 SAST 2004 debian:~# date Fri Mar 12 13:27:04 SAST 2004 debian:~# atq debian:~# ls -l /tmp/at.job.finished -rw-r--r-- 1 root root 0 Mar 12 13:27 /tmp/at.job.finished |
As you can see, we scheduled a job to execute one minute from now, and then waited for a minute to pass. You'll notice how it was removed from the queue once it was executed.
SYNTAX: crontab [ -u user ] { -l | -r | -e } crontab [ -u user ] filename |
You can use the crontab command to edit, display and delete existing cron tables.
The "-u" switch lets the root user specify another user's crontab to perform the operation on.
Table 7.1. crontab options
l | lists current crontab |
r | removes current crontab |
e | edits current crontab |
If a filename is specified instead, that file is made the new crontab.
The syntax for a crontab is as follows:
# minute hour day month weekday command Example: # minute hour day month weekday command 0 1 * * * backup.sh |
This cron job will execute the backup.sh script, at 01:00 every day of the year.
A more complicated example:
# minute hour day month weekday command 5 2 * * 5 backup-fri.sh |
This cron job will execute the backup-fri.sh script, at 02:05 every Friday.
Weekdays are as follows:
01 - Monday 02 - Tuesday etc. 07 - Sunday |
Note | |
---|---|
There is also a "system crontab", which differs slightly from the user crontabs explained above. You can find the system crontab in a file called /etc/crontab. |
You can edit this file with vi, you must not use the crontab command to edit it.
You'll also notice that this file has an additional field, which specifies the username under which the job should run.
debian:~# cat /etc/crontab # /etc/crontab: system-wide crontab # Unlike any other crontab you don't have to run the `crontab' # command to install the new version when you edit this file. # This file also has a username field, that none of the other crontabs do. SHELL=/bin/sh PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin # m h dom mon dow user command 25 6 * * * root test -e /usr/sbin/anacron || run-parts --report /etc/cron.daily 47 6 * * 7 root test -e /usr/sbin/anacron || run-parts --report /etc/cron.weekly 52 6 1 * * root test -e /usr/sbin/anacron || run-parts --report /etc/cron.monthly # |
Some of the daily system-wide jobs that run are:
logrotate - this checks to see that the files in /var/log don't grow too large.
find - this builds the locate database, used by the ?locate? command.
man-db - this builds the "whatis" database, used by the whatis command.
standard - this makes a backup of critical system files from the /etc directory, namely, your passwd,shadow and group files - that backups are given a .bak extension.
The follow commands are vital for monitoring system resources:
The ps command displays the process table.
SYNTAX: ps [auxwww] a -- select all with a tty except session leaders u -- select by effective user ID - shows username associated with each process x -- select processes without controlling ttys (daemon or background processes) w -- wide format |
debian:~# ps PID TTY TIME CMD 1013 pts/0 00:00:00 bash 1218 pts/0 00:00:00 ps |
debian:~# ps auxwww USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.4 0.1 1276 496 ? S 13:46 0:05 init root 2 0.0 0.0 0 0 ? SW 13:46 0:00 [kflushd] root 3 0.0 0.0 0 0 ? SW 13:46 0:00 [kupdate] root 4 0.0 0.0 0 0 ? SW 13:46 0:00 [kswapd] root 5 0.0 0.0 0 0 ? SW 13:46 0:00 [keventd] root 140 0.0 0.2 1344 596 ? S 13:46 0:00 /sbin/syslogd root 143 0.0 0.3 1652 836 ? S 13:46 0:00 /sbin/klogd root 151 0.0 0.1 1292 508 ? S 13:46 0:00 /usr/sbin/inetd daemon 180 0.0 0.2 1388 584 ? S 13:46 0:00 /usr/sbin/atd root 183 0.0 0.2 1652 684 ? S 13:46 0:00 /usr/sbin/cron root 682 0.0 0.4 2208 1256 tty1 S 13:48 0:00 -bash root 1007 0.0 0.4 2784 1208 ? S 13:51 0:00 /usr/sbin/sshd root 1011 0.0 0.6 5720 1780 ? S 13:52 0:00 /usr/sbin/sshd root 1013 0.0 0.4 2208 1236 pts/0 S 13:52 0:00 -bash root 1220 0.0 0.4 2944 1096 pts/0 R 14:06 0:00 ps auxwww |
The USER column is the user to whom that particular process belongs; the PID is that processes unique Process ID. You can use this PID to send signals to a process using the kill command.
For example, you can signal the "sshd" process (PID = 1007) to quit, by sending it the terminate (TERM) signal:
debian:~# ps auxwww | grep 1007 root 1007 0.0 0.4 2784 1208 ? S 13:51 0:00 /usr/sbin/sshd debian:~# kill -SIGTERM 1007 debian:~# ps auxwww | grep 1007 |
The "TERM" signal is the default that the kill command sends, so you can leave the signal parameter out usually.
If a process refuses to exit gracefully when you send it a KILL signal; e.g. "kill -SIGKILL <pid>".
The top command will display a running process table of the top CPU processes:
14:15:34 up 29 min, 2 users, load average: 0.00, 0.00, 0.00 20 processes: 19 sleeping, 1 running, 0 zombie, 0 stopped CPU states: 1.4% user, 0.9% system, 0.0% nice, 97.7% idle Mem: 257664K total, 45104K used, 212560K free, 13748K buffers Swap: 64224K total, 0K used, 64224K free, 21336K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 1 root 0 0 496 496 428 S 0.0 0.1 0:05 init 2 root 0 0 0 0 0 SW 0.0 0.0 0:00 kflushd 3 root 0 0 0 0 0 SW 0.0 0.0 0:00 kupdate 4 root 0 0 0 0 0 SW 0.0 0.0 0:00 kswapd [ ... ] |
SYNTAX: nice -<niceness> <command> |
To run the sleep command with a niceness of "-10":
debian:~# nice --10 sleep 50 |
If you then run the top command in a different terminal, you should see that the sleep's NI column has been altered:
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 2708 root -9 -10 508 508 428 S < 0.0 0.1 0:00 sleep |
You can use the renice command to alter the niceness level of an already running process.
SYNTAX: renice <niceness> [ -p <pid> ] [ -u <user> ] |
The "-p pid" parameter specifies the PID of a specific process, and the "-u user" parameter specifies a specific user, all of whose currently running processes will have their niceness value changed.
To renice all of user "student"'s processes to a value of "-10":
debian:~# renice -10 -u student |
The vmstat command gives you statistics on the virtual memory system.
SYNTAX: vmstat [delay [count]] |
debian:~# vmstat 1 5 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 0 0 0 212636 13748 21348 0 0 4 4 156 18 1 1 98 0 0 0 0 212636 13748 21348 0 0 0 0 104 12 0 0 100 0 0 0 0 212636 13748 21348 0 0 0 0 104 8 0 0 100 0 0 0 0 212636 13748 21348 0 0 0 0 104 10 0 0 100 0 0 0 0 212636 13748 21348 0 0 0 0 104 8 0 0 100 debian:~# |
Table 7.2. procs
r | processes waiting for run time |
b | processes in uninterpretable sleep |
w | processes swapped out but otherwise runnable |
Table 7.3. memory
swpd | virtual memory used (Kb) |
free | idle memory (Kb) |
buff | memory used as buffers (Kb) |
Table 7.5. io
bi | blocks sent to a block device (blocks/s) |
bo | blocks received from a block device (blocks/s) |
Table 7.7. cpu
us | user time as a percentage of total CPU time |
sy | system time as a percentage of total CPU time |
id | idle time as a percentage of total CPU time |
It is often useful to be able to keep a historical record of system activity and resource usage. This is useful to spot possible problems before they occur (such as running out of disk space), as well as for future capacity planning.
Usually, these tools are built by using system commands (such as vmstat, ps and df), coupled together with rrdtool or mrtg, which store the data and generate graphs.
rrdtool: http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/
mrtg: http://people.ee.ethz.ch/~oetiker/webtools/mrtg/
Some of the more complex monitoring systems that have been built using these tools include the following:
Cacti: http://www.raxnet.net/products/cacti/
Zabbix:http://www.zabbix.com/
All of the above tools are open source and free to use.
You may use the bash shell built-in command "ulimit" to limit the system resources that your processes are allowed to consume.
The following is an excerpt from the bash man page:
SYNTAX: ulimit [-SHacdflmnpstuv [limit]] Provides control over the resources available to the shell and to processes started by it, on systems that allow such control. The -H and -S options specify that the hard or soft limit is set for the given resource. A hard limit cannot be increased once it is set; a soft limit may be increased up to the value of the hard limit. If neither -H nor -S is specified, both the soft and hard limits are set. The value of limit can be a number in the unit specified for the resource or one of the special values hard, soft, or unlimited, which stand for the current hard limit, the current soft limit, and no limit, respectively. If limit is omitted, the current value of the soft limit of the resource is printed, unless the -H option is given. When more than one resource is specified, the limit name and unit are printed before the value. Other options are interpreted as follows: -a All current limits are reported -c The maximum size of core files created -d The maximum size of a process's data segment -f The maximum size of files created by the shell -l The maximum size that may be locked into memory -m The maximum resident set size -n The maximum number of open file descriptors (most systems do not allow this value to be set) -p The pipe size in 512-byte blocks (this may not be set) -s The maximum stack size -t The maximum amount of cpu time in seconds -u The maximum number of processes available to a single user -v The maximum amount of virtual memory available to the shell If limit is given, it is the new value of the specified resource (the -a option is display only). If no option is given, then -f is assumed. Values are in 1024-byte increments, except for -t, which is in seconds, -p, which is in units of 512-byte blocks, and -n and -u, which are unscaled values. The return status is 0 unless an invalid option or argument is supplied, or an error occurs while setting a new limit. |
On a Debian system, the default ulimit settings should appear as follows:
debian:~# ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 256 virtual memory (kbytes, -v) unlimited |
The common use of this command is to prevent long running processes, such as web servers (e.g., Apache) and CGI scripts from leaking memory and consuming all available system resources. Using the ulimit command to reduce the locked memory and memory size options before starting up your web server would mitigate this problem.
Another common use is on shell servers where users may not "tidy up" after themselves; you can then set the cpu time limit in /etc/profile, thus having the system automatically terminate long running processes.
Another example specifically relates to core files. You'll notice that the default core file size is set to 0. This means that when an application crashes (called a "segmentation fault"), it does not leave behind a core file (a file containing the memory contents of the application at the time that it crashed). This core file can prove invaluable for debugging, but can obviously be quite large as it will be the same size as the amount of memory the application was consuming at the time! Hence the default "0" value.
However, to enable core dumps, you can specify the "-c" switch:
debian:~# ulimit -c 0 debian:~# ulimit -c 1024 debian:~# ulimit -c 1024 |
Any further applications launched from this shell, which crash, will now generate core dump files up to 1024 blocks in size.
The core files are normally named "core" or sometimes processname.core, and will be written to the current working directory of the specific application that crashed.
On a Linux system, you should find all the system log files are in the /var/log directory.
The first place you should look if you were experiencing problems with a running system is the system "messages" logfile.
You can use the tail command to see the last few entries:
$ tail /var/log/messages |
It's sometimes useful to keep the log scrolling in a window as entries are added, and you can use tail's -f (follow) flag to achieve this:
$ tail -f /var/log/messages |
Other files of interest in /var/log:
auth.log -- log messages relating to system authentication
daemon.log -- log message relating to running daemons on the system
debug -- debug level messages
syslog -- system level log messages
kern.log -- kernel messages
The process which writes to these logfiles is called "syslogd", and its behavior is configured by /etc/syslog.conf.
Each log message has a facility (kern, mail, news, daemon) and a severity (debug, info, warn, err, crit). The syslog.conf file uses these to determine where to send the messages.
As machines continue to run over time, these files can obviously become quite large. Rather than the system administrator having to manually trim them, there is a utility called "logrotate".
This utility can be configured to rotate (backup and compress) old log files and make way for new ones. It can also be configured to only store a certain amount of log files, making sure that you keep your disk space free.
The files which controls this behavior is /etc/logrotate.conf. See the logrotate(8) man page for details on the syntax of this file.