Email Tom
Tom's Website

Email Brian
Brian's Website

Go to the Table Of Contents

Did you read the Preface? Thanks!

20 - The Zen of Linux System Administration

In This Chapter

The role of the system administrator
Developing an appropriate mindset
The challenges of system administration
Ethics and responsibility
Never-do's in system administration
Knowing your network
File system planning
Installing software

Gather ten system administrators together in a room and ask them to define their job. You'll probably get ten distinct, diverse answers. Why? Because system administration has a lot of aspects to it: some concrete and tangible, and some abstract and intangible. On the concrete side of the equation, today's typical administrator needs to have a strong grasp of all things hardware-related, a good understanding of network topologies and protocols, and how to make everything "play nice together" in the same "sandbox." The job frequently demands in-depth, hands-on experience with at least two or three operating systems. There's software to install and update, programs to reconfigure, and kernels to tweak. User accounts need to be created and removed. File systems require ongoing maintenance. Backups need to be scheduled and tested. The list goes on and on.

Now factor in the intangibles. Users have needs, and sometime those needs directly contradict system integrity or security. Did someone mention training (usually for a product the administrator has little or no direct experience with)? Network or filesystem design anyone? How about that new VPN that needed to be in place yesterday?

This chapter is takes a sweeping look at some of the concepts of system administration. While the balance of this book deals with the hands-on implementation of installing, configuring, and maintaining a Linux system, the following topics are meant to provoke thought and reflection. Keep in mind if there's any rule to system administration, it's that there are no rules per se; that there's more than one way to successfully implement something or solve a problem.

Several topics discussed in the following pages might at first glance appear to apply only to large networks with hundreds of users. We disagree. Whether you're tasked with the care and feeding of one standalone machine, or a complex network fed by a diverse range of operating systems, we believe the underlying principles are the same.

The Role of the System Administrator

The job of a system administrator is complex. It's demanding, frequently requires the patience of Job, and definitely requires a good sense of humor. In addition, you need to be able to switch hats quickly and without breaking stride. System administration is a lot like working in the emergency ward of a hospital. You need to be a doctor, a psychologist, and - when your instruments fail - a mechanic. Often you need to work with limited resources. You need to be inventive in a crisis; you need to know a lot of facts and figures about the way computers work. You need to recognize that the answers are not always written down for you to copy, and that machines do not always behave the way you think they should. And, on top of everything else, you need to learn hundreds of new things a year.

While each administrator's knowledge base will vary according the hardware, software, and number of users they're supporting, one aspect of the job is consistent - attitude. Being a system administrator is as much a state of mind as it is about being knowledgeable. You need to be ready for the unexpected, resigned to an uncertain future, and able to plan for the unknown. It requires organization and the ability to be systematic. There is often no right answer, but there always a wrong answer. It's about making something robust that works as advertised, with a minimum amount of fuss, muss, or expense. In spite of stereotypes, today's system administrator is neither haphazard nor messy. Modern computing systems demand the very best of organizational skills and a professional attitude.

System administration also demands a certain degree of self-confidence. In the beginning, the novice is typically too nervous or worried about breaking something to get any real work done. Later, as experience is gained, aspects of the job become routine and second-nature. It's like learning to ski - there are three stages:

Under-confidence: You start out on flat ground, well away from spectators, and fall down a lot.
Over-confidence: You ignore the world around you, race down the mountainside at Mach 8, and while showing off your best moves to the opposite sex, ski straight into a tree and break your nose.
Resignation to your fate: You are conscious of your environment, know where the potholes are, and concentrate on the task at hand instead of style.

To get going, you need to know your stuff and have confidence in your abilities - but you also need to know your limitations.

A lot of people think that system administration is begins and ends with installing an operating system and creating a few user accounts. In fact, this is only the beginning. Correctly installing an OS and a handful of software programs is the easy part; keeping that system running, smoothly, without interruption or incident, for months at a time, is the challenge.

Developing an Appropriate Mindset

The days when one could fly a system by the seat of one's pants are over. System administration demands practical skills but nothing can replace real understanding. Today, keeping up with the theory is as important as upgrading the software. Successful system administration means approaching problems with the appropriate mindset. The following attitudes are ill-advised:

Insisting that there is a right or wrong answer to every question.
Running to someone else whenever there is a problem.
Getting distraught and upset when things do not work as expected.
Expecting every problem to have a beginning, a middle, and an end.

In contrast it is recommended to begin by:

Looking for answers in manuals and newsgroups.
Using trial and error (carefully!) to locate problems.
Being cheerful and patient with the users of your system.
Listening to people who tell you that there is a problem. It might be true, even if you can't see it yourself.
Writing down your experiences so that you know how to solve the same problem again in the future.
Taking responsibility for your own actions. Be prepared for accidents. They are going to happen and they will be your fault. And you are going to have to fix them.
Remembering the tedious jobs like vacuum cleaning the hardware once a year.

System administrators need to be able to read documentation, to be able to communicate clearly with others, and to ask the appropriate questions when necessary.

It is important to start by being systematic from the beginning. Every time you read something, or learn some isolated fact, think: how does this apply to me? Do yourself a favor and take the time to write down your experiences. Start now. And try to find your own method for remembering what you have learned.

The Challenges of System Administration

System administration is not just about installing operating systems. It's about planning, designing, and maintaining an efficient network of computers that are going to be used by people on an ongoing basis. On the computer side of the equation, this means:

Designing a network that is logical and efficient.
Designing and deploying machines that can be easily upgraded.
Deciding what services are needed and implementing them effectively.
Planning and implementing system security.
Providing a comfortable environment for users.
Developing ways of fixing errors and problems that occur.
Designing and testing a "disaster" recovery plan.
Keeping track of and understanding how to use the enormous amount of relevant knowledge that increases every year.

Some system administrators are responsible for both the network hardware and the computers that it connects, that is, the cables as well as the computers. Some are only responsible for the computers. Either way, an understanding of how data flows from machine to machine is essential as well as an understanding of how each machine affects every other one. A central axiom for network administration is as follows:

Once a computer is plugged into the network, it no longer belongs to any one individual. It becomes part of a community of machines that share resources and communicate with the whole. What your machine does affects other machines. What other machines do affects your machine.

The ethical issues associated with connecting to the network are not trivial. Administrators are ultimately responsible for how an organization conducts itself electronically. This is a responsibility that must be exercised wisely and with care.

Ethics and Responsibility

A system administrator wields great power. He or she has the means to read everyone's mail, change files, and to start and kill processes. This power can easily be abused and at times the temptation to play divine ruler is very alluring.

Tip
Another danger an administrator often gets caught up in is thinking a system exists primarily for their own entertainment and/or use, and that users are simply a nuisance to the smooth operation of a network. We'd like to gently remind all administrators that their role is defined by user needs, not the other way around.

The ethical integrity of a system administrator is clearly an important issue. Administrators for top secret government organizations and administrators for small businesses have the same responsibilities towards their users and their organizations. A quick glance at the governing institutions around the world should quickly tell you that power corrupts. And despite noble intentions, few individuals are immune to the temptations of such power at one time or other.

Administrators "watch over" backups, e-mail, private communications, and they have access to everyone's files. While it is almost never necessary to look at a user's private files, it is possible at any time and users do not usually consider the fact that their files are available to other individuals in this way. Users need to be able to trust the system and its administrator. Here are some thinks to think about as you mediate the systems for your users:

The kind of rules you can fairly impose on users
The responsibilities you have to the rest of the network community, that is, the rest of the world
Censoring of information or views
Restriction of personal freedom
Taking sides in personal disputes
Extreme views (some institutions have policies about this)
Unlawful behavior

Objectivity of the administrator means avoiding taking sides in ethical, moral, religious or political debates. Personal views should be kept separate from professional views. However, the extent to which this is possible depends strongly on the individual and organizations have to be aware of this. Some organizations dictate policy for their employees. This is also an issue to be cautious with: if a policy is too loose it can lead to laziness and unprofessional behavior; if it is too paranoid or restrictive it can lead to hostile feelings within the organization. Historically, unhappy employees have been responsible for the largest cases of computer crime.

Note
Almost all the material in this chapter is generic to all variants of Unix. Feel free to substitute the word "Linux" for "Unix" anywhere it's used.

Never-Do's in System Administration

The following is a list of Tom and Brian's NEVER-DO's regarding root access and administrative privileges. Most of these were learned directly from the school of hard knocks - in other words, we've made our share of errors in the past and (in most cases) learned our lessons.

The root account has unlimited privileges. Never log into the system as the root user. Use the su command to gain root privileges when you need them and quit immediately after completing the task at hand. And never run ordinary programs with root privileges - doing so increases the risk of a system being compromised by a virus or a program accessing data with inappropriate permissions.
Never leave a root shell running on a machine you're not actively using. If you forget you're working in the "big stick" account, it's easy to do something destructive under root privileges. (In other words, "Put the sledgehammer down when you are not using it!")
Never leave services running if they are not used for anything. All running services provide a potential back-door into the system for intruders.
Never give users physical access to a machine that stores important data. If users can touch the system, it's theirs.
Windows 95, Windows 98, and the Macintosh OS are inherently insecure systems. They cannot be secure by virtue of their design (they have no access control of any kind - all access is privileged). When setting up a network in a potentially hostile environment, use an operating system like Unix (or NT, if you must, if your corporate policy forbids all but MS products). Put insecure machines on a subnet secured at its entry points. (This is our "protect the weak and helpless rule").

Knowing Your Network

System administration requires its operatives to know a lot of facts about hardware and software. The road to real knowledge is long and winding; where's the best place to start?

A top-down approach is useful for understanding a network's interrelationships. The best place to start is at the network level. In most daily situations, one starts with a network already in place - that is, one doesn't have to build one from scratch. It is important to know what hardware one has to work with and where everything is to be found; how it is organized (or not), and so on. Here's a general checklist:

How does the network fit together? (What is its topology?)
Which function does each host/machine have on the network?
Which machines supply which network services?

Having thought about the network as a whole, now think about individual hosts/machines. First there's a hardware list.

What kinds of machines are on the network? What are their names and addresses and where are they? Do they have hard disks? How big? What's the current BIOS level? How much memory do they have? What brand of NIC is installed and how is it configured? Some devices allow firmware to be updated (for example, CD-ROMs and video cards); what revision is installed and are there updates available from the vendor?
What operating systems are running on the
Which kinds of network cables are in use? Is it thin/thick Ethernet? Is it a star net (hubs/twisted pair), or fiber optic FDDI net?
Where are hubs/repeaters/the router or other network control boxes located? Who is responsible for mintaining them?
Who is responsible for determining if a hardware or software upgrade is justified?
Which machines, if any, are covered by service contracts? What are the details of the contract?

Then there's the network implementation list:

How many different subnets does your network have?
What are their network addresses?
What are the router addresses (the default routes) on each segment?
What is the netmask?
What broadcast address convention is used? 255 or the older 0?

And there are many other such questions, such as... Which machines provide the key services on the network? What kind of file sharing services are in use? (NFS? Samba shares? NT shares?) What permissions are allowed on the shares? Has anyone checked these permissions for accuracy lately? Does the network have its own DNS server? If not, who provides DNS services? Does the company run its own web server? Is it secured? Which machines provide firewall services? When's the last time the firewall was tested against an attack scenario?

Accurately documenting both machines and the network itself is extremely important to all installations, large and small. Reviewing (or creating documentation if it doesn't exist) is a good way for an administrator to become familiar with the tools at hand, how they're currently being utilized, and how they might be better deployed.

Finding and recording the above information is not only an important learning process, but an accurately documented network is crucial The information changes as time goes by. Networks are not static; they grow and evolve with time.

File System Planning

All operating systems are based on some form of hierarchical file system, and use directories and subdirectories to sort and organize the data they contain. Disks can also be divided up into logical partitions. The key reason behind directories and partitions is to logically distinguish files separately from each other. For example, you can have the following directory structure:

User home directories
Development work
Commercial software
Free software
Local scripts and databases

One of the challenges of system design is to find a directory structure for data that's simple, flexible, and allows users to immediately grasp where to find data and where the administrator expects them to save new files.

Hint
Data, programs, and operating system files should always be segregated at the very least by directory tree, and preferably by disk (or at least by disk partition). Mixing data and programs files with the operating system tree makes upgrading or re-installing an OS unnecessarily difficult.

It makes no sense to mix logically separate file trees. Operating systems usually have a special place for installed software. Regrettably, some vendors break this rule and mix software with the operating system's file tree. On Unix machines, the place for installed software is traditionally /usr/local; fortunately separate disk partitions can be placed anywhere in the file tree on a directory boundary, so this is not a practical problem as long as everything lies under a common directory. Under NT, software is often installed in the same directory as the operating system itself; also, thanks to Microsoft's concept of a central "Registry," re-installation of NT means re-installation of all program software as well.

The Filesystem Hierarchy Standard
The locations for the installation of various file types is delineated by the Linux Filesystem Hierarchy Standard (FHS), which is found at http://www.pathname.com/fhs/. One area of confusion is the location of locally installed software (as opposed to software that comes with the distribution. Our reading of the FHS indicates that small tools, utilities and assorted add-in software should indeed be placed in /usr/local. The /opt subdirectory is reserved for application level software, like the office suites that we reviewed in Chapter 11.

Data files installed or created locally are not usually subject to any location constraints; they can reside anywhere. One can therefore find a naming scheme that gives the system logical clarity. This benefits users and management issues. Again, directories are typically used for this purpose. Operating systems that are descended from DOS also have the concept of drive designators like A:, B:, C: and so on. These are assigned to different disk partitions. Some Unix operating systems have virtual file systems that allow administrators to add disks transparently without ever reaching a practical limit. Users never see partition boundaries. This has both advantages and disadvantages since small partitions are a cheap way to contain groups of misbehaving users, without resorting to disk quotas.

Each operating system has a model for laying out its files in a standard pattern, but user files are usually left unspecified. The naming convention and layout of these files is generally left to the discretion of the system administrator. Choosing a sound layout for data can make the difference between incomprehensible chaos and a neat orderly structure. An orderly structure is beneficial not only to the users of the system, but also when making backups. Some relevant issues are as follows:

Disk partitions are associated with drives or directory trees when connected to operating systems. These need names.
Naming schemes for files and disks are operating system-dependent.
The name of a partition should reflect its function or contents.
In a network, the name of a partition should contain the name of the host.

It is good practice to consolidate file storage into a few special locations rather than spreading it out all over the network. Data kept on many machines can be difficult to manage compared to data collected on a few dedicated file servers. Also, insecure operating systems offer no protection for files on a local disk.

The site/host/purpose model of organizing a filesystem is but one of many. It has an advantage over some other schemes in that anyone who knows the naming rules can quickly identify the host and function of a network resource. Also, it falls nicely into a hierarchical directory pattern. A simple but effective implementation is to use a three level mount-point for adding disks: each user disk is mapped onto a directory with a name of the form: /site/host/purpose .This approach works well even for large organizations and can be extended in obvious ways.

Within an organization, using this structure provides a global naming scheme, like those used in true network filesystems like AFS, NFS, and DFS. These use the name of the host on which a resource is physically located to provide a point of reference. This is also an excellent way of labeling backups of partitions since it is then immediately clear where the data belongs. A few rules of thumb allow this naming scheme to live painlessly alongside traditional Unix naming schemes.

When mounting a remote filesystem on a host, the client and server directories should always have exactly the same name. Anything else only causes confusion and problems later on.
The name of every filesystem mount point should be unique and tell the user something meaningful about where it is located and what its function is.
Symbolic links can be used to map programs that insist on installing themselves in non-standard locations to common locations like /usr/local.

It doesn't matter whether software compiles in the path names of special directories into software as long as you follow the previous points. The first link in the mount point is the part of the organization or site that the host belongs to, the second link is the name of the host to which the disk is physically connected, and the third and final link is a name that reflects the contents of the partition. Some examples are as follows:

/syroidmanor/janus/local/usr
/syroidmanor/hydras/home
/syroidmanor/hydras/storage

/orbdesigns/gryphon/home
/orbdesigns/grendel/data
/orbdesigns/grendel/db

The problem of drive names under NT and Windows is awkward, especially if your goal is Linux/NT interoperability. In practice, many networks based on NT and Windows use Microsoft's model throughout, and while it might not gleam with elegance, it does the job. The problem of backups is confined to the domain servers, so the fact that Windows is not a fully distributed operating system restricts the problem to manageable proportions.

Installing Software

Unlike many other systems, Unix and its newer sibling Linux are often used by people who write their own software rather than relying on off-the-shelf products. The Internet contains gigabytes of software for Unix systems that are completely free. Large companies like the oil industry and newspapers can afford off-the-shelf software for Unix, but most individuals can't.

There are therefore two kinds of software installation: the installation of commercial software and the installation of freeware. (Either free as in the public domain, or free as in licensed under a sharing license, like the GNU GPL). Commercial software is usually installed from a CD by running a simple script and by following the instructions carefully; the only decision you need to make is where you want to install the software. Freeware usually comes as source code and must therefore be compiled. Unix programmers have gone to great lengths to make this process as simple as possible for system administrators.

Structuring installed software

The first step in installing software is to decide where you want to keep it. You can, of course, put the software anywhere but keep the following in mind:

You should keep third-party software separate from the operating system installed files. This allows the OS to be reinstalled or upgraded without affecting your software installation.
Compiled software should be grouped together, with a bin directory and a lib directory so that binaries and libraries conform to the usual Unix conventions. This makes the system more consistent and easier to understand, and it also makes it easier to administer the PATH for user commands due to the fact programs are centralized.
You should try to keep files and programs that are host-specific separate from files that could be used anywhere. One common approach is to put host-specific files under /usr/local and third-party applications under /opt.

The directory traditionally chosen for installed software is called /usr/local. Within that sub-hierarchy, create the sub-directories /usr/local/bin and /usr/local/lib, and so on. Linux has a de-facto naming standard (set out by the aforementioned FHS) for directories that you should try to stick to so others can quickly grasp the structure of a system.

bin - Binaries or executables for normal user programs
sbin - Binaries or executables for programs that only system administrators require. Those files in /sbin are often statically linked to avoid problems with libraries which lie on unmounted disks during system boot
lib - Libraries and support files for special software
etc - Configuration files
share - Files which might be shared by several programs or hosts: For instance, databases or help information; other common resources

Below is one suggestion for structuring installed software.

                     /usr/local
                         |
       ---------------------------------------------
       |                 |                     |
      bin/              gnu/bin              site/bin
      lib/              gnu/lib              site/lib
      etc/              gnu/etc              site/etc
      sbin/             gnu/sbin             site/sbin
      share/            gnu/share            site/share

In this example, we use three categories: regular installed software, GNU software, and site software. The reasons for this are as follows:

/usr/local is the traditional place for software that does not belong to the OS. You could keep everything here, but you will end up installing a lot of software after a while, so you might like to create two other sub-categories.
GNU software, which is written by the Free Software Foundation, forms a self-contained set of tools that replace many of the older Unix equivalents, like ls and cp. GNU software has its own system of installation and set of standards. GNU will also eventually become an operating system in its own right, and should therefore be kept separate.
Site-specific software includes programs and data which you build locally to replace the software or data to accompany or work in conjunction with your operating system. It also can include special data like the database of aliases for e-mail and the DNS tables for your site. Since it is special to your site, you should keep it separate so that it can be backed up separately and you always know where to find site-specific stuff.

When installing software, you will usually be asked for the name of a prefix or location for the package. The prefix in the above cases is /usr/local for ordinary software, /usr/local/gnu for GNU software, and /usr/local/sitename for site specific software. Most software installation scripts place executables under bin and lib automatically.

To begin compiling software, you should always start by looking for a file called README or INSTALL. This will tell you what you have to do to compile and install the software. In most cases, you will only have to type a couple of commands, as in the following example.

GNU software example

The following example illustrates the GNU method of installing software from sources. The steps are:

Collect the software package by ftp from a site like ftp.uu.net or ftp.sourceforge.net. Use a program like ncftp for painless anonymous login.
Unpack the file using tar zxvf tar software.tar.gz.
Enter the directory that is unpacked, cd software.

Read the installation directions, readme files, and type ./configure --help to get guidance on compilation options.

Type ./configure [options].
Type make.
If all goes well, type make install. This should be enough to install the software.

Some installation scripts leave files with the wrong permissions so ordinary users cannot access the files. Always check new program installations and make sure the executable binaries (the ones you want average users to run, that is) have file permission of 755. Again, most installation scripts ensure the correct permissions bits are set, and if for some reason they are not, this is usually noted in the program's documentation.

This procedure should be more or less the same for just about any software an administrator encounters. Older software packages sometimes provide only Makefiles that you must customize yourself. Some X11-based windowing software requires you to use the xmkmf X-make-makefiles command instead of configure. You should always consult a program's INSTALL file.

Installing shared libraries

Systems that use shared libraries or shared objects sometimes need to be reconfigured as new binaries are added to the system. This is because the names of the libraries are cached for fast access. The system will not look for a library if it is not in the cache file.

To register a library directory with the operating system, add it to /etc/ld.so.conf. Then run ldconfig to update the /etc/ld.so.cache file. ld.so is the Linux utility that loads the shared code libraries that a program needs. It looks in several default locations for the libraries, such as /lib and /usr/lib, before reading /etc/ld.so.cache.

Beating you about the head...
Always scour a program's README file before installation - especially when you're updating an existing version. On occasion, new libraries are required by program updates and failure to install them will either break the product completely, or leave it in a very unstable state. This is especially relevant to administrators who prefer to compile programs from source code, as opposed to package releases that incorporate automatic dependency resolution or notification.

The ever-present upgrade dilemma

Some software (especially some free software) suffers from frequent update cycles. You could easily spend your entire life just chasing the latest versions of your favorite software packages. Don't!

It is a waste of time.
Sometimes new versions contain more bugs than the old one, and an even-newer-version is just around the corner.
Users will not thank you for routinely changing the software they use every day. Stability is a virtue in the eye of the user.

On the other hand, some program updates contain important security fixes or features that benefit user productivity. There's a fine line here, of course, and each administrator has to weigh an update's benefits against the time invoked in applying it and potential user re-training or re-familiarization.

We do not recommend upgrading front-line server applications "cold turkey." Ideally, install the update on a second tier server and test it thoroughly before putting it into active duty. If extra hardware is at a premium or simply unavailable, consider installing the product in a different location for testing purposes and run it in conjunction with your existing version. If neither of these options works for you, at the very least don't install an upgrade the day it hits the streets - monitor the product's mailing lists, and let other people do the testing for you.

Summary

In this chapter we examined several philosophical [CT9]issues pertaining to the role of system administrator, plus looked at three generic topics regarding efficient system design and implementation: Getting to know the topology of a network, planning a file system, and installing software.

System administration is a complex topic, and as such, demands a diverse skill-set. Beyond the obvious hardware/software knowledge, administrators also need to be able to effectively design a network, interact with users, ask questions, and listen to the answers received.
We provided our own list of NEVER-DO's based on experience and past mistakes. The bottom line is that an administrator holds the keys to the kingdom, and should keep this in mind every time the su command is issued. Don't leave open root shells lying around, and don't give average users physical access to a system containing data.
Getting to know an unfamiliar network is best approached from the "top down." Study the big structures, and work your way down to specific details.
Effective file system design is 90 percent careful planning, and 10 percent implementation.
An example was provided based on the popular site/host/purpose naming schema. The benefits of this approach are: scalability, consistency, and immediate identification of a given filesystem's purpose.
Operating systems, applications programs, and user data should all reside on separate file systems, and ideally on separate disks and/or partitions.
Mixing operating system files with application and data makes upgrading an OS unnecessarily complex and difficult.
The application programs are, by long-standing convention, typically put under the /usr/local filesystem. This convention can be further enhanced by categorizing applications based on their source, as well as whether they are machine-specific or available to network users.
Program upgrades should be carefully considered before being applied. Security updates aside, "if it ain't broke, don't fix it" is a good rule of thumb to follow unless the upgrades add needed functionality or features.

Go to the Table Of Contents